ELCA_data

Zaragoza | Ontologies and Interactive Network Visualizations

Connections with the data universe

High-Level Project Summary

Astronomical information is gathered from multiple agencies around the world. However, this one is growing rapidly and there is not consensus in the taxonomy. This makes difficult to find specific information and wasting time is irremediable. For that reason, our goal is to find multiple relations in the released datasets, APIs, etc. from different agencies and visualize the mentioned relationships in a useful and interactive dashboard. We use techniques such as network graphs, text mining, clustering and NLP that help us to reach our main goal.By doing this, private researchers and public researchers will find easily what they are looking for and may continue their investigation.

Link to Project "Demo"

https://drive.google.com/drive/folders/1oDOLO0M2HkRY11LV7DCpxiJ2NSULUJ7Y?usp=sharing

Link to Final Project

https://github.com/luismanriqueruiz/space_apps_challenge/tree/main/2021

Detailed Project Description

General Schema

Data Collection

The data is first collected from multiple sources, from NASA’s taxonomy website (1) that contains SKOS files as well as NASA’s APIs (2). The latter one provides several types of information in different formats like JSON, CSV, JPEG, etc.

Power BI Solution

The BI solution contains 2 parts: The data integration, analytics, and modelling. In this step (3) we integrated python to analyze large texts by using text mining and NLP techniques. The last process is then uploading the solution into the Power BI service and visualization.

Dashboard

The dashboard gathers and combine information from multiple Taxonomy SKOS and APIs. First, it introduces multiple relationships from NASA’s taxonomy. Information related to narrower and broader is shown as network graphs.

In the dashboard, the user or researcher may look for specific terms or concepts inside the field “search by Term” and “search by concept”. All the data comes from multiple sources such as SKOS files, datasets, APIs and several formats such as json, csv, png, etc.

This user’s search will update the visualization and the filtered data and graphs are shown. The user is informed about the related words and their frequencies (word cloud). The information from the “word cloud” does not come only from single fields but from long texts as well. These texts are processed using text mining techniques and NLP that allow us to find document term matrix, removing stop words, analyze distances between texts etc.

The benefits of using this dashboard or visualization are that the user will understand easily how specific terms and related and how to find this information inside NASA’s datasets. Also, because it is easy to use it.

We want to achieve, by creating this software, a much easier understanding of astronomical data through multiple charts. Also, another useful aspect is that the software automatically creates a URL query for the user if he/she wants to delve into the NASA’s information.

We used Power BI Desktop for visualizing and creating this dashboard. However, we also used python 3.9.7 (as backend system) to analyze and get different results in an automated way from the APIs. Also, by programing we were able to find relationships between their fields, make use of text mining and NLP techniques too.

Space Agency Data

Provide specific details about what space agency data you used in your project, how you used it, or how it inspired your project.

Remember: You are welcome to use any open data in your project, however, you must use at least some data from NASA and/or open source-space based data from Space Apps partner space agencies to be eligible for Global Judging.

We use NASA’s information; this one comes from multiple sources as follows:

APIS:

APOD
Asteroids - NeoWs
DONKI
Earth
EONET
EPIC
Exoplanet archive
GeneLab Public API
InSight: Mars Weather Service API
Mars Rover Photos
NASA Image and Video Library
TechTransfer
SBDB Close-Approach Data API
Techport
TLE API
Vesta/Moon/Mars Trek WMTS

Also, from NASA taxonomy 2.0.

To understand the complexity of the vast collection of information, we explored each of the mentioned APIS. By doing this (working as developers), we could get:

Multiple samples
Check different fields and relationships
Consider how the standardization and normalization is possible.

The different type of data, such as json structure, tables, pictures allow us to contemplate even more how can we help users and researchers to access any type of astronomical information.

Hackathon Journey

The experiences working on this hackathon has been terrific. Fortunately, the members of this team although they are few, they know what they want and by using their specific experiences and knowledge the discussion is much richer.

Another interesting aspect is that both members are in different countries. This situation allows them to work in different time zones and take breaks from time to time while the other one could continue working.

As a team we learned about the importance of the taxonomy and ontologies not only for NASA but for all the collected information around the world. It is mandatory to have a global consensus on how to handle and name the data or having specific organizations that could clean and organize it.

Our team is formed by two people who have worked with astronomical information and are passionate about this subject. Also, our friendship became stronger and now we know more not only the capabilities of each other but also about the personality. For us, it was too important the respect of each other and having faster and clear communication about innovative ideas, setbacks, challenges. It helps us to be mesmerized on how the other person could see and solve an issue in an easier way.

By having some discussions about our experiences, knowledge, and capabilities we chose couple of challenges. However, this one is chosen since we have worked in the past with databases, file structures, visualizations, etc., and we are passionate of tiding information and make it more accessible for everyone.

We knew that visualizations are important for understanding the relationship between datasets. For that reason, we focused on this point. Another relevant topic network graphs, it not only helps us to see relationships but finding clusters too. Finally, text mining and NLP techniques were important in this project because we also delved into long texts and tried to find specific clusters and common topics among the data.

We would like to thank to Zaragoza’s board members for providing us their comments and for fuel up the discussion leading to the conception of novel perspectives. And also, to our families that know that we are passionate about this topic and in some way, we want to provide an idea of how to solve this humanity challenge.

References

Power BI Desktop 2.97

Python 3.9.7: pandas, json, rdfpandas, rdflib, matplotlib, sklearn, nltk

Global Judging

This project has been submitted for consideration during the Judging process.