Noodle; A Nasa search engine, like google

High-Level Project Summary

Data and information about data can be found in many different places and it can be challenging for researchers to get an overview and to access data in an easy way. We want to remedy this by offering a single source of truth of research data - through an easy-to-use interface - where users can design their own queries and be presented with relevant datasets, articles, and technical papers, relevant to their specific needs. By leveraging a Labeled Property Graph search queries are made fast. The graph also allows for interesting connections between datasets that would otherwise go unnoticed, which would further add value in the research process.

Detailed Project Description


The project



Our solution is a search engine to easily find data from NASA datasets. It contains three search inputs, so that one could search for taxonomy, SKOS or keywords. The result shows datasets based on the search words. By clicking on one result, we see a

visualization on how the different data is related to each other in the data set, from a database with over 80K nodes and 301K relationships!


This will enable STEAM education and professional researchers to find information much easier.


As an extra feature we included a view from Google Scholar to easily show relevant articles/papers to the searched word/dataset.


A video of the project





  • Note: Download the client folder, and run the index.html with live server to see example of an interactive visualization! The example shows projects, keywords and datasets related to the taxonomy code "TX08.1.4".


Picture 0: Our visualization of the data




Project background


In order for the search engine to work, we developed a data model that joins together taxonomy, SKOS, and keywords (and therefore joines data from different datasets). We then parsed the data with Python and seeded a graph database (Neo4j) based on the data model. By using Neo4j, we could see the data as nodes and relationships, which is specified as one of the challenges objectives.



Picture 1: Our whiteboard when trying to figure out how to tackle the challenge.





Picture 2: The Data Model we developed.



Picture 3: Snippet of Neo4j database, where three datasets are connected.




 Picture 4: A graph that shows all nodes an relationships of NASA's taxonomies (made in Neo4j)







Picture 5: Maybe the most important picture of them all -- the Ontology we have built in Neo4j, with all of the data sources we used connected to each other (notice the amount of nodes and relationships in the graph -- over 80k nodes and 301k relationships!). This enables us to find information much easier.



Next step


A possible next step would be to implement this proof of concept in a web application built in Vue.js and d3.js.


We belive the data from Neo4j with the nodes and relationships should be accessible to others, so that anyone could create a smart solution -- maybe a much different search engine or a visualization, with the data as a source. This could then be distributed as an API service.

Space Agency Data

Simple Knowledge Organization System (SKOS) files from NASA have been used as part of the overall ontology in the Labeled Property Graph.


Data.NASA.gov/data.JSON datasets have been implemented as a first dataset in the Labeled Property Graph.


The JSON data and the SKOS files are read programmatically via REST API calls and then parsed in Python. The parsed data is then pushed to the neo4j database, from where it is presented to the user interface (website).

Hackathon Journey

We applied to the NASA Apps Challenge because we saw an interesting opportunity to learn something new. We felt that the ontology challenge was a good fit since we have some experience with this from other industries. It is always interesting to apply knowledge from one field in a new context.


During the weekend we have learned many new things about NASA datasets, APIs, and taxonomies/ontologies. We have also gained a better understanding of the problems researchers face when trying to find relevant data for their projects, as well as a better insight into the possibilities with connected research data.

In addition to the above, this weekend has been a fun journey with many laughs and maybe a bit to much junk food ;)

References

  • data.nasa.gov/data.json
  • SKOS
  • https://api.nasa.gov/ (TechPort and taxonomy)
  • Neo4j (as database)
  • Python (for scripting)
  • Vue.js (for front end-development)

Tags

#d3 #graph #neo4j #ontology #search #datasets

Global Judging

This project has been submitted for consideration during the Judging process.