DeepRoot: Let's have fun finding relationships between NASA datasets while saving time! Let's go!!

High-Level Project Summary

How much time do you spent looking for the right data set? The time spent finding related NASA datasets is unnecessarily long causing an overwhelming searching experience.DeepRoot a friendly and inclusive web interface that allows a deep exploration for both: scientific and amateur audiences, to carry out an effective research.Using word embedding and information available in the metadata such as the description, subject, title, and tags, we analyze the dataset's semantic similarity, in order to generate connections. This will be displayed through an innovative interactive graph. Facilitating access of information and offering similar datasets suggestions.

Detailed Project Description

Why do we chose it? Introducing the challenge

We all have heard at least one time that doing online scientific research is tedious. But the searching process for high school students, university students, and scientists is difficult and time-taking for them equally? Is effective the research they do? Can we offer an inclusive website that allows any audience to find significant sources based on the semantic similarities of datasets so it's easy to contrast information while the interactive interphase motivates and empowers your enthusiasm?

First, we need to discuss: 

The aptitudes from digital native vs digital immigrant, and the implication these has on finding corrects sources while doing online research. 

Is true that students that are digital natives have tech aptitudes that are different from the digital immigrants' but can we say they are better? 

Research is not easy, but as online research has been an option, it’s quite deceptive how the Internet makes it seem easy. 

"Research requires students to read, interpret, and analyze new information, reshape their research question, and start again. This kind of sustained focus on a challenging task is very hard for most students" (1)

Digital natives adapt faster to challenges that involve technology, for example; let's say doing research for a science project, but we cannot assure that the information found by them is accurate, updated, and validated. So if this info will be used as the foundation of the investigation, the result will not be:

1) approved by their teachers because there is no theoretical basis

2) Have any meaning as the purpose of scientific research is to seek the truth and comprehend the phenomena.  

" When researching online, students unsuccessfully scan pages of text as opposed to reading those pages of text for comprehension. Therefore, they cannot tell whether or not the source they are looking at, is applicable to their research question." (2)
" (..) Digital natives have the world of information at their fingertips, for some reason they are often unable to take basic problem-solving skills and apply them to simple online research. Students today are accustomed to instant gratification, and therefore can be overwhelmed by tasks that require time-consuming research. " (1)

And, the immigrants take more time, much more..., to find sources that convince them, because they are used to make connections and look at them in different sources but the amount of time that they spend seems to be so long that at the end it only fatigues and strain their eyes.

The online search turns out to be really overwhelming and in most of the cases paralyze the scientist spirit not letting them do their best and stopping a great outcome.

What's DeepRoot, what exactly does it do, and what benefits does it have?

We propose to develop a friendly and inclusive web interface that allows deep exploration for both: scientific and amateur audiences, to carry out an effective research. 

De prototype will be found on: https://www.figma.com/proto/mlrHqxNRsuGkOKiiP0v3iV/Untitled?page-id=0%3A1&node-id=4%3A202&viewport=241%2C48%2C0.67&scaling=min-zoom&starting-point-node-id=1%3A2


Facilitating access of information and offering similar datasets suggestions. Using the information available in the metadata such as the description, subject, title and tags, we analyze the dataset's semantic similarity, in order to generate connections.

This will be displayed through an innovative interactive network graph that makes it easy to visualize the connections between datasets and offers to find sources based on the semantic similarities of datasets so it's easy to contrast information. Making a significant contribution to 1) understand the topic better and in a cost-effective time 2)The outcome of the investigation will be more relevant.

How does it work? What tools, coding languages or software did you use to develop your project?

Using word embeddings, the similarity between each of the datasets is calculated on a scale from 0 to 1. If the similarity is greater than 95%, we consider that these compared datasets have related information respectively. 

What do we hope to achieve??

Our prototype works by running a word embedding model, trained for general purposes. We seek to develop a model trained for the usage of technical-scientific purposes. So by learning from the NASA metadata, it will achieve to obtain higher precision.

The next goal is to calculate the similarity of all the available datasets, since we did not have enough computation power either time, we calculated a sample of 10,000 datasets. And one of our ultimates aspirations is to use this model to make connections with other institutions' datasets, so the universe of information will be bigger.


Space Agency Data

  • https://open.nasa.gov/
  • https://github.com/nasa/dictionaries/tree/master/thesauri/STI
  • https://catalog.data.gov/dataset/nasa-data-json
  • http://webarchive.loc.gov/all/20111207224704/http:/nasataxonomy.jpl.nasa.gov/fordevelopers/
  • https://data.nasa.gov/browse
  • https://nasa.github.io/data-nasa-gov-frontpage/data_visualizations.html
  • https://www.data.gov/

Hackathon Journey

When we first listened to the word "Ontology" we were afraid of not having enough knowledge to afront the challenge but with the selection of resources that NASA provided us, we could understand the concept, terminology related, and start braining.

The Space Apps experience gave us first, the chance to meet a community that is open and eager to challenge themselves and reimagine the future. The networking has been a gift that will help us in our next projects and to grow academically.

We would like to thank the Peruvian committee of the event in Lima because they were ready to solve all our questions and NASA SPACE CENTER for providing so useful resources and launching this challenge that brings closer without prejudice all who share the love for science.

References

NASA resources:

  • https://ntrs.nasa.gov/api/citations/20140003144/downloads/20140003144.pdf
  • https://www.youtube.com/watch?v=2NYYYUG-zMc&list=PL37Yhb2zout05pUjr7OoRFpTNroq_wd9f&index=1

External resources:

  • (1) https://blogs.scientificamerican.com/guest-blog/whats-so-hard-about-research/
  • (2) https://blogs.scientificamerican.com/guest-blog/being-a-digital-native-isnt-enough/

**All the images displayed were obtained by the model we build.

Tags

#NASA, #WordEmbbeding, #VectorsOfSimilarities, #research, #improvements, #Promoviendounaculturacientífica, #TheoryOfGraphs, #2021, #MakingImpact

Global Judging

This project has been submitted for consideration during the Judging process.