High-Level Project Summary
Our team developed a concept for a machine learning model, which uses a combination of satellite data and deep neural network to define a risk factor for possible landslides.This risk factor is then shown on a simple site, which shows the current prediction of risk factors on a global map, showing the zones with high risk factors through an overlay.This service allows everyone who has access to the internet to discover if a zone has become dangerous and could be hit by a landslide.
Link to Project "Demo"
Link to Final Project
Detailed Project Description
As the data gathering for a project of this kind is one of the most important sections, we focused many hours on defining which features are highly correlated with the happening of landslides.
From many scientific articles we defined mostly 5 features which could be valuable for our project:
- Vegetation density
- Terrain temperature
- Daily rainfall
- Cumulative rainfalls
- Terrain slope
Although a few more features could be used (eg. soil moisture, type of terrain...) we set these as our preponderant features, both for the readibility of the data together with the high correlation with the landslides.
These features had to be processed in different ways, as not all the values were directly obtainable, and this required a lot of time due to the sheer size of the datasets.
We used a repository for landslides, and from those we selected a few particular members, as we decided to focus on a smaller region (precisely between 50 and 40 degrees North and 115 and 125 degrees West, which consists of the north-western side of the United States) as to increase the amount of events while decreasing the amount of data required (more landslides have been categorized).
We also decided to focus on landslides in the last 15 years, as the earlier entries had partially missing data, and also to use only landslides correlated to precipitations, as to increase to the maximum the correlation with our features.
This led to a dataset composed of only 706 events, which resulted in a highly imbalanced dataset when other entries of data which didn't lead to landslides were added (5000 as a first test).
Due to this situation, we decided to apply a combination of "oversampling" and "weighted loss":
- Oversampling: the positive entries (the landslides), where added multiple times to the dataset, as to reduce the ratio between the positive and negative entries.
- Weighted loss: the loss is heavier if the error is regarding a mis-classified landslide compared to a mis-classified non-landslide
Once trained the model as a binary classifier (1=certain landslide, 0=landslide not happened), which we initially set as a dense neural network (16-32-16 architecture), through a site we show the results on a global map, by considering the most recent data from the satellites, together with a real-time weather forecast by using a Web API, plotting the risk factor as an overlay with a colorbar going from green (0) to red(1), passing through yellow and orange for respectively low and medium risk.
Space Agency Data
The Nasa Earth Observations site gave data regarding the vegetation distribution (Normalized Difference Vegetation Index) and the terrain temperature through data collected by MODIS.
From the JAXA Global Rainfall Watch we obtained data regarding daily and cumulative rainfalls.
From the ALOS World 3D project (JAXA) we obtained the elevation of the region of interest, from which we derived the approximate terrain slope of each point.
From the Cooperative Open Online Landslide Repository (NASA) we obtained a list of landslides.
Hackathon Journey
This experiences allowed us to learn many aspects regarding the actual dangers of landslides and how important it is to diffuse free information to everyone.
On the technical side, we were able to experience how a real-life project is developed, and most importantly how to use the abilities of every team member to surpass the obstacles in a project like this.
While this challenge was initially chosen due to the machine learning involvement, we came closer to the topic of hazards and got attached to the project, and this allowed us to work hard and for long to the best of ourselves.
This experiences was one-of-a-kind, also thanks to our local mentor and judges, the organizers and also the other participants, who brought inspiring works to the table.
References
Landslides | NASA Global Precipitation Measurement Mission
U.S. Landslide Inventory (arcgis.com)
JAXA Global Rainfall Watch (GSMaP)
Index of /archive/csv (nasa.gov)
(PDF) Tectonic geomorphology: Landslides limit mountain relief (researchgate.net)
Global Landslide Catalog Downloadable Products Gallery (nasa.gov)
World Internet Users Statistics and 2021 World Population Stats (internetworldstats.com)
Vegetation Index [NDVI] (16 day - Terra/MODIS) | NASA
Tags
#machinelearning #landslides #prevention #deforestation
Global Judging
This project has been submitted for consideration during the Judging process.

