Github Projects

GitHub Projects

Big Data : A Cassandra DB for geo-political data (GDELT)

The GDELT Project monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes …in the entire world. With new files uploaded every 15 minutes, GDELT data bases contain more than 500 Gb of zipped data for the single year 2018.

In a group project, we worked on a resilient No-SQL (Cassandra) database architecture on EC2 instances. The pipeline for the data processing was developped in Spark-Scala. The visualization implied Zeppelin Notebooks.

See GitHub page : https://github.com/AnthonyHoudaille/Cassandra-GDELT-Queries-AWS

Predicting the predominant kind of tree (Kaggle)

In this challenge , I am trying to predict the forest cover type (the predominant kind of tree cover) from strictly cartographic variables (as opposed to remotely sensed data) :

See GitHub page : https://github.com/AnthonyHoudaille/Kaggle-Forest-Type

Estimating a position from a received signal strength for IoT sensors

Smart devices such as IoT sensors use low energy consuming networks such as the ones provided by Sigfox or Lora. But without using GPS networks, it becomes harder to estimate the position of the sensor. The aim of this study is to provide a geolocation estimation using Received Signal Strength Indicator in the context of IoT. The aim is to allow a geolocation of lowconsumption connected devices using the Sigfox network. State of the art modelsare able to be precise to the nearest kilometer in urban areas, and around tenkilometers in less populated areas.

See GitHub page: https://github.com//Received-Signal-Strength-Geo-Location

Ship Detection (Deep Learning on satelite images)

See GitHub page: https://github.com/AnthonyHoudaille/FilRougeAIRBUS