Photo by Nik Ramzi Nik Hassan on Unsplash
AI is one of the most powerful tools we have to create a better world for everyone.
AI for good is all about putting together software and data to make a positive impact on society. Is about solving problems that might seem impossible to tackle, is about making life better and easier for people, is about making accessible services that nowadays only the one-percenters have access to. With software and data, we can scale and deliver products to everyone, products that solve complex problems like biodiversity collapse, climate change, poverty, or hunger. Of course, machine learning is not a magic wound, none of these problems will be solved by using only software and data, but we need to make use of every tool in our hands to at least try and see how can we improve the world, and with the combination of AI as a tool and humans to make the right decisions we can solve the biggest problems that we, as humanity, are facing, it only takes action and commitment to a cause bigger than ourselves. We can do better, we always can.
Early this year I had the opportunity to collaborate with Omdena on one of their AI for good projects. It was an amazing experience not just because of the technical challenge but because I could share eight weeks with a group of amazing people from all over the world, people from different backgrounds, countries, and cultures.
The challenge was to create a dataset with the land cover classification of Ireland to allow the farmers to self-report their environmental impacts. We decided to follow two different paths for the project, one was using a random forest model and structured data from individual pixels values. The other approach was a U-net model to perform semantic segmentation on individual images. The labels we used were from the Corine land cover methodology for both approaches.
I focused mainly on the data collection part for both approaches. I collected tiles from Ireland using the United States Geological Service (USGS) and Google Earth Engine (GEE) as data sources and created a dataset for each one. Every tile had to be divided into 128 x 128 pixels images to be suitable for the U-net neural network. In each case, we used images from Sentinel 2 satellites due to the higher pixel resolution. In order to create the random forest dataset, we manually picked individual pixel values as training examples, and export those values into a CSV file.
It was difficult to create both datasets especially because of cloud presence, we needed to discard a lot of images and areas of Ireland. On the other hand the big imbalance of classes (turns out, Ireland is a country with a LOT of grasslands) was a problem for model performance. One main challenge during the modeling part was the differences in resolutions between labeling images and satellite images. The former was 100m resolution per pixel and the latter 10m. Resampling was applied but it had not much effect on the final images.
In the end, we selected the Random Forest model, where we manage to achieve a 66% of average accuracy in the test set performing some feature engineering and hyperparameter tuning of the model. It was a very difficult challenge, we had to overcome big problems but in the end, it was a good result
This experience was great for me. Not just because it was a real-world end-to-end project but because I learned a lot not only by myself doing the things and research but mostly from the other collaborators, they were amazing all the time providing feedback, and building amazing tools to move forward during the project. It was my very first AI for good project, and it felt great the whole time. Thank you Omdena and all collaborators! I am really looking forward to participate in more AI for good projects.