New discoveries #24
CropNet, A simple & strong xBD baseline, DengueNet, OpenEarthMap challenge, Vehicle Perception dataset, QGPT Agent & Streaming spatial data with Lightning
Welcome to the 24th edition of the newsletter, now reaching 8,433 subscribers 🔥 If you want to boost your business or service's exposure, consider sponsoring a future edition of this newsletter. Sponsors get a mention in the opening statement and a dedicated section in the newsletter. Check out editions #18 and #9 for sponsorship examples. For further details, please email me 📧
CropNet: An Open Large-Scale Dataset with Multiple Modalities for Climate Change-aware Crop Yield Predictions
The CropNet dataset is an open, terabyte-sized, and deep learning-ready dataset, specifically targeting climate change-aware crop yield predictions for the contiguous United States continent at the county level. It is composed of three modalities of data, i.e., Sentinel-2 Imagery, WRF-HRRR Computed Dataset, and USDA Crop Dataset, aligned in both the spatial and temporal domains, for over 2200 U.S. counties spanning 6 years (2017-2022).
It is expected to facilitate researchers in developing deep learning models for timely and precisely predicting crop yields at the county level, by accounting for the effects of both short-term growing season weather variations and long-term climate change on crop yields.
A simple, strong baseline for building damage detection on the xBD dataset
The paper revisits the xView2 competition (2019), which tasked participants with developing machine learning models using satellite imagery for post-disaster building damage assessment. The authors streamlined the winning solution by eliminating certain components to create a more straightforward, yet sufficiently effective method. This approach aims to enhance the solution's accessibility and adaptability across various user needs and datasets, facilitated by simple hyperparameter heuristics.
To test the models, the dataset was modified to differentiate test locations from training ones, contrasting with the original competition format. This adjustment uncovered a limitation in both the complex and simplified models' ability to generalize to unseen areas. The analysis indicates that this generalization issue may be attributed not only to the models' design but also to the dataset's skewed class distribution among different disaster events.
The study's retrospective examination sheds light on critical aspects: the benefits of simplifying complex models and the challenges posed by imbalanced datasets, underscoring the importance of dataset composition and model adaptability.
DengueNet: Dengue Prediction using Spatiotemporal Satellite Imagery for Resource-Limited Countries
Dengue fever poses a significant challenge in developing countries, particularly exacerbated by inadequate sanitation infrastructure. Responding swiftly to dengue outbreaks is particularly challenging due to constraints in financial resources and access to current information, and most dengue prediction studies depend on data collection methods that are often labor-intensive and time-consuming.
DengueNet combines Vision Transformer, Radiomics, and Long Short-term Memory to exploit spatial and temporal patterns in satellite image sequences. This enables dengue predictions on a weekly basis. The paper's promising results underline the feasibility and effectiveness of using satellite imagery for disease prediction. The versatility of this approach holds potential for application in a broader spectrum of vector-borne diseases, as well as in diverse fields that benefit from precise environmental monitoring.
OpenEarthMap Land Cover Mapping Few-Shot Challenge
This challenge aims to evaluate and benchmark methods for few-shot semantic segmentation on the OpenEarthMap dataset. The motivation is to enable researchers to develop few-shot learning algorithms for high-resolution remote sensing image semantic segmentation, which has applications in disaster response, urban planning, and natural resource management.
🖥️ Website
🗓️ March 29, 2024: Challenge submission deadline.
Vehicle Perception dataset
This new dataset is designed to pioneer traffic monitoring from a satellite perspective, addressing the complexities of analyzing tiny, slowly moving vehicles against a moving background, with the aim of catalyzing research in this high-potential field.
QGPT Agent
The QGPT Agent is an innovative plugin for QGIS, designed to enhance user interaction with the platform through natural language commands. Leveraging the advanced natural language processing capabilities of the OpenAI GPT model, it streamlines and automates a variety of tasks within QGIS. The trend towards integrating conversational interfaces into geospatial software is gaining momentum, and it's particularly encouraging to observe open-source solutions at the forefront of this movement.
💻 Code
Streaming spatial data with Lightning ⚡
Lightning.ai is a new platform for training deep learning models, created by the team behind the popular training framework pytorch lightning. I have been a regular user of Lightning.ai for several months and this article lists some of the benefits.
The lightning team have now open sourced a new approach for streaming datasets, called Lightning Data. This enables training on datasets that are too large to download locally, and does not significantly impacting training times vs using a local dataset. The article below describes how to prepare data for streaming, and demonstrates the creation of a StreamingDataset. Given the trend towards ever larger training datasets, the adoption of dataset streaming seems inevitable, with Lighning Data potentially setting a new standard in the field.
TorchGeo joins the Open Source Geospatial (OSGeo) Foundation
TorchGeo is a PyTorch library that provides datasets, samplers, transforms, and pre-trained models specific to geospatial data. TorchGeo has now officially joined the Open Source Geospatial (OSGeo) Foundation 🎉 According to Adam Stewart, the creator and lead developer of TorchGeo, ‘not much will change in terms of direction or governance, but OSGeo offers additional advertising, stability, and legal support’.
I would like to extend my congratulations to all the TorchGeo contributors on this noteworthy achievement. The association with the OSGeo Foundation not only emphasises the project's strength and future prospects but also instills enhanced confidence within the community to engage with and contribute to TorchGeo, assured by the substantial support and acknowledgment it has received.
Consulting
If you need expert guidance on any of the following topics, I’m available for video call consulting:
Applying machine learning techniques to satellite and aerial imagery, including dataset selection, model training, and deployment
Building data processing pipelines in the cloud
Appraisal of your product offering
Building your brand and community for technical products
Personal career development
As an experienced consultant, I offer customised advice and practical solutions to help you achieve your goals. To discuss this service please email me 📧