New discoveries #25
Clay Foundation Model, Self-supervised spatio-temporal representation learning of SITS, Guidelines to Compare Semantic Segmentation Maps at Different Resolutions, and interpretable machine learning
Welcome to the 25th edition of the newsletter. I'm delighted to share that the newsletter now has 9,062 subscribers 🔥 Please note that this edition of the newsletter does not have a sponsor. If you're interested in gaining visibility for your business or service, sponsoring a future edition of the newsletter is an excellent way to achieve this. As a sponsor, you'll receive a shout-out in the opening statement and a dedicated section in the newsletter, reaching a wide audience in the community. For more information on how to sponsor the newsletter please email me 📧
Clay Foundation Model
Clay is a foundational model for Earth Observation data that is trained on Sentinel-2, Sentinel-1, and DEM data via Self-supervised learning (SSL) using a Masked Autoencoder (MAE) method. This places the Clay in the same family of model as the Prithvi 100 model and other well publicised models.
So what makes Clay noteworthy? Clay is a fiscal sponsored project of the 501c3 non-profit Radiant Earth Foundation which has a clear objective to create the most capable foundational model by developing in the open; making all training code, datasets and weights as open and freely accessible as possible. Training and evaluation code is released on Github with a permissive Apache 2 license, and weights are available on HuggingFace with an OpenRAIL-M license.
Clay are seeking a senior data scientist or AI research engineer to deploy and assess new, published foundation models for satellite imagery. Send an email to contact@madewithclay.org with a CV and expression of interest.
Self-supervised spatio-temporal representation learning of SITS (Satellite Image Time Series)
The paper introduces Unet-BERT spAtio-temporal Representation eNcoder (U-BARN), a new self-supervised approach for utilising irregularly sampled satellite image time series (SITS). U-BARN learns rich features from unlabelled SITS data, combining spatial-spectral and temporal information. It uses a time-series reconstruction pretext task inspired by BERT for pretraining on a Sentinel-2 dataset. To demonstrate its feature learning capability, representations of SITS encoded by U-BARN are then fed into a shallow classifier to generate semantic segmentation maps.
U-BARN outperforms the baseline U-TAE method in crop and land cover classification, especially with pretraining and fine-tuning on the PASTIS and MultiSenGE datasets. The study also shows that fine-tuning U-BARN yields significant gains in scenarios with limited reference data and explores the effect of masking during pretraining on feature quality.
It's encouraging to see the development of self-supervised techniques like U-BARN that harness the time series nature of satellite imagery, offering new avenues for feature learning and analysis in the satellite imagery domain without the need for extensive labeled datasets.
TreeSatAI dataset 🌲
TreeSatAI is a dataset for tree species classification in Central Europe based on multi-sensor data from aerial, Sentinel-1 and Sentinel-2. The dataset contains labels of 20 European tree species (i.e., 15 tree genera) derived from forest administration data of the federal state of Lower Saxony, Germany.
Guidelines to Compare Semantic Segmentation Maps at Different Resolutions
Semantic segmentation evaluation typically occurs at the pixel level, differing from scene classification and object detection which assess at scene and object levels, respectively. This pixel-level evaluation is significantly affected by the image's spatial resolution, indicated by the Ground Sample Distance (GSD), which can obscure the true performance of the model irrespective of imagery resolution.
This work introduces guidelines for equitably comparing semantic segmentation outcomes across varying spatial resolutions. It suggests augmenting standard scene-based pixel-wise metrics with region-based pixel-wise metrics for a more nuanced evaluation of model efficacy.
Demonstrated through building and swimming pool detection case studies, the guidelines and region-based metrics enable consistent comparison of segmentation maps across different resolutions, offering a clearer insight into model performance.
SARDet_100K dataset
SARDet_100K is a Large-Scale Synthetic Aperture Radar (SAR) Object Detection. It collects, and standardises 10 existing SAR detection datasets.
Recognizing protected and anthropogenic patterns in landscapes using interpretable machine learning and satellite imagery
In environmental research, accurately mapping land cover, including areas shaped by both natural processes and human activities (anthropogenic), is crucial for effective conservation efforts. The approach in this paper employs a machine learning model that leverages satellite imagery, incorporating advanced image processing and feature analysis techniques like Grad-CAM (Gradient-weighted Class Activation Mapping).
Grad-CAM enhances the model's transparency by visually highlighting the image regions most influential to its predictions. Emphasising explainable AI is essential, as it not only builds trust in the model's capabilities but also facilitates easier validation of its predictions, crucial for advancing environmental conservation.
📘 Interpretable Machine Learning with Python
I recently received a copy of the book Interpretable Machine Learning with Python by Serg Masís. This book provides a comprehensive introduction into model interpretability and techniques for explainability. It discusses significant considerations such as the trade-off between performance and interpretability, and covers techniques such as SHAP (SHapley Additive exPlanations) with in-depth examples.
Readers of this newsletter will be particularly drawn to chapter 7: Visualizing Convolutional Neural Networks, which covers gradient based attribution methods such as saliency maps, and the Grad-CAM visualisations demonstrated earlier in this newsletter edition. There are also advanced chapters on interpreting NLP transformers and methods for multivariate forecasting and sensitivity analysis.
Overall this book is an excellent guide on the vital subject of model interpretability, crucial for securing stakeholder endorsement for machine learning projects. It offers a comprehensive introduction that will benefit newcomers to the field, while its advanced sections provide in-depth exploration for seasoned practitioners seeking concrete examples. It can be purchased on Amazon at this link
Consulting
If you need expert guidance on any of the following topics, I’m available for video call consulting:
Applying machine learning techniques to satellite and aerial imagery, including dataset selection, model training, and deployment
Building data processing pipelines in the cloud
Appraisal of your product offering
Building your brand and community for technical products
Personal career development
As an experienced consultant, I offer customised advice and practical solutions to help you achieve your goals. To discuss this service please email me 📧