New discoveries #19
Prithvi-100M foundational model, HSAT, RapidAI4EO & Five-Billion-Pixel datasets, and SeaDroneSim2
Welcome to the 19th edition of the newsletter, which is going out to 6,638 subscribers. First of all, a big shout-out and special thanks to the sponsors of this edition, Heimdal Satellite Technologies Ltd (HSAT) 🙏 If you'd like to both support this newsletter and enhance the visibility of your business, service, event, or competition, sponsoring an upcoming edition would be a brilliant choice. Please email me 📧 to discuss this opportunity
This month online there has been speculation that the tech world is being divided into two camps: the ‘GPU rich’ (large tech companies like Meta & Google) and the ‘GPU poor’ (individuals, startups and small-medium companies). The lack of access to a fundamental resource (the GPU) could have significant implications for innovation in our domain. It creates a bottleneck for the GPU poor, and potentially forces them to head in a different direction to the GPU rich, pursuing small optimised models for example. Only time will tell if this will slow the pace of innovation, or act as a stimulant for it!
Prithvi-100M foundational model
A foundational model is a model that serves as a base for various specialized tasks by leveraging extensive training on broad and diverse data sets. One recent and compelling example of such a model is Prithvi-100M, announced collaboratively by NASA & IBM. Specifically crafted for remote sensing applications, Prithvi-100M is trained on harmonized 30m resolution Landsat & Sentinel 2 imagery, encompassing 100 million trainable parameters, and demanding 5000 GPU hours to train.
What sets Prithvi-100M apart is its adaptability. It has been utilized as the foundational layer to fine-tune models for diverse applications such as flood detection, burn scar mapping, and multi-temporal crop classification. The resulting models, customized from this base, have demonstrated up to 15% improvement in performance compared to models crafted using conventional training methods.
The accessibility of Prithvi-100M further contributes to its potential impact. As an open-source project hosted on HuggingFace, researchers, developers, and enthusiasts can explore demo applications, including flood and burn scar models. The barrier to entry is relatively low, as fine-tuning does not necessitate substantial GPU resources. For instance, the flood detection model took roughly an hour to fine-tune on an NVIDIA V100 GPU. Building on the MMsegmentation framework, the fine-tuning code also provides support for multispectral tiffs, enhancing its applicability to various use-cases.
Prithvi-100M represents a promising stride in the ongoing evolution of foundational models. By enabling the training of downstream models with smaller datasets and achieving superior performance, it paves the way for more efficient and effective solutions. The emergence of Prithvi-100M and similar innovations hints at a burgeoning ecosystem around foundational models, where collaboration and creativity might lead to further advancements. This trend is likely to accelerate, contributing to the development of more specialized applications, and fostering a community centered around the most impactful and popular models.
HSAT: Do You Need Ground Truth Data?
HSAT has developed a unique ground truth collection platform that continually gathers data from Brazil to Bangladesh. Our platform, Tessa, is designed to collect data globally at an astonishing pace. Every week, our team visits thousands of sites, maps locations, and labels the data. Our method is over 90% lower cost and 10 times faster than traditional methods.
How Fast? Just last month, we surveyed over 9,000 fields in Thailand , India and Pakistan in under three weeks. We visited every field, photographed it, mapped it, and linked the picture to the polygon. Then we obtained satellite data and weather data for every polygon. All within 20 days.
Where Do We Operate? Our reach is global. We have dedicated teams consistently collecting data in countries like Argentina, Brazil, Kenya, Rwanda, Sudan, Turkey, India, Pakistan, Thailand, and many more.
Applications of Our Data? The data we collect is used for a wide variety of purposes—from powering machine learning models to enhancing logistics understanding.
Examples:
Pakistan: During the we surveyed 2,000 fields in just 10 days to gauge the extent of the damage. We then revisited the exact same locations two weeks later to monitor the impact of the floods
Within 5 days of the earthquake in Turkey, our team was on the ground assessing our factories to determine their capacity to supply essential goods.
Over the past few years, we've surveyed tens of thousands of fields worldwide, creating a comprehensive database of crops
In Need of Rapid and Precise Data? 📧 data@hsat.info
RapidAI4EO time series dataset
The RapidAI4EO dataset, comprising time-series satellite imagery at 500,000 locations across Europe (illustrated above), offers monthly Sentinel-2 cloud-free mosaics and five-day Planet Fusion imagery within non-overlapping 600x600 meter footprints. It is designed for training deep learning models for land use and land cover (LULC) classification and change detection but can be utilised in other applications needing dense time series satellite data.
It's hosted on Source Cooperative, which is a Radiant Earth initiative. The corpus was created under the RapidAI4EO project, which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101004356.
I think this is an exciting dataset because it provides a rich and dense time series of satellite imagery, and the integration of both Sentinel-2 and Planet Fusion imagery makes it a versatile resource for various environmental and geospatial applications.
SeaDroneSim2: Whale Detection Enhancement through Synthetic Satellite Images
With the urgent need to monitor declining whale populations, deep learning models are being employed to detect whales in aerial and satellite images, enhancing the process. Training these models can be arduous due to the challenges in collecting training datasets in marine environments. SeaDroneSim2, a new software suite, addresses this by generating synthetic aerial and satellite image datasets for improved whale detection, reducing the effort in training data collection. The use of synthetic data leads to a 15% performance boost in whale detection compared to training with real data alone. This showcases the value of synthetic data and its promising potential in wildlife monitoring.
Five-Billion-Pixels segmentation dataset
The Five-Billion-Pixels dataset contains more than 5 billion labeled pixels of 150 high-resolution Gaofen-2 (4 m) satellite images, annotated in a 24-category system covering artificial-constructed, agricultural, and natural classes. In addition, the authors propose a deep-learning-based unsupervised domain adaptation approach that can transfer classification models trained on a labeled dataset to an unlabelled data for large-scale land cover mapping, and demonstrate this on PlanetScope (3 m), and Sentinel-2 (10m) imagery.
Poll
In the previous poll I wanted to get a broad understanding of the readership of this newsletter. 27% responded student (undergrad and masters), 34% academic (PhD and upward), and the largest group on 39% were professionals in industry. I take it as a healthy sign that all three groups are roughly balanced. In this poll and in the context of a worldwide shortage of GPUs, I am interested in where people use GPUs. I am interested to know if you either already do, or plan to use one of these options:
Consulting with Robin
If you need expert guidance on any of the following topics, I’m available for hourly video call consulting:
Applying machine learning and deep learning techniques to satellite and aerial imagery, including dataset selection, model training, and deployment.
Understanding the physics of remote sensing imaging systems.
Building data processing pipelines in the cloud.
Building your brand and community for technical products.
Personal career development.
As an experienced consultant, I offer customised advice and practical solutions to help you achieve your goals in these areas. To discuss this service please email me 📧