New Discoveries #28

GeoSynth, Greenlens, AI2-S2-NAIP dataset, The FutureCrop Challenge & SSL4EO

Jun 28, 2024

Welcome to the 28th edition of the newsletter. I am delighted to share that the newsletter continues to grow and now reaches over 10,200 subscribers 🥳 Shout out and special thanks to the sponsor of this newsletter edition, Greenlens 🙏

This month, I had the privilege of presenting on Deep Learning with Satellite Imagery at the Lightning.ai meetup in London. Surprisingly, about 20% of attendees had experience with satellite images, ranging from professional uses like vehicle recovery to preferring images for navigation over traditional maps.

The discussion generated many questions on how language models could transform interactions with satellite imagery and prompted numerous creative ideas for enhancing user experiences. Given the rapid pace of innovation in this field, we are on the cusp of a surge in new ways to interact with this valuable yet underutilised resource. I’m personally very excited about these developments and their potential to significantly expand access to satellite imagery. This expansion will lower the technical barriers to utilising this data, and hopefully accelerate the pace of innovation and adoption. This is truly an thrilling time to be working in remote sensing!

GeoSynth

GeoSynth is a suite of models designed to synthesize realistic-looking satellite images, offering control over global style and image-driven layout. This control is achieved through textual prompts and/or geographic locations, utilising features extracted from SatCLIP. This allows users to dictate specific scene semantics or regional appearances. The model is trained on a vast dataset of satellite images paired with automatically generated captions and supplemented by OpenStreetMap data. Additionally, the dataset includes SAM masks for each satellite image.

GeoSynth excels in generating diverse, high-quality images and demonstrates strong zero-shot generalisation capabilities. The developers of GeoSynth suggest that this model could be instrumental in remote sensing applications such as urban planning, data augmentation, and the generation of pseudo labels for weakly supervised learning, enhancing the efficacy and scope of these processes.

While the capability of diffusion based models like GeoSynth is evident, it remains to be seen whether their effectiveness is primarily due to the diffusion technology itself or the extensive datasets and resources involved in their development.

You can learn more about generative models in the excellent DiffusionFastForward course by Mikolaj Czerkawski

Shape the future of foundational models at Greenlens 🚀

Hi, I’m Szymon and I recently launched Greenlens.world, a venture-backed company dedicated to simplifying the fine-tuning, deployment, and hosting of geospatial foundational models. Our mission in the coming months is to test our prototype platform, build an exceptional team, and develop the world’s leading geospatial foundational model. We’re on a journey to create the most advanced geospatial foundational model, and we need your help!

Founding Machine Learning Engineers: Are you a passionate ML engineer with experience in scaling geospatial AI? We’re looking for talented individuals to join our founding team. Let’s chat if you’re ready to make an impact.

Early Adopters and Testers: Are you struggling with accessing current foundational models, facing challenges with data pipeline scalability, or needing a quick way to fine-tune models on your private dataset? We want you to try our platform. We are dedicated to ensuring your success.

OpenSource Community. If you want to help design the new family of models, join us on our Discord

Interested? Reach out at szymon@greenlens.world

AI2-S2-NAIP dataset

AI2-S2-NAIP is a comprehensive new dataset from the Allen Institute for AI (AI2), consisting of aligned NAIP (1.25 m/pixel), Sentinel-2, Sentinel-1, and Landsat images (all at 10 m/pixel) that cover the entire continental US. It includes detailed annotations, leveraging OpenStreetMap data to identify buildings, roads, and 30 other categories, along with WorldCover classes.

This dataset is particularly useful for training both supervised and unsupervised models. It facilitates advanced applications such as super-resolution (e.g., enhancing NAIP images to the resolution of Sentinel-2), segmentation and detection (e.g., mapping features from NAIP or Sentinel-2 images to OpenStreetMap or WorldCover data), and multi-modal masked autoencoder pre-training for Foundational Models like Satlas.

🤗 Dataset on HuggingFace

The FutureCrop Challenge

Can insights from recent history help us forecast the impact of climate change on agriculture? The FutureCrop Challenge poses this question, inviting participants to predict future maize and wheat yields based on soil and daily weather data, assuming a high-emissions scenario. This contest requires predictions for the years 2021 to 2100, using training data spanning from 1980 to 2020. As actual future crop yields are unknown, the challenge employs simulated outputs from a rigorously tested and validated crop model. These simulations serve as a stand-in for real-world data, allowing participants to apply and test their predictive models against plausible future scenarios.

🖥️ Competition page on Kaggle
🗓️ Deadline: 7 September 2024
😎 The prize: Kudos
🤓 Techniques for Crop yield & vegetation forecasting

SSL4EO course

The University of Copenhagen is organising a summer PhD course on SSL4EO. The course is aimed at PhD students in computer science, remote sensing, geospatial sciences, or related fields and researchers and practitioners interested in leveraging self-supervised learning for Earth observation tasks.

Attend to gain an overview of self-supervised learning (SSL) methods, discuss SSL challenges tailored for Earth observation data, network with fellow PhD students, and acquire hands-on experience through practical projects.

Registration is now open via the website below, and note that seats are limited.

🖥️ Website

Lily

Jul 3, 2024

Hey, thanks a lot for featuring our FutureCrop competition!

Our community (AgML) has organised sessions and a workshop on machine learning for agricultural modelling and we plan on running more competitions, publishing other benchmark datasets and organising more workshops and events in future. If you're interested in staying up-to-date on our activities, all are welcome to join our mailing list to stay up-to-date (https://mail.agml.org/mailman/listinfo/agml)

Expand full comment

satellite-image-deep-learning

Discussion about this post