New discoveries #16

Presto, FLAIR dataset, Whale detection in satellite imagery with active learning, Identifying contrails competition and new video: The ML workflow at Development Seed with Ryan Avery

Jun 01, 2023

Welcome to the 16th edition of the newsletter. I'm delighted to share that the newsletter now has 5.5k subscribers 🔥

Presto - Lightweight, Pre-trained Transformers for Remote Sensing Timeseries

In the remote sensing domain, the temporal dimension is critical for many tasks and data is often collected from many complementary sensors. This paper shows that designing models and self-supervised training techniques specifically for remote sensing data results in both smaller and more performant models. The paper introduces Presto, a transformer-based model pre-trained on remote sensing time-series data which excels at a wide variety of remote sensing tasks and outperforms much larger models. Presto can be used for transfer learning or as a feature extractor for simple models, enabling efficient deployment at scale. Given the widespread perception that state-of-the-art methods are typically too unwieldy for practical application in real-world scenarios, it's exciting to witness tangible advancements that help bridge this gap, thereby encouraging more widespread and practical implementation of these techniques.

📖 Paper
💻 Code

FLAIR dataset

FLAIR is a large dataset ( >20 billion pixels) of aerial imagery, topographic information and land cover (buildings, water, forest, agriculture...) annotations with the aim to further advance research on semantic segmentation, domain adaptation and transfer learning. The dataset covers a total of approximately 812 km², with patches that have been sampled across the entire metropolitan French territory to illustrate the different climate and landscapes (spatial domains). The aerial images included in the dataset were acquired during different months and years (temporal domains).

🖥️ Website
📖 Paper
💻 Code

Whale detection in satellite imagery with active learning 🐋

This project from the Microsoft AI for Good Lab aims to help researchers and conservationists detect whales in high-resolution satellite imagery using an active learning approach. Active learning is a technique that combines AI with human input, enabling the AI to learn more effectively and efficiently. In this case, the AI is being used to identify potential whale sightings in satellite images, while human experts verify the accuracy of these predictions. As the AI receives feedback from human experts, it adapts its learning to improve its ability to identify whales accurately. This iterative process continues until the AI model reaches a desired level of performance.

💻 Code

Competition: Identify Contrails

Contrails are clouds of ice crystals that form in aircraft engine exhaust. They can contribute to global warming by trapping heat in the atmosphere. The goal of this competition is to train ML models to identify contrails in satellite images and ultimately to help prevent their formation. The satellite images were obtained from the GOES-16 Advanced Baseline Imager (ABI), a geostationary satellite. Because contrails are easier to identify with temporal context, a sequence of images at 10-minute intervals are provided.

🖥️ Competition webpage
💰 $50,000 prize money

Video: The ML workflow at Development Seed with Ryan Avery

In this video, I caught up with Ryan Avery to learn about the machine learning workflow at Development Seed. The making of this video was inspired by a three part blog series Ryan has authored on the ML tooling stack used at Development Seed

🎙️ Note this is also available as an audio podcast here

Bio: Ryan is an expert in developing machine learning-powered services for processing satellite and camera trap imagery, and he is deeply passionate about leveraging machine learning to enhance environmental outcomes and improve livelihoods. In addition to his work at Development Seed, Ryan has made significant contributions to open-source. These include a comprehensive two-day geospatial python curriculum, an image segmentation model service, and a torchserve deployment of Megadetector for wildlife monitoring.

Poll

In the previous poll I asked if people use ChatGPT as an assistant when reading papers. The majority (57%) do not, whilst 32% occasionally use it. Only 11% regularly use ChatGPT when reading papers. This is in contrast to the poll I ran in newsletter 14 where 37% were regularly using an AI assistant for coding. This reinforces my perception that AI adoption is higher when the service is built into a tool people are regularly using (e.g. the IDE in the case of an AI assistant for coding). Therefore in this poll I would like to know if you have any intention of integrating AI features in your products or services?

satellite-image-deep-learning

Discussion about this post