New discoveries #22
SatBird, Multi-Stage Semantic Segmentation Quantifies Fragmentation of Small Habitats at a Landscape Scale, Digital Typhoon Dataset & Ten deep learning techniques to address small data problems
Welcome to the 22nd and final edition of the newsletter for 2023. This month an additional 274 people subscribed to the newsletter, and this edition goes out to 7641 subscribers 🚀
This month's AI news has been tumultuously centred around OpenAI, creators of ChatGPT. The notable incident involved the firing and subsequent reinstatement of the company founder within a week. While further developments might unfold rapidly, it's essential to ponder their broader implications.
A year ago, OpenAI was barely known outside tech circles. That changed dramatically with ChatGPT, which transformed AI from a concept to a practical tool. These recent events resemble the internet's rise in 1995 and e-commerce in 2005, but with AI, the transition is happening at an unprecedented pace. In 2024, I foresee AI moving from public awareness to widespread industry disruption.
In the traditionally cautious remote sensing industry, I predict 2024 as a year of significant upheaval, akin to the transformative period following the mainstream adoption of GPS technology in the early 2000s. This shift marked a leap from specialised military use to an essential tool in commercial remote sensing. We may see new companies, like OpenAI in the tech sphere, emerge from obscurity to lead in our industry. Additionally, I anticipate big tech firms making significant strides in this sector. How are you preparing for these impending changes? I'd love to hear your strategies and thoughts - feel free to reply to this email. Let's navigate these upcoming shifts together!
SatBird: Bird Species Distribution Modelling with Remote Sensing and Citizen Science Data
The global decline in biodiversity poses a significant challenge to ecosystem services that are foundational for food security, water availability, and overall human health and well-being. Traditional methods for monitoring species distribution are often hampered by their limited scope, focusing on select species or specific geographical areas, and by the immense resources required for field data collection. However, the advent of freely available remote sensing data, combined with citizen science tools is revolutionising biodiversity monitoring.
SatBird is a groundbreaking new dataset aimed at mapping bird species to their habitats using satellite images. Developed using presence-absence observation data from the eBird citizen science database, SatBird covers locations in the USA across summer and winter seasons, as well as a dataset from Kenya representing areas with limited data. It also includes environmental data and species range maps for each location.
Evaluations using various machine learning models demonstrate SatBird's potential for globally scalable ecosystem modelling. Code is also provided to extend the dataset to other regions of the world.
🚀 Become a sponsor of this newsletter
If you want to boost your business or service's exposure, consider sponsoring a future edition of this newsletter. Sponsors get a mention in the opening statement and a specific section, reaching over 7,500 niche community members. Check out editions #18 and #9 for sponsorship examples. For sponsorship details, please email me 📧 Your consideration is greatly appreciated!
Multi-Stage Semantic Segmentation Quantifies Fragmentation of Small Habitats at a Landscape Scale
Traditionally, land cover (LC) maps suffer from low spatial resolution and broad categorisation, limiting their effectiveness for small, specific habitats. This paper presents a multi-stage semantic segmentation approach applied to the Peak District National Park in the UK. It uses a detailed LC schema, achieving high accuracy in classifying both high-level (95% accuracy) and low-level classes (72-92% accuracy). This approach enabled an analysis of habitat fragmentation, specifically for wet grassland and rush pasture, revealing varying degrees of fragmentation across primary habitats. The findings underscore the value of high-resolution, CNN-derived LC maps in nature conservation and landscape planning.
Digital Typhoon Dataset
The Digital Typhoon dataset, spanning 40 years of satellite imagery, is crafted to benchmark machine learning models on long-term spatial-temporal data. Developed with a specialised workflow for infrared, typhoon centred image cropping, it addresses key data quality issues, including inter-satellite calibration. The dataset is a significant tool for machine learning research in tropical cyclones, aiding in tackling major challenges such as disaster response and climate change.
Ten deep learning techniques to address small data problems with remote sensing
This paper presents a framework for deep learning practitioners facing the challenge of training models on small datasets, which often leads to poor generalisation and transferability. It explores 10 techniques that can mitigate these issues: transfer learning, self-supervised learning, semi-supervised learning, few-shot learning, zero-shot learning, active learning, weakly supervised learning, multitask learning, process-informed learning, and ensemble learning. The provided flowchart above serves as a decision-making tool to guide practitioners in choosing the most appropriate technique for their specific context, based on a series of questions related to the dataset and the problem at hand.
📖 Paper
📘 Streamlit for Data Science
I recently received a copy of the book Streamlit for Data Science (2nd edition) by Tyler Richards. Streamlit, for those unfamiliar, is a user-friendly, open-source Python framework for rapidly developing interactive web apps. My experience with Streamlit spans various projects:
This book comprehensively covers Streamlit, from fundamental concepts to advanced topics like cloud deployment and custom component integration. A highlight for me was the section on creating interactive maps with streamlit-folium, which aligns with my work in geospatial data exploration. The book caters to both Streamlit newcomers and seasoned users, offering many valuable insights. To US readers I’m excited to share a special offer: use the code 25TYLER for a 25% discount on the print edition at Amazon US, valid until December 15th
Consulting
If you need expert guidance on any of the following topics, I’m available for video call consulting:
Applying machine learning techniques to satellite and aerial imagery, including dataset selection, model training, and deployment
Building data processing pipelines in the cloud
Appraisal of your product offering
Building your brand and community for technical products
Personal career development
As an experienced consultant, I offer customised advice and practical solutions to help you achieve your goals in these areas. To discuss this service please email me 📧