A brief introduction to satellite image classification with neural networks
What is image classification, which models are used and how to approach your first project
You may already be familiar with image classification from seeing the numerous cats vs dogs image classification tutorials on the internet. Image classification is therefore the task of assigning one (or more) labels to an entire image. Note however that the term 'classification' can mean different things to different people - in particular in many articles classification may be used to describe pixel or pixel-cluster level labels, which I would call semantic segmentation. To be clear, in this post we are discussing single labels applied to single images, using deep learning neural networks to generate the label. When applied to satellite imagery, single label classification has two common uses:
label the dominant subject of image, e.g. golf course, harbour
perform binary detection of some subject, e.g. ship present or not
There are also more advanced classification techniques, for example using a time-series of images to classify crops where the unique seasonal changes are a strong indicator of crop type.
Image classification datasets
To get more familiar with satellite image classification I recommend exploring a couple of benchmark datasets. A benchmark dataset is a dataset that is used as a standard by the community to compare the performance of different techniques. Two good benchmark datasets are the UC Merced dataset (a sample of which is shown below) or the EuroSAT dataset. Both of these datasets are available in the standard RGB/single label format, but also in more interesting multi-class versions. For these and other datasets see my Datasets repository.
Selecting & training models
The Classification section in my repository lists many different resources demonstrating the training of classification models on satellite imagery. In fact it is relatively rare to train a model from scratch on your own dataset, and far more common to use a model that has been pre-trained on a benchmark dataset (usually ImageNet) and then fine-tune this model on your own dataset. To learn more about fine-tuning I recommend the fine-tuning lesson on d2l.ai. In fine-tuning the feature extraction layers are frozen, and only the fully connected classification layers are updated:
The internet regularly reports new 'state of the art' models which improve performance on some benchmark dataset or other, and it would be reasonable to assume that the latest and greatest models are usually used in applications. However for an approachable article comparing models I highly recommend reading The best vision models for fine-tuning by Jeremy Howard. In this article Jeremy compares 86 models on two benchmark datasets; the IIT Pet dataset and the Kaggle Planet dataset (a remote sensing dataset). He shows that the modern models are the top performers in terms of accuracy, shown in the table below:
Interestingly the best performers vary between the Pets and Planet datasets, and Jeremy attributes this to the fact that the Planet dataset does not resemble the images in the ImageNet dataset (which most models are pre-trained on), so the models which learn new features the fastest are the best performers. He also notes that "there's little correlation between model size and performance" on the Planet dataset, and therefore advises selecting smaller models (which will also be faster in use). An additional advantage of choosing a small model is that the pace of experimentation is faster. For me a surprising result on the Planet dataset is that the relatively old (published in 2015) Resnet 18 model is in the top 10 performers. As Jeremy says, "Resnet 18 has very low memory use, is fast, and is still quite accurate", and for these reasons I suggest it is a good default model to begin projects with.
How to approach your first classification project
Perhaps you already have a use case for classification from your day job, but if not I suggest deciding on a topic that interests you (e.g. deforestation, crop classification) and finding a relevant dataset on Kaggle, Roboflow data hub, or in my repository. There are also regular competitions run by organisations including ESA and the Radiant Earth Foundation, and these typically provide a dataset and an exciting challenge. Begin by following a tutorial on fine-tuning a vision model (e.g. the fine-tuning lesson on d2l.ai) and then adapt it to use your chosen dataset. You will probably encounter some challenges just from switching dataset alone, such as dealing with different sized images or number of channels. If you are not particularly familiar with geospatial images (geotiffs) then I recommend sticking to datasets where the images are simply png or jpgs. If you do want to work with geospatial images you will probably need to 'chip' large images into smaller training chips, and I list many tools to do this on my repository here.
Once you have assembled your dataset you will probably have to modify the model training code for loading and preprocessing the dataset, and this is a good opportunity to practice your Python programming skills and get familiar with your chosen deep learning framework (Tensorflow or Pytorch). Note that the UC Merced & EuroSAT dataset can be accessed via the Tensorflow data hub, simplifying the process of using this dataset. Pytorch users will want to access these datasets via torchgeo, and will also benefit from much additional functionality that simplifies working with geospatial datasets.
Moving on to model fine-tuning, begin experimenting to see which factors improve or degrade model performance. You will find that data augmentation and training parameters such as batch size and number of epochs will have a significant impact on the models performance. As a general guide, the classification accuracy you can achieve roughly depends on three factors:
The quality of the input images; including appropriate image pre-processing, spatial & radiometric resolution of the images
Quality, quantity and balance of the training dataset and labels
Selection and fine-tuning of the deep learning model
Keep iterating with these parameters until you feel confident in fine-tuning a model on a published dataset. Next you could move on to creating your own dataset, and a classification dataset can be created by downloading images from Google Earth using one of the scripts listed on my repository here. If you are unsure which tool to use I suggest first checking out Map Tiles Downloader which provides a helpful UI [^3]. To prepare the dataset for training it will be necessary to sort your images into folders where the folder name is the label that will be used for that class. Fortunately this can be done using just the file browser on your computer, and no special 'annotation' software is required. If you're interested in a tool to better understand and curate your classification data, I recommend Roboflow. You can also use Roboflow to have a hosted model API after training your custom model.
At this point you may wish to write a blog post about your project, or simply publish a notebook on Kaggle. I personally think that summarising and presenting your work is an important part of the learning process, and recommend doing it even for small projects. If you want to take your project to the next level, consider creating a web app or API to provide a live service which you can use to demonstrate the model. There are a few examples of how to do this in my repository here and here. Note that if you want to deploy a production ready service there may be a significant amount of engineering required to handle pre-processing of the uploaded images, for example to handle multiple images types, detect quality issues etc.
I hope that this has been a useful introduction to satellite imagery classification, and provided an interesting overview of how models are trained in practice. If you have any questions please use the comments section below
multi payload fused EO imagery dataset and how to implement in python and c++