Generative AI and remote sensing imagery
Diffusion models and remote sensing
Can we utilize realistic but artificial samples when researching natural geographic phenomena?
Generative Artificial Intelligence is not new to remote sensing practitioners. We have successfully applied it in our research to enhance our models for several years. Inpainting areas covered by clouds, denoising, up- and downscaling images, and even creating artificial samples. Fine-tailored algorithms handled all these activities and often supported AutoEncoders (AE) or Generative Adversarial Networks (GAN).
I worked on introducing novel aerial imagery data generation techniques. Initially, I doubted whether generative techniques could help solve problems related to natural phenomena and the environment. Eventually, I weighed all the pros and cons and concluded that using generative models, as long as they produce realistic output, can aid us in carrying out remote sensing activities. Multiple tests and model benchmarks also confirm that.
Unfortunately, the amount of realism represented by the artificial samples created by the models was limited. Most spatial features were properly recreated and even undistinguishable for a human expert. However, some minor flaws spoiled the overall result: shadow aberrations, too-smooth edges in geometries of human-made objects, invisible water flow, and color issues when representing various plants. They did not dramatically decrease the model usability but were far too synthetic for a geographer to accept in all scenarios. This was about to change.
Recently, with undisguised fascination, I caught up with the scientific literature and code repositories regarding diffusion. I am sure you’ve heard of DALL-E or Midjourney, which are examples of applying the above-mentioned models in practice. I was astonished at how much progress has been made in such a short time. If someone asked me a year ago what generative AI's role in remote sensing would be, I would say it is an exciting addition to classically used methods. I am sure we are facing a significant update in our GIS toolboxes.
Undoubtedly, the Generative AI revolution is here and will significantly affect remote sensing. Our discipline will face enormous changes, and it’s better to be prepared!
Generative AI techniques are powerful but, at the same time, complex. To take full advantage of them and know their limitations, it’s better to familiarize yourself with scientific literature and complete hands-on assignments simultaneously. If you are new to Generative AI, it would be a good idea to start with the following papers and their corresponding code:
- Pix2Pix - conditional image-to-image translation architecture based on GAN.
- BigGAN - a generative adversarial network designed for scaling generation to high-resolution, high-fidelity images.
- Diffusion - models generate samples by gradually removing noise from a signal.
Now, let’s proceed with a simple example. Assume that we are facing a classification problem. Our task is to process satellite imagery patches and assign each of them a label. The label roughly describes what we can see in the patch: a bridge, a forest, a stadium, an industrial complex, etc. One of the previous blog posts shows an example of a machine-learning classifier that solves this task.
AID dataset image samples: river, mountain, and forest classes.
Although it’s a classical machine learning problem, the task is not trivial regarding remote sensing. There are a lot of distinct classes, and samples are heavily imbalanced. One can easily imagine that more areas are covered by forests than by power plants (thankfully). Collecting frequently occurring classes is not an issue. The problem starts when we must provide our greedy neural network classifier with patches representing rarely occurring objects. Looking for specific samples can be tedious. It’s even more complicated when only a few such objects exist.
We can always try generating more samples by utilizing Generative AI and augmenting our imbalanced training dataset. We will use a diffusion model and the AID dataset to present a solution for this task.
Before running the code, ensure you have a decent GPU (8GB+ VRAM) and that CUDA was installed correctly in your environment. If you’re struggling with the setup, it would be easier to use Colab. Remember to set the proper type of runtime. When you’re ready, install PyTorch and Denoising Diffusion.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install denoising_diffusion_pytorch
Now it’s time to prepare training samples. The implementation handles RGB imagery with various extensions (TIF included). Although not adjusted to multi-spectral imagery, diffusion models are not limited to only three channels. Generative AI was successfully utilized in near-infrared synthesis. It’s only a matter of time before handling multiple spectral channels becomes open-source software.
Back to the dataset, you might be tempted to put the whole AID dataset into a neural network. The more is better in ML, isn’t it? It could be a good idea in some circumstances, but I will save you time in this case. The truth is that the more complex the observed phenomena and diverse the data, the more time and examples it takes for the generator to create reasonable samples. Look at the example. The output is pleasant to the eye but doesn’t present realistic-looking satellite imagery patches.
Diffusion model training on all available AID classes.
Let’s focus on individual classes and start with generating artificial mountains. Extract the AID mountains catalog and place it in a folder according to your preferences. What happens in the code is that we prepare a backbone model (Unet) and configure the parameter of the diffusion process. Please note that we are working with 256 px rectangular patches. If you lack resources, you can change the image_size value to 128 px or even 64 px. The last part is related to the training process. Most parameters were set their default values defined in the denoising_diffusion_pytorch repository. You should pay close attention to the batch_size, num_samples, and amp. The first one controls the number of samples processed simultaneously and is directly related to the amount of your computational resources. In my case, I could fit 10 patches on my Titan RTX card. The second parameter sets the number of samples generated in each 1000 iterations. To speed up the process, you can decrease the number. The third one defines the acceleration type and is quite problematic. There are several issues created on GitHub related to this parameter. The training takes a lot of time, and disabling acceleration is safer to avoid nan loss.
from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer
if __name__ == '__main__':
model = Unet(
dim=64,
dim_mults=(1, 2, 4, 8)
)
diffusion = GaussianDiffusion(
model,
image_size=256,
timesteps=1000,
loss_type='l1'
)
trainer = Trainer(
diffusion,
'data/AID_mountains',
train_batch_size=10,
train_lr=8e-5,
train_num_steps=10000,
gradient_accumulate_every=2,
ema_decay=0.995,
amp=False,
calculate_fid=False,
convert_image_to='RGB',
save_and_sample_every=1000,
num_samples=36
)
trainer.train()
After around 8h, we can enjoy the initial result. Amazing, right? To the trained eye, these are still synthetic samples, but we're close to a high-quality result! Imagine what we could achieve when working with a larger remote sensing imagery dataset and more resources.
Artificial mountain samples are generated using a diffusion model.
This is the perfect time to show how diffusion-based models work. Their main aim is to predict (find) noise in images consistently obfuscated by the systematic addition of jitter. The noise is added in the span of multiple time steps. The trained model is capable of removing image distortions. How does this relate to generating new samples? The model can denoise an image that is completely composed of noise. During each time step, it deducts a bit of distortion until it reaches the final stage, i.e., obtaining a clean image. The trick is that the initial noise may not result from real image obfuscation. It may be completely random. Denoising random noise produces a random, artificial image. Excellent idea!
Diffusion model training process
I believe this example demonstrates generative AI techniques in remote sensing. Of course, it doesn't have to end with generating new samples and data augmentation. I see great potential in correcting real samples. It's not just about clouds. These techniques can be very useful when combining different materials. We all know how complex it is to process and compose long-term series of remote sensing images, especially if some time sample is missing or how complicated it is to combine point clouds acquired by LIDAR. Keep a close eye on the world of science and engineering. Over the coming time, you will see many exciting projects related to Generative AI. I am sure you will find something useful that can boost your research. And if you have any interesting ideas, feel free to contact us :)
Remember to leave the authors of Denoising Diffusion a star! They definitely deserve it.