FoXAI for pneumonia

Rafał Pytel

23 May 2023.7 minutes read

FoXAI for pneumonia webp image

What is pneumonia?

Pneumonia is quite a common infection, with increased popularity after CoVid-19 ( pneumonia caused by a SARS-CoV-2 virus). But before we can go more in-depth with the investigation steps for pneumonia, it is worth understanding more about the lungs. We can see parts of the lungs in the picture below (the left lung- on the right is smaller due to the presence of the heart).


Picture of lungs from, 1- trachea, 2-3 right/left bronchus, 4 - right lobe (4a-superior, 4b-middle, 4c- inferior), 5 - left lobe (5a- superior, 5b - inferior), six on the right and 7- horizontal fissure, six on the left oblique fissure, 8 - pulmonary artery.

The definition of pneumonia is the following:

Pneumonia refers to infection within the lung and results in infective fluid and pus filling the alveolar spaces

This means infective fluids start at the bronchus (2 and 3 in the picture) and advance later around the pulmonary arteries (8 in the image above).

The diagnosis pipeline for pneumonia looks as follows:

  1. Blood test: check-raised white-blood-cell count and inflammatory markers
  2. Chest x-ray - characteristics described in the next section
  3. Chest CT-scan - characteristics similar to an X-ray

In this blog post, we will try to automate diagnosis for step 2, as with CoVid-19, quite a lot of labeled data became openly available, and the number of skilled radiologists is limited.

Chest X-rays - pneumonia or not pneumonia?

Below are examples of two healthy lungs and two lungs with pneumonia. The lungs with pneumonia have clear white smoke on them, which indicates air spaces and infective fluids


Healthy lungs (examples from the dataset).


Lungs with pneumonia (examples from the dataset).

Dataset creation & Training

For this blog post, we use a combination of three datasets:

In the case of the medical domain, pictures are always limited, so it is crucial to be clever and resourceful with what you have. Additionally, the dataset is often imbalanced, so for the sake of our dataset, we only use a fraction of “normal” (healthy) examples.

The proportion for our dataset is visible on the graphic below.


Proportions of examples in the train-val dataset.

For the training purpose, we use the ImageNet pre-trained Resnet-18 model and the train-val split of 80-20. Additionally, we use an Adam optimiser and not-so-casual Cosine Anealing Learning Rate as it is advised to train powerful models fast. Additionally, to add more robustness, we use casual data augmentations like resizing, random cropping, and flipping.

Finally, we use unused examples from the “Covid-19 Radiography dataset” for the test set.

Results: We got about 93-94 % for the test set and validation set, which seems quite decent. When we observe confusion matrices, it seems that we still sometimes have some false positives and false negatives, so let's observe some examples and provide explanations from FoXAI.


Confusion matrix for test set.

Explanations analysis

With explanation analysis, we want to get our confidence in good explanations where the model correctly focuses on the right part of the picture. Additionally, we can identify some biases and artefacts in data which might pollute the model reasoning and result in worse performance and strange behaviour.

To better understand the model's behaviour, we use the GradCam algorithm from the ReasonField Lab library called FoXAI.

To install it you need to install it:

pip install foxai

and for explanations, explanation generation and visualisation look as follows:

with FoXaiExplainer(
            explainers=[ExplainerWithParams(explainer_name=Explainers.CV_LAYER_GRADCAM_EXPLAINER, layer=layer),],
        ) as xai_model:
            # calculate attributes for every explainer
            probs, attributes_dict = xai_model(input_data)
            for key, value in attributes_dict.items():

                # create figure from attributes and original image           
                figure = mean_channels_visualization(attributions=value[0], transformed_img=sample, title= f"Mean of channels ({key})")

                imshow(sample, (8,8), title=title)

Having all that we can now move to the analysis of exciting examples.

Good examples

Let's start with good examples. We can see that model focuses on both lungs for “normal” examples and more on the smoky area for pneumonia examples. So it seems it is working great, right?


Predicted: normal, label: normal


Predicted pneumonia, label: pneumonia

Bad examples

General errors of label

Well, it is not always great. When we analyse samples when the model is wrong, we can find many peculiarities. For the first example, it wrongly assigns the normal label while focusing on the smoky area of the lung, which indicates airspaces often happen with pneumonia.

The other example is even more severe. Even a non-expert would notice a lot of crazy things in the lung, but the model gives the label “normal”, even when focusing on the part of the lung with quite an advanced stage of illness. Even with 90+% accuracy, we get this kind of peculiarity.


Predicted: normal, label: pneumonia

Focus outside of the picture

I also noticed that for some examples, it focuses outside of the scope of interest, as in the example below. This explanation indicates that as the model focuses on the wrong part of the picture, it cannot truthfully give a good enough prediction.

Another example proves that it is not always the case when the model disagrees and perhaps has low confidence it focuses on the wrong part. For this example, it goes to extra length to focus on the lower part of the left lobe, while viral “smokes” are seen across the upper lobes and in the right lung.


Predicted: normal, label: pneumonia


Predicted pneumonia, label: pneumonia

Focus on letters

The last peculiarity I noticed was the focus on the letter “L”. It is common for radiologists to put these letters on the picture to better understand where is left and right. Even if it helps to orient the picture, it should not be the most discriminative feature to which the model decides whether it is pneumonia.

Problems with XRay pictures

As you have noticed by now, we can see that pictures from XRay vary a lot, and there is no one way of making pictures. X-ray pictures are sensitive to different settings of both the machine and picture-specific settings. This can be seen in some pictures, as you have a human body clearly of darker colour, while in some, we have it significantly more white and blurry. There is no clear standardisation between hospitals. For the untrained eye, the same person can have one picture at hospital A while completely different at hospital B.

Human-specific characteristics also play a massive role in the picture. The level of fat tissue plays a considerable role in how the XRay looks. Age also affects the look as it is much more straightforward to see lungs for a child than for an older person. Older people also might have loose skin, which may produce reflexes, which could be mistakenly labelled as some form of lung disease. Overlapping soft tissues may cause quite some confusion in this area.

Patients after operations are less standard examples. Even skilled radiologists are often confused without a proper background check for a particular patient, as there may be misleading shades or unwanted lines.

After training models for this blog post, I noticed that data diversity is of utmost importance. I started with a combination of the first two datasets (Open dataset of Covid-19 and Paul Mooney’s dataset) and tested on the third one (Covid 19 Radiography dataset), which resulted in high accuracy (97+% accuracy) within the training and validation set, while coin-flip like results on the test set. After some investigation, I noticed that pictures differ between datasets, so it was much harder to learn generalisable features.


To create an AI system that can assist radiologists end-to-end, we should consider doing the following:

  1. Further analyses with radiologists to clean and expand the dataset.
  2. Try to understand better the risk of false negatives and false positives and address it via proper means like class balancing, results preprocessing or weighted loss function.
  3. Add other modalities like patient history, which influence radiologists' decisions.
  4. Add other steps like lateral XRay picture or CTScan.

This list is not exhaustive but should be treated more as directions worth considering.


Lateral picture from


CTScan of lungs from


In this blog post, we learned about applying deep learning to the medical domain, pneumonia and how it can be seen, how problematic XRay pictures are and most importantly, how we can use FoXAI to understand our model mistakes better.

If interested in the topic, check the accompanying notebook to understand further the subject and how FoXAI can be used to analyse explanations.

Blog Comments powered by Disqus.