What is AI watermarking? webp image

With more prevalent Generative AI via chatGPT, Dalle, Midjourney, and others, we are slowly surrounded by generated content. Additionally, there are many ongoing cases of illegally using data for training models like Getty Images vs (creators of Stable Diffusion) or visual artists vs Midjourney or Deviant Art.

The answer to these problems is AI watermarking. In this blog post, we will cover what it is and how it is applied to different modalities like images, text, and audio.

AI watermarking - definition and importance


Invisible watermark with C2PA signature from Steg.AI

When we hear watermarking, we may associate it with UV signs on banknotes or some special paper mark on diplomas, which are extremely hard to forge. When we consider AI, we also have this kind of signature.

Watermarking is a method of marking content to transmit extra information, like authenticity. In AI, watermarking can either be fully visible (like the presence of certain visible code) or invisible (like the one in the picture above). These patterns can be recognized either by humans or specialized tools.

Types of watermarking

We can consider two main classifications of AI watermarking:

  • Open/closed watermarking: When considering whether implementation is open to the public. Open watermarking stimulates innovation, and the community can spot and improve errors. However, knowing how the watermark looks and how it is created might make it easy for anyone trying to remove or forge it.
  • Model/content watermarking: When we consider where we apply watermarking, inside the available model or as an external tool. This choice may be important as it is impossible for some domains, like text to create a watermark without access to the model. Model watermarking mostly focuses on enforcing certain distributions of the output and intermediate layers for a chosen set of inputs. Content watermarking is focused on adding noise or some signature to data.

Watermarking across different modalities

In this section, we will review examples of types of watermarking and discuss how it is applied to various domains, such as computer vision, text, or audio.



Example of visible watermarks from images generated by Stable Diffusion models.

In the computer vision domain, it became quite popular to add watermarks to verify that data was from a certain source. A good example of that is the ongoing process between Getty and, where in their generated images, you can see the watermark from the image provider.

On the other hand, in the area of Computer Vision, there are many watermarks of generated data that are not visible, like SynthID (available on Google Cloud for images but also for audio), AWS introduced watermarking API or open-source demos from TruePic or IMATAG. OpenAI also introduced a new format for their images due to adding watermarking in Dalle-3 (using C2PA).

In computer vision, there is also a popular area of data poisoning, like Nightshade, which adds extra noise to your images (invisible to humans) but causes devastating effects for models trained on illegally scraped. For example, you have a picture of a cow lying on the field casting a shade. The image transformed by Nightshade looks the same for the human eye. However, for the model being trained, there is a hidden shape of a green purse in the shade, which harms the learning process. No transformations like cropping, screenshotting, smoothing out pixels, or adding noise will change anything as patterns persist. A similar method is Glaze (by the very same lab from the University of Chicago), which changes to pictures with uncorrelated style (invisible to humans but highly affecting for machines), which causes prompt-based image generation models to produce irrelevant pictures. For instance, having a charcoal image with Glaze applied might cause the machine to see it as abstract art a la Jackson Pollock, and the prompt “generate me charcoal style image” might cause irrelevant and unwanted changes).


Photoguard immunization from the original paper.

On the other hand, when we consider not allowing deep fakes to appear in existing images, there are tools like Photoguard. Their aim is similar to the aforementioned methods, which invisibly alter original pictures to the human eye. Still, when used on inference, they either ignore the prompt or disregard the context of the original image.

TruePic is an example of a tool that uses the C2PA standard to sign an image in metadata, which allows you to track where the image came from. Their Hugging Face allows for signing and verifying the signature.

Around the Internet, there are multiple Deepfake detectors like the HuggingFace detector or Content at Scale detector, but their quality and stability are often discussable.



GLTR demo for detecting text generated by AI

As watermarking image data is quite easy to imagine, watermarking text is a completely different story. As with images, we could manipulate pixels in the space that was not attainable for a human eye, with text not an option. Most solutions in that space exploit the mechanism of current text generation models, which is the next token prediction (i.e.

An interesting method for watermarking text randomly divides the vocabulary into two sets preferred tokens (“green tokens”) and restricted tokens (“red tokens”). Then, depending on the watermarking mode, these tokens are either completely avoided (“hard watermarking”) or examples from the green group have increased probability (“soft watermarking”). Manipulating model output might harm text generation capabilities, so the authors of WaterBench proposed a unified benchmark to compare various watermarking algorithms for autoregressive LLMs.

When we consider methods working purely on inference time, we need to mention GLTR method. What they do is based on previous tokens, they try to predict logits for the next token and find the “chosen token” position among returned logits. For text that humans write, you get fewer “most common choices” (green colour on the graphic), and rather rare ones (purple colour on the graphic). This is only possible if we can access the model's outputs fully to get the logits on any input we want. Moreover, we are only limited to the models we have access to, so instead of answering the question, “Did AI generate this text?” we answer the question, “Was this text generated by model XYZ?”.

It is important here to mention that methods for watermarking texts currently do not work perfectly and might have many false positives. Because of that, instances like OpenAI might silently remove public access to these tools.


With advancements in GenAI for both images and text, audio is also experiencing rapid development. Unfortunately, voice recognition systems are often used for biometrical scanning at banks (fortunately, not as a sole tool but often paired with PIN, fingerprint, or password), and they can be used for suspicious activity.
Audio watermarking is done similarly to images, which means changes are introduced to the recording outside of the human perceived range (outside 20-20000Hz), but can be detected by specialized models.


Helpful picture from Venomave paper.

With voice systems, there are popular methods for poisoning the data of unwanted trainers. WaveFuzz introduces targeted noise to make voice samples useless for training while having the same sound when evaluated by humans by having the same characteristic in frequency space while having a different structure in MFCC representation (commonly used by AI systems for audio). On the other hand, the authors of Venomave try to introduce such a noise, which, for selected audio recording elements, pushes the model to the decision boundary, making it very difficult to replicate the audio directly.


Training setup for AudioSeal watermarking scheme from the original paper.

AudioSeal is a good example of a watermarking system that jointly trains the generator and detector, making them robust to natural audio transformations while allowing for high-quality detection. Here, we have Perceptual loss aiming to have indistinguishable audio samples between original and watermarked samples (created by the watermark generation model). Another component is the watermark detector, which uses Localization loss to predict whether the watermark is present, regardless of perturbations introduced to the watermarked sample.


In this blog post, we went over what watermarking in AI is and the possible types, and we gained insight into how it is applied across different domains like image, text, or audio. Equipped with that knowledge, perhaps we can use the right tools, i.e., not allow our photos to be scrapped and used for training by unwanted parties or be the target of deep fakes.

If you are interested in watermarking data produced at your product, do not hesitate to contact us!

Blog Comments powered by Disqus.