Beyond Regex, Hello AI: Smarter Spring Validation with Semantic AI Validator

Have you ever been in a situation where you had to validate data that was beyond classic email or phone validation?
Validating form data can often lead to unreadable and complex-looking regular expressions that no one understands, and everyone hopes there are some unit tests written for those, so you can at least grasp the idea of what the purpose is and what we are trying to achieve. On top of that, when you need to cross-validate multiple fields or detect sarcasm entered by the user, you are on your own when it comes to, e.g., Spring validation (or any popular framework I am aware of).
What if your framework validation could understand language and intent, not just syntax? Meet Semantic AI Validator from SoftwareMill, which is a lightweight, annotation-based (JSR-380 compliant), async library for Spring framework, which solves some of the problems people may have when building web solutions where forms are an essential part of the business.
TL;DR
Semantic AI Validator is a Kotlin library that enables intelligent, context-aware validation of form fields using Large Language Models. Instead of writing complex regex patterns or business logic for custom validators, describe what you want to validate in plain English (or any other language if needed) with an LLM prompt.
Features
- Simple Annotation-Based API: Add
@AIVerifyto any field with a validation prompt - Multiple LLM Providers: OpenAI (GPT-4), Anthropic (Claude), Google (Gemini)
- Context-Aware Validation: Validate fields based on values of other fields
- Spring Boot Integration: Seamless integration with Spring Validation framework
- Async by Default: Built on Kotlin coroutines for high performance (although blocking versions of validate are also available for Java interop)
- Type-Safe Configuration: Fully typed configuration properties with IDE support
- Production Ready: Comprehensive error handling, logging, and testing
- JSR-380 - Bean Validation compliant
Whenever you need semantic or subjective checks (for bios, reviews), or you need to check cross-field consistency on your web forms, you can utilise LLM and execute your validations across multiple providers with simple @Valid annotation and its attributes.
Before & after
Let’s take a simple email validation with regexp in Spring:
@field:NotBlank(message = "Contact email cannot be blank")
@field:Pattern(
regexp = "^[A-Za-z0-9+_.-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$",
message = "Invalid email format. Expected format: user@example.com"
)
val contactEmail: String
In this particular case, it's not so bad actually. With some basic knowledge about regular expressions, you can easily decipher what we are expecting in this field. But the regex validation can become complex. If you ever wanted to validate something other than email, e.g., a complex URL with path and query parameters, you know the pain.
Now compare that to the power of Semantic Spring Validator:
@AIVerify(
prompt = """Verify that this project description contains all of the following elements:
1. Project name
2. Main goal or objective
3. Target audience
4. Timeline or schedule
If any of these elements are missing, respond with: "INVALID: Missing [list what's missing]"
If all elements are present, respond with: "VALID"
Be strict in your validation. Vague or implied information doesn't count."""
)
val description: String,The whole new world of validation opens up for us, we not only can validate syntax but also the intent.
We can, of course, still validate emails this way, but it wouldn’t be the best solution, as strict formats should be left for the classic validation approach with cheap synchronously enforced constraints.
Project intro
Semantic AI Validator brings LLM-powered validation to Spring. What we were trying to accomplish with this simple project is to give developers the ability to assess meaning and intent beyond the classic formats we usually have on our forms.
The project's main goal is to make it simple, the API is annotation-based with @AIVerify and @ValidateWithAI annotations, it's also flexible, and we are using Koog library for actual LLM interactions.
Except for validating single fields, the library allows you to build context before sending data to LLM, and attach data from other fields while constructing your validations.
The project is not a replacement for classic @BeanValidation rules, it only complements them for semantic, and cross-field language tasks.
Setup & configuration
Installation takes less than a minute, assuming you already have an API key for your LLM provider available. Depending on which build system you are using, the setup is straightforward.
Step 1 - Add dependency
dependencies {
implementation("com.softwaremill.aivalidator:semantic-ai-validator-spring-boot-starter:0.2.0")
}or for maven builds:
<dependency>
<groupId>com.softwaremill.aivalidator</groupId>
<artifactId>semantic-ai-validator-spring-boot-starter</artifactId>
<version>0.2.0</version>
</dependency>
Step 2 - Configure provider and API key
aivalidator:
openai:
api-key: ${OPENAI_API_KEY}
base-url: https://api.openai.com # optional, default shown
default-model: gpt-4
The default model is OpenAI GPT-4 but that can be easily changed to other models (whatever the Koog Kotlin library supports). The rest of defaults is listed below:

Semantic validation
One of the best examples of the semantic validation is already shown at the beginning of the article. Imagine you have a simple text field on your form, but you want to make sure the text written there is relevant to the business case you are trying to solve. You can write down a list of requirements in your prompt, directly in the @AIVerify annotation, which must be fulfilled for the description to be valid.
This example comes from a real-world problem we had to solve in one of our projects, where the description of an entity had to include specific pieces of information and was often entered incorrectly by users, requiring manual verification on each form submission. Applying @AIVerify automates the process, ultimately saving you time and money.
@AIVerify(
prompt = """Verify that this project description contains all of the following elements:
1. Project name
2. Main goal or objective
3. Target audience
4. Timeline or schedule
If any of these elements are missing, respond with: "INVALID: Missing [list what's missing]"
If all elements are present, respond with: "VALID"
Be strict in your validation. Vague or implied information doesn't count."""
)
val description: String,Now, imagine how you could approach this validation with classic validators, it would be very difficult or almost impossible to do, and would definitely bring a whole new level of business logic to your application or require manual input to finish the work
Cross-field semantic validation
Another functionality that Semantic AI Validator provides is cross-field validation. You can extend the context to your LLM call with the values of other fields available on your form.
For example, imagine popular review forms where you enter a number, e.g., from 1-5, in the form of a star widget, but also you are able to write some description for your review. One of the possible problems in this scenario is that users sometimes select a number of stars that completely mismatch the description they had written. Due to simple, or maybe sometimes acting on purpose, the description can be very positive, yet only 1 star was given, or the opposite.
Semantic AI Validator solves this problem by giving you ability to include other fields values as context to @AIVerify:
@ValidateWithAI
public data class ProductReviewForm(
@field:NotBlank(message = "Review text cannot be blank")
@AIVerify(
prompt = """Validate this product review for quality and appropriateness.
CRITERIA:
1. Constructive: Review provides useful, specific feedback
2. Non-offensive: No profanity, hate speech, or abusive language
3. Product-related: About the product itself, not delivery/seller
4. Substantive: More than just "good" or "bad", has details
RESPONSE FORMAT:
- If review meets ALL criteria: "VALID"
- If review fails any criteria: "INVALID: [specific reason why it fails]"
Examples of INVALID reasons:
- "Contains offensive language"
- "Review is about delivery service, not the product"
- "Review lacks constructive feedback, only says 'bad'"
- "Contains hate speech or discriminatory content"
""",
temperature = 0.1 // Low temperature for consistent quality checks
)
val reviewText: String,
@field:Min(value = 1, message = "Rating must be at least 1")
@field:Max(value = 5, message = "Rating must be at most 5")
@AIVerify(
prompt = """Verify the rating is justified by the review text.
TASK:
Analyze if the {fieldValue} star rating aligns with the sentiment and content
of the review for "{productName}".
RATING GUIDE:
- 5 stars = Excellent, highly positive feedback
- 4 stars = Good, mostly positive with minor issues
- 3 stars = Mixed feelings, both pros and cons
- 2 stars = Disappointing, mostly negative
- 1 star = Very poor, highly negative
RESPONSE FORMAT:
- If rating matches review sentiment: "VALID"
- If mismatch detected: "INVALID: Rating mismatch - [explain the mismatch]"
Example mismatches:
- "Rating is 5 stars but review describes multiple serious problems"
- "Rating is 1 star but review is mostly positive"
- "Rating is 3 stars but review has no negative points"
""",
contextFields = ["reviewText", "productName"]
)
val rating: Int,
@field:NotBlank(message = "Product name cannot be blank")
val productName: String
)
For the rating validation, we have contextFields attribute present, which includes reviewText as well as productName to the context of the LLM prompt sent when validating rating itself.
Context fields don’t need to be text fields. Another example, again taken from a real-world problem we had, is to compare what the user writes on the form to the data they submit in a file attached.
Let’s take an example of an invoice submission form. It's important for the accounts division to keep real invoice numbers in a separate system than the one where users submit their invoices.
Users are required to enter the invoice number in a separate text field and upload their invoice as a PDF file. To no surprise, it turns out that often invoice numbers entered don’t match the real ones on the PDF itself. Semantic AI Validator can submit the PDF uploaded as a context to your LLM call, just as we did with other information in the semantic review validation example.
@ValidateWithAI
public data class InvoiceValidationForm(
@AIVerify(
prompt = """You are validating an invoice number against a PDF document.
TASK:
1. Extract the invoice number from the uploaded PDF document
2. Compare it with the provided invoice number: {fieldValue}
3. Determine if they match
RESPONSE FORMAT:
- If they match exactly: "VALID"
- If they don't match: "INVALID: Invoice number mismatch - PDF shows [extracted_number] but [provided_number] was provided"
- If no invoice number found in PDF: "INVALID: No invoice number found in the PDF document"
- If PDF is unreadable: "INVALID: Unable to read the PDF document"
Be strict about matching - consider format, spacing, and case sensitivity.""",
contextFields = ["invoicePdf"],
llmProvider = LLMProvider.OPENAI,
model = "gpt-4-vision-preview", // Vision-capable model for PDF reading
maxTokens = 300 // Allow longer response for detailed mismatch explanations
)
val invoiceNumber: String,
@AIVerifyContext
val invoicePdf: MultipartFile? = null
)
There are other interesting scenarios in which the semantic validator can be useful. All of the examples listed below can be found in the project examples. Demo projects for Kotlin and Java are available.
Advanced options
For production use, you can fine-tune almost every aspect of the LLM call with different @AIVerify attributes.
@Target(AnnotationTarget.FIELD)
@Retention(AnnotationRetention.RUNTIME)
@MustBeDocumented
public annotation class AIVerify(
val prompt: String,
val contextFields: Array<String> = [],
val language: String = "",
val llmProvider: LLMProvider = LLMProvider.OPENAI,
val model: String = "",
val temperature: Double = 0.0,
val maxTokens: Int = 150,
val failOnError: Boolean = true,
val groups: Array<KClass<*>> = []
)
Some of those you have seen in action already. You can see from the list that for each field, you can specify not only different models but also different providers. You have an option for fail-soft mode with failOnError=false, if needed, and so on.
For Java users, a separate BlockingAIObjectValidator was created if needed.
Refer to the README.md on the project website.
I18n
With Semantic AI Validator we support internationalization through two distinct but complementary mechanisms. The first one allows you to specify in which language the LLM-generated error messages should be (with the language attribute in @AIVerify annotation). The second is for framework error messages via Spring MessageSource.
Language attribute
The @AIVerivy annotation has language attribute available which controls what language LLM uses when generating validation error messages:
@AIVerify(
prompt = "Check if this feedback is appropriate and constructive",
language = "pl" // ISO 639-1 language code
)
val feedback: String
The language code is a popular ISO 639-1 two letter code, any language supported by your LLM will be supported here.
This type of language support is implemented by injecting additional data into your @AIVerify prompts to return the response in a language specified if given.
val languageInstruction = request.language
?.takeIf { it.isNotBlank() }
?.let { tag ->
val languageName = Locale.forLanguageTag(tag)
.getDisplayLanguage(Locale.ENGLISH)
.replaceFirstChar { c ->
if (c.isLowerCase()) c.titlecase(Locale.ROOT) else c.toString()
}
"When building JSON response, respond to the user in $languageName language regardless of the input language."
}.orEmpty()
Please note that your prompt can be actually written in any language, the response will be constructed according to the specified language property anyway, e.g.:
@ValidateWithAI
data class PolishFeedbackForm(
@AIVerify(
prompt = """Check if this feedback is appropriate and constructive.
CRITERIA:
1. No spam or promotional content
2. No offensive or abusive language
3. Constructive and relevant to the topic
""",
language = "pl"
)
val feedbackPolish: String
)
Example response:
{
"valid": false,
"errors": ["Opinia zawiera obraźliwy język"]
}
Except for specifying language as @AIVerify attribute we can also set the global default language for all the responses using default-language property:
ai:
validator:
default-language: "pl" # All validations use Polish by default
The precedence for such a language setup would be:
@AIValidatorlanguage attribute- ai.validator.default-language in configuration
- English (fallback)
Framework error messages
Those error messages are standard, regardless of the prompt attribute, and are displayed to inform users about common problems encountered during validation, e.g., "LLM provider not configured" or "Validation timed out".
Framework-level messages are stored in resource bundle property files, and loaded via Spring’s MessageSource.
Location: semantic-ai-validator-spring-boot-starter/src/main/resources/messages/
You can find i18n examples in the demos, play around with it, change the language attribute, configuration property, or set the default browser language and see the exact way language is picked up for your responses.
Hands-on: Demo time!
To see the full power of Semantic AI Validator yourself, you can easily start it up on your local machine. Assuming you can run Kotlin/Java projects, the only thing you have to do is to provide the API key for the LLM provider of your choice. You can either configure it in application.yml or just export it directly in the shell with:
export OPENAI_API_KEY=sk-proj-eRhfasdfasdf……….667asdf78
Start up one of the demo versions with, e.g.:
./gradlew :semantic-ai-validator-demo:bootRun -x test
and you are all set and ready to test.
The demo app has 4 main pages where you can see different functionality of the library:
- Project description validation - typical semantic validation described earlier in the article
- Invoice validation - cross-validation example with a file attached as a context to the LLM query.
- Product Review Validation - another cross-validation example with the rating and description fields described earlier in the article
- I18n demo taking into account default browser language,
languageattribute and configuration inapplication.yml

Each demo page allows you to quickly fill out the form with valid or invalid data through pre-defined content and play around with different inputs to see different validation outcomes.
For the project description demo, the user has to enter the project details, including its name, purpose, audience, and timeline. If those requirements are not met the validation error message gets displayed under the description field.


The review cross-validation demo checks how the actual review text aligns with the number of stars assigned in the star-rating widget. In the example below, the review itself is very positive, but the user submitted only 1-star, which resulted in the validation error displayed below.

The PDF cross-validation example demos the capability to include attached file data in the context of the LLM prompt. The example verifies whether the entered invoice number matches the one on the uploaded PDF.

The last example was created solely for the purpose of demonstrating the i18n capabilities of the library. Change the browser settings, and use different input examples created for 3 example languages to see how the labels and error messages change.

Final notes
Introducing semantic AI validations into your projects can solve many problems, but you need to be aware of the consequences. The first and foremost is the cost, if you are using external LLM providers, you need, of course, to pay for the API calls you make to validate your data. If your form is heavily used, that can generate significant operational cost, which you would need to calculate and be aware of.
Additionally, calling LLMs, especially with cross-field validations where the context is enhanced with uploaded files, can be time-consuming. In such cases, additional frontend solutions, such as displaying a loader, may be necessary to maintain a proper user experience.
Last but not least, when you validate email or other data easily validatable with different means, you should probably stick to that and avoid additional calls, slowing down your app. Treat Semantic AI Validator as an augmentor, not a replacement for classic validation.
Besides the aforementioned issues, we believe that Semantic AI Validator can be of great benefit to your application, especially when you need a validation that understands meaning, not syntax. AI-based validation is a new area in the world of web form validation.
There are no established best practices yet, or they are only just emerging. Nevertheless, due to the fascinating complexities one can encounter while building web forms, it is a fascinating topic to research and develop.
Reviewed by: Łukasz Zalewski Rafał Maciak
