LLMs for LegalTech
Unleashing the Potential of LLM Agents for Chatbot with the EU AI Act
LLM-based applications are sprouting in sectors like Customer Service, the Medical domain, and Legal Tech. We have seen large advancements in Customer Service (as 1 point of contact on various platforms) and the medical domain (like the Meditron model, specialised in the medical domain, check our blog post using it); however, in Legal Tech, the broader applications are limited.
In this blog post, we will discuss applying LLMs in LegalTech, starting with explaining why it is challenging. Afterwards, we will also briefly describe the structure of the EU AI Act (which will be used as a resource in our example), an agentic framework called CrewAI, and an email crafting system as an introduction to changes introduced in the EU AI Act.
Why is LegalTech difficult?
Let's start by explaining why LegalTech is difficult. You have probably read some agreements or bylaws where the choice of words is very specific, and there are many references to reactive legal code and cross-references to articles of bills.
Regarding the specificity of the words, good examples are the words “possession” and “ownership,” which, for the casual person, are quite similar, while in legal language, they are applied to completely different cases.
Another critical matter is that law is constantly changing, with some bills being overwritten (e.g., the US Senate passing a bill on increasing the debt ceiling) or some old laws still being in force (e.g., in Ohio, you can’t make faces to the dog).
Last but not least is how the law is enforced in different countries. In the US, precedents can be used in a court of action (often present in series), while in Poland, precedents do not hold such great power as in the US. This illustrates the difference in the strategy of lawyers in different countries.
LLM applications are prone to approximation mistakes at every step (“close enough” is not a good approach when it comes to law), so pipelines using LLM agents might be an answer to improving on this problem.
Existing solutions using large language models for LegalTech
This Stanford paper offers an interesting analysis of errors and the current performance of popular RAGs for LegalTech (including Lexis+ AI Westlaw, Practical Law, and GPT-4).
According to their analysis, there seems to be a lot of room for improvement (with the best of them still hallucinating on 17% of the queries).
Each solution was susceptible to the RAG assumption of “similar texts are relevant” and hallucinated significantly on jurisdiction and time-specific queries, where provided groundings were not suitable for the answer.
Additionally, for most of the solutions, there is a real risk of incomplete answers with insufficient grounding.
EU AI Act
EU AI act pyramid of risks, based on the definition, based on: https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
EU AI Act - What is it?
In the past few months, the EU AI Act has been in everybody's mouths, as it is the first such regulatory movement in AI (we have a dedicated blog post regarding that). It is quite easy to read with not many links to other bills (for interested readers, I can recommend https://artificialintelligenceact.eu/ai-act-explorer/).
At its core, it defines a pyramid of risks and translates them into four levels of risks: unacceptable risk (prohibited practices), high risk (regulated high-risk systems), limited risks (transparency needed), and low and minimal risk (where no obligations are needed).
Technical analysis of the dataset
Considering that it is not that complicated and is the first such regulation, it is a valid choice for our example with agents, so we chose it as one of our resources.
From a technical perspective, the EU AI act has more than 78k words and is composed of 13 chapters (with 113 articles total) and 13 annexes (so far).
Crafting agentic workflow
CrewAI - a simple agentic framework for complex tasks
To create an AI agentic workflow, you have plenty of frameworks (at least 3): Langgraph from the Langchain team, AutoGen from Microsoft, and finally, CrewAI, with the very last one will be used in this blog post for its simplicity and ease of use.
CrewAI structure is rather simple, with the division into tools, agents, and tasks.
Tools
It offers a wide selection of tools, starting with general-purpose web scrapers, specialised file readers (e.g., PDF or Markdown), or designated search tools (e.g. Github or YouTube).
Agents
Three main parameters define agents:
- their role in the pipeline (i.e. Analyst)
- their goal - what they want to achieve (i.e. compose a report based on retrieved information)
- their backstory - defining their character and behaviour.
Tasks
Tasks, on the other hand, are defined by:
- description of the task
- designated agent
- expected output - definition of what we expect as an output.
Setup
The code created based on the premise of the example can be found in the associated repository.
Agents configuration
For this problem, we define 3 agents with their underlying tasks:
- Senior Paralegal, who is doing the heavy lifting of the research and composing the general report for changes in the law regarding clients' situation concerning the EU AI Act.
- Law Associate - composing the information from the paralegal’s report, adding the wider knowledge of the law in the sector, and defining risks and benefits for the client’s sector after the EU AI act comes into force.
- Senior Partner - crafting an easy-to-understand email to a client, showing a personal touch, and translating law jargon into a more understandable form.
In this setup, there is no leader and agent work in sequential order (where results of the work of the previous one are fed as an extra input to the other one.
Tools configuration
Regarding tools, a Senior Paralegal or Law Associate can use Google and more in-depth web scraping of the EU AI Act website to gather helpful information. The Senior Partner is just translating the associate's report into simpler language for the client.
For this experimentation, GPT-4o was used as a backend LLM for each agent due to its large context capacity.
Results
In this section, we will compare qualitatively how the answers compared for two setups:
- One using plain chatGPT with the given input.
- Another is using agentic workflow, described in the repository mentioned earlier
Input prompt for ChatGPT
Input for chatGPT using GPT4-o can be seen below:
Output for plain request form ChatGPT using gpt4-o can be seen below:
The output of the agentic workflow can be seen below:
Comparison
Simple ChatGPT output feels more like an excerpt from the bill or article, while output from agentic workflow feels much more engaging and easier to understand.
Additionally, there were iterations of responses of chatGPT where cited articles were not aligned with truth, and were not having relevant legal grounding.
Bonus: AgentOps - observability for LLM Agents
As output from the console can be a bit chaotic and not that easily read, there are some other tools that allow for better understanding and tracking of what is happening. One of which is AgentOps.
How does it look?
With only a few lines invoking the agent ops and setting the correct API keys, it's possible to observe the following and better track costs (the exploitation cost for agents is higher, as they require multiple sequential LLM requests), processing time (more requests = more processing time), communications, and intermediate outputs.
Conclusions
In this blog post, we discussed why LegalTech for LLM is challenging, reviewed the EU AI Act fundamentals one more time, discussed the Crew AI framework for creating agentic workflows, provided an example output for legal advice, and briefly compared simple chatGPT and agentic workflows.
As a bonus, we observed AgentOps as an observability tool to better understand the behaviour of the created system.
If you want to unlock the full potential of AI with expertly crafted LLMs feel free to reach out to us. I would be happy to discuss your use case.
Review by: Kamil Rzechowski