LLM for tapir - Chatbot for OSS
In this blog post, I want to share the news that we are working on our custom chatbot to research the usage of LLMs using private data. Tapir documentation is public and in a specialised area, so it is a great challenge for us with possible use cases for the Scala community.
We will now explore why we decided to do so, what steps we took, and the next steps in our quest to improve its quality.
Image generated with Dalle-3
The background story behind the project - what is tapir?
Tapir is a Scala Library for creating type-safe web endpoints. It was created in 2019, and till now, more than 200 people have contributed to its development. It is used by many companies like Adobe, Ocado Technologies, and Swissborg and is actively maintained by SoftwareMill.
The idea behind the chatbot for OSS
In an era where you can see chatbots on almost every corner, we at SoftwareMill decided that we could enhance the user experience of our library to limit lengthy searches through documentation, previous issues, and specialised forums. With properly formed questions, you could get a more comprehensive and well-rounded answer with links to resources which were used to create the answer.
The reported level of programming language knowledge for GPT-4 (response from chatbot)
Scala's knowledge level for Out-of-the-box chatGPT is lower than other languages, i.e., Python. Even if it can understand good practices of Scala code, it is still limited to everything on the internet and optimised to the general purpose, not specialised to single libraries, like Tapir. In comparison, we want to focus only on this area. We saw that there is a similar solution for Zio, but unfortunately, we could not find out what the inner mechanisms and quality look like. Within this series of blog posts, we will present more about how we managed to create it, how we improved it, and what we learned along the way.
Steps for improvement of our chatbot
As a baseline solution for our Q&A project, we used Retrieval Augmented Generation architecture. It consists of two parts: one providing context (using Langchain framework) and the other part responsible for text generation (databricks/dolly-v2-3b LLM). After that, we decided that we needed to operate in the following steps:
- Focus on document retrieval
To properly generate answers to the question, it is crucial to provide a text generation model with the right passages from the text when generating the answer. Without that, there is a low chance that the answer will be helpful as our niche is quite specialised and the library's name could be mistaken with an animal from Asia. - Improve on answer generation
With a large number of LLMs (i.e. Mistral, Falcon, LLAMA), it is crucial to choose the best for your use case. Additionally, you might get more favourable answers when you use the right prompt for the chosen model or even finetune the dataset on your target domain. - Incorporate user’s feedback
As we can easily access experts in Scala and tapir, we can incorporate users' feedback in a more automated fashion, either in defining the model's current shortcomings or automatically enriching the dataset.
Wrap up
In this blog post, we gave you insight into our ongoing project for Tapir with the usage of LLMs, explaining briefly what Tapir is, where the idea for such an application came from, and finally, what the steps are in improving our solution.
In the next blog posts, we will give more insight into each improvement step, giving you more hands-on examples.