This question hasn’t left the legal discourse.
We’ve been plagued with horror stories of hallucinations from AI. And making up legal precedent is just the tip of the iceberg - the most visible problem. But there are more hallucinations under the surface of the water - ones that could sink or damage legal cases. Using AI for summarization or drafting runs the risk of pulling in false or partial facts about the case. A study run by Stanford University revealed that (general use) AI chatbots “hallucinated between 58% and 82% of the time on legal queries”!
We’re not fear-mongers, we bring up the issue of hallucination - and AI trust, because there are measures that both AI providers and users can take to minimize the risk. To push the metaphor a bit far, we can work to make the icebergs smaller, and also equip ships (legal professionals) with more processes to avoid potential dangers.
That’s what the Eve team has been focused on for the last few months - leveraging our indelible AI engineering team to create robust systems to minimize errors in the underlying models that run Eve while also implementing a system for users to easily verify answers. Working with our visionary customers we’ve been able to make marked improvements to Eve’s quality and trust.
Let’s back up a bit and uncover what can be done to improve the underlying technology of AI models to increase quality and trust for legal users.
Right now there is one standard method employed to help minimize hallucination: Retrieval-Augmented Generation, or RAG. To put RAG simply, it’s a process of having the AI search through a controlled repository before (or instead of) looking through the conclusions made from generic training data.
Let’s take an example - find a set of relevant cases based on a particular fact pattern. Without RAG, the AI would answer based on everything that it knows to be true (just like a human lawyer, put on the spot, give you an answer based on what they learned in law school). Similarly to a human with an imperfect memory, the AI might make up a case that isn’t real, combine cases into one, or, of course, might provide a correct answer. If instead, the AI was working on a RAG method - it would be as if, before answering, it walked into a room full of case law, and skimmed through everything in the room, found the most promising cases, read them in full, and then picked the top relevant ones. Simply put - when you put parameters on what the AI can use as an answer, you could prevent it from using external and irrelevant sources.
Although RAG is becoming common in Legal Research AI applications, it’s still one of the biggest opportunities for improvement. The way in which the model retrieves the right “room” full of information can have a significant impact on the quality of the response. Compare two models that “walk” into the same “room” of case law books - one that is very good at choosing the right books to read fully, and one that isn't. Even if they both are equally good at reading a book and full and providing a good summary, the model that chose the right books to read based on their titles would give you a better answer.
But it is not enough to minimize the risk of poor answers or hallucination. At Eve, we’ve worked up an extra layer that we add to the RAG methodology and increase the trust (and ease of verification) of AI’s answers. We call it “Trust but Verify”.
What does “Trust but Verify” mean in the world of AI? It means that you should have a base level of trust in the quality of answers you are provided, and that it should be easy (an organic part of your process) to verify responses that AI gives you.
To support this “trust but verify” model, we’ve added two new elements to Eve - a new verification framework and a new “Fact Search” feature.
Eve’s new AI Verification Framework is an extra level to how Eve provides her answers. Simply put, it’s a sanity check. Eve will run every AI-generated response through a custom set of rules to determine if the answers are valid. For example, for any quote that Eve produces, the AI will pull out quotes from the document, and then our “sanity check” will ask if that quote is present in the document - visually flagging the certainty of each quote (showing you if the quote can be found in full in one of your case documents).
Once Eve has run the rules to check that a quote provided in an answer is true to the original documents, the platform will link the response to the original document. A user can simply click on the quote in the answer and be taken to the original quote in the document, highlighted for easy verification. Not only does this allow a user to verify responses with a click of a button, this also allows legal teams to find key context for surfaced quotes in a matter of seconds.
These “rules” that are added on top of the base AI model and the RAG approach are unique to Eve - custom-built by our world-class team of engineers. And verifying quotes isn’t the only thing Eve will be enhancing. The Eve product is also leveraging in-line sources for long-form answers, verifying cited cases, and even confirming all math calculations will be offered on the Eve platform.
The second new element to the Eve platform is called “Fact Search”. Fact Search is a new feature that helps users find key facts throughout case documents. Instead of RAG, or a general LLM approach, when performing Fact Search, Eve reads each page of the case documents, word by word, searching systematically for quotes and facts that support a particular claim or search query. For example, Eve will search through all case files including transcripts, email chains, and employment documents to find facts that support a Wrongful Termination cause of action. Each source is then linked to its source and presented to the end user in a table that can be investigated and downloaded. This new way of searching for information allows a significantly higher level of confidence in any answer provided, of course, backed by the ability to verify.
This is just the beginning. Legal, generative AI is still so new, and we envision there will only be more quality improvements - both in the market and at Eve. Whether it’s more “rules” that are built on top of base AI models that work to verify answers and surface up an easy human check, or continued improvements to the way AI searches its context in the “retrieval” step of RAG, or something new entirely, we encourage all legal users to keep quality and trust at the top of their minds when using and growing with legal AI.