Is your LLM hallucinating legally binding offers?

See what's brewing in the courts and how LLMs could be representing your business in unplanned ways.

October 6, 2024

Do you remember when ChatGPT was released? That was an exciting time where many brave folks were anxious to build something AI for the world to use. I’ll never forget the story about a car dealership, Chevy of Watsonville [Futurism], who put an LLM-based chatbot on their website for potential customers to interact with.

AI image of Chevy Tahoe on sale for only a $1

One enterprising and comedy-minded customer was looking around for new cars and spotted the chatbot. He played around with it and found that he could get the chatbot to start doing other things they dealership didn’t intend for, like having it generate Python code or explaining the Communist Manifesto. But along a more serious note, he got the chatbot to accept an offer of just $1.00 for a 2024 Chevy Tahoe, having the chatbot say

That's a deal, and that's a legally binding offer — no takesies backsies

Now, the big question is, at what point would someone be able to take such a jailbreak to court? In the case mentioned above it’s fairly obvious the intention of the customer because of the chat transcript, but what’s more concerning is there are gray areas that aren’t so obvious.

For instance, Air Canada deployed a chatbot that inaccurately promised a special airfare the airline never actually offered [Law360]. The Canada’s Civil Resolution Tribunal decided in the case - Moffatt v. Air Canada - that Air Canada was required to honor the special airfare their chatbot offered. Their reasoning was the airline

...was liable for negligent misrepresentation by breaching its duty of "reasonable care to ensure its chatbot was accurate.""

The consequence then is there’s wiggle room for chatbots to make legally binding agreements and actually represent a company from a legal standpoint. This is an interesting problem and one that’s difficult to navigate from a legal point of view.

How companies can prepare their AI products for compliance

Note because QAComet isn’t a legal firm, I cannot give specific legal advice about whatever legal questions you may have. But, we can help ensure your chatbot is compliant and remains on track with your policies when giving responses.

One thing to realize though is these AI tools are fundamentally stochastic, meaning the input you’re giving these models can result in seemingly limitless variations as output. This is because of their probabilistic nature. At its core, these LLMs are just billion-plus dimensional probability distributions, so while they can help accelerate a ton of workflows it is important to understand their limitations and mitigate against their inherent risks.

There are many techniques AI researchers have developed for assuring AI safety. We can mention some here, but this is a whole other can of worms and depends largely on the type of problem your AI tools are solving.

Some techniques for AI compliance

Because compliance for AI tools and LLM-based applications is an open problem with new research coming out constantly, we can only touch upon some general techniques and tools companies use to deal with wrangling them. These include

  • Adversarial testing: It’s useful to both test these models manually and automatically with different types of adversarial attacks. Both methods help establish holes within your AI applications and help get an idea about the areas requiring more work.
  • LLM observability frameworks: There’s a growing ecosystem of LLM observability tools out there, from open-source projects to commercial software for logging, analyzing, and continuously testing LLMs. Without sufficient logging and metrics, you can easily be left in the dark about how your LLM is performing in (pre-)production.
  • Automated hallucination reduction: Beyond the general similarities between all these platforms, like the ability to log their general behaviors, there’s a growing list of techniques for reducing hallucinations. Some of them include fine-tuning models on specialized domains of information, checking responses across multiple seeds or multiple analogous queries, developing specialized scoring functions for how accurate the model thinks its answer is, including chain-of-thought reasoning in responses, having a grader model etc.
  • Implementing guardrails: Having additional guardrails is important for making sure your chatbots aren’t deviating outside the bounds of their intended functionality. For example, you don’t want your AI chatbot to start asking a customer personal questions when it should only be an assistant in a pre-defined workflow.

But the list of techniques will only increase as researchers find more ways to probe inside LLMs and applications built on top of them.

What should be clear today is that LLM-based tools pose some serious risks to companies if they aren't properly taking into account their behavior. QAComet can pick up the slack and help your company develop robust AI tools that your customers love. Feel free to schedule a no-cost no-obligation call today using our Calendly to see how we can help.

References