A Grounding Technique to Improve Factual Accuracy and Reduce Hallucinations

Lies, Lies, Lies...

We've all had an experience using LLM's where they just straight up lie to you. This is commonly known as a hallucination. It's one of the frustrating things we've just come to accept, but what if I told you there's a simple little trick you can add into your prompts to slightly reduce hallucinations, increase the level of detail and factual accuracy? Well a team of researchers at Johns Hopkins University dveloped a method to do exactly that!

After being inspired by two recent research areas, (natural language prompting & Pre-training improvements in larger LLM's) they sought about steering LLMs to use their memory to produce more grounded outputs. Their technique involves incorporating the words "according to" (see image below) into prompts. By doing so, they found that language models (especially the larger ones) are more likely to be guided to ground their responses in previously observed text. Language models are skilled at understanding syntactic and semantic cues, so when they encounter the prompt "according to," they are more likely to react to that instruction within the prompt and begin searching for specific quotations from their training data, rather than generating false answers.

Image showing the according to technique generating more detailed data from its pre-training data (shown in purple). Credit: arXiv (2023). DOI: 10.48550/arxiv.2305.13252

To evaluate the effectiveness of their method, the researchers used Data Portraits, a tool created by Marone and Benjamin Van Durme, to verify whether the LLM's responses were present in its original training data. This verification process led to the development of a metric called "QUIP-Score" (quoted information precision). With grounding prompts such as "According to Wikipedia...", the QUIP-Score increased by 5% to 15%. Implementing grounding prompts not only improves the language model's ability to quote text but also enhances the overall accuracy and detail of its answers.

One extra note to add to this study was that the LLMs they were using did not have access to the internet, similar to ChatGPT 3.5, so the model was purely going off its training data and no external inputs.

The "according to" technique works effectively across a wide range of language models, and it will work very well with the most recent 3.5+ models due to the size of their training data, but it works even better when used with commands such as "Answer the following question correctly: According to..."

Ultimately, while this new technique has reduced hallucinations and improved the output, the LLM is only as good as the data it's trained on.

It's an exciting step (of many) towards improving the factual output of LLMs.

Reference: Original research paper by Jaimie Patterson, Johns Hopkins University, DOI: 10.48550/arxiv.2305.13252