HEM by Vectara: Rating AI Hallucinations for Reliable Benchmarking

Does AI hallucinate? Yes…..but what if we could rate the hallucinations created by different LLMs to benchmark their performance.

Vectara has released Hallucination Evaluation Model (HEM), an open source model to evaluate AI generation and measure AI accuracy.

Just like a personal credit score, it creates ratings for various LLMs that will be updated frequently.

Here are some highlights:

+ It is aimed at detecting and quantifying hallucinations in Retrieval Augmented Generation (RAG) systems.

+ Provides a FICO-like score for grading LLMs, crucial for businesses considering AI adoption.

+ The model addresses major concerns about AI-generated errors, like misinformation or biases.

+ HEM’s leaderboard offers an objective comparison of popular models like GPT-4, Cohere, and Google Palm.

+ Vectara’s model opens the door for safer AI integration in sectors where factual accuracy is non-negotiable.

From the current leaderboard, it seems that GPTs and Llama are faring better with lower hallucinations than Cohere or PaLM. But time will tell as LLMs evolve and these evaluations become more accurate.

What are your thoughts on LLM accuracy benchmarking and collaboration?

#generativeai #hallucinations #aibusiness #aichallenges #aicompliance

Data: Vector / Github

Related Posts

Charlie Munger Edition

"Those who keep learning, will keep rising in life. I constantly see people rise in life who are not the smartest, sometimes not even the most diligent, but they are learning machines.” 

Google’s Gemini AI: Redefining Excellence in Multimodal Computing

Gemini, Google's secret AI project, is now live and the AI landscape will never be the same again.

AI Takes Center Stage: AWS Redefines Cloud Computing at re:Invent 2023

Amazon Web Services (AWS) is reshaping the narrative of AI by carving out a future where cloud computing and AI are not just aligned—they're inseparable.

Decoding Success: The Crucial Role of Optionality in Strategy

Optionality is one of the least understood but yet one of the most powerful strategic levers that you can create for yourself and your organization.

Microsoft Dominates AI Landscape with Ignite 2023 Unveilings

Microsoft's announcements during Ignite yesterday indicate that it is now the 800 pound gorilla in the AI business.

Humane™ AI Pin: A $699 Game-Changer in Wearable Tech

Will Humane™'s AI Pin change the way we use smartphones and AI? Or will it go the way of Microsoft Zune, Amazon Fire, Google Glass etc.
Scroll to Top