Anthropic says its latest AI bot can beat Gemini and ChatGPT

Photo illustration of a brain made of data points.
Image: The Verge

Anthropic, the AI company started by several former OpenAI employees, says the new Claude 3 family of AI models performs as well as or better than leading models from Google and OpenAI. Unlike earlier versions, Claude 3 is also multimodal, able to understand text and photo inputs.

Anthropic says Claude 3 will answer more questions, understand longer instructions, and be more accurate. Claude 3 can understand more context, meaning it can process more information. There’s Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, with Opus being the largest and “most intelligent model.” Anthropic says Opus and Sonnet are now available on claude.ai and its API. Haiku will be released soon. All three models can be deployed on chatbots, auto-completion, and data extraction tasks.

Previous versions of Claude refused to answer some prompts that were harmless, which the company writes “suggests a lack of contextual understanding.” The new models are less likely to refuse to answer prompts that toe the line of its safety guardrails, similar to rumors about Meta’s plans for Llama 3 when it’s released.

Bar chart showing a significantly lower rate of “refused on harmless prompt” responses by Claude 3 AI models (near or below 10 percent), compared to Claude 2.1 (around 25 percent).
Image: Anthropic
Incorrect refusals on Claude 3 versus Claude 2.1.

Anthropic claims Claude 3 models can give near-instant results even while parsing dense material like a research paper. A blog post says Haiku, the smallest version of Claude 3, is “the fastest and most cost-effective model on the market,” able to read a dense research paper complete with charts and graphs “in less than three seconds.”

Anthropic says Opus outperformed most models in several benchmarking tests. It showed better graduate-level reasoning than OpenAI’s GPT-4, getting 50.4 percent in that test over GPT-4’s 35.7 percent. It also answered math questions, coded, and understood reasoning better.

A list of benchmark scores comparing AI models from Anthropic, OpenAI, and Google, showing Claude 3 (Opus) as the highest scoring model on all of the tests listed.
Image: Anthropic
Claude 3 models compared to GPT-4, GPT-3.5, and Gemini 1.0 Ultra / Pro.

The new models also significantly improve against the previous Claude 2.1 model. Sonnet, the middle ground model, was twice as fast as Claude 2 and Claude 2.1. “It excels in tasks demanding rapid responses, like knowledge retrieval or sales automation,” Anthropic said.

Anthropic trained the Claude 3 models on a mix of nonpublic internal and third-party datasets and publicly available data as of August 2023. The company says in a paper introducing the three models that these were trained using hardware from Amazon’s AWS and Google Cloud. Both companies invested in Anthropic, with Amazon putting $4 billion into the company. Claude 3 will be available on AWS’s model library Bedrock and in Google’s Vertex AI.