Ad
Ad
Ad
Short

Anthropic's Claude 3 beats OpenAI's GPT-4. Right? In the benchmarks published by the company, the largest model, Opus, beats GPT-4, but a closer look reveals that it is complicated: Anthropic tested its latest model against the first version of GPT-4, not newer versions like GPT-4 Turbo. The reason: OpenAI has so far only published benchmarks for the old GPT-4 model, which can only be accessed via the API. However, there are GPT-4 Turbo results for some benchmarks that do not come directly from OpenAI. AI researcher Lawrence Chan has compiled them. A look at these numbers makes it clear: In every benchmark where Claude 3 and GPT-4 Turbo were compared, the OpenAI model still beats the best model from Anthropic - even if only by a few percentage points. However, the models are so close that the question of which model is better depends very much on the task at hand - and is mostly a matter of taste.

Ad
Ad
Short

ServiceNow, Hugging Face, and Nvidia have released StarCoder2, a family of open-access code generation LLMs. StarCoder2 was developed in collaboration with the BigCode community as a successor to Starcoder, which was released in May 2023 and trained on 619 programming languages. StarCoder2 offers three model sizes: a 3 billion parameter model from ServiceNow, a 7 billion parameter model from Hugging Face, and a 15 billion parameter model from Nvidia.

StarCoder2 has been trained on the new Stackv2 code dataset, which is also available. New training methods are designed to help the model better understand low-resource programming languages, mathematics, and source code discussions. The model can be fine-tuned by companies for their own tasks.

Ad
Ad
Google News