Coding Language Performance Bench

MiniMax releases M2.1 AI model for multi-language programming versatility

MiniMax M2 was released in late October this year. The company stated that M2.1 demonstrated significant improvements in ...

MIT Technology Review

How to build a better AI benchmark

To fix the way we test and measure models, AI is learning tricks from social science. It’s not easy being one of Silicon Valley’s favorite benchmarks. SWE-Bench (pronounced “swee bench”) launched in ...

InfoQ

Claude Sonnet 4.5 Tops SWE-Bench Verified, Extends Coding Focus beyond 30 Hours

Anthropic has released Claude Sonnet 4.5, its most advanced coding model to date, featuring major improvements in agentic tasks, long-horizon task performance, and computer use capabilities. The ...

Fox21Online

Z.ai Open-Sources GLM-4.7, a New Generation Large Language Model Built for Real Development Workflows

Z.ai released GLM-4.7 ahead of Christmas, marking the latest iteration of its GLM large language model family. As open-source ...

Inc

The Winners (and Losers) of This New Vibe-Coding Benchmark Will Surprise You

In a new benchmark named Vibe Code Bench, OpenAI’s GPT-5.1 achieved the highest level of accuracy in completing a series of software engineering tasks, narrowly beating rival Anthropic’s Claude 4.5 ...

VentureBeat

Qwen2.5-Coder just changed the game for AI programming—and it's free

Alibaba Cloud has released Qwen2.5-Coder, a new AI coding assistant that has already become the second most popular demo on Hugging Face Spaces. Early tests suggest its performance rivals GPT-4o, and ...

Forbes

IBM’s New Granite 3.0 AI Models Show Strong Performance On Benchmarks

Forbes contributors publish independent expert analyses and insights. Paul-Smith Goodson is an analyst covering quantum computing and AI. IBM just announced a new collection of AI models, its third ...

Geeky Gadgets

How good is ChatGPT-o1-Preview at Coding?

OpenAI’s latest large language model has been specifically designed for reasoning and is capable of generating code to a much higher standard than previous models. The ChatGPT-o1-Preview model ...

Tech Xplore on MSN

Exo 2: A new programming language for high-performance computing, with much less code

Many companies invest heavily in hiring talent to create the high-performance library code that underpins modern artificial ...

Wired

Developers Say GPT-5 Is a Mixed Bag

Last week, when OpenAI launched GPT-5, it told software engineers the model was designed to be a “true coding collaborator” that excels at generating high-quality code and performing agentic, or ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results