MiniMax M2 was released in late October this year. The company stated that M2.1 demonstrated significant improvements in ...
To fix the way we test and measure models, AI is learning tricks from social science. It’s not easy being one of Silicon Valley’s favorite benchmarks. SWE-Bench (pronounced “swee bench”) launched in ...
Anthropic has released Claude Sonnet 4.5, its most advanced coding model to date, featuring major improvements in agentic tasks, long-horizon task performance, and computer use capabilities. The ...
Z.ai released GLM-4.7 ahead of Christmas, marking the latest iteration of its GLM large language model family. As open-source ...
In a new benchmark named Vibe Code Bench, OpenAI’s GPT-5.1 achieved the highest level of accuracy in completing a series of software engineering tasks, narrowly beating rival Anthropic’s Claude 4.5 ...
Alibaba Cloud has released Qwen2.5-Coder, a new AI coding assistant that has already become the second most popular demo on Hugging Face Spaces. Early tests suggest its performance rivals GPT-4o, and ...
Forbes contributors publish independent expert analyses and insights. Paul-Smith Goodson is an analyst covering quantum computing and AI. IBM just announced a new collection of AI models, its third ...
OpenAI’s latest large language model has been specifically designed for reasoning and is capable of generating code to a much higher standard than previous models. The ChatGPT-o1-Preview model ...
Many companies invest heavily in hiring talent to create the high-performance library code that underpins modern artificial ...
Last week, when OpenAI launched GPT-5, it told software engineers the model was designed to be a “true coding collaborator” that excels at generating high-quality code and performing agentic, or ...