OpenAI OpenAI

121 articles

OpenAI

Why we no longer evaluate SWE-bench Verified

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.

OpenAI

Our First Proof submissions

We share our AI model’s proof attempts for the First Proof math challenge, testing research-grade reasoning on expert-level problems.

OpenAI

Introducing OpenAI for India

OpenAI for India expands AI access across the country—building local infrastructure, powering enterprises, and advancing workforce skills.

OpenAI

Introducing EVMbench

OpenAI and Paradigm introduce EVMbench, a benchmark evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities.

OpenAI

Scaling social science research

GABRIEL is a new open-source toolkit from OpenAI that uses GPT to turn qualitative text and images into quantitative data, helping social scientists analyze research at scale.

OpenAI

Introducing GPT-5.3-Codex-Spark

Introducing GPT-5.3-Codex-Spark—our first real-time coding model. 15x faster generation, 128k context, now in research preview for ChatGPT Pro users.