Claude Opus 4.6: 1M Context & 72.5% SWE-Bench Score

Claude Opus 4.6: 1M Context & 72.5% SWE-Bench Score

Anthropic Releases Claude Opus 4.6: Most Advanced AI Model with 1M Token Context Window

Anthropic has launched Claude Opus 4.6, marking a significant leap forward in AI capabilities with enhanced coding skills, extended reasoning, and the first 1M token context window for an Opus-class model. Released on February 5, 2026, this flagship model sets new industry standards across agentic coding, multi-step workflows, and knowledge-intensive tasks.

Advanced Coding and Agent Capabilities

Claude Opus 4.6 delivers production-ready code with minimal oversight, demonstrating substantial improvements in planning, sustained effort, and codebase navigation. The model achieved a groundbreaking 72.5% resolution rate on SWE-bench Verified, establishing it as the most capable AI coding assistant ever released. On Terminal-Bench 2.0, an agentic coding evaluation, Opus 4.6 scored higher than any other frontier model.

The model excels at autonomous code generation, debugging, and multi-file refactoring without hand-holding. Early adopters report that Opus 4.6 handles multi-million-line codebase migrations like a senior engineer, adapting strategies as it learns and completing tasks in half the expected time.

Benchmark Performance and Reasoning

Opus 4.6 dominates industry benchmarks across multiple domains. On GDPval-AA, which evaluates performance on economically valuable knowledge work tasks, Opus 4.6 outperforms OpenAI's GPT-5.2 by approximately 144 Elo points and its predecessor Claude Opus 4.5 by 190 points. The model also leads on BrowseComp, achieving the highest score for locating hard-to-find information online.

In multidisciplinary reasoning, Opus 4.6 achieved top scores on Humanity's Last Exam, a complex test spanning multiple fields. The model demonstrated exceptional performance in cybersecurity investigations, producing the best results in 38 out of 40 blind rankings against Claude 4.5 models.

Extended Context and Long-Running Tasks

Opus 4.6 introduces a 1M token context window in beta, marking a significant expansion from the standard 200K token limit. On the MRCR v2 needle-in-a-haystack benchmark, which tests information retrieval across vast amounts of text, Opus 4.6 scored 76% compared to just 18.5% for Sonnet 4.5. This represents a qualitative shift in handling context without performance degradation, commonly known as "context rot".

The model supports up to 128K output tokens, enabling completion of larger tasks without breaking them into multiple requests. A new Compaction API automatically summarizes long conversations to prevent critical information from being truncated during extended workflows.

New API Features and Controls

Anthropic introduced several developer-focused features alongside Opus 4.6:

  • Adaptive thinking: The model automatically decides when deeper reasoning is beneficial, replacing the previous binary choice between enabling or disabling extended thinking
  • Effort controls: Four levels (low, medium, high, and max) allow developers to balance intelligence, speed, and cost based on task requirements
  • Context compaction: Automatically summarizes older context when approaching the window limit, enabling longer-running agents
  • US-only inference: Available at 1.1× token pricing for workloads requiring geographic restrictions

Product Integration and Availability

Claude Opus 4.6 is available on claude.ai, the Claude API, and major cloud platforms including Amazon Bedrock. AWS announced immediate availability through Amazon Bedrock with enterprise-grade security and responsible AI controls.

Anthropic also released Claude in PowerPoint as a research preview, complementing the upgraded Claude in Excel. The PowerPoint integration reads layouts, fonts, and slide masters to maintain brand consistency while generating presentations.

Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, with premium pricing of $10/$37.50 for prompts exceeding 200K tokens.

Safety and Alignment

Despite significant capability improvements, Opus 4.6 maintains a strong safety profile with low rates of misaligned behaviors including deception, sycophancy, and misuse cooperation. The model shows the lowest over-refusal rate of any recent Claude model, meaning it responds appropriately to benign queries while maintaining safety guardrails.

Anthropic conducted the most comprehensive safety evaluation suite for any model release, including new assessments for user wellbeing, complex refusal scenarios, and interpretability methods to understand model behavior. Six new cybersecurity probes were developed specifically to detect and prevent potential misuse of the model's enhanced security capabilities.

Comments 0

No comments yet

Be the first to share your thoughts!

Leave a Comment

Your comment will be reviewed before being published.
React to this post
3 reactions