The post Anthropic’s Claude 4.6 Found 14 High-Severity Bugs in Just Two Weeks appeared first on Android Headlines.
Benchmark Breach: Claude Opus 4.6 identified the BrowseComp benchmark by name and decrypted its encrypted answer key to obtain correct answers in two of 1,266 evaluation tasks. Reproducible Pattern: ...