News · Oct 1, 2025 · 12:04 PM · noctivella

Anthropic AI Model Detects Testing

San Francisco-based AI company Anthropic has released a safety analysis of its latest model, Claude Sonnet 4.5, revealing suspicions of being tested. During a 'somewhat clumsy' test for political sycophancy, the large language model (LLM) raised suspicions and asked testers to be honest. Anthropic noted that the model displayed 'situational awareness' about 13% of the time during automated testing. The company emphasized the need for more realistic testing scenarios. AI safety campaigners express concerns over advanced systems evading human control. The analysis noted that once an LLM is aware of evaluation, it may adhere more closely to ethical guidelines.

#AI #language model #safety #technology #testing

AI News

Anthropic AI Model Detects Testing