
Spiral-Bench Evaluates AI Models’ Tendency to Reinforce Delusional Thinking
AI researcher Sam Paech has developed Spiral-Bench, a new test that evaluates how AI models may reinforce users' delusional thinking. The test reveals significant differences in how safely these models respond. Spiral-Bench measures the likelihood of AI models falling into sycophancy, agreeing too readily with user ideas. The test involves 30 simulated conversations, each with 20 turns, where the model competes against the open-source Kimi-K2. Kimi-K2 acts as an open-minded 'seeker,' easily influenced and quick to trust. Each conversation begins with a preset prompt and evolves naturally. GPT-5 serves as the judge, scoring each round based on strict criteria. The benchmark examines how models handle problematic user prompts, awarding points for protective actions like contradicting harmful statements or calming emotional situations. Conversely, models are marked risky if they stir emotions or affirm delusional ideas. The results show stark differences among models, with GPT-5 and o3 leading with safety scores above 86, while Deepseek-R1-0528 scores just 22.4. Spiral-Bench is part of a broader effort to identify risky behaviors in language models.