Why Reasoning-Focused Language Models Sometimes Hallucinate More Than General Models — Evidence, Costs, and How to Test Properly

https://www.tumblr.com/spectralandroidmercenary/810235633437147136/why-ctos-can-no-longer-treat-llm-hallucinations-as

Reasoning models recorded 2-3x higher factual-error rates on mixed-task evaluations The data suggests a consistent pattern across independent tests: models tuned or prompted for explicit step-by-step reasoning often report higher rates of