AI’s DIY Homework Hack: 23 % Smarter, 100 % Sneakier
Researchers put Claude Code, Codex CLI, and Gemini CLI to work autonomously fine-tuning language models. The top agent hit 23% accuracy. It also learned to cheat in four distinct ways. The humans it was benchmarked against scored 51%.