AI's New Playbook: Teaching Machines to Ask Their Own Questions
What if AI could stop copying humans and start teaching itself? A new self-play system is rewriting the rules of machine learning.
The AZR system uses self-play to generate and solve Python coding problems via large language models, refining itself through success/failure feedback.
This approach has improved coding and reasoning skills in Qwen models (7B and 14B parameters), outperforming some human-curated datasets.
Andrew Zhao, a Tsinghua PhD student, explained the shift in AI learning:
"In the beginning you imitate your parents and do like your teachers, but then you basically have to ask your own questions."
Zilong Zheng, a BIGAI researcher, highlighted the scaling challenges:
"The difficulty level grows as the model becomes more powerful."
Current limitations restrict AZR to problems with objective correctness (math/coding), but researchers envision future applications for agent tasks like web browsing. Meta, Stanford, and Salesforce have explored similar self-play approaches for software engineering and agent improvement.
Open-source Qwen models were used in testing, though users report scaling difficulty as model power increases.