Image missing.
A Chinese firm has just launched a constantly changing set of AI benchmarks

Caiwei Chen

created: June 23, 2025, 3:46 p.m. | updated: June 24, 2025, 9:21 p.m.

Development of the benchmark at HongShan began in 2022, following ChatGPT’s breakout success, as an internal tool for assessing which models are worth investing in. One is similar to traditional benchmarking: an academic test that gauges a model’s aptitude on various subjects. The other is more like a technical interview round for a job, assessing how much real-world economic value a model might deliver. The team has committed to updating the test questions once a quarter and to maintain a half-public, half-private data set. To assess models’ real-world readiness, the team worked with experts to develop tasks modeled on actual workflows, initially in recruitment and marketing.

3 days, 9 hours ago: MIT Technology Review