Tongyi Lab has released PawBench v1.0 as open-source. The benchmark targets personal-assistant and general-agent scenarios and integrates base models and the runtime framework Harness into a single evaluation system. PawBench conducts cross-evaluation of model, Harness and task components rather than operating as a simple model leaderboard.

2026-06-05

Tongyi Lab has released PawBench v1.0 as open-source. The benchmark targets personal-assistant and general-agent scenarios and integrates base models and the runtime framework Harness into a single evaluation system. PawBench conducts cross-evaluation of model, Harness and task components rather than operating as a simple model leaderboard.