Tongyi Lab has released PawBench v1.0 as open-source. The benchmark targets
personal-assistant and general-agent scenarios and integrates base models and
the runtime framework Harness into a single evaluation system. PawBench conducts
cross-evaluation of model, Harness and task components rather than operating as
a simple model leaderboard.