China's National Data Administration issued an Implementation Plan to advance
construction of industry high-quality datasets, marking the first national-level
systematic deployment to use data to enable AI development. The plan targets
dataset supply, circulation and application through six targeted initiatives and
calls for sustained development of multimodal datasets — text, image, audio and
video — to meet AI application needs. It prioritizes datasets for intelligent
agents, embodied intelligence and world models, require accelerated dataset
construction, and encourages regions to set up data-annotation innovation pilot
zones. Experts say high-quality datasets are the core training input for large
models and can accelerate model performance improvements.