This is a collaboration between Cambridge and this Chinese lab. It's actually successfully integrating the audio, the video, the text. This is all consistent with that meta AI philosophy of achieving general AI through multimodality. They explicitly say this in the paper. They're like, this is basically this is a step towards AGI.