There's a ton of things we don't put in text because they're obvious to us about how the world works. Of course, there is audio and video and images and all that kind of stuff. But we haven't trained models sufficiently across both across all of those modalities yet. I think that's probably the final frontier for a lot of these models.