Do you see model distillation as a way to get around the size issue, or do you see something more fundamental there? The language text inputs are sort of key to being able to reason about instructions and have all this knowledge encoded in it. So when you distill things, you inherently lose some of that reasoning capability and that knowledge. To some extent, probably, but also probably not to the extent that it's needed for edge hardware.