They have a core LLM called Vicunia, which is the main LLM in their system. It is connected to an adapter module that allows it to process images and other modalities like audio. The outputs are generated by different adapters depending on the prompt. This multimodal model can handle text, images, video, and audio. Although it's a proof of concept, it demonstrates the potential of an end-to-end general-purpose multimodal language model.
Our 138th episode with a summary and discussion of last week's big AI news, with guest host Jon Krohn of the the SuperDataScience podcast!!
Note: this one is coming out a week late, but we'll be back on schedule going forward!
Read out our text newsletter and comment on the podcast at https://lastweekin.ai/
Email us your questions and feedback at contact@lastweekin.ai
Timestamps + Links:
- (00:00) Intro / Banter
- (04:30) Preview of news
- Tools & Apps
- Applications & Business
- Projects & Open Source
- Research & Advancements
- Policy & Safety
- Synthetic Media & Art