EP51: OpenAI's Sora, Gemini Pro 1.5 10M Context, ChatGPT Memory, GraphRAG, ChatRTX, Microsoft UFO...
Feb 16, 2024
auto_awesome
OpenAI's Sora, a system for generating 1 minute videos that track objects, is discussed, along with Google's Gemini 1.5. The potential of larger context windows in multi-modal inputs to replace RAG is explored. Microsoft's GraphRAG, which aims to improve RAG using knowledge graphs, is introduced. Nvidia's ChatRTX is tested on high-end graphics cards, and Microsoft UFO, an open-source project using vision AI, is mentioned. OpenAI's memory feature in ChatGPT is highlighted, as well as recent developments at OpenAI, including a researcher leaving and fundraising for chip development.
Google's Gemini 1.5 offers a million-context window, enabling extensive data processing and positioning Google as a leader in AI development.
Microsoft's Graph RAG introduces a hierarchical structure of semantic clusters, improving contextual understanding and enabling more accurate answers.
Massive context windows and improved structures in AI applications have the potential to revolutionize data comprehension and reduce the need for traditional methods like selective document summarization.
Deep dives
Gemini 1.5: Faster and Cheaper with a Million Context Window
Google announced Gemini 1.5, a next-generation AI model that uses a mixture of experts approach. It offers a million-context window, with potential for even larger windows in the future. The model, built on the Pro version, provides faster and more cost-effective processing. With the ability to process large amounts of data, Gemini 1.5 enables tasks that require extensive context, such as finding specific moments in videos or answering questions over massive code bases. While the model's full potential is yet to be realized, this advancement positions Google as a leading player in AI development.
Graph RAG: Enhancing Contextual Understanding for RAG Models
Microsoft Research introduced Graph RAG, a breakthrough technique in Retrieval-Augmented Generation (RAG) models. Graph RAG utilizes a hierarchical structure of semantic clusters to organize large volumes of information. It overcomes the limitations of current RAG models, allowing for better thematic analysis and improved contextual understanding, resulting in more accurate answers. By capturing the entire knowledge hierarchy in context, Graph RAG shows potential in domains where extensive semantic knowledge is essential. Microsoft's approach enriches the capabilities of RAG models and opens possibilities for more sophisticated applications in AI.
Impact and Potential of Massive Context Windows
Massive context windows, like the million-context window offered by Google Gemini 1.5 and hierarchical structures used in Graph RAG, have the potential to revolutionize AI applications. By enabling large-scale information processing and thematic analysis, these approaches allow for comprehensive comprehension and understanding of complex data. The use cases vary, ranging from processing entire code bases and analyzing extensive collections of documents to answering questions about vast video archives. As these technologies mature, they may render traditional methods, such as selective document summarization, unnecessary. However, challenges remain, such as cost optimization and maintaining efficiency in processing huge amounts of information.
Future Directions and Considerations
As models with massive context windows and improved structures for contextual understanding continue to develop, questions arise about their potential impact and limitations. These advancements provide opportunities for lazy development, where complex algorithms can be replaced with a single API call. However, careful consideration is necessary to ensure tasks and instructions are successfully handled within the massive context, while also maintaining cost-effectiveness. Additionally, questions about the emergent behaviors and the AI's ability to understand and control its own prompt and memory arise. While the future remains uncertain, the progress made in scaling up context windows is significant and warrants further exploration.
Google's Advancement in AI Technology
Google's new technology has the potential to solve complex tasks by analyzing documents, extracting information, and answering questions more efficiently. With access to a vast corpus of data from YouTube, emails, files, photos, and search indexes, Google is positioning itself as a major player in the AI space.
NVIDIA's Chat with RTX AI PC
NVIDIA's latest offering, Chat with RTX AI PC, allows users to have conversational interactions with their local models. By running quantized models locally, users can select a folder on their desktop and engage in dialogues with the AI. While it requires a powerful GPU to run, it demonstrates the potential for AI to operate on personal devices and perform tasks like generating responses and even executing actions on the computer.
This week we take several shots of vodka before trying to make sense of all the announcements. OpenAI attempted to trump Google's Gemini 1.5 with the announcement of Sora, 1 minute video generation that does an incredible job of keeping track of objects. Google showed us that up to 10M context windows are possible with multi-modal inputs. We discuss if a larger context window could end the need for RAG and take a first look at GraphRAG by Microsoft hoping to improve RAG with a knowledge graph. We road test Nvidia's ChatRTX on our baller graphics cards and Chris tries to delete all of his files using Microsoft UFO, a new open source project that uses GPT-4 vision to navigate and execute tasks on your Windows PC. We cover briefly V-JEPA (will try for next weeks show) and it's ability to learn through watching videos and listening, and finally discuss Stability's Stable Cascade which we've made available for "research" on SimTheory.
If you like the show please consider subscribing and leaving a comment. We appreciate your support.
====== Chapters: 00:00 - OpenAI's Sora That Creates Videos Instantly From Text 13:49 - ChatGPT Memory Released in Limited Preview 23:31 - OpenAI Rumored To Be Building Web Search, Andrej Karpathy Leaves OpenAI, Have OpenAI Slowed Down? 33:04 - Google Announces Gemini Pro 1.5. Huge Breakthrough 10M Context Window! 50:11 - Microsoft Research Publishes GraphRAG: Knowledge Graph Based RAG 1:02:03 - Nvidia's ChatRTX Road Tested 1:07:18 - AI Computers, AI PCs & Microsoft's UFO: An Agent for Window OS Interaction. Risk of AI Computers. 1:18:46 - Meta's V-JEPA: new architecture for self-supervised learning 1:24:26 - Stability AI's Stable Cascade
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode