Sara Hooker, Director at Cohere and head of Cohere For AI, joins the podcast to discuss challenges with multilingual models, the Mixture of Experts technique, common language between ML researchers and hardware architects, impact and emotional connection of language models, benefits and safety concerns of universal models, and the significance of grounded conversations in AI model development.
Read more
AI Summary
Highlights
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Multilingual models face challenges with poor data quality and tokenization, relying on data augmentation and preference training for improvement.
The Mixture of Experts technique has disadvantages but can be motivated by factors like increased performance and flexibility in language models.
Common language and collaboration between ML researchers and hardware architects are important for addressing pain points in frameworks and creating cohesion between communities.
Deep dives
Cohere: Innovating Research Space and Collaboration
Cohere is a research lab that aims to create a new type of research space by combining traditional industry lab practices with open science initiatives. Their goal is to work at the frontier of language models and scaling infrastructure while fostering cross-institutional collaboration. With their unique approach, Cohere provides a platform for researchers from around the world to contribute and collaborate on AI research. Despite the unexpected challenges faced, such as the rapid interest in language models like GPT-3, Cohere continues to strive towards their mission of creating a meaningful and different type of research.
IOPr: Advancing Multilingual AI with Open Science
The IOPr project led by Cohere aims to make language models more accessible for languages other than English. By collaborating with researchers worldwide, they aim to cover 101 languages, creating the largest open source model release. With a focus on improving multilingual coverage and data quality, IOPr strives to realize breakthroughs in natural language understanding and generation. They also explore the challenges of tokenization, optimization, and preference training for multilingual models. The project's ultimate goal is to enable high-quality and efficient language processing across diverse languages.
Data Pruning for More Efficient Training
The One Less is More paper explores the concept of data pruning in pre-training to achieve more efficient training for deep learning models. By focusing on high-quality data and reducing the amount of noisy or irrelevant examples, the researchers demonstrated that equivalent performance can be achieved with just 30% of the training data. They compared different techniques for data pruning and highlighted the significance of using simple metrics, such as perplexity, to identify high-quality data. This approach not only optimizes training time and capacity requirements but also enhances the overall efficiency of deep learning algorithms.
Software Portability and the Challenge of Framework Lock-In
The Grand Illusion paper addresses the lack of software portability among mainstream AI frameworks. It highlights the limitations of frameworks like PyTorch and TensorFlow in terms of their portability across different hardware architectures. The researchers found that transferring models between frameworks often resulted in failures and performance slowdowns. This lack of portability restricts researchers from leveraging different hardware types and hampers innovation and experimentation. The paper emphasizes the need for software frameworks that are more portable across different hardware systems to enable greater flexibility and support for diverse research needs.
The Importance of Responsible Use of Technology
There is a growing recognition of the need to use new technology in a responsible way. The focus is on ensuring that powerful tools are not a threat to humans and human welfare. While there are concerns about future models and existential risks, it is also important to address current risks associated with the use of these models. For example, the ease of generating text indistinguishable from human-written content can lead to the spread of misinformation. Investment in developing techniques to identify model-generated text and ensuring traceability can help mitigate these risks. The allocation of resources should consider both addressing future risks and mitigating current risks.
The Challenges of Evaluating Risk and Identifying Frontier Models
Defining and characterizing frontier models, which have the potential to cause significant harm to human welfare, is a complex task. There is a need to establish benchmarks and evaluation frameworks to measure progress and identify when a model transitions from being a normal model to a frontier model. However, the lack of consensus on evaluation criteria and the absence of standard evaluation techniques pose challenges. Additionally, the focus on long-term risks, such as sentient AI, can make it difficult to hold models accountable. A technical grounding and accountability framework are essential for effectively addressing current risks associated with AI models.
Today we’re joined by Sara Hooker, director at Cohere and head of Cohere For AI, Cohere’s research lab. In our conversation with Sara, we explore some of the challenges with multilingual models like poor data quality and tokenization, and how they rely on data augmentation and preference training to address these bottlenecks. We also discuss the disadvantages and the motivating factors behind the Mixture of Experts technique, and the importance of common language between ML researchers and hardware architects to address the pain points in frameworks and create a better cohesion between the distinct communities. Sara also highlights the impact and the emotional connection that language models have created in society, the benefits and the current safety concerns of universal models, and the significance of having grounded conversations to characterize and mitigate the risk and development of AI models. Along the way, we also dive deep into Cohere and Cohere for AI, along with their Aya project, an open science project that aims to build a state-of-the-art multilingual generative language model as well as some of their recent research papers.
The complete show notes for this episode can be found at twimlai.com/go/651.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode