AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
James Zhu from Stanford introduces the concept of Mixture of Agents, a novel technique where each 'component' in the neural architecture is an 'agent' like a language model. This collaboration among different language models enhances overall model capabilities, outperforming the latest GPT models on benchmarks like Alpaca Eval.
Kyle Corbett from OpenPipe discusses their parallel work using GPT-4.0 within a Mixture of Agents framework to achieve high-quality output for inference tasks. While acknowledging the speed and cost challenges, they aim to streamline the process for faster response times and more cost-effective operations.
OpenPipe's efforts focus on distilling high-performing Mixture of Agents output into smaller models, significantly reducing costs and latency. Their experiments show exceeding GPT-4.0 performance on specific tasks, maintaining high quality while offering a substantial cost reduction of up to 125x and decreasing latency by one-third.
OpenRouter has continued to expand its marketplace by onboarding various suppliers like OpenAI, Entropic, Google, Mistral, Movita, and Win. This platform aims to serve as a marketplace for LLN calls or tokens, allowing seamless switching between providers. The focus is on democratizing access to models, improving distribution, and enhancing the developer experience.
OpenRouter offers free accessibility to select models to encourage trial before purchase, leveraging lower-cost 7 billion parameter models for this purpose. The platform implements 100% discounts on certain models while emphasizing load balancing and distribution. In terms of security, users have the option to share prompts with OpenRouter, which aids in automatic model selection. Data privacy is addressed by enabling users to opt-out of data collection, emphasizing user consent and data protection.
OpenRouter not only facilitates seamless model switching but also encourages app discovery and exposure, particularly for open-source and niche models. The platform serves as a hub for developers to spotlight their apps, fostering community engagement and innovation. By maintaining a user-friendly interface and prioritizing user feedback, OpenRouter continues to evolve as a dynamic marketplace connecting developers, providers, and users.
Hey all, Alex hereā¦ well, not actually here, Iām scheduling this post in advance, which I havenāt yet done, because I'm going on vacation!
Thatās right, next week is my birthday š and a much needed break, somewhere with a beach is awaiting, but I didnāt want to leave you hanging for too long, so posting this episode with some amazing un-released before material.
Mixture of Agents x2
Back in the far away days of June 20th (not that long ago but feels like ages!), Together AI announced a new paper, released code and posted a long post about a new method to collaboration between smaller models to beat larger models. They called it Mixture of Agents, and James Zou joined us to chat about that effort.
Shortly after that - in fact, during the live ThursdAI show, Kyle Corbitt announced that OpenPipe also researched an approached similar to the above, using different models and a bit of a different reasoning, and also went after the coveted AlpacaEval benchmark, and achieved SOTA score of 68.8 using this method.
And I was delighted to invite both James and Kyle to chat about their respective approach the same week that both broke AlpacaEval SOTA and hear how utilizing collaboration between LLMs can significantly improve their outputs!
This weeks buzz - what I learned at W&B this week
So much buzz this week from the Weave team, itās hard to know what to put in here. I can start with the incredible integrations my team landed, Mistral AI, LLamaIndex, DSPy, OpenRouter and even Local Models served by Ollama, LmStudio, LLamaFile can be now auto tracked with Weave, which means you literally have to only instantiate Weave and itāll auto track everything for you!
But I think the biggest, hugest news from this week is this great eval comparison system that the Weave Tim just pushed, itās honestly so feature rich that Iāll have to do a deeper dive on it later, but I wanted to make sure I include at least a few screencaps because I think it looks fantastic!
Open Router - A unified interface for LLMs
Iāve been a long time fan of OpenRouter.ai and I was very happy to have Alex Atallah on the show to talk about Open Router (even if this did happen back in April!) and Iām finally satisfied with the sound quality to released this conversation.
Open Router is serving both foundational models like GPT, Claude, Gemini and also Open Source ones, and supports the OpenAI SDK format, making it super simple to play around and evaluate all of them on the same code. They even provide a few models for free! Right now you can use Phi for example completely free via their API.
Alex goes deep into the areas of Open Router that I honestly didnāt really know about, like being a marketplace, knowing what trendy LLMs are being used by people in near real time (check out WebSim!) and more very interesting things!
Give that conversation a listen, Iām sure youāll enjoy it!
Thatās it folks, no news this week, I would instead like to recommend a new newsletter by friends of the pod Tanishq Abraham and Aran Komatsuzaki both of whom are doing a weekly paper X space and recently start posting it on Substack as well!
Itās called AI papers of the week, and if youāre into papers which we donāt usually cover, thereās no better duo! In fact, Tanishq often used to come to ThursdAI to explain papers so you may recognize his voice :)
See you all in two weeks after I get some seriously needed R&R š ššļø
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode