Atila Orhon, expert in scaling large ML models for small devices, discusses challenges running models on phones and laptops. Argmax, his startup, focuses on solutions for this. Orhon shares insights from Apple and NVIDIA, optimizing ML models, and more.
Read more
AI Summary
Highlights
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Argmax focuses on optimizing large ML models for inference on phones and laptops.
Attila Orhon transitioned from academia to industry, emphasizing computer vision and ML technologies.
Techniques like compression and precomputation are crucial for efficient on-device ML model deployment.
Deep dives
ARG Max: Innovating Large Model Deployment on Commodity Hardware
ARG Max, a startup founded in 2023 by Attila Orhan, focuses on developing methods to run large ML models on non-dedicated hardware like phones and laptops. They observed that the largest models are growing, while commercially relevant smaller models are shrinking. The company received funding from General Catalyst and industry leaders, aiming to optimize ML models for efficient inference on devices.
Attila Orhan's Transition from Apple to Founding ARG Max
Attila Orhan, founder of ARG Max, shares his journey from working in computer vision at Apple to founding ARG Max. He emphasizes the progression from academia, where he worked on data science and deep learning, to the industry where he saw the rise of computer vision applications. At Apple, he focused on improving image quality in products like Camera and Photos through ML technologies.
Challenges in Deploying Models on Device and Strategies for Efficiency
In the podcast, Attila discusses the challenges of deploying ML models on devices, emphasizing the importance of optimizing for efficiency and performance. Techniques like compression through quantization, precomputation for KV cache, and fine-tuning models are employed to maximize the speed and accuracy of on-device inference while reducing memory consumption.
Whisper Kit, an open-source project released by ARG Max, focuses on enhancing real-time transcription and translation applications. By leveraging optimized techniques like precomputation and KV caching, Whisper Kit enables accurate and efficient transcriptions on devices, fueling applications like video editing with automatic captioning and crop editing based on transcript timestamps.
Community Engagement and Future Opportunities in ML Development
ARG Max encourages developers to engage with their projects through GitHub, Discord, and other platforms, offering opportunities for collaboration and learning. By contributing to open-source projects like Whisper Kit and exploring the roadmap, developers can enhance their skills and network in the ML development space, potentially leading to future career opportunities.
The size of ML models is growing into the many billions of parameters. This poses a challenge for running inference on non-dedicated hardware like phones and laptops.
Argmax is a startup focused on developing methods to run large models on commodity hardware. A key observation behind their strategy is that the largest models are getting larger, but the smallest models that are commercially relevant are getting smaller. The company was started in 2023 and has raised money from General Catalyst and other industry leaders.
Atila Orhon is the founder of Argmax and he previously worked at Apple and NVIDIA. He joins the show to talk about working in computer vision, building ML tooling at Apple, optimizing ML models, and more.
Sean’s been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer.