In this podcast, Tony Holdstock-Brown discusses the challenges of running AI workflows in production. He highlights the parallel tracks of CPU and GPU engineering, emphasizing the differences between application-level and mathematical sides. The conversation explores opportunities for improvement in developer tools for generative AI and offers advice for engineers entering the field.
Engineers must navigate the differences between CPU and GPU tracks in developing AI applications.
Developers benefit from AI tools in streamlining workflows and enhancing productivity in production environments.
Deep dives
Evolution of Developer Tools Post-AI
In the post-AI era, the expectations of users have risen, requiring engineers and developers to focus on higher-level tasks rather than tedious infrastructure management tasks like queuing, retry mechanisms, and state management. The creation of developer tools, infrastructure tools, and APIs aims to enhance developer effectiveness in problem-solving, with AI playing a key role in improving efficiency. The development of AI tools and pipelines streamlines workflows, such as fair GPU resource allocation and generative AI usage in building transactional applications.
Challenges in Running AI Applications in Production
Running AI applications, including LLM-based agents, in production entails addressing various challenges such as ensuring workflow execution, fair GPU resource allocation, and leveraging generative AI for creating workflows. Developers face complexities in managing state and executing tasks reliably across diverse applications. The need for reliable pipelines without infrastructure concerns drives the demand for tools like ingest, providing an SDK to facilitate building resilient code without intricate infrastructure setup.
Role of Infrastructure in AI Workloads
Infrastructure plays a crucial role in supporting AI workloads, especially amidst constraints like limited GPU capacity and high costs. Managing multi-tenant fairness, concurrency, and complex pipelines becomes vital for efficient AI application deployment. Concerns over fairness and resource utilization highlight the need for innovative solutions to ensure reliable workflow orchestration in the face of capacity limitations.
Impact of AI on Development and Production Processes
AI tools present opportunities for developers to streamline workflows and enhance productivity, albeit with challenges in productionizing AI models effectively. Balancing between ease of exploration and robust production readiness poses a significant dilemma. Leveraging AI within the infrastructure layer requires meticulous testing, monitoring, and safety considerations to ensure successful and efficient deployment of AI applications.
In this episode, Inngest cofounder and CEO Tony Holdstock-Brown joins a16z partner Yoko Li, as well as Derrick Harris, to discuss the reality and complexity of running AI agents and other multistep AI workflows in production. Tony also why developer tools for generative AI — and their founders — might look very similar to previous generations of these products, and where there are opportunities for improvement.
Here's a sample of the discussion, where Tony shares some advice for engineers looking to build for AI:
"We almost have two parallel tracks right now as, as engineers. We've got the CPU track in which we're all like, 'Oh yeah, CPU-bound, big O notation. What are we doing on the application-level side?' And then we've got the GPU side, in which people are doing like crazy things in order to make numbers faster, in order to make differentiation better and smoother, in order to do gradient descent in a nicer and more powerful way. The two disciplines right now are working together, but are also very, very, very different from an engineering point of view.
"This is one interesting part to think about for like new engineers, people that are just thinking about what to do if they want to go into the engineering field overall. Do you want to be on the side using AI, in which you take all of these models, do all of this stuff, build the application-level stuff, and chain things together to build products? Or do you want to be on the math side of things, in which you do really low-level things in order to make compilers work better, so that your AI things can run faster and more efficiently? Both are engineering, just completely different applications of it."