

Accelerating LLMs with TornadoVM: From GPU Kernels to Model Inference
May 18, 2025
Juan Fumero, a Software Engineer and contributor to TornadoVM, dives into the world of GPU acceleration and Java. He shares his insights on how TornadoVM enables efficient data parallelization and optimizes large language models like Llama3. The discussion highlights innovative features such as dynamic hardware reconfiguration, tensor types for FP8 and FP16, and the potential for model quantization. Fumero also touches on the integration possibilities with Project Babylon, emphasizing Java's growing role in enterprise applications for LLMs.
AI Snips
Chapters
Transcript
Episode notes
Adam Bien's Early GPU Story
- Adam Bien shared his early GPU experiences, including his interest in graphics cards like Elsa Winner and Voodoo.
- He enjoyed building PCs himself to get the best hardware for graphics and resolution, even without a focus on gaming.
TornadoVM's Java Integration
- TornadoVM is a Java parallel framework that accelerates data-parallel applications primarily on GPUs but also on CPUs and FPGAs.
- It works as a plugin to existing JDKs using Graal as a library without modifying the JVM itself.
TornadoVM Parallel Programming Tips
- Use Tornado's @parallel and @reduce annotations to hint which loops in Java code are parallelizable.
- For more control, use the lower-level Kernel API to define work per GPU thread with access to thread IDs and local memory.