
AI Tinkerers - "One-Shot" How Tomasz Kolinko Is Rewriting the Rules of AI Inference
What if you could skip half of your LLM’s computations—and still get the same output?
In this episode of One-Shot, we sit down with Tomasz Kolinko, the Warsaw-based founder of Effort Engine—a new AI inference algorithm that dynamically adjusts precision in real time.
This isn’t quantization. It’s something weirder—and maybe more useful.
Tomasz walks us through how he:
- Built a custom algorithm that runs 2–3x faster on MacBooks
- Developed a system that can skip 50%+ of model computations dynamically
- Created heatmaps to visualize token-level divergence
- Benchmarked everything himself… and shared the code
You’ll also see:
- Live demos of inference tuning from 100% to 5%
- Why AI models still work (sometimes better!) with just 30% effort
- How a DIY hacker space in a car shop led to one of the most creative AI projects in Europe
If you’re building with LLMs, pushing inference limits, or just obsessed with optimization — this episode will change how you think about AI computation.
