Learning from Machine Learning cover image

Learning from Machine Learning

Leland McInnes: UMAP, HDBSCAN & the Geometry of Data | Learning from Machine Learning #10

Oct 25, 2024
55:27

In this episode of Learning from Machine Learning, we explore the intersection of pure mathematics and modern data science with Leland McInnes, the mind behind an ecosystem of tools for unsupervised learning including UMAP, HDBSCAN, PyNN Descent and DataMapPlot. As a researcher at the Tutte Institute for Mathematics and Computing, Leland has fundamentally shaped how we approach and understand complex data.

Leland views data through a unique geometric lens, drawing from his background in algebraic topology to uncover hidden patterns and relationships within complex datasets. This perspective led to the creation of UMAP, a breakthrough in dimensionality reduction that preserves both local and global data structure to allow for incredible visualizations and clustering. Similarly, his clustering algorithm HDBSCAN tackles the messy reality of real-world data, handling varying densities and noise with remarkable effectiveness.

But perhaps what's most striking about Leland isn't just his technical achievements – it's his philosophy toward algorithm development. He champions the concept of "decomposing black box algorithms," advocating for transparency and understanding over blind implementation. By breaking down complex algorithms into their fundamental components, Leland argues, we gain the power to adapt and innovate rather than simply consume.

For those entering the field, Leland offers poignant advice: resist the urge to chase the hype. Instead, find your unique angle, even if it seems unconventional. His own journey – applying concepts from algebraic topology and fuzzy simplicial sets to data science – demonstrates how breakthrough innovations often emerge from unexpected connections.

Throughout our conversation, Leland's passion for knowledge and commitment to understanding shine through. His approach reminds us that the most powerful advances in data science often come not from following the crowd, but from diving deep into fundamentals and drawing connections across disciplines.

There's immense value in understanding the tools you use, questioning established approaches, and bringing your unique perspective to the field. As Leland shows us, sometimes the most significant breakthroughs come from seeing familiar problems through a new lens.

Resources for Leland McInnes

Leland’s Github

  • UMAP
  • HDBSCAN
  • PyNN Descent
  • DataMapPlot
  • EVoC

References

Learning from Machine Learning

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode