

Contributor
Eric Anderson
The origin story behind the best open source projects and communities.
Episodes
Mentioned books

Jul 20, 2022 • 28min
Milvus with Frank Liu
Eric Anderson (@ericmander) and Frank Liu (@frankzliu) talk about Milvus, the open-source vector database built for scalable similarity search. Vector databases are built to search, index and store embeddings, a requirement for powerful AI applications. Frank is Director of Operations at Zilliz, the company that stewards the project. Tune in to find out how Milvus is the database for the AI era.
Subscribe to Contributor on Substack for email notifications, and join our Slack community!
In this episode we discuss:
A crash course on embeddings and vector databases
Using Milvus for logo search, crypto predictions, drug discovery, and more
Other open-source projects at Zilliz that complement Milvus
“Embedding Everything”
How Milvus incorporates tunable consistency to its search process
Links:
Milvus
Zilliz
Towhee
Attu
Feder
Other episodes:
Clickhouse with Alexey Milovidov and Ivan Blinkov
Correction:
Milvus is based on a “shared storage” architecture, not “shared nothing.”

Jul 6, 2022 • 38min
Apache Beam with Kenn Knowles and Pablo Estrada
Eric Anderson (@ericmander) reunites with old colleagues Kenn Knowles (@KennKnowles) and Pablo Estrada (@polecitoem) for a conversation on Apache Beam, the open-source programming model for data processing. The trio once worked together at Google, and Beam was a turning point in the history of open-source there. Today, both Kenn and Pablo are members of the Beam PMC, and join the show with the inside scoop on Beam’s past, present and future.
In this episode we discuss:
Transitioning Beam to the Apache Way
How “inner source” works at Google
Thoughts on the relationship between batch processing and streaming
Some ways that community “power users” have contributed to Beam
Information on Beam Summit 2022, the first onsite summit since COVID began
The first few people to register can use code BEAM_POD_INV for a discount on tickets!
Links:
Apache Beam
Apache Spark
Apache Flink
Apache Nemo
Apache Samza
Apache Crunch
MapReduce paper
MillWheel paper
FlumeJava paper
Dataflow paper
Beam Summit 2022 Website
Other episodes:
TensorFlow with Rajat Monga

Jun 22, 2022 • 41min
Temporal (Part 2) with Maxim Fateev and Dominik Tornow
Eric Anderson (@ericmander) returns to Temporal with co-founder Maxim Fateev (@mfateev) and principal engineer Dominik Tornow (@DominikTornow). When Maxim joined us in September of 2020, the company called their project a “workflow orchestrator.” Today, Temporal has grown in popularity and usability, but the terminology around that abstraction has changed. Tune in to track the evolution of what Maxim calls a genuinely “new category of software.”
In this episode we discuss:
New features and developments in the last 2 years
The proper way to pronounce “Temporal”
How Temporal guarantees that workflow execution actually runs to execution
Describing Temporal as a new pair of glasses
Replay, Temporal’s first developer conference on August 25-26, in Seattle
Links:
Temporal
Cadence
Apache Cassandra
Replay
People mentioned:
Samar Abbas (@samarabbas77)
Other episodes:
Temporal with Maxim Fateev
Apache Cassandra with Patrick McFadin

Jun 8, 2022 • 28min
Scarf with Avi Press
Eric Anderson (@ericmander) interviews Avi Press (@avi_press) about Scarf, the distribution platform for open-source software that facilitates analytics and commercialization. Scarf offers a set of tools that allows founders and maintainers to understand adoption of their products, including Scarf Gateway, which provides a central access point to containers and packages. From there, open-source developers can connect with the people that rely on their work.
In this episode we discuss:
Why you can’t rely on Github as a source of comprehensive data about open-source software
Tracing a user’s journey interacting with a project across multiple platforms
How better observability allows maintainers to make better software
Inspiring indie maintainers to commercialize their projects
The privilege of being able to work in open-source, and how Scarf can enable a more inclusive developer community
Links:
Scarf
Tidelift
Gitcoin
OpenTeams
Aviyel

May 25, 2022 • 29min
Rasgo with Patrick Dougherty
Eric Anderson (@ericmander) and Patrick Dougherty (@cpdough) talk about Rasgo, the data transformation platform for MLOps that makes generating SQL easy. The team at Rasgo recently open-sourced a package called RasgoQL, that allows users to execute SQL queries against a data warehouse using Python syntax. Tune in to find out how Rasgo aims to bridge an important gap in the Modern Data Stack.
In this episode we discuss:
The advantages of offering both a low-code/no-code UI and a Python interface
"How can a data scientist, without needing full-time resources from data engineering, be somewhat self-sufficient in data prep and able to deliver those insights without a massive human capital investment needed?"
Where Rasgo fits into the world of feature stores
Why one Rasgo user took a trip to a wind farm in Texas
Eric’s predictions for the future of data prep and transformation
Links:
Rasgo
RasgoQL
DuckDB
Delta Lake
People mentioned:
Jared Parker (@jaredtparker_)

May 11, 2022 • 32min
Feast with Willem Pienaar
Eric Anderson (@ericmander) and Willem Pienaar (@willpienaar) talk about Feast, the open-source feature store for machine learning. Feature stores act as a bridge between models and data, and allow data scientists to ship features into production without the need for engineers. Willem co-created Feast at Gojek, and later teamed up with the folks at Tecton to back the project.
In this episode we discuss:
The value of feature stores in MLOps
What happens when you open-source too early
Why most open-source code has nothing to hide
Bringing an open-source project to an existing company
Good and bad use cases for a feature store
Links:
Feast
Tecton
Turing
Merlin
Kubeflow
apply() Conference
People mentioned:
Mike Del Balso
Kevin Stumpf (@kevinmstumpf)
Ajey Gore (@AjeyGore)
Demetrios Brinkmann (@Dpbrinkm)
Wes McKinney (@wesmckinn)
Other episodes:
Flyte with Ketan Umare
Great Expectations with Abe Gong and Kyle Eaton

Apr 27, 2022 • 36min
Flyte with Ketan Umare
Ketan Umare, a former engineer at Lyft, created Flyte, a groundbreaking open-source platform for workflow automation in machine learning. He discusses how Flyte integrates compute and workflow to optimize user experience. Ketan emphasizes the pivotal role of accurate fare and ETA predictions in ride-sharing. He also shares insights on transitioning from 'Better Airflow' to Flyte and the benefits of typed programming for machine learning. Additionally, he explores open-sourcing protocols and the project’s partnership with the Linux Foundation.

Apr 13, 2022 • 31min
Activeloop with Davit Buniatyan
Eric Anderson (@ericmander) meets with Davit Buniatyan (@DBuniatyan) of Activeloop, the database for AI. Davit was inspired to found Activeloop while working on large datasets in a neuroscience research lab at Princeton. Powering the technology at Activeloop is Hub, the open-source dataset format for AI applications. Join us to learn how Hub promises to enhance and expand various verticals in deep learning.
In this episode we discuss:
Reconfiguring traditional ML tooling for the cloud
Connectomics - working with thin slices of a mouse brain with neuroscientist Sebastian Seung
Choosing between university, a start-up, and open-source
Davit’s original product, that ran computation on crypto mining GPUs on a distributed scale
Focusing on different data modalities for computer vision
Links:
Activeloop
Activeloop Hub
Apache Parquet
Apache Spark
TensorFlow
Snowflake
Databricks
Timescale
People mentioned:
Sebastian Seung (@SebastianSeung)
Other episodes:
TensorFlow with Rajat Monga

Mar 30, 2022 • 27min
Unikraft with Alexander Jung and Simon Kuenzer
Eric Anderson (@ericmander), Alexander Jung (@nderjung) and Simon Kuenzer (Github: @skuenzer) get technical on Unikraft, the open-source unikernel development kit. Unikernels are specialized, high performing OS images that have the potential to revolutionize virtualization. Unikraft makes unikernels easy to use by prioritizing modularity, security, and POSIX-compatibility.
In this episode we discuss:
How Unikraft seeks wider adoption of unikernels in real-world applications
Unikraft’s background in research and academia
Bottom-up as well as top-down specialization
Building a community with a large proportion of students
Links:
Unikraft
Unikraft: Fast, Specialized Unikernels the Easy Way
Xen Project
MirageOS
HermitCore
Firecracker

Mar 16, 2022 • 45min
EdgeDB with Yury Selivanov
Eric Anderson (@ericmander) has a conversation with Yury Selivanov (@1st1), the co-founder of EdgeDB. EdgeDB is the world’s first “graph-relational database.” It’s a term coined specifically for this new type of database, designed to ease the pain of dealing with the usual relational and NoSQL models. And no, EdgeDB is NOT a graph database!
In this episode we discuss:
A glitch at EdgeDB’s Matrix-inspired launch event
Origin of the term and design philosophy, “graph-relational”
What to know about becoming a Python core developer
How EdgeDB’s next-gen query language compares to GraphQL and SQL
Links:
EdgeDB
magicstack
uvloop
People mentioned:
Elvis Pranskevichus (@elprans)
Colin McDonnell (@colinhacks)
Victor Petrovykh (Github: @vpetrovykh)
Dan Abramov (@dan_abramov)
Brett Cannon (@brettsky)
Daniel Levine (@daniel_levine)
Other episodes:
Hasura with Tanmai Gopal
Dgraph with Manish Jain