Contributor

Eric Anderson
undefined
Jul 20, 2022 • 28min

Milvus with Frank Liu

Eric Anderson (@ericmander) and Frank Liu (@frankzliu) talk about Milvus, the open-source vector database built for scalable similarity search. Vector databases are built to search, index and store embeddings, a requirement for powerful AI applications. Frank is Director of Operations at Zilliz, the company that stewards the project. Tune in to find out how Milvus is the database for the AI era. Subscribe to Contributor on Substack for email notifications, and join our Slack community! In this episode we discuss: A crash course on embeddings and vector databases Using Milvus for logo search, crypto predictions, drug discovery, and more Other open-source projects at Zilliz that complement Milvus “Embedding Everything” How Milvus incorporates tunable consistency to its search process Links: Milvus Zilliz Towhee Attu Feder Other episodes: Clickhouse with Alexey Milovidov and Ivan Blinkov Correction: Milvus is based on a “shared storage” architecture, not “shared nothing.”
undefined
Jul 6, 2022 • 38min

Apache Beam with Kenn Knowles and Pablo Estrada

Eric Anderson (@ericmander) reunites with old colleagues Kenn Knowles (@KennKnowles) and Pablo Estrada (@polecitoem) for a conversation on Apache Beam, the open-source programming model for data processing. The trio once worked together at Google, and Beam was a turning point in the history of open-source there. Today, both Kenn and Pablo are members of the Beam PMC, and join the show with the inside scoop on Beam’s past, present and future. In this episode we discuss: Transitioning Beam to the Apache Way How “inner source” works at Google Thoughts on the relationship between batch processing and streaming Some ways that community “power users” have contributed to Beam Information on Beam Summit 2022, the first onsite summit since COVID began The first few people to register can use code BEAM_POD_INV for a discount on tickets! Links: Apache Beam Apache Spark Apache Flink Apache Nemo Apache Samza Apache Crunch MapReduce paper  MillWheel paper FlumeJava paper Dataflow paper Beam Summit 2022 Website Other episodes: TensorFlow with Rajat Monga
undefined
Jun 22, 2022 • 41min

Temporal (Part 2) with Maxim Fateev and Dominik Tornow

Eric Anderson (@ericmander) returns to Temporal with co-founder Maxim Fateev (@mfateev) and principal engineer Dominik Tornow (@DominikTornow). When Maxim joined us in September of 2020, the company called their project a “workflow orchestrator.” Today, Temporal has grown in popularity and usability, but the terminology around that abstraction has changed. Tune in to track the evolution of what Maxim calls a genuinely “new category of software.” In this episode we discuss: New features and developments in the last 2 years The proper way to pronounce “Temporal” How Temporal guarantees that workflow execution actually runs to execution Describing Temporal as a new pair of glasses Replay, Temporal’s first developer conference on August 25-26, in Seattle Links: Temporal Cadence Apache Cassandra Replay People mentioned: Samar Abbas (@samarabbas77) Other episodes: Temporal with Maxim Fateev Apache Cassandra with Patrick McFadin
undefined
Jun 8, 2022 • 28min

Scarf with Avi Press

Eric Anderson (@ericmander) interviews Avi Press (@avi_press) about Scarf, the distribution platform for open-source software that facilitates analytics and commercialization. Scarf offers a set of tools that allows founders and maintainers to understand adoption of their products, including Scarf Gateway, which provides a central access point to containers and packages. From there, open-source developers can connect with the people that rely on their work. In this episode we discuss: Why you can’t rely on Github as a source of comprehensive data about open-source software Tracing a user’s journey interacting with a project across multiple platforms How better observability allows maintainers to make better software Inspiring indie maintainers to commercialize their projects The privilege of being able to work in open-source, and how Scarf can enable a more inclusive developer community Links: Scarf Tidelift Gitcoin OpenTeams Aviyel
undefined
May 25, 2022 • 29min

Rasgo with Patrick Dougherty

Eric Anderson (@ericmander) and Patrick Dougherty (@cpdough) talk about Rasgo, the data transformation platform for MLOps that makes generating SQL easy. The team at Rasgo recently open-sourced a package called RasgoQL, that allows users to execute SQL queries against a data warehouse using Python syntax. Tune in to find out how Rasgo aims to bridge an important gap in the Modern Data Stack. In this episode we discuss: The advantages of offering both a low-code/no-code UI and a Python interface "How can a data scientist, without needing full-time resources from data engineering, be somewhat self-sufficient in data prep and able to deliver those insights without a massive human capital investment needed?" Where Rasgo fits into the world of feature stores Why one Rasgo user took a trip to a wind farm in Texas Eric’s predictions for the future of data prep and transformation Links: Rasgo RasgoQL DuckDB Delta Lake People mentioned: Jared Parker (@jaredtparker_)
undefined
May 11, 2022 • 32min

Feast with Willem Pienaar

Eric Anderson (@ericmander) and Willem Pienaar (@willpienaar) talk about Feast, the open-source feature store for machine learning. Feature stores act as a bridge between models and data, and allow data scientists to ship features into production without the need for engineers. Willem co-created Feast at Gojek, and later teamed up with the folks at Tecton to back the project. In this episode we discuss: The value of feature stores in MLOps What happens when you open-source too early Why most open-source code has nothing to hide Bringing an open-source project to an existing company Good and bad use cases for a feature store Links: Feast Tecton Turing Merlin Kubeflow apply() Conference People mentioned: Mike Del Balso Kevin Stumpf (@kevinmstumpf) Ajey Gore (@AjeyGore) Demetrios Brinkmann (@Dpbrinkm) Wes McKinney (@wesmckinn) Other episodes: Flyte with Ketan Umare Great Expectations with Abe Gong and Kyle Eaton
undefined
Apr 27, 2022 • 36min

Flyte with Ketan Umare

Ketan Umare, a former engineer at Lyft, created Flyte, a groundbreaking open-source platform for workflow automation in machine learning. He discusses how Flyte integrates compute and workflow to optimize user experience. Ketan emphasizes the pivotal role of accurate fare and ETA predictions in ride-sharing. He also shares insights on transitioning from 'Better Airflow' to Flyte and the benefits of typed programming for machine learning. Additionally, he explores open-sourcing protocols and the project’s partnership with the Linux Foundation.
undefined
Apr 13, 2022 • 31min

Activeloop with Davit Buniatyan

Eric Anderson (@ericmander) meets with Davit Buniatyan (@DBuniatyan) of Activeloop, the database for AI. Davit was inspired to found Activeloop while working on large datasets in a neuroscience research lab at Princeton. Powering the technology at Activeloop is Hub, the open-source dataset format for AI applications. Join us to learn how Hub promises to enhance and expand various verticals in deep learning. In this episode we discuss: Reconfiguring traditional ML tooling for the cloud Connectomics - working with thin slices of a mouse brain with neuroscientist Sebastian Seung Choosing between university, a start-up, and open-source Davit’s original product, that ran computation on crypto mining GPUs on a distributed scale Focusing on different data modalities for computer vision Links: Activeloop Activeloop Hub Apache Parquet Apache Spark TensorFlow Snowflake Databricks Timescale People mentioned: Sebastian Seung (@SebastianSeung) Other episodes: TensorFlow with Rajat Monga
undefined
Mar 30, 2022 • 27min

Unikraft with Alexander Jung and Simon Kuenzer

Eric Anderson (@ericmander), Alexander Jung (@nderjung) and Simon Kuenzer (Github: @skuenzer) get technical on Unikraft, the open-source unikernel development kit. Unikernels are specialized, high performing OS images that have the potential to revolutionize virtualization. Unikraft makes unikernels easy to use by prioritizing modularity, security, and POSIX-compatibility. In this episode we discuss: How Unikraft seeks wider adoption of unikernels in real-world applications Unikraft’s background in research and academia Bottom-up as well as top-down specialization Building a community with a large proportion of students Links: Unikraft Unikraft: Fast, Specialized Unikernels the Easy Way Xen Project MirageOS HermitCore Firecracker
undefined
Mar 16, 2022 • 45min

EdgeDB with Yury Selivanov

Eric Anderson (@ericmander) has a conversation with Yury Selivanov (@1st1), the co-founder of EdgeDB. EdgeDB is the world’s first “graph-relational database.” It’s a term coined specifically for this new type of database, designed to ease the pain of dealing with the usual relational and NoSQL models. And no, EdgeDB is NOT a graph database! In this episode we discuss: A glitch at EdgeDB’s Matrix-inspired launch event Origin of the term and design philosophy, “graph-relational” What to know about becoming a Python core developer How EdgeDB’s next-gen query language compares to GraphQL and SQL Links: EdgeDB magicstack uvloop People mentioned: Elvis Pranskevichus (@elprans) Colin McDonnell (@colinhacks) Victor Petrovykh (Github: @vpetrovykh) Dan Abramov (@dan_abramov) Brett Cannon (@brettsky) Daniel Levine (@daniel_levine) Other episodes: Hasura with Tanmai Gopal Dgraph with Manish Jain

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app