marimo: Reactive Notebooks and Deployable Web Apps in Python
Nov 29, 2024
auto_awesome
Akshay Agrawal, creator of the innovative marimo notebook, discusses enhancing the Python notebook experience for data scientists. He tackles common issues with traditional notebooks, such as hidden state and reproducibility, proposing a directed acyclic graph (DAG) structure for better organization. Akshay reveals how marimo notebooks are both readable and git-friendly, utilizing PEP 723 for standalone notebooks. He also illuminates their capability to be deployed as interactive web apps, revolutionizing how data scientists collaborate and share their work.
The marimo notebook addresses hidden state issues common in traditional notebooks by utilizing a Directed Acyclic Graph (DAG) structure for better reproducibility.
Designed as pure Python files, marimo notebooks enhance readability and compatibility with version control systems, simplifying collaboration and code management.
Marimo notebooks facilitate the creation of interactive applications by integrating UI widgets and enabling deployment as web apps, enhancing user engagement.
Deep dives
Common Notebook Issues and Hidden State
Using traditional notebooks for Python development often leads to significant problems, primarily the phenomenon of hidden state, where the execution order can affect variable definitions. This hidden state issue can result in discrepancies when sharing notebooks, where code might produce different outputs based on its running sequence, thus complicating reproducibility. For instance, if a cell containing variable definitions is deleted, it can prevent other cells from functioning correctly without indicating that the underlying variable was lost. This can lead to frustration and inefficiencies for users trying to replicate analyses, as they struggle to ascertain the exact execution history of the notebook.
The Concept of Directed Acyclic Graphs (DAGs)
To address the limitations of traditional notebooks, a new notebook paradigm using a Directed Acyclic Graph (DAG) has been introduced, which ensures that the code on the page accurately reflects the outputs. In this model, each cell behaves more like a node in the graph, where dependencies are established between cells based on the variables they declare and reference. As a result, running a cell will automatically trigger any dependent cells to execute, maintaining consistency and preventing hidden states. This innovative approach not only enhances reproducibility but also allows users to organize their code in a non-linear fashion without worrying about the conventional top-to-bottom execution order.
Marimo Notebooks and Git Integration
Marimo notebooks are designed as pure Python files, allowing for greater readability and seamless integration with version control systems like Git. Unlike traditional Jupyter notebooks, which are stored as complex JSON files making them difficult to manage, Marimo's architecture facilitates easier code versioning and collaboration. The use of decorators within Marimo enables structures to remain defined in a modular way, enhancing maintainability and reusability. This structure not only simplifies the management of individual notebook components but also provides a more familiar coding environment for developers accustomed to working with Python scripts.
Reproducibility and Package Management
Reproducibility in Marimo notebooks is enhanced through the serialization of package requirements using the PEP 723 standard, which embeds the necessary dependencies directly into the notebook file. By incorporating modern package management tools like UV, Marimo automates the creation of isolated virtual environments, ensuring that users can easily manage and replicate their development setups. This innovation addresses a major pain point often encountered in traditional notebooks, where authors frequently neglect to document package versions, leading to failures when others attempt to reproduce their work. With this streamlined approach, users can focus on their analysis without the burden of navigating fragmented and complex dependency issues.
Interactive Applications and Deployment
The ability to create interactive applications using Marimo notebooks elevates the user experience beyond that of traditional data science notebooks. By integrating UI widgets—such as sliders, dropdowns, and input fields—users can develop responsive applications directly within their Python notebooks. Furthermore, with options for deploying these notebooks as web apps via technologies like Pyodide, users can share their work in an engaging and interactive format without requiring extensive backend knowledge. This capability not only enhances the usability of notebooks for personal projects but also facilitates collaborative projects, allowing teams to present their analyses dynamically.
What are common issues with using notebooks for Python development? How do you know the current state, share reproducible results, or create interactive applications? This week on the show, we speak with Akshay Agrawal about the open-source reactive marimo notebook for Python.
Before writing any code, Akshay wrote a 2,500-word design document. He wanted to create a maintainable and reproducible tool that avoided the hidden state of traditional notebooks. We discuss solving the hidden state problem by building the notebook as a directed acyclic graph (DAG).
Akshay shares how marimo notebooks are stored as pure Python files, which makes them easy to read, importable, and git-friendly. We discuss serializing package requirements using PEP 723 inline metadata to create standalone reproducible notebooks. We also cover how marimo notebooks can be deployed as a web app or dashboard using Pyodide.
In this course, you’ll learn about Python namespaces, the structures used to store and organize the symbolic names created during execution of a Python program. You’ll learn when namespaces are created, how they are implemented, and how they define variable scope.
Topics:
00:00:00 – Introduction
00:02:06 – Akshay’s background and studies
00:04:14 – Work at Google and PhD program
00:06:29 – Sharing notebooks
00:08:18 – Starting work on marimo 2 years ago
00:12:48 – Avoiding notebook issues and building a DAG
00:18:39 – The difference of reactivity
00:20:39 – What is a marimo notebook?
00:23:39 – Video Course Spotlight
00:24:50 – Reproducibility and managing package requirements
00:27:49 – Using decorators for cells
00:30:23 – Writing a design document before any coding
00:34:08 – Interactivity and UI widgets
00:38:20 – Design decisions and built-in widgets
00:42:05 – Creating a deployable web application
00:44:34 – Exploring examples and tutorials
00:46:13 – Supporting DataFrame libraries with narwhals
00:48:00 – Migrating from a Jupyter notebook
00:52:02 – Working with cells and not running code
00:54:30 – A couple favorite tutorials
00:56:17 – What are you excited about in the world of Python?
00:57:39 – What do you want to learn next?
00:59:34 – How can people follow the project and yourself?