
Running in Production Mux Is an API Based Platform That Lets You Process and Stream Videos
May 18, 2020
01:20:48
In this episode of Running in Production, Dylan Jhaveri talks about building an API driven video platform called Mux. It uses Phoenix, Elixir and Go to handle billions of video views a month. Itâs hosted on AWS and GCP with Kubernetes and has been up and running since early 2016.
Dylan covers how video streaming works, processing billions of events a month, taking advantage of Elixir and Phoenixâs features, providing a zero downtime public API, continuously deploying their products, working with massive databases, metered billing and tons more.
Topics Include
- 1:14 â How online streaming video works with HLS and where Mux fits into the picture
- 7:51 â Mux lets you post a video to their API and they give you an HLS playback URL
- 8:24 â Mux has been up and running since January 2016 and went through YCombinator
- 8:37 â Mux Data is another service they offer, itâs like New Relic but for video data
- 12:04 â They process billions of video views per month through Mux Data
- 12:36 â You could use Mux as a lower level alternative to Vimeo or Wistia
- 13:33 â Sometimes embedding iframes can be problematic and Mux can help in this area
- 14:35 â About 45 people work at Mux and half are involved with engineering
- 15:03 â Motivation for using Phoenix and Elixir, even when they were very new tools
- 16:52 â Their main public API is an out of the box Phoenix app
- 17:52 â They have a real-time dashboard that is powered by websockets and channels
- 20:28 â Some of Muxâs customers have millions of concurrent video views through that
- 20:42 â Will you switch to using Live View? Probably not since they are so API driven
- 21:51 â A dozen or so Go microservices and Kafka handle processing the videos
- 23:25 â Go is a great fit for super CPU intensive tasks such as video encoding
- 24:03 â The video processing infrastructure was very well thought out early on
- 24:50 â The public API is RESTful and thereâs ~40-50 endpoints with a few private endpoints
- 26:14 â Cookie based auth is done in a browser but thereâs tokens for API access
- 26:47 â The exq library is used for processing jobs asynchronously in Elixir land
- 27:22 â exq runs within a supervisor of your app, not a dedicated OS level service
- 28:21 â Prometheus is used for metrics but itâs not hooked into Elixir Telemetry (yet)
- 29:26 â Kubernetes and Docker drive their production infrastructure
- 29:47 â Buildkite is used for their CI / CD pipeline
- 32:08 â Deployments are very automated, a human only needs to merge to a specific branch
- 32:53 â The video processing microservices are in 1 mono repo, but thereâs 2 other repos
- 33:33 â Thereâs PR approvals in place but all developers can merge to the production branch
- 34:39 â Code reviews are really important and you need to trust your developers
- 35:41 â The Elixir app has a PostgreSQL billing DB and also uses ClickHouse (SQL based)
- 37:53 â ClickHouse lets them store billions of rows and access everything quickly
- 40:58 â You do write SQL queries with ClickHouse but it doesnât work with Ecto out of the box
- 41:44 â The Elixir API runs on AWS with an AWS load balancer sitting in front of it all
- 42:20 â The video infrastructure runs on Google Cloud
- 42:56 â How many servers do you run in total? Hard to tell really, but itâs a lot of compute
- 43:44 â Despite being on AWS, they are not using Amazonâs managed Kubernetes (EKS)
- 44:01 â All payments go through Stripe, including the metered billing which they hand rolled
- 45:06 â Instead of billing based on bandwidth, Mux bills by minutes watched
- 46:06 â SendGrid is used for transactional emails, Sentry for errors and Opsgenie for paging
- 46:48 â All sorts of CI / CD related information gets sent over to a Slack channel
- 47:08 â Developers are broken out into 4 cross functional teams
- 48:31 â Thereâs 2 flavors of SDKs that Mux has (REST API wrappers and video players)
- 50:21 â They currently have 22 different video players to account for across many platforms
- 50:36 â Efficiently creating so many different SDKs by having a core library for each language
- 54:20 â Itâs sort of like having a core payment library and supporting Stripe, PayPal, etc.
- 54:41 â The SDK team needs to be aware of many different languages and players
- 55:16 â Another key metric to track is the video upscale and downscale percentages
- 56:47 â As of today Mux is focused on supplying service quality metrics
- 58:08 â Thereâs a lot of data stored but it all gets rolled over after 90 days
- 58:42 â The API is deployed all the time, but thereâs zero down time deploys
- 59:45 â Thereâs been one day in the past there they had to put the API in read-only mode
- 1:00:19 â The data is backed up, but Dylan isnât sure how often (but it happens, he swears!)
- 1:00:42 â Video thumbnails can be picked out from any timestamp, even animated GIFs too
- 1:02:21 â For now you need to supply your own closed captions to Mux
- 1:03:52 â Captions are downloaded, cached locally until processed and then backed up too
- 1:04:38 â Smoke tests and various alarms help detect issues in production (they use Flink)
- 1:06:25 â Uptime is important, Mux has high profile clients where downtime is not an option
- 1:06:52 â Rate limiting is done at the Elixir level for API calls with the ex_rated library
- 1:07:25 â Itâs a reasonable idea to always assume users are out to get you
- 1:07:52 â For video rate limiting, itâs up to the CDN and they use a few different CDNs
- 1:09:33 â You could build a live streaming service like Twitch with Muxâs API
- 1:13:19 â The Elixir API doesnât get billions of calls a month but itâs a still a lot
- 1:16:37 â Best tips? Video is hard and it keeps getting more and more complicated
- 1:18:15 â Fortunately the video player SDKâs churn isnât too high due to the HTML5 spec
- 1:19:14 â You can email Dylan or contact him on Twitter, also Mux is hiring too!
Links
đ References
- https://en.wikipedia.org/wiki/HTTP_Live_Streaming
- https://howvideo.works/
- https://www.ycombinator.com/about/
- https://bugzilla.mozilla.org/show_bug.cgi?id=356558
- https://golang.org/
- https://en.wikipedia.org/wiki/Column-oriented_DBMS
- https://mux.com/blog/from-russia-with-love-how-clickhouse-saved-our-data/
- https://en.wikipedia.org/wiki/WebVTT
- https://flink.apache.org/usecases.html
- https://www.fastly.com/
- https://en.wikipedia.org/wiki/Real-Time_Messaging_Protocol
- https://obsproject.com/
âïž Tech Stack
- phoenix â
- elixir â
- golang â
- aws â
- buildkite â
- clickhouse â
- docker â
- fastly â
- gcp â
- kafka â
- kubernetes â
- opsgenie â
- postgres â
- prometheus â
- sendgrid â
- sentry â
- slack â
- stackpath â
- stripe â
đ Libraries Used
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If you want to support the show, the best way to do it is to purchase one of my courses or suggest one to a friend.
- Dive into Docker is a video course that takes you from not knowing what Docker is to being able to confidently use Docker and Docker Compose for your own apps. Long gone are the days of "but it works on my machine!". A bunch of follow along labs are included.
- Build a SAAS App with Flask is a video course where we build a real world SAAS app that accepts payments, has a custom admin, includes high test coverage and goes over how to implement and apply 50+ common web app features. There's over 20+ hours of video.
