

Running in Production
Nick Janetakis - Full stack developer
Hear about how folks are running their web apps in production. We'll cover tech choices, why they chose them, lessons learned and more.
Episodes
Mentioned books

Jun 29, 2020 • 46min
A Real Estate Order and Appraisal System for a Small Business
In this episode of Running in Production, Austin Lewis goes over replacing
an Excel sheet with a custom / internal Django app to manage his real estate
business. It’s been up and running on the AWS free tier since April 2020.
It has processed over 300 orders in the few months it’s been up and Austin is
sole developer of this project. It is one of the first apps he’s deployed.
Topics Include
6:40 – Motivation for using Django and Python and taking advantage of Django’s admin
9:22 – Breaking down how the app is structured as a monolith and a few helpful libraries
15:28 – Having the foresight to upload files to S3 while having only 1 production EC2 server
17:24 – Sticking with Django templates and sprinkles of JavaScript to avoid Yak Shaving
20:06 – Using Docker / Docker Compose with PostgreSQL and Traefik
25:26 – Recap of AWS services (free tier) and setting up the EC2 servers
27:07 – It’s very helpful to deploy your app early and to also use Docker
30:25 – Covering the deploy process, the value in testing and secret management
36:01 – Using Mailgun for sending email and Sentry for error reporting
41:36 – Planning for disaster by letting RDS handle backups
43:05 – Best tips? Keep learning and just get something up and running
45:10 – You can find Austin on GitHub or contact him by email
Links
📄 References
https://en.wiktionary.org/wiki/yak_shaving
⚙️ Tech Stack
django →
python →
aws →
bootstrap →
docker →
jquery →
lets-encrypt →
mailgun →
postgres →
rds →
route53 →
s3 →
sentry →
traefik →
ubuntu →
🛠 Libraries Used
https://github.com/shivanshs9/pdfgen-python
https://github.com/jschneier/django-storages
https://gunicorn.org/
https://github.com/pytest-dev/pytest
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Jun 22, 2020 • 1h 24min
Passiv Is a Portfolio Management and Automation Platform
In this episode of Running in Production, Brendan Wood talks about building
a portfolio management platform with Django and Python. It’s been running in
production since mid 2017 and is hosted on DigitalOcean.
There’s about 3,000+ active users and overall they are responsible for
managing hundreds of millions of dollars in funds for their users.
Topics Include
3:13 – It started as a 50 line Python script that replaced an Excel sheet
10:49 – Motivation for using Django, Python, NumPy and creating a monolithic app
15:38 – Eventually decommissioning a legacy version of the back-end over time
19:00 – There’s about 33,000+ lines of back-end code, including tests
22:24 – There’s a clean split between the back-end API and the TypeScript React front-end
30:52 – The entire front-end is open source on GitHub
32:13 – It’s hosted on DigitalOcean w/ Ubuntu 18.04, PostgreSQL, Redis, Celery and nginx
39:08 – There’s ~5 seconds of down time per deploy which is done outside of trading hours
46:00 – Everything runs on a single server + a managed PostgreSQL DB (with replicas)
48:20 – Ansible is being used to configure the server
55:22 – Getting code from dev to production in a few minutes with git and a deploy script
1:01:07 – Brendan’s philosophy on starting a business is to do things when you need to do it
1:02:58 – Logging, email alerts and using Stripe to handle payments
1:08:35 – Handling disasters and other unexpected events with backups and alerts
1:16:19 – Best tips? Use the tools that you know unless you have a compelling reason not to
1:19-27 – Setting up a customer support system only after they had a need for it
1:21:39 – Check out https://getpassiv.com/
Links
📄 References
https://en.wikipedia.org/wiki/Exchange-traded_fund
⚙️ Tech Stack
django →
python →
react →
ansible →
digitalocean →
lets-encrypt →
nginx →
postgres →
redis →
stripe →
supervisor →
ubuntu →
webpack →
🛠 Libraries Used
https://numpy.org/
https://github.com/encode/django-rest-framework
https://github.com/celery/celery
https://gunicorn.org/
https://github.com/kakulukia/django-secrets
https://github.com/dj-stripe/dj-stripe
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Jun 15, 2020 • 36min
Determine What Your Toilet Paper Supply Is Based on Your Usage
In this episode of Running in Production, Ben Sassoon goes over building a
site that helps you figure out how much toilet paper you have left. It’s a
static site using pure HTML. It’s been running in production since March 2020
and it’s hosted on GitHub Pages.
The site has had over 10 million visitors and was featured on various cable TV
news outlets and talk shows. The MVP was built as a joke for his friends in
about 20 minutes.
Topics Include
6:47 – A very simple static site let him spin up an MVP in about 20 minutes
8:00 – GitHub embraced his service, even though he surpassed the GH Pages traffic limit
10:26 – Making the site mobile friendly using BrowserStack and Polypane
12:59 – There’s not even a static site generator being used, it’s pure HTML
14:14 – Ezoic helped quickly add Google AdSense to the page
19:30 – Getting 300-400 donations and $5,000 / day from ads during its peak
22:01 – The core of the site is about 6 lines of vanilla JavaScript
27:32 – The process of transferring a domain name / site to another person
30:23 – A DNS mystery caused a bit of down time at one point
32:37 – Ben’s workflow for pushing code from development to production
34:17 – Going from basically no traffic to millions of visitors in a short period of time
35:27 – You can find Ben on Twitter at @bensassoon
Links
📄 References
https://education.github.com/pack
https://www.browserstack.com/
https://polypane.app/
https://empireflippers.com/
⚙️ Tech Stack
static-site →
bootstrap →
cloudflare →
github-pages →
namecheap →
weglot →
🛠 Libraries Used
https://www.ezoic.com/
https://ko-fi.com/
https://icons8.com/
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Jun 8, 2020 • 1h 8min
Confectionery Connect Is an E-commerce Video Course Marketplace
In this episode of Running in Production, Sean Parsons goes over building
an e-commerce video course marketplace to sell Confectionery goods with Django
and Python. It’s been running in production since December 2019 and it’s hosted
on AWS.
The app has roughly ~100k lines of code and was solo developed part time over
about 3 months before shipping an MVP.
Topics Include
3:00 – Modifying an existing e-commerce library called Seleor
7:20 – Figuring out how to pay out instructors fairly based on activity
10:04 – Picking Django, avoiding burnout and splitting the code into ~15 Django apps
20:49 – Celery is being used extensively, along with Celery Beat
25:10 – Stripe as a payment gateway was a natural fit given their subscription model
29:44 – It is a server rendered site with Django templates, except for the video player
35:26 – Turns out using Amazon’s video encoding service is expensive, so Sean uses ffmpeg
38:48 – High level overview about the rest of the tech stack
42:21 – Using Fabric to deploy to a single EC2 instance
45:00 – Going over the deploy process from development to production
50:08 – Benefits of switching to a compute optimized C5n.large EC2 instance
1:00:46 – Handling disasters and unexpected events
1:04:39 – Best tips? Pick the tool you’re the most productive with and ship something
1:07:03 – They’re on Instagram with a new account name of ZenVur
Links
📄 References
https://en.wikipedia.org/wiki/Confectionery
https://transferwise.com/us
https://www.youtube.com/watch?v=8hY6DSSVvYw (Etsy talk on deployment)
⚙️ Tech Stack
django →
python →
aws →
cloudfront →
cloudwatch →
docker →
elasticache →
postgres →
rds →
redis →
route53 →
s3 →
statuscake →
stripe →
supervisor →
ubuntu →
🛠 Libraries Used
https://github.com/mirumee/saleor
https://github.com/deschler/django-modeltranslation
https://github.com/celery/celery
https://flower.readthedocs.io/en/latest/
https://videojs.com/
https://gunicorn.org/
https://www.fabfile.org/
https://github.com/antonagestam/collectfast
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Jun 1, 2020 • 1h 14min
Zego Lets You Easily Buy Insurance by the Hour
In this episode of Running in Production, Stuart Kelly lets us know what it’s
like to build an insurance company from scratch with Django and Python. It’s
been running in production since early 2017 and they’ve issued out 290+ million
hours of insurance so far. It’s hosted on AWS.
Stuart covers building an MVP in 8 weeks, using Stripe with SCA, creating 25+
Django apps over time, working with a GraphQL API back-end, querying 45+
million DB rows quickly, making app deploys a pleasant experience for his team,
achieving 99.99% uptime and so much more.
Topics Include
2:02 – Shipping an MVP insurance company in 8 weeks with little insurance knowledge
3:53 – React Native was used to build mobile apps after a demand for it was seen
4:46 – Motivation for using Django and Python to build this site
6:15 – The Django admin is used for simple config changes and CRUD operations
6:59 – Examples of when they needed to roll their own admin UI due to added complexity
8:41 – Stripe is being used to handle the payments with SCA support
11:41 – How do you even start an insurance company?
13:32 – It’s a monolithic app broken up by Django apps which is a nice way to break things up
15:15 – Django apps are a nice stepping stone to maybe microservices due to easy refactors
16:18 – What type of Django apps do you have to power your site? There’s 25+ of them
17:36 – Not every Django app would end up being its own service in the future
18:10 – The MVP didn’t start off with this many apps, it grew organically over time
18:46 – Which microservices would you tease out later if it came down to it?
20:28 – The split up services would end up having their own dedicated databases too
22:42 – The back-end is powered by a GraphQL API
23:37 – Using an API back-end came from realizing they are building a platform not an app
24:39 – Hole in one insurance isn’t offered, but they did offer rocket launcher insurance
25:46 – Graphene is used on the Python side of things and it works nicely with Django models
26:04 – On the front-end Relay is being used, but in hindsight maybe Apollo would be better
27:40 – The front-end is about 500,000 lines of code (not including node_modules)
27:53 – The back-end is about 300,000 to 350,000 lines of code
28:44 – There’s about 40-50 top level dependencies in the requirements.txt file
29:54 – PostgreSQL is used through RDS on AWS, along with a RedShift cluster
30:34 – What is RedShift and how does it help make certain queries much faster?
32:43 – They don’t connect to RedShift through Django’s ORM but you do write SQL
33:34 – Their financial reconciliation engine has 40-50 million rows and queries are fast
34:12 – Celery, Redis, Kubernetes, AWS Lambda, oh my!
34:52 – There’s 3-5 web app servers but up to 24 background workers
36:08 – Payment handling doesn’t need to happen live as a driver is working
37:41 – A majority of things are running on t3 EC2 instances
38:24 – Steps taken to safely go from 1 background worker to running many of them
40:50 – One mistake they made early on was not having idempotent worker tasks
41:52 – Having zero down time deploys with AWS CodeDeploy, but migrations are tricky
44:41 – The infrastructure is managed with Terraform, Stuart knows enough to be dangerous
47:12 – Trusting your developers to do reviews is important, along with having tests
48:41 – There’s a few different environments, such as QA which is after a dev pushes code
49:31 – Moving from a git flow model to doing PRs that get merged to a deployed master
50:55 – Every pull request that comes in gets a sub-domain that can be directly accessed
51:33 – Feature flags are sometimes used, but not with a dedicated library or framework
52:55 – Secrets are managed using AWS’ Parameter Store
53:45 – The EC2 instances are spun up using pre-baked AMIs, except for the code itself
55:11 – They pay somewhere between $10,000 and $50,000 a month on hosting
55:46 – How they went from $3,000 to $3 a month from making a database backup change
57:21 – Cloudflare is used as their CDN, DNS host, anti-DDoS and SSL certificate service
58:06 – The imgix service is used to do on the fly image resizing and optimizations
58:31 – Cloudflare is a solid service and competitively priced
58:54 – The JavaScript payload for the front-end is about 1MB after being gzipped
1:00:29 – The Next.js library is used to do server side rendering initially
1:00:56 – Mailgun is used for sending emails and Twilio is used for sending text messages
1:01:40 – Sentry.io (hosted version) captures all of their errors with loads of integrations
1:02:11 – DataDog is used for alerting, APM metrics and logging
1:03:54 – It’s valuable to have your metrics and logging on 1 service
1:04-22 – Various alarms and alerts get sent through DataDog
1:04:42 – Health checks are done with Django Health Check, and they query the DB in it
1:06:02 – So far in 2020 they’re operating at 99.99% uptime which is quite the feat
1:06:46 – Checking your database in your health check is totally worth it
1:07:44 – There’s not many live tests that happen in production due to the nature of the app
1:08:53 – Best tips? Release as often as you can and invest in your release process
1:10:03 – That’s also been the biggest pain point as they scaled up to a larger dev team
1:11:19 – Database migrations are run on every deploy
1:12:33 – Check out https://zego.com, their open source work or email Stuart for questions
Links
📄 References
https://reactnative.dev/
https://en.wikipedia.org/wiki/Underwriting
https://relay.dev/
https://www.apollographql.com/
https://en.wikipedia.org/wiki/Column-oriented_DBMS
https://www.stitchdata.com/
https://fivetran.com/
https://blog.getdbt.com/what--exactly--is-dbt-/
https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html
https://www.imgix.com/solutions/resizing-and-cropping
⚙️ Tech Stack
django →
python →
react →
aws →
cloudflare →
codedeploy →
datadog →
graphql →
mailgun →
postgres →
rds →
redis →
sentry →
sns →
sqs →
stripe →
terraform →
twilio →
🛠 Libraries Used
https://docs.graphene-python.org/projects/django/en/latest/
https://github.com/celery/celery
https://github.com/joealcorn/laboratory
https://github.com/KristianOellegaard/django-health-check
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

May 25, 2020 • 50min
Building a Site Around Thousands of Diary Entries from Samuel Pepys
In this episode of Running in Production, Phil Gyford goes over building a
community around 9+ years of diary entries from Samuel Pepys. The site was
built with Django. It gets about 150k+ page views a month and has been up and
running since 2002. It’s currently hosted on Heroku.
Phil talks about being in the sweet spot in terms of engagement while not being
under high load, rewriting the platform with Django as a monolith, how Heroku
helps him get it all up and running without needing to bother with servers and
much more. The site is open source.
Topics Include
1:21 – Who is Samuel Pepys and why a weblog is a natural fit for this site
2:26 – John Carmack had daily write ups in the mid-1990s
3:35 – It gets about 150,000+ page views a month with 30,000+ users
4:39 – The site is more than just weblog entries, there’s 88k+ user comments
6:34 – It’s the sweet spot of engagement between popular but not crazy popular
7:05 – Motivation for using Django and Python after using Movable Type for 9 years
9:03 – Deadlines are a great way to ensure you abort the idea of perfect and release it
9:26 – Django was enjoyable to use, and Phil thought about using Rails and PHP too
11:17 – We live in a really nice time where we have so many good choices for web frameworks
12:23 – It’s a monolithic app with about 12,000 lines of Python across 200 files
12:53 – It’s split into a bunch of Django apps, here’s a few
13:45 – The idea of using apps to organize your code is a great idea
14:43 – This whole site is open source on GitHub, you can use it as a learning resource
16:08 – How new entries make their way onto the site (spoiler alert: it was laborious)
19:21 – The site uses server rendered Django templates with sprinkles of JavaScript
19:43 – Tiny bit of JS for things like maps (Leaflet) and charts (D3.js)
20:19 – Server rendered templates are simple and fast, it’s a great combo
21:21 – It runs on Heroku with PostgreSQL and a bit of caching with Redis
21:43 – The site runs on (1) $7 / month “Hobby” Dyno and it’s more than enough
23:43 – There’s full text search using Django’s built in PostgreSQL search features
26:12 – Django 3.0 powers the site as of today and Phil likes to keep it up to date
27:54 – If you postpone updating your dependencies for too long it can get painful
28:48 – What are you caching? Everything! At least for anonymous users
31:26 – The PostgreSQL database runs off the $9 / month Heroku add-on
32:54 – Have you ever thought about spinning up your own server?
35:17 – If you don’t like the idea of managing your own servers, Heroku can be decent
37:27 – Heroku handles issuing SSL certificates for you for free
38:13 – Sentry is used for error handling through the Heroku add-on
39:14 – Errors coming in are pretty rare
40:04 – Phil’s site holds its own in terms of SEO, even against Wikipedia
42:51 – Heroku handles backing up the database once a day, and Phil backs it up to S3 too
43:49 – He also uses S3 to store some of the static files, such as uploaded blog post images
45:15 – Django storage is used to handle uploading to S3
46:52 – Best tips? Start simple and grow it from there, writing any code is important
48:22 – Maybe using an app generator isn’t worth it, unless you make a lot of new apps
49:45 – You can find Phil on Twitter, he also has his own site at https://www.gyford.com/
Links
📄 References
https://www.gutenberg.org/
https://github.com/ESWAT/john-carmack-plan-archive/tree/master/by_day
https://en.wikipedia.org/wiki/Movable_Type
https://docs.djangoproject.com/en/3.0/ref/contrib/postgres/search/
https://devcenter.heroku.com/articles/django-assets
⚙️ Tech Stack
django →
python →
aws →
bootstrap →
heroku →
open-source →
postgres →
redis →
s3 →
sentry →
🛠 Libraries Used
https://github.com/django/django-contrib-comments
https://d3js.org/
https://leafletjs.com/
https://django-storages.readthedocs.io/en/latest/
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

May 18, 2020 • 1h 21min
Mux Is an API Based Platform That Lets You Process and Stream Videos
In this episode of Running in Production, Dylan Jhaveri talks about building
an API driven video platform called Mux. It uses Phoenix, Elixir and Go to
handle billions of video views a month. It’s hosted on AWS and GCP with
Kubernetes and has been up and running since early 2016.
Dylan covers how video streaming works, processing billions of events a month,
taking advantage of Elixir and Phoenix’s features, providing a zero downtime
public API, continuously deploying their products, working with massive
databases, metered billing and tons more.
Topics Include
1:14 – How online streaming video works with HLS and where Mux fits into the picture
7:51 – Mux lets you post a video to their API and they give you an HLS playback URL
8:24 – Mux has been up and running since January 2016 and went through YCombinator
8:37 – Mux Data is another service they offer, it’s like New Relic but for video data
12:04 – They process billions of video views per month through Mux Data
12:36 – You could use Mux as a lower level alternative to Vimeo or Wistia
13:33 – Sometimes embedding iframes can be problematic and Mux can help in this area
14:35 – About 45 people work at Mux and half are involved with engineering
15:03 – Motivation for using Phoenix and Elixir, even when they were very new tools
16:52 – Their main public API is an out of the box Phoenix app
17:52 – They have a real-time dashboard that is powered by websockets and channels
20:28 – Some of Mux’s customers have millions of concurrent video views through that
20:42 – Will you switch to using Live View? Probably not since they are so API driven
21:51 – A dozen or so Go microservices and Kafka handle processing the videos
23:25 – Go is a great fit for super CPU intensive tasks such as video encoding
24:03 – The video processing infrastructure was very well thought out early on
24:50 – The public API is RESTful and there’s ~40-50 endpoints with a few private endpoints
26:14 – Cookie based auth is done in a browser but there’s tokens for API access
26:47 – The exq library is used for processing jobs asynchronously in Elixir land
27:22 – exq runs within a supervisor of your app, not a dedicated OS level service
28:21 – Prometheus is used for metrics but it’s not hooked into Elixir Telemetry (yet)
29:26 – Kubernetes and Docker drive their production infrastructure
29:47 – Buildkite is used for their CI / CD pipeline
32:08 – Deployments are very automated, a human only needs to merge to a specific branch
32:53 – The video processing microservices are in 1 mono repo, but there’s 2 other repos
33:33 – There’s PR approvals in place but all developers can merge to the production branch
34:39 – Code reviews are really important and you need to trust your developers
35:41 – The Elixir app has a PostgreSQL billing DB and also uses ClickHouse (SQL based)
37:53 – ClickHouse lets them store billions of rows and access everything quickly
40:58 – You do write SQL queries with ClickHouse but it doesn’t work with Ecto out of the box
41:44 – The Elixir API runs on AWS with an AWS load balancer sitting in front of it all
42:20 – The video infrastructure runs on Google Cloud
42:56 – How many servers do you run in total? Hard to tell really, but it’s a lot of compute
43:44 – Despite being on AWS, they are not using Amazon’s managed Kubernetes (EKS)
44:01 – All payments go through Stripe, including the metered billing which they hand rolled
45:06 – Instead of billing based on bandwidth, Mux bills by minutes watched
46:06 – SendGrid is used for transactional emails, Sentry for errors and Opsgenie for paging
46:48 – All sorts of CI / CD related information gets sent over to a Slack channel
47:08 – Developers are broken out into 4 cross functional teams
48:31 – There’s 2 flavors of SDKs that Mux has (REST API wrappers and video players)
50:21 – They currently have 22 different video players to account for across many platforms
50:36 – Efficiently creating so many different SDKs by having a core library for each language
54:20 – It’s sort of like having a core payment library and supporting Stripe, PayPal, etc.
54:41 – The SDK team needs to be aware of many different languages and players
55:16 – Another key metric to track is the video upscale and downscale percentages
56:47 – As of today Mux is focused on supplying service quality metrics
58:08 – There’s a lot of data stored but it all gets rolled over after 90 days
58:42 – The API is deployed all the time, but there’s zero down time deploys
59:45 – There’s been one day in the past there they had to put the API in read-only mode
1:00:19 – The data is backed up, but Dylan isn’t sure how often (but it happens, he swears!)
1:00:42 – Video thumbnails can be picked out from any timestamp, even animated GIFs too
1:02:21 – For now you need to supply your own closed captions to Mux
1:03:52 – Captions are downloaded, cached locally until processed and then backed up too
1:04:38 – Smoke tests and various alarms help detect issues in production (they use Flink)
1:06:25 – Uptime is important, Mux has high profile clients where downtime is not an option
1:06:52 – Rate limiting is done at the Elixir level for API calls with the ex_rated library
1:07:25 – It’s a reasonable idea to always assume users are out to get you
1:07:52 – For video rate limiting, it’s up to the CDN and they use a few different CDNs
1:09:33 – You could build a live streaming service like Twitch with Mux’s API
1:13:19 – The Elixir API doesn’t get billions of calls a month but it’s a still a lot
1:16:37 – Best tips? Video is hard and it keeps getting more and more complicated
1:18:15 – Fortunately the video player SDK’s churn isn’t too high due to the HTML5 spec
1:19:14 – You can email Dylan or contact him on Twitter, also Mux is hiring too!
Links
📄 References
https://en.wikipedia.org/wiki/HTTP_Live_Streaming
https://howvideo.works/
https://www.ycombinator.com/about/
https://bugzilla.mozilla.org/show_bug.cgi?id=356558
https://golang.org/
https://en.wikipedia.org/wiki/Column-oriented_DBMS
https://mux.com/blog/from-russia-with-love-how-clickhouse-saved-our-data/
https://en.wikipedia.org/wiki/WebVTT
https://flink.apache.org/usecases.html
https://www.fastly.com/
https://en.wikipedia.org/wiki/Real-Time_Messaging_Protocol
https://obsproject.com/
⚙️ Tech Stack
phoenix →
elixir →
golang →
aws →
buildkite →
clickhouse →
docker →
fastly →
gcp →
kafka →
kubernetes →
opsgenie →
postgres →
prometheus →
sendgrid →
sentry →
slack →
stackpath →
stripe →
🛠 Libraries Used
https://github.com/akira/exq
https://github.com/elixir-mint/mint
https://github.com/grempe/ex_rated
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

May 11, 2020 • 43min
TradeRev Is a Machine Learning Vehicle Appraisal / Auctioning System
In this episode of Running in Production, Amit Jain goes over building an auctioning
system that uses machine / deep learning and is powered by Flask and Python.
It’s all hosted on AWS and has been up and running since mid 2011.
Amit goes over a few machine learning libraries, refactoring a 100k+ line
monolith into microservices without any automated tests, the importance of
machine learning accuracy, using a bunch of AWS services to deploy a large
site, treating your infrastructure as code and more.
Topics Include
3:58 – Amit lead a team of ~10 R&D engineers responsible for Data Science / ML
4:33 – Roughly 1,000 cars a day are being traded with 8-10k auctions / bids per day
5:15 – Motivation for using Flask and Python
6:55 – Scikit-Learn and TensorFlow for machine / deep learning
7:39 – Did things start off with multiple microservices or was it a monolith early on?
9:41 – There’s about 80,000 to 120,000 lines of code across 200-300+ Python files
10:14 – The huge refactor to microservices was done without automated tests initially
11:11 – After the refactor now there’s 86% test coverage which is enough to be confident
12:24 – Flask-Restplus is the main library used to build their RESTful APIs
12:43 – Other notable libraries were gunicorn and boto3 (AWS SDK for Python)
13:05 – Locust is an open source load / performance testing tool
13:40 – With machine learning, speed is important but accuracy is even more important
15:30 – gunicorn is very compact, performant and easy to configure
16:28 – Most caches were in memory and they used Amazon DynamoDB
17:09 – The primary database is MySQL running on Amazon RDS
18:04 – SQLAlchemy is used on the Python side as an ORM
19:29 – Docker is sort of being used in development
21:02 – The platform runs on AWS with Lambda, API Gateway and AWS Fargate with ECS
22:24 – What is AWS Fargate and what does it allow you to do?
23:48 – Scaling with Fargate while using auto scaling policies and configuration
26:28 – Taking advantage of the cloud and setting up load balancing with configuration
28:04 – How do you deal with secrets when using Fargate / ECS?
30:02 – What about logging and metrics? Are you exclusively using all of AWS’ services?
31:12 – What about error reporting, such as getting notified if an error happens
31:34 – The deploy process from development to production (includes CI / CD with Jenkins)
33:26 – A Walk through of how the different AWS services come together
36:54 – Terraform is being used to manage the infrastructure as code (valuable tool)
40:04 – Database backups were performed by the DevOps team
40:41 – Best tips? Start slow and expect failures, also don’t chase perfection
42:14 – You can find Amit on Twitter at @ml_amit and on LinkedIn
Links
📄 References
https://en.wikipedia.org/wiki/Machine_learning
https://en.wikipedia.org/wiki/Deep_learning
https://en.wikipedia.org/wiki/Natural_language_processing
https://en.wikipedia.org/wiki/Convolutional_neural_network (CNN)
https://en.wikipedia.org/wiki/Smoke_testing_(software)
https://locust.io/
https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)
⚙️ Tech Stack
flask →
python →
aws →
cloudwatch →
docker →
dynamodb →
ecs →
fargate →
jenkins →
lambda →
mysql →
pagerduty →
python →
rds →
stripe →
terraform →
🛠 Libraries Used
https://scikit-learn.org/stable/
https://www.tensorflow.org/
https://github.com/noirbizarre/flask-restplus
https://gunicorn.org/
https://github.com/boto/boto3
https://github.com/sqlalchemy/sqlalchemy
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

May 4, 2020 • 45min
Cover Tuner Uses NLP to Help Improve Your Cover Letters
In this episode of Running in Production, Saad Malik talks about building
a free cover letter analysis tool with Flask and Python. It uses NLP (Natural
language processing) and has been up and running on Google App Engine since
April 2020.
Saad goes over various Python NLP libraries, processing 400+ cover letters in
his first month after shipping an MVP, using MongoDB as a primary database,
keeping his front-end simple with a bit of jQuery, what it’s like to deploy a
Python app using Google App Engine and more.
Topics Include
2:54 – You can upload your cover letter and get back an analysis without an account
3:50 – Motivation for using Flask and Python
4:48 – Writing the “business logic” in a standalone script before adding a web layer
5:48 – What is NLP (Natural language processing) and what Python libraries exist for it
7:01 – Using an NLP library vs using a full text search database
8:14 – About 1,000 users a month go to the site and 50% of them upload a cover letter
9:25 – Lots of users re-upload new copies of their cover letter after making changes to it
10:06 – Server side rendered templates with Jinja plus a touch of jQuery here and there
10:53 – After submitting a cover letter, an ajax response fills in the info after ~5 seconds
11:31 – Gunicorn is used as the app server for Flask
11:46 – Why Saad chose to use Google App Engine over using Google Compute Engine
12:43 – Motivation for using Google App Engine over Heroku and other PaaS alternatives
14:09 – It’s mostly a monolithic application but with a separate script that runs locally
14:59 – The local script helps validate cover letters
16:47 – MongoDB Atlas is used to host MongoDB along with Google Cloud Storage
17:51 – Why did you choose MongoDB over PostgreSQL or another SQL database?
18:50 – MongoDB Compass is a way for you to visually explore your data
19:29 – Docker isn’t being used in development but app engine uses it in production
20:20 – nginx isn’t needed because app engine handles all of that for you
20:57 – App engine is nice but it does come at a price (it’s quite a bit more expensive)
22:13 – App engine costs won’t necessarily scale linearly with your traffic
23:46 – A run down on all of the Google Cloud services Saad is using and how they connect
25:14 – Are MongoDB databases really schemaless?
26:05 – PyMongo is used to connect the Python app to MongoDB
27:13 – It only took 4-5 days to turn the standalone script into an MVP Flask app
29:13 – Only the Python NLP libraries are note worthy libs to make this app work
29:44 – There’s no user authentication needed because no user accounts are necessary
30:14 – WuFoo is used to accept form submissions using their free tier
30:35 – WTForms is also used to process the cover letter form submissions
31:17 – Google Search Console helped make the site more mobile friendly
32:09 – The site isn’t using Bootstrap, it’s just plain old hand rolled CSS and JavaScript
32:53 – Both app engine and MongoDB Atlas provide notifications for various events
33:26 – Walking through deploying code from development to production on app engine
34:38 – Saad has tests set up with Pytest
35:09 – What exactly is that YAML file with app engine?
35:48 – Dealing with secret keys
37:05 – Both MongoDB Atlas and Google App Engine have tools for disaster recovery
38:17 – Alerts can be set up to measure resource consumption, including cost limits
39:22 – App engine’s price is high, Saad would probably use Google Compute Engine instead
40:47 – Best tips? Be mindful of the SAAS tools you use and how they interact with your app
42:11 – If you crank out code ASAP to ship an MVP, don’t forget to go back and refactor
43:10 – Using the Spyder IDE to help develop certain features faster and easier
44:28 – If you want to contact Saad you can find him on LinkedIn
Links
📄 References
https://en.wikipedia.org/wiki/Cover_letter
https://en.wikipedia.org/wiki/Natural_language_processing
https://www.mongodb.com/products/compass
https://www.wufoo.com/
https://cloud.google.com/appengine/docs/standard/python/config/appref
https://www.spyder-ide.org/
⚙️ Tech Stack
flask →
python →
app-engine →
gcp →
jquery →
mongodb →
🛠 Libraries Used
https://www.nltk.org/
https://spacy.io/
https://jquery.com/
https://gunicorn.org/
https://pymongo.readthedocs.io/en/stable/
https://github.com/wtforms/wtforms
https://github.com/pytest-dev/pytest
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Apr 27, 2020 • 1h 19min
Easily Find, Reproduce and Track Your JavaScript Errors with TrackJS
In this episode of Running in Production, Todd Gardner goes over how he
built TrackJS. It’s written in .NET and pulls
together a number of different technologies to get the job done. It’s all
hosted on OVH using dedicated hardware and has been running in production since
2013.
Todd talks about how to track JavaScript errors in production, creating a
data pipeline to ingest thousands of errors a minute in ~80 milliseconds, the
benefits of pjax, how dedicated hardware ended up being half the price of
cloud servers and using Ansible to configure all of the servers.
Topics Include
2:06 – Working nights and weekends until it made enough to replace consulting work
3:05 – Being your own boss changes how you think about writing software
4:23 – Thousands of developers a day use TrackJS resulting in thousands of errors per minute
5:45 – Installing TrackJS is painless, just drop the JS snippet into your site’s HTML
6:38 – Debugging client side JavaScript can be difficult for a number of reasons
7:43 – Motivation for using .NET / C#
9:05 – Why it’s a good idea to avoid shiny new tech when you’re building a new product
12:14 – Why TrackJS is mostly a monolithic application instead of microservices
13:23 – But there are bits that are broken into their own service when it makes sense
13:40 – Creating a pipeline to efficiently capture and process a ton of incoming data
15:21 – Leveraging nginx to quickly create logs for requests that are processed later
16:23 – For data that is more time sensitive they wrote a .NET service that uses Redis
18:45 – If TrackJS gets slammed, it will never effect page load speeds for their customers
19:40 – nginx was configured to write out JSON formatted logs
21:46 – The processor service ingests those log files and figures out what to do next
22:38 – Then there’s the web front-end service that developers use to browse their errors
23:30 – Elasticsearch is used to store the errors to create very fine grained reports and filtering
24:15 – A quick recap of the technologies used so far
24:39 – ASP.NET is similar to Rails, it’s server rendered templates but they use React too
25:52 – pjax is used to make the app feel very fast even with server rendered templates
27:10 – pjax / Turbolinks is one of the best bangs for your buck to make your site feel fast
28:23 – Making the most of your tech stack with a small team of developers
31:12 – Elasticsearch needs a bit of tuning if you’re using it as your primary database
32:04 – Writing their own .NET class to interface with the Redis backed queue
33:31 – IIS (Microsoft’s web server) serves the app without nginx sitting in front of it
34:17 – Load balancing is done over DNS with a round-robin strategy across 3 servers
36:13 – All 3 web servers get restarted at once during updates because IIS is great like that
37:31 – Everything is hosted on dedicated hardware with OVHCloud after moving off Azure
39:58 – Poor support and opaque downtime resolutions is why they moved off the cloud
41:26 – After thinking about, using Ansible to set up machines seemed like a good idea
42:01 – They landed on using OVH after doing a bunch of research
42:48 – $180 / month for a high end Xeon 8 CPU core server w/ 64 GB of RAM + 2 TB of SSDs
43:28 – It was more work to set up but it’s A LOT faster and costs were dropped in half
44:40 – When something goes wrong, it’s obvious on what went wrong when it will be fixed
45:47 – Even while running at 10% capacity, they do capacity planning every quarter
46:44 – $180 / month is an average figure, they have smaller servers doing different things
47:19 – They run about 12x Elasticsearch servers that are pretty beefy ($240 / month)
47:49 – Overall they have about 30 servers that they have to manage
48:31 – Some servers run Ubuntu LTS, and the web servers run Windows Server 2016
49:10 – Managing Windows servers is kind of a pain in the butt
51:04 – Ansible is used to configure both the Windows and Linux servers
53:32 – It takes about 48 hours to get new hardware from OVH, but that’s not a problem
54:11 – Using Team City to help get code from development to production
55:50 – The test environment gets real production data synced every hour
56:32 – Their “dev” environment is really a test environment
58:20 – It gets pushed to production manually through a Team City job by choice
59:28 – But every time they git push code, a new test environment is set up automatically
59:43 – They use their own service to help monitor JavaScript errors and it helps
1:00:29 – They built their own back-end monitoring tools too due to lack of choices
1:00:55 – Todd has opinions on back-end monitoring in general
1:01:47 – Real exceptions get sent to their primary Slack chat channel
1:03:30 – Payments are handled using Stripe but it doesn’t use SCA
1:05:16 – Monitis is used to monitor their infrastructure load and website up-time
1:07:39 – They would still use rented hardware but maybe use .NET Core today
1:09:10 – Depending on well tested and mature tools allows you to use them years later
1:10:38 – Best tips? Don’t build something in new tech just to use new tech
1:12:36 – When it comes to billing code, try to deal with it early on (it’s tricky)
1:15:26 – It’s hard to test webhooks and other external interactions in an automated way
1:17:23 – You can find Todd on Twitter @toddhgardner and check out his new monitoring service at https://requestmetrics.com
Links
📄 References
https://docs.microsoft.com/en-us/dotnet/core/
https://dotnet.microsoft.com/apps/aspnet
https://github.com/turbolinks/turbolinks
https://en.wikipedia.org/wiki/Round-robin_DNS
https://en.wikipedia.org/wiki/Windows_Server_2016
https://www.jetbrains.com/teamcity/
https://www.monitis.com/
⚙️ Tech Stack
dotnet →
c-sharp →
react →
jekyll →
ansible →
elasticsearch →
iis →
monitis →
nginx →
ovh →
redis →
slack →
stripe →
teamcity →
ubuntu →
windows →
🛠 Libraries Used
https://github.com/defunkt/jquery-pjax
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.