

Running in Production
Nick Janetakis - Full stack developer
Hear about how folks are running their web apps in production. We'll cover tech choices, why they chose them, lessons learned and more.
Episodes
Mentioned books

Feb 10, 2020 • 56min
Smart Music Helps Musicians Practice More Efficiently
In this episode of Running in Production, Julien Blanchard goes over building
Smart Music which uses a combination of Rails,
Phoenix and .NET Core. It has roughly half a million users and it’s all hosted
on AWS with EKS, ECS and Elastic Beanstalk. It’s been up and running since
2016.
There’s around 20 developers working on the project. We talked about managing
git repos with a few apps, TDD, using GraphQL with Phoenix, contexts, multiple
databases with Rails, InfluxDB, GitHub Actions and tons more.
Topics Include
2:41 – Roughly half a million users are on the platform (~1.5k requests a minute at times)
3:27 – What Rails, Phoenix and .NET Core are being used for
5:38 – End users of the site interact with the Rails app and .NET Core is for authentication
6:10 – It’s an API back-end driven app and React / EmberJS is used on the front-end
6:35 – Motivation for using Phoenix and Elixir for the data ingesting app
9:28 – About 20 developers work on all of the different parts of the site
9:55 – Organizing the git repos for each of the apps
10:34 – The back-end code has many tens of thousands of lines of code
11:04 – TDD is something their company practices and they like it a lot
12:24 – A JS front-end makes sense for this app since the UI is live and dynamic
13:17 – Trying to visualize a live sheet music application that helps you learn notes
14:02 – Maybe Phoenix LiveView could have been used, but they prefer what they chose
14:33 – The TL;DR on GraphQL and why in this case it works better than a RESTful API
17:55 – Docker isn’t being used in dev, but Kubernetes is being used in production
18:29 – PostgreSQL, InfluxDB and Redis are used to manage the data and for caching
19:32 – They knew from the start that InfluxDB would be needed to store the time data
20:33 – Redis is being used as a cache through AWS ElastiCache
21:49 – nginx is sitting in front of the Rails application with Elastic Beanstalk
22:44 – Motivation for picking AWS over Google Cloud and other providers
23:40 – AWS Aurora is being used to manage PostgreSQL
24:51 – They are using the Rails 6.x feature to select multiple databases
25:33 – Rails is very nice when it comes to getting community driven features merged in
26:08 – Julien also really likes Phoenix and here’s how they use contexts
28:50 – File uploads are sent directly to S3 using the ex_aws Elixir library
30:02 – For Kubernetes, they are using the managed EKS service from AWS
31:07 – (2) pretty beefy boxes with 8 GB of memory power the EKS cluster (overkill)
31:36 – They are still feeling out the resource usage of their services
32:18 – (20)’ish EC2 instances power the Elastic Beanstalk set up for the Rails app
32:54 – CloudFront is being used as a CDN for book covers but not much else
33:38 – Walking us through a code deploy from development to production
34:46 – Getting rid of Jenkins is the next step but GitHub Actions is a bit insecure currently
35:49 – GitHub Actions is a great tool and it’s being used for more than just CI
36:44 – You can use GitHub Actions to run tasks periodically (separate from git pushes)
37:27 – Dealing with big database migrations with scheduled down time
38:54 – New Relic and the ELK stack take care of metrics and logging
39:18 – Sentry.io (self hosted version) is being used to track exceptions
39:42 – The time series data doesn’t end up getting logged by these tools
40:20 – Braintree is used as a payment gateway to handle credit card and PayPal payments
41:44 – Transactional emails are being sent through AWS SES
42:24 – In app notifications are coming soon to SmartMusic (websockets, etc.)
44:05 – Another use case for websockets and events will be for collaboration features
44:49 – The databases are backed up daily and S3 is very redundant in itself for user files
45:31 – VictorOps handles alerting if a service happens to go down
45:58 – New Relic is being used in a few of the applications
46:55 – Handling bot related issues with nginx’s allow / deny IP address feature
48:46 – Best tips? Make a solid proof of concept in your new tech before switching to it fully
50:36 – Biggest mistake? Trying to use your old coding habits in a different language
51:27 – Dealing with N + 1 queries with GraphQL using DataLoader
52:58 – Ecto Multi is awesome for ensuring multiple things happen successfully
54:10 – Check out Julien’s blog, @julienXX on GitHub and he’s on Source Hut
Links
📄 References
https://en.wikipedia.org/wiki/Practice_(learning_method)#Deliberate_practice
https://www.w3.org/2017/12/musicxml31/
https://emberjs.com/
https://guides.rubyonrails.org/active_record_multiple_databases.html
https://github.com/dependabot
https://www.elastic.co/what-is/elk-stack
http://nginx.org/en/docs/http/ngx_http_access_module.html
https://martinfowler.com/bliki/BlueGreenDeployment.html
https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping
https://sourcehut.org/
⚙️ Tech Stack
rails →
ruby →
graphql →
phoenix →
elixir →
aurora →
aws →
braintree →
dotnet-core →
eks →
github-actions →
influxdb →
jenkins →
kubernetes →
new-relic →
nginx →
postgres →
react →
redis →
s3 →
sentry →
ses →
victorops →
🛠 Libraries Used
https://github.com/ex-aws/ex_aws
https://github.com/absinthe-graphql/absinthe
https://github.com/graphql/dataloader
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Feb 3, 2020 • 47min
VA.gov Provides an API to Get Information about Veterans
In this episode of Running in Production, Charley Stran goes over building
the developer.va.gov API with Ruby on Rails and
React. It’s running on 10+ auto scaling EC2 instances on AWS GovCloud and has
been since mid-2018.
There’s around 140,000+ lines of code and ~20 developers. We covered what it’s
like working on government contracts, how AWS GovCloud is different than the
regular AWS platform, the code base being open source, code reviews and a whole
lot more.
Topics Include
2:17 – 20 developers (~50 people total) run just the developer.va.gov site
3:10 – The platform has been up and running for 18+ months
4:28 – Motivation for using Ruby on Rails
5:55 – The application is running Rails 5.2, but they want to upgrade to 6.x
6:14 – It’s currently a single Rails monolith but it may get broken up at some point
8:13 – What’s it like working on a government contract?
9:13 – The app is roughly 140,000+ lines of code which is API driven and uses React
10:25 – The entire application is open source on GitHub (to my surprise)
11:32 – What makes React a good fit for this application? Complicated forms mostly
13:56 – The VA has their own UI design specifications publicly posted
15:09 – Tailwind CSS isn’t being used but Charley likes it
16:07 – Docker is being used in production and it runs on AWS GovCloud
17:59 – PostgreSQL and Redis are used but there’s not a ton of data in the DB
18:45 – How AWS GovCloud is different than the regular AWS platform
20:32 – It’s all on EC2 instances that’s managed by Terraform and Ansible
21:15 – They use Auto Scaling Groups, CloudWatch, SNS, Elasticsearch and more
22:45 – Sentry.io is being used for error reporting
23:03 – Getting external services approved for usage on AWS GovCloud
23:56 – On average 10-15 t3.large instances power the web servers, but it fluctuates a lot
25:41 – The EC2 instances are running the Amazon Linux 2 AMI
26:35 – Each deploy takes about 20 minutes to run from start to finish
27:28 – Charley walks us through deploying from development to production
29:24 – So far he hasn’t had to get woken up at 3am (except from his 2 year old)
30:07 – Jenkins controls their CI pipeline, which is kicked off from git pushing code
30:54 – With multiple instances and an ELB, there are zero downtime deploys
31:16 – Database migrations can sometimes get complicated
32:14 – They aim for 90%+ test coverage
33:10 – Between 2 and 5 developers typically review code before it gets merged
33:52 – Their team works remotely and waiting for builds can get interesting
35:08 – Rubocop analyzes the code base along with Code Climate
35:50 – A “development” environment exists on AWS but developers run the code locally
36:45 – VCR is used to help cache remote API calls to other VA systems
38:27 – Each API has its own version
39:47 – Attempting to get rid of the need for fax machines
40:41 – All of the data is backed up and recovery would be quick if something went wrong
42:18 – How is Terraform being used?
43:03 – Best tips? With undocumented APIs, write tests and pry into the details
44:10 – Biggest mistakes that were corrected? The mocking layer
45:17 – Every developer is accountable for their work and will help to resolve issues
46:27 – Charley’s consulting company Oddball is hiring and you can also find him on Twitter
Links
📄 References
https://oddball.io/
https://twitter.com/dhh
https://en.wikipedia.org/wiki/Agile_software_development
https://en.wikipedia.org/wiki/Freedom_of_Information_Act_(United_States)
https://en.wikipedia.org/wiki/Edward_Snowden
https://stimulusjs.org/
https://design.va.gov/
https://tailwindcss.com/
https://aws.amazon.com/govcloud-us/
https://codeclimate.com/
https://youtu.be/NV3sBlRgzTI?t=35 (First Principles explained by Elon Musk)
⚙️ Tech Stack
rails →
ruby →
node →
react →
ansible →
aws →
cloudwatch →
elasticsearch →
jenkins →
open-source →
postgres →
redis →
s3 →
sentry →
sns →
terraform →
🛠 Libraries Used
https://github.com/rubocop-hq/rubocop
https://github.com/vcr/vcr
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Jan 27, 2020 • 56min
Kernl.us Helps WordPress Plugin and Theme Developers Manage Updates
In this episode of Running in Production, Jack Slingerland goes over building
his platform with Express / Node. It handles 2.5+
million requests a day and hosting costs about $65 / month on DigitalOcean for
2 web servers and a few other things. It’s been up and running since early
2015.
Jack wrote 100,000+ lines of code on his own in his spare time. We talked about
buildings monoliths, switching from Apache to nginx during a 10 hour car ride,
keeping your deployments as simple as possible (even with zero down time) and a
whole lot more.
Topics Include
1:22 – From 2,000 requests a day to 2.5 million requests a day in a few years
2:01 – WordPress is still really popular
2:39 – Motivation for using Express and Node
5:30 – TJ Holowaychuk created Express and he was a JavaScript legend
6:06 – Express is still actively developed by the community
6:26 – The back-end is using ES6 JavaScript
7:46 – There’s 100,000+ lines of code and Jack wrote it all
8:05 – What does Kernl allow WordPress developers to do?
10:27 – The 100k lines of code includes and back-end and front-end
12:08 – The code is split up across a few git repos
12:42 – Breaking a few things out into services came naturally, it wasn’t forced
14:09 – A new WordPress site health monitor service will be coming out soon
15:50 – Part of the reason for choosing Angular with an API back-end was to learn new things
16:29 – MongoDB, PostgreSQL, Redis and Node
17:13 – Some of the Node services are using TypeScript
17:37 – Is it worth it to refactor the other services to use TypeScript? Probably not
18:38 – This whole app is a long running side project that’s worked on after hours
19:25 – Trello plays a huge role in helping Jack organize what to do
20:21 – Jack’s super power is being able to context switch really quickly
21:48 – DigitalOcean is being used to host the site and Stripe handles payments
22:17 – Pusher is used to update a counter on the home page with websockets
23:12 – SendGrid is used to send out emails
23:42 – Stripe isn’t configured for SCA, he’s still on an API version from 2016 (but it works)
24:58 – What does log management look like with 75 million requests a month?
26:20 – DigitalOcean’s managed load balancer replaces what nginx used to do
27:20 – Docker isn’t being used in development or production
28:21 – Jack’s been running his own Linux servers since 2002
29:06 – Ubuntu 18.04 (LTS) is being used on all of Kernl’s servers
29:37 – What will upgrading to Ubuntu 20.04 (LTS) look like for you?
30:26 – (2) $5 / month DigitalOcean servers power the entire web application
31:41 – Node serves static files directly, but there’s very few requests for static assets
33:16 – 1 static asset is served from S3 because it needs to handle massive traffic spikes
33:57 – What about DigitalOcean Spaces? It’s just not stable enough (and I agree)
36:22 – How code gets from Jack’s dev box into production
37:32 – Deployments are done on multiple servers at once in parallel with no down time
38:04 – How zero down time deploys are handled without a complicated set up
39:35 – His main competitor had hours of down time so that had to be avoided
40:20 – Secrets get transferred straight from dev to the server over SSH / SCP
41:13 – Have you thought on what would need to change if 2 devs worked on the project?
42:15 – The song and dance of making “Fix CI” commits until it’s actually fixed
42:29 – All customer data is backed up daily and things can be recreated quickly
43:26 – Configuration management takes time to learn which is why it’s done by hand
45:00 – Pingdom will send out alerts if the site goes down, it gets checked every minute
45:31 – Switching from Apache to nginx in the middle of a 10 hour car ride with his wife
47:39 – Experiencing problems like that really helps you learn what not to do
48:29 – DigitalOcean alerts are also set up for additional system resource alerts
49:05 – Block Storage is also being used (it’s an extra drive you can connect to a server)
50:24 – Best tips? If you’re not comfortable with a technology, don’t self host it
51:44 – Jack pays $65-100 a month for DigitalOcean hosting which includes everything
53:37 – Biggest mistake? Probably using MongoDB because his schema is very relational
54:52 – Check out Kernl and you can also find Jack on Twitter @jackslingerland
Links
📄 References
https://wordpress.com/
http://boringtechnology.club/
https://twitter.com/tjholowaychuk
http://es6-features.org/
https://trello.com/en-US
https://pm2.keymetrics.io/
⚙️ Tech Stack
express →
node →
angularjs →
bitbucket-pipelines →
datadog →
digitalocean →
mongodb →
pingdom →
postgres →
redis →
s3 →
sendgrid →
sentry →
stripe →
ubuntu →
websockets →
🛠 Libraries Used
https://www.npmjs.com/package/pg
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Jan 20, 2020 • 55min
ScrollKeeper Is a Collaboration Tool for Researchers
In this episode of Running in Production, Ian Butler goes over building a
collaboration tool for researchers called Scrollkeeper. Ian is all-in with AWS and hosting costs about $400 / month for a
multi-node AWS ECS cluster. It’s been up and running since mid 2019.
His app is ~8,000 lines of Elixir code and there are a few things being done
through AWS Lambda. We talked about ways to run various AWS services locally,
auto-scaling, background workers, developing a custom caching solution and a
whole lot more.
Topics Include
1:19 – Working on it during nights and weekends as a part time project
2:03 – Motivation for using Phoenix and Elixir
3:09 – Ian is confident that if it came down to it, he could hire Elixir developers later
3:43 – Generally speaking it’s a monolithic app but it has a few tiny services broken out
4:15 – At the moment there’s not much benefit in breaking it out into umbrella apps
4:50 – The main app is about 8,000 lines of Elixir and it’s using Phoenix contexts
5:21 – Phoenix has some opinions but you still need to make a lot of decisions
5:48 – A couple of context names that are used in the main app
6:34 – Parsing PDF files and doing some work in the background
7:40 – Phoenix channels are used to ferry back the parser’s status to the user
7:54 – Live View isn’t currently being used
8:08 – It’s a split app style with a Phoenix API and a React front-end
8:24 – Live View currently doesn’t have enough features to replace a React app
10:20 – The PDF parsing rabbit hole goes quite deep
10:51 – AWS ECS, AWS Lambda and a service called Spotinst helps run and scale the app
12:16 – Docker isn’t being used in development but tests are run in Docker
12:52 – Gitlab CI handles testing, building images and pushing them to ECR
13:54 – Why exactly isn’t Docker being used in development?
14:41 – Testing various AWS services locally using Local Stack
16:21 – Local Stack is open source and free but they also have a paid tier if you want it
17:32 – PostgreSQL is the primary database and S3 is used for storing flat files
17:54 – The ex_aws library is used to connect to AWS using Elixir
18:43 – Caching is being done directly in Elixir with a custom GenServer approach
20:27 – What exactly is being cached? Mainly the PDF documents
21:11 – nginx isn’t being used because AWS’ API Gateway and load balancers fill that role
21:26 – Static files are being hosted with AWS CloudFront (CDN)
21:53 – Going all-in with AWS and being very happy with it for productivity
22:57 – 3 to 6 EC2 instances are used in the ECS cluster depending on the load
23:14 – Scaling up is automated and takes about 30 seconds
23:37 – Spotinst helps with that by having idle machines that are ready to go
23:57 – The highest load comes from uploading many fairly large PDF files in parallel
24:44 – Between AWS’ and Spotinst’ logs and alarms, it’s easy to keep an eye on the load
25:53 – Stripe is being used to handle payments and the payment strategy is interesting
26:58 – Moving to Stripe’s hosted checkout eventually would remove a lot of stress
27:46 – Currently PaymentIntents and SCA isn’t supported by choice
29:10 – Handling payments was the last feature that was added to the app before shipping
30:16 – PayPal isn’t supported yet because there’s only so many hours in the day
31:09 – The only emails being sent out are for user actions which is handled by Cognito
32:01 – Walking us through a deployment from development to production
33:13 – ECS has gotten a little nicer to work with in regards to updating services
34:17 – Having issues with AWS App Mesh and Envoy due to issues with websockets
36:27 – Secrets are managed with env variables hard coded into the task definition files
37:56 – The AWS web console is starting to become quite good
39:14 – Rolling restarts are done over ECS to deploy without downtime
39:52 – How do you deal with draining worker connections to avoid losing partial uploads?
41:11 – Rihanna is an Elixir job processing library backed by PostgreSQL
42:00 – You usually end up needing a job processing library even with Elixir
44:03 – As for backups, all of the data and flat files are backed up and could be recovered
44:45 – IP bans at the firewall level helps with denial of service attacks
45:48 – Everything together on AWS costs about $400 / a month
47:13 – Gust Launch gave him $15,000 in AWS credits for starting a business with them
48:38 – Trying to crash things on purpose by throwing massive traffic at it
49:25 – Thinking that all users are out to get you and designing your app to be robust
50:18 – Plug.Upload goes straight to S3 to handle file uploads
51:31 – Best tips? Profile and load test your app before you launch your app publicly
52:35 – If you have a lot of task based jobs, look into using Lambda early on
54:06 – Check out Scrollkeeper, also Ian is on Twitter and GitHub
54:41 – Ian also wrote a blog post series on writing your own web crawler in Elixir
Links
📄 References
https://spotinst.com/
https://github.com/localstack/localstack
https://www.envoyproxy.io/
https://docs.microsoft.com/en-us/azure/architecture/patterns/sidecar
https://github.com/samphilipd/rihanna
https://gust.com/launch
⚙️ Tech Stack
phoenix →
elixir →
react →
acm →
aws →
cloudfront →
docker →
ecs →
gitlab-ci →
lambda →
postgres →
route53 →
s3 →
serverless →
stripe →
🛠 Libraries Used
https://github.com/mozilla/pdf.js
https://github.com/ex-aws/ex_aws
https://github.com/samsondav/rihanna
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Jan 13, 2020 • 1h 9min
Remote.com Helps You Find Remote Jobs Anywhere
In this episode of Running in Production, Marcelo Lebre goes over building a
remote job platform on remote.com. They serve about
100k+ requests a day and it’s all hosted on a small AWS ECS cluster. It’s been
up and running since early 2019.
We covered a lot of ground, from using Elasticsearch to developing API based
applications to Elixir’s ecosystem and everything in between. One takeaway is
to be mindful of over engineering your code base and try to focus on the things
that are important to your application.
Topics Include
3:15 – Taking over the old remote.com site and rebuilding it from scratch
4:03 – Motivation for rewriting everything in Elixir and Phoenix
6:25 – Phoenix quickly became a go-to tool for Marcelo
7:27 – He first used it in another project to deal with 1000s of requests per minute
8:08 – It was pretty easy to get the rest of his team on-board with Elixir
9:40 – Writing a monolithic code base using single responsibility principles
11:47 – The code base’s lib/ folder has 500+ files and many thousands of lines of code
12:59 – Phoenix contexts are being used but with an added level of SRPs
14:26 – This will allow Marcelo to easily break up his code base later if needed
16:15 – The value in writing about things after having a lot of real world experience
20:28 – The Phoenix app is API based with a React front-end
21:36 – NextJS is also being used to have server rendered pages
22:31 – Letting developers pick and choose their dev style leads to increased productivity
23:36 – The front-end team doesn’t need to know about the Phoenix back-end
24:52 – There’s 3 different git repos (back-end, front-end and the new app)
25:13 – The front-end devs can get going easily due to excellent documentation
27:19 – Docker is being used in staging and production but not development
29:43 – PostgreSQL, Redis and Elasticsearch are being used and it’s hosted on AWS ECS
30:16 – Not super happy with AWS ECS and he wants to switch to Kubernetes eventually
30:51 – Gitlab is used for continuous integration and deployment
31:17 – Reasons for going with Elasticsearch (full text search and caching)
35:59 – Live View isn’t being used but he’s keeping an eye on it
37:40 – The core of what’s needed to make Live View amazing is there
38:48 – Everyone would still be using IE 5 if no one tried things out in practice
40:53 – The Elixir community is very helpful if you get stuck
41:09 – Keeping the ops side under control by using AWS for the DB, cache and ES
42:27 – After looking at Heroku, Google Cloud, Azure and others - AWS looked good
43:26 – Serving 100k to a million requests per day for $4,000 a month before Phoenix
45:26 – The moment Remote switched to Phoenix the bill dropped to $500 / a month
45:56 – The entire staging environment runs with ~512 MB of memory
46:01 – The production environment runs on 2-4 instances with 2 GB of memory each
46:36 – Exq is being used to process jobs and it’s running in its own service
47:35 – 2 web servers and 1 worker is enough to serve things with a ~150ms response time
48:24 – Redis is being used to cache a few things (nothing like Rails level caching)
49:16 – Elixir / Phoenix are incredibly efficient with memory usage and releasing it
51:11 – AppSignal’s notifications help with giving a general peace of mind
52:28 – Searching and applying to jobs is free but posting jobs require a payment
54:44 – Payments are handled with Stripe using the stripity_stripe library
55:40 – Walking through a code deploy from development to staging to production
57:04 – The Docker images are being stored on Gitlab’s registry service
57:23 – Secret management is done with Gitlab’s secret management tool
58:47 – Elixir releases are created using Docker multi-stage builds
1:00:05 – Having small Docker images helps a lot due to their deployment strategy
1:00:21 – Everything is backed up and the cache can be recreated from the DB
1:02:05 – Mail Gun is used for transactional emails and customer.io is used too
1:02:39 – Sometimes you get used to a service or tool always existing
1:03:34 – Best tips? Be careful about over engineering and keep your eyes on the prize
1:06:18 – Mistakes? Not enough tests, too many tests, finding a perfect structure, etc.
1:06:53 – Coming to acceptance that Elixir and Phoenix has a fairly new ecosystem
1:07:52 – Check out the remote.com blog or follow Marcelo on Twitter @marcelo_lebre
Links
📄 References
https://en.wikipedia.org/wiki/Single_responsibility_principle
https://www.getpostman.com/
https://dockyard.com/blog/2018/12/12/phoenix-liveview-interactive-real-time-apps-no-need-to-write-javascript
https://martinfowler.com/bliki/BlueGreenDeployment.html
https://edgeguides.rubyonrails.org/caching_with_rails.html#russian-doll-caching
https://customer.io/
⚙️ Tech Stack
phoenix →
elixir →
react →
appsignal →
aws →
cloudflare →
docker →
ecs →
elasticache →
elasticsearch →
gitlab-ci →
mailgun →
nextjs →
postgres →
rds →
redis →
stripe →
🛠 Libraries Used
https://github.com/akira/exq
https://github.com/code-corps/stripity_stripe
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Jan 6, 2020 • 1h 14min
Learn Ruby on Rails through Screencast Tutorials on GoRails
In this episode of Running in Production, Chris Oliver goes over how he
builds and deploys his screencast tutorial platform called GoRails. The site handles about 2 million page views a year on a
single $20 / month DigitalOcean server. GoRails has been up and running since
2014.
There’s a lot of useful nuggets of information in this episode around keeping a
pulse on similar communities that you’re in. For example, Chris took a lot of
inspiration from Laravel when it came to implementing the billing code for
GoRails. Spoiler alert: Rails does scale.
Topics Include
1:42 – Avoiding burn out by having a 2nd project to work on
3:11 – Scratching your own business needs is a healthy way to drive a project
4:13 – GoRails gets 2 million page views a year (~500k unique visitors)
4:36 – Looking at Laravel for inspiration when it comes to batteries included
7:12 – Talking a bit about Bootstrap vs Tailwind CSS
9:47 – Being aware of developer driven vs user driven features
10:24 – GoRails uses server side templates with Turbolinks
13:11 – Using Turbolinks has been good but there are gotchas
14:16 – Flatpickr is a really nice datetime picker with minimal dependencies
14:43 – Websockets and Action Cable aren’t used in GoRails but it is with Hatchbox
17:03 – Introducing just enough JavaScript complexity as needed, but no more
18:54 – Trying to avoid heavy client side JS for performance issues on low end devices
20:09 – GoRails is using Rails 6.x with Webpacker but it’s not using Sidekiq
22:31 – Docker isn’t being used in development or production to keep complexity low
23:40 – PostgreSQL is used as a primary database along with Redis for caching
25:13 – Using the strong migrations gem to help make production migrations less scary
28:23 – Hopefully more advanced database related features make its way into Rails
29:31 – The entire GoRails site is hosted on a single $20 / month DigitalOcean server
30:24 – Making extensive use of multi-level caching helps a lot for performance
31:57 – Passenger is being used as the web server (it’s an nginx module)
34:15 – Let’s Encrypt is still being used on the server for end to end encryption
36:28 – Errbit is being used for catching errors which gets emailed back to him
37:47 – Keeping tracking in house with Ahoy to keep costs down and help against fraud
40:35 – Wistia is used for hosting / streaming videos and it has useful built in metrics
43:04 – Manually transcoding video is hard and expensive (Wistia does the dirty work here)
44:02 – Both Stripe and BrainTree are being used as payment gateways
45:49 – Inspired by Laravel, Chris wrote a Rails Engine called Pay
46:50 – It took 3 months to get payments to work with Stripe’s new SCA APIs
48:12 – Accepting payments went from being simple to outrageously complex
50:24 – You should deal with SCA now in the US to future proof yourself later
52:06 – Even the database is hosted on that single $20 server (2 CPU cores / 4 GB of memory)
52:36 – Honestly the database for GoRails is pretty tiny but it’s heavily backed up
55:39 – Walking through the deployment process from development to production
57:57 – GoRails isn’t using Hatchbox yet, but it will be eventually
58:13 – Upgrading Ubuntu LTS releases gets tricky without a 2nd web server
59:46 – Having a managed database would help with upgrading servers with minimal risk
1:00:41 – There’s a few seconds of down time for each deploy at the moment
1:01:30 – Passenger isn’t just for Ruby apps, it works with Python and Node too
1:02:34 – Everything will come up automatically after a system reboot
1:05:25 – Environment variables are protected with Rails’ encrypted credentials
1:07:39 – Best tips? Things are more changeable than you think, keep it simple initially
1:08:20 – Always keep your master branch deployable with automated tests
1:10:12 – Open sourcing and writing about the tools you’ve built helps everyone
1:13:08 – Chris is on twitter @excid3, also check out GoRails, Hatchbox.io and Jumpstart
Links
📄 References
https://www.hatchbox.io/
https://jumpstartrails.com/
https://twitter.com/dhh
https://www.youtube.com/watch?v=Gzj723LkRJY (famous 15 minute blog in Rails)
https://tailwindcss.com/
https://twitter.com/adamwathan
https://github.com/flatpickr/flatpickr
https://twitter.com/andrewkane
https://egghead.io/
⚙️ Tech Stack
rails →
ruby →
turbolinks →
stimulusjs →
bootstrap →
braintree →
cloudflare →
digitalocean →
lets-encrypt →
nginx →
postgres →
redis →
s3 →
stripe →
ubuntu →
webpack →
wistia →
🛠 Libraries Used
https://github.com/rails/webpacker
https://github.com/brandonhilkert/sucker_punch
https://github.com/ankane/strong_migrations
https://github.com/ankane/production_rails
https://github.com/ankane/secure_rails
https://github.com/phusion/passenger
https://github.com/puma/puma
https://github.com/errbit/errbit
https://github.com/ankane/ahoy
https://github.com/pay-rails/pay
https://github.com/backup/backup
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Dec 30, 2019 • 41min
Logflare Is a Log Management and Event Analytics Platform
In this episode of Running in Production, Chase Granberry goes over running a
logging platform that deals with 7+ billion log events per month. The back-end
and front-end is powered by a Phoenix / Elixir application that’s running on
Google Cloud (GCP).
6 pretty beefy servers power everything but for a long time it was all on 1
server. Also, Live View is being used for search results and a few counters on
the web dashboard. Phoenix Tracker is being used for a cluster-wide rate
limiter too. The app is open source on
GitHub.
Topics Include
2:16 – Over 7 billion log events are being handled per month
2:43 – What are CloudFlare apps?
3:42 – In the future native language libraries will have more support
4:11 – Currently there’s support in Elixir for the Logflare library
5:13 – Elixir and Phoenix is powering the web front-end for Logflare
5:35 – Motivation for choosing Elixir and Phoenix
8:35 – Phoenix allowed Chase to get things up and running in a few months
9:18 – The web UI is using server side templates with a touch of Live View
11:04 – Live View is mostly used for counters but it powers a search page too
12:24 – A monolithic / mono-repo Phoenix app ingests all of the logs
12:49 – Phoenix contexts are being used to break up the domain a bit
13:47 – Docker isn’t being used in dev, but it is in production on Google Cloud
14:34 – Google’s managed instance groups are being used to host the app
15:28 – These managed instance groups help with doing rolling deploys
16:24 – Motivation for choosing GCP came down to free hosting credits mostly
16:57 – (6) 16 CPU core / 32 GB of memory instances power the Phoenix app
17:49 – Each instance has an identical copy of the Phoenix app
18:23 – Google’s container-optimized OS is being used
19:10 – PostgreSQL stores a bit of user data, but the logs are in BigQuery
19:38 – BigQuery does not work with Ecto, but it’s a fairly simple set up
19:58 – For development, Chase connects to a dev BigQuery database on GCP
20:27 – Cowboy is in front of Google’s load balancer (nginx isn’t being used)
20:51 – SSL certificates are issued and signed by CloudFlare
21:35 – A step by step walk through of how code goes from development to production
25:04 – It takes 5 minutes for each server to get drained, but it’s configurable
25:45 – Having manual components of your deploy can be beneficial
26:26 – Logflare is very stateful, so a 5 minute time-out is necessary
27:43 – Chase uses his own tool and there are public charts to look at online
28:44 – Scratching your own itch is a great way to build a useful service
28:58 – Sign ups for free right now, but a billing system is coming soon
29:26 – Getting $3,000, $17,000 and then $80,000 in GCP hosting credits for free
30:51 – A majority of the cost is processing the log events on the web servers
31:55 – Server costs could probably be cut down but Chase wanted to learn / experiment
32:49 – Initially everything was on a single 32 CPU core server (3 billion reqs / month)
33:54 – Rate limiting is cluster-wide and uses Phoenix Tracker
36:34 – Using Redis vs rolling your own Elixir based set up for cluster state
38:29 – Best tips? Stop over-thinking things because just starting helps a lot
39:36 – Chase is on Twitter @chasers and Logflare is open source on GitHub
Links
📄 References
https://cloud.google.com/
https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups
https://martinfowler.com/bliki/CanaryRelease.html
https://www.cloudflare.com/apps/developer/offers/google
⚙️ Tech Stack
phoenix →
elixir →
bigquery →
cloudflare →
docker →
gcp →
logflare →
open-source →
postgres →
🛠 Libraries Used
https://github.com/ninenines/cowboy
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Dec 23, 2019 • 1h 17min
ScholarPack Runs 10% of the UK's Primary Schools and Gets Huge Traffic
In this episode of Running in Production, Gareth Thomas goes over running a
platform that helps manage 3.5+ million students. There’s over 1,500 databases
and it peaks at 65k requests per second. A legacy Zope server and a series of
Flask microservices power it all on AWS Fargate.
ScholarPack been running in production since 2010.
This episode is loaded up with all sorts of goodies related to running
microservices at scale, handling multi-tenancy databases with PostgreSQL,
aggressively using feature flags and so much more.
Topics Include
0:57 – The current stack is a legacy Zope system combined with Flask
1:27 – ScholarPack has been running for 12+ years and Zope was very popular then
2:12 – 10% of the schools in the UK are using ScholarPack, it peaks at 65k reqs / second
2:40 – Their traffic patterns are predictable based on a school’s working hours
3:39 – Feature development during school sessions / architecture upgrades during holidays
4:36 – Zope vs Flask and the main reason they wanted to move to Flask
6:20 – Since Flask is so flexible, you need to be on the ball with setting standards
7:06 – 17-18 folks deal with the infrastructure and development of the project
7:31 – Gareth has a fetish for microservices but it really does fit well for their app
8:00 – Microservices let you split out your responsibilities and independently scale
8:47 – At their scale, downtime can have a serious impact on the kids at school
10:16 – A well maintained skeleton app works wonders for working with microservices
11:15 – A developer’s workflow for starting and working with a microservice
12:10 – Mocking responses for unrelated services helps with the development process
14:32 – Dealing with multi-tenancy across 1,500+ databases using SQLAlchemy binds
16:59 – Splitting the data up with a foreign key and 1 database would be too risky
18:02 – A school’s database gets picked from a sub-domain / unique URL
19:15 – What it’s like running database migrations on 1,500+ databases with PostgreSQL
20:03 – Point in time database backups make running so many migrations less scary
20:52 – Point in time backups are why they are on AWS instead of GCP
22:26 – Most services render Jinja on the server with sprinkles of JavaScript
23:08 – Supporting browsers like IE8 limits what you can do on the front-end
24:58 – IE8 is getting a little crusty, but it’s necessary to support it
26:29 – Redis and CloudFront are the 2 only other services being used in their stack
27:39 – Using signed cookies vs Redis for storing session state
28:56 – What about Celery and background workers? Most things are synchronous
29:41 – Celery could still be used in the future since it has benefits
30:13 – Schools do pay to use this service, but not with a credit card
34:32 – Using checks has an advantage of not needing a billing back-end
36:04 – Cost and scaling requirements of their old platform lead them to AWS Fargate
37:34 – GCP was looked into initially but the lack of point in time backups killed that idea
38:07 – The added complexity of going multi-cloud wasn’t worth it and RDS won
38:50 – Managed Kubernetes is not that great on AWS (especially not in 2017)
39:03 – ECS was also not going to work out due to their scaling requirements
39:20 – Fargate allows them to focus on scaling containers, not compute resources
40:21 – The TL;DR on what AWS Fargate allows you to do and not have to worry about
42:25 – Their microservices set up fits well with the Fargate style of scaling
43:11 – You still need to allocate memory and CPU constraints on your containers
44:40 – Everything runs in the AWS UK region across its multiple availability zones
45:10 – AWS initially limits you to 50 Fargate containers but you can easily raise that cap
46:06 – Setting a cap on the number of containers Fargate will ever spawn
46:30 – Pre-warming things to prepare for the massive traffic spike at 9am
47:25 – It’s fun to watch the traffic spikes on various dashboards
48:05 – Number of requests per host is their primary way to measure scaling requirements
48:32 – DataDog plays a big role in monitoring and reporting
49:08 – But CloudWatch is used too and DataDog alerts get sent to Slack
49:28 – Jira is used for error logging and ticket management
49:44 – 100s of errors occur a day in the legacy Zope system, but they are not serious
50:32 – It’s very rare to have a system level error where things are crashing
50:45 – The longest down time in the last 3.5 years has been 35 minutes
51:10 – All of the metrics to help detect errors have a strong purpose
52:16 – Walking through a deployment from development to production
52:29 – The Zope deployment experience has been a dream come true
54:02 – The Flask deployment has more steps but it’s still very automated
55:59 – Dealing with the challenges of doing a rolling restart
57:12 – Complex database changes are done after hours with a bit of down time
57:41 – That’s a great time to do a Friday evening deploy!
57:56 – Most new additions are behind a feature toggle at the Flask level
58:35 – Feature flags can be tricky but a good mindset helps get everyone on board
1:00:08 – A company policy combined with experience dictates best practices
1:00:43 – Switching from Flask-RESTPlus to Connexion
1:01:03 – What is Connexion and how does it compare to other API libraries?
1:03:07 – It only took a few days to get a real API service running with Connexion
1:04:04 – Everything is in git, it’s all deterministic and they use Pipenv with lock files
1:04:57 – The Zope structure is in a RAID file system and has daily backups
1:05:27 – Extensive user auditing is done at the database level (everything is logged)
1:07:06 – The audit tables get a huge amount of writes
1:07:38 – (10) t3.2xlarge (8 CPU cores / 32 GB of memory) instances power the RDS database
1:08:07 – How much does it all cost on AWS? Too much!
1:08:49 – The cloud is nice but you need to really keep tabs on your bills
1:09:54 – Gareth spends 2 days a month reviewing the AWS bills
1:10:16 – RDS will automatically restart stopped instances after 7 days
1:11:18 – Best tips? Look at what you have, what you want to do and how to get there
1:12:36 – A microservice should be broken up by its scope / domain, not lines of code
1:13:24 – There is no “wrong”, there is only the thing that works
1:13:50 – One mistake they did early on was try to be too perfect which delayed shipping
1:15:34 – Gareth is on Twitter @thestub
and his personal site is at https://munci.co.uk
Links
📄 References
https://en.wikipedia.org/wiki/Single_responsibility_principle
https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks
https://www.getpostman.com/
https://aws.amazon.com/fargate/
https://docker.com
https://flask-sqlalchemy.palletsprojects.com/en/2.x/binds/
https://en.wikipedia.org/wiki/Universally_unique_identifier
https://en.wikipedia.org/wiki/Benefit_corporation
https://aws.amazon.com/rds/
https://github.com/helm/helm
https://www.elastic.co/what-is/elk-stack
https://en.wikipedia.org/wiki/Open_API
https://www.thoughtworks.com/insights/blog/microservices-nutshell
https://martinfowler.com/articles/break-monolith-into-microservices.html
https://martinfowler.com/bliki/StranglerFigApplication.html
⚙️ Tech Stack
flask →
zope →
python →
aws →
centos →
cloudfront →
cloudwatch →
datadog →
docker →
ecs →
fargate →
jenkins →
postgres →
rds →
redis →
slack →
terraform →
🛠 Libraries Used
https://www.zope.org/
https://www.sqlalchemy.org/
https://github.com/psf/black
https://github.com/pgbouncer/pgbouncer
https://github.com/zalando/connexion
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Dec 19, 2019 • 1h 5min
Load Balance, Secure and Observe Your Web Applications with Nova ADC
In this episode of Running in Production, Dave Blakey goes over how their
load balancing service (Nova) handles 33,000+
events per second across a 100+ server Kubernetes cluster that runs on both AWS
and DigitalOcean. There’s a sprinkle of Serverless thrown in too.
If you ever wondered how a large scale app is developed and deployed, you’re in
for a treat. Some of Nova’s clients spend $5,000,000+ a month on hosting fees.
We covered everything from development best practices, how to create a scalable
application architecture and how they replicate their environment across
multiple cloud providers and regions.
P.S., Nova is also really useful for small sites too and they have a free tier
with no strings attached, so you may want to give it a
try.
After this episode the first thing I thought was “wtf, why am I not using
this?”. I’m going to start playing with it over the holidays for my own sites.
Topics Include
1:31 – 2 teams composed of 9 developers work on the back-end and front-end
1:59 – Motivation for choosing Golang for the back-end came down to scaling requirements
2:57 – Tens of thousands of clients are connected to 1 point of control (the Golang server)
3:24 – Balancing operational scale with developer programming speed
3:43 – Their dev team has lots of programming experience and decided Golang was solid
4:28 – The client / server architecture of how their load balancer is installed
5:38 – The “cloud” component which is the managed web UI to configure your load balancer
5:54 – The web UI is written in PHP using Laravel
6:39 – It wasn’t really a matter of using Laravel, it was “should we even use a framework?”
7:16 – Motivation for picking Laravel for the web interface
8:08 – Picking a web framework where hiring isn’t a problem and the documentation is great
8:47 – The Laravel app isn’t a monolithic app, many things run on Kubernetes and Serverless
9:38 – As an end user, if you click a button on the site it ultimately leads to a job running
9:57 – Docker and Vagrant are heavily being used in development
10:43 – This isn’t a typical web app, there’s lots of moving parts to deal with in development
11:34 – Vagrant makes it really easy to network together your VMs to other systems
12:08 – The value in spending the time in your dev tooling to spin new devs up quickly
12:46 – InfluxDB is being used as a time series database and what problems it solves
13:45 – After only 4 months of being around, we’re writing 33,000+ metrics per second
14:37 – Nova operates at massive scale but if you’re not, maybe stick with a SQL database
15:19 – Their load balancer is the single source that your clients (web visitors) connect to
15:50 – Even if Nova happens to have growing pains, it won’t affect your site’s up-time at all
17:18 – What makes Nova different than a load balancer on AWS, GCP or anywhere else?
17:42 – It’s an ADC with availability, security, caching and observability
18:37 – Nova is more than load balancing and there’s also multi-cloud hosting to think about
19:14 – For example, Nova is currently hosted on both AWS and DigitalOcean
19:30 – It’s difficult to rely on cloud specific services (ELB, ALB, Firewalls, etc.)
20:14 – Nova is replicated between AWS and DigitalOcean for redundancy
20:57 – (40) $20 / month servers on DigitalOcean running in Kubernetes
21:42 – And another (100) servers for their testing environment to perform load tests
21:55 – About (20) servers are running on AWS
22:01 – For the Nova load balancers, they are running on $5 / month DigitalOcean droplets
22:21 – Everything is running Ubuntu 18.04 LTS, except for a few servers running 19.x
22:49 – On AWS, those 20 servers range from $40-60 / month
23:07 – 2-4 CPU cores is the sweet spot for their work load, more cores doesn’t help much
23:55 – They run their own load balancer to manage their own infrastructure
24:29 – Most of their servers are a piece of a Kubernetes cluster
24:49 – The rest of the servers are template driven but we’re not using Ansible, etc.
25:42 – Those development changes were great because it makes things easier to scale
26:11 – Kubernetes is nice but it took a lot of changes in development to make it work
26:23 – There is no magic Kubernetes button to scale, it takes a lot of preparation
27:35 – Nova supports many different deployment environments, not just Kubernetes
28:11 – For example, you can load balance (2) DigitalOcean droplets, here’s how it works
30:01 – Doing things like rolling restarts is all handled by Nova
31:05 – Using Kubernetes is hard, especially for larger organizations
31:30 – What would the deploy process look like for an end user load balancing 2 servers?
33:35 – Performing an automated rolling restart with Kubernetes
34:16 – The dangers of a fully automated rolling restart without extra “smarts”
34:44 – Nova’s deploy process for their own infrastructure (Golang server and client)
36:21 – Their CI / CD environment runs on CircleCI but the deploy script is custom
36:57 – Secrets are managed mostly with environment variables
38:16 – Being cloud neutral is a trend right now (AKA, not being locked into a vendor)
38:41 – Moving the data, replication and keeping things in sync are the hardest parts
38:54 – End to end, a new web UI deploy for Nova could be done in under 10 seconds
40:20 – What about deploying the client / server component? It’s quite a bit different
40:57 – Shell scripts can go a long ways, especially for gluing together deployment code
41:44 – A lot of monitoring and reporting is kept in-house for performance reasons
42:12 – For error reporting on the web UI with Laravel, it goes to Sentry
42:16 – Then it’s all integrated with an agent-less DataDog and Slack notifications
43:17 – It’s nice using 3rd party tools that integrate easily like Laravel and Sentry
43:36 – Not using DigitalOcean’s alerting mainly due to working with a cluster of servers
44:30 – Being able to switch cloud hosting providers without a huge fuss
44:37 – One of Nova’s clients spend 5 million US dollars a month on cloud hosting fees
45:00 – Another client has 500+ load balances deployed across 5,000+ servers
45:30 – Spending that kind of money on hosting fees is a whole different level
45:59 – Nowadays a small team can be responsible for a huge amount of infrastructure
46:51 – WhatsApp and Instagram are great examples of a few devs with lots of end users
47:12 – We live in an interesting time where 1 developer can do so much
47:27 – Nova’s test environment has 1 million connected clients
48:00 – Their databases are run across multiple data centers and are auto-backed up
49:19 – Their business is redundancy and up-time, so good disaster plans are necessary
49:36 – Working with high traffic clients helped define best practices
50:31 – Nova is great for small sites too and you get 5 nodes for free (no strings attached)
52:01 – It’s not a pay to win system, the free tier has everything you would need at that scale
52:55 – Stripe is being used to process payments and it uses the new Payments Intents API
53:28 – Nova bills you per hour instead of per month and Stripe makes this really easy
55:46 – SendGrid is being used to send transactional emails to end users
56:12 – They send less than a 1,000 emails out a day (mostly for notifications and alerts)
56:41 – Their load balancer deals with handling SSL certificates using Let’s Encrypt
56:59 – End users of the service don’t need to worry about issuing their own SSL certs
57:34 – Push button, receive bacon
57:51 – If you’re in the public cloud you can also get encryption end to end
58:43 – Dealing with SSL at the load balancer level can save a lot of headaches
58:54 – It comes back to setting up your app using best practices in order to scale
59:35 – Best tips? Create a code contract to help keep your code sane for years to come
1:00:56 – It’s sort of like Heroku’s 12 factor guidelines but not really
1:01:06 – It’s more like 8 things to do to avoid an angry meme in your pull request
1:01:36 – It’s 8 core concepts (but it doesn’t need to be 8) to define your system’s purpose
1:02:49 – One mistake that was corrected was under estimating time series databases
1:03:52 – snapt
is their parent company, Nova
is the load balancer service that we talked about today, and you can also find
them @SnaptADC on Twitter
Links
📄 References
https://laracasts.com/
https://en.wikipedia.org/wiki/Time_series_database
https://en.wikipedia.org/wiki/Application_delivery_controller
https://en.wikipedia.org/wiki/Geo-fence
https://en.wikipedia.org/wiki/Web_application_firewall
https://martinfowler.com/bliki/BlueGreenDeployment.html
https://en.wikipedia.org/wiki/Timeline_of_Instagram
https://i.pinimg.com/originals/bb/98/e0/bb98e03ac44d2fa7b07cfd364eab6e2e.jpg
https://en.wikipedia.org/wiki/Heartbleed
https://12factor.net/
⚙️ Tech Stack
golang →
laravel →
php →
aws →
circle-ci →
cloudflare →
datadog →
digitalocean →
docker →
influxdb →
kubernetes →
lambda →
lets-encrypt →
postgres →
sendgrid →
sentry →
serverless →
slack →
stripe →
ubuntu →
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.

Dec 16, 2019 • 53min
Openship Is a Shopify App for Drop Shipping and Order Fulfillment
In this episode of Running in Production, Junaid Kabani goes over how he
built and deploys Openship which is a Shopify app
that was written in Koa. The front-end uses React.
We covered a lot of ground in this episode, such as how Prisma, Apollo, Next.js
and React all come together to build an app that uses Shopify’s API. There’s
also quite a lot of details on the value of testing and how CI helps keep open
source projects well tested.
Topics Include
1:02 – Junaid was running his own online store before making this app
1:38 – Zapier and Google Sheets worked for a while but it wasn’t sustainable
2:12 – Shopify’s API has extensive documentation
2:47 – Drop shipping is a great way to test items before holding your own inventory
4:24 – A lot of these services are trying to compete with Amazon’s fulfillment service
4:37 – Openship lets you transition from drop shipping to having your own inventory
5:24 – Drop shipping and testing items is almost like pre-selling an app idea
5:28 – Junaid hired a contractor early on to help with anything he gets stuck on
5:41 – He didn’t have much luck with StackOverflow early on (I’m not surprised!)
6:19 – He paid about $500 to $1,000 while developing his project and it was worth it
7:03 – Motivation for using Koa and Node
8:13 – Shopify has official packages for Koa
8:41 – Shopify lets you write custom apps in a lot of different web frameworks
9:26 – There’s an Apollo server and a React front-end with Prisma handling the data layer
9:53 – The back-end and front-end are in their own separate git repos
10:11 – Trade offs between working with a mono repo and multi-repo set up
11:32 – Going into a bit more details about the back-end / front-end set up
12:36 – Websockets might be used later when an upcoming messaging system is in place
12:57 – The work flow for adding Openship to your Shopify app
13:10 – Dealing with returns is cumbersome with drop shipping
15:24 – High level recap of the work flow as a shop owner
15:42 – End customers who purchase items see the usual Shopify checkout work flow
16:39 – The marketplace aspect of Openship is very powerful and it’s competitively priced
18:25 – Private labeling is another feature that’s coming soon
18:53 – The marketplace is a separate Shopify shop that uses Shopify’s API
19:06 – The Shopify app is hosted on DigitalOcean using CapRover (self hosted PaaS)
20:04 – Prisma runs on its own server which contains the MySQL database
20:12 – The 2nd server hosts the back-end (web server) and front-end (React app)
20:32 – It was all hosted on 1 server initially but it kept crashing
21:19 – Prisma is an open source CMS for a bunch of popular databases
22:54 – You typically use tools like Apollo to limit access to Prisma
23:19 – The Apollo server prevents anyone from accessing your database
23:50 – What exactly is the Apollo server? It’s a GraphQL implementation
24:34 – Breaking down the layers of your database, Prisma, Apollo and your client
26:22 – Apollo helps deal with multi-tenancy concerns by letting you isolate users
27:06 – Openship doesn’t store any confidential info in their own database
27:55 – Access control between Shopify and Openship is handled with OAuth
28:34 – CapRover handles setting up a reverse proxy and setting up HTTPS
29:31 – Openship isn’t running in Docker but Prisma provided its own Dockerfile
30:35 – CapRover has a bunch of 1 click installers, one of which is for Sentry
30:54 – CapRover is only being used in production
31:42 – Postmark is being used to send transactional emails (free tier)
32:08 – Junaid pays about $10 to $20 a month for Zapier
32:33 – Zapier helps you glue together APIs from external services
34:28 – CapRover uses Let’s Encrypt under the hood for managing SSL certificates
34:37 – CapRover has a 1 click app on DigitalOcean so it’s easy to install
35:36 – DigitalOcean’s monitoring / alerts aren’t being used at the moment
35:59 – On the horizon Junaid may switch to using now.sh
36:20 – Should you go Serverless or stick with a more traditional app?
37:12 – Dealing with secrets and sensitive values when using CapRover
37:58 – The full break down of how code gets from development to production
38:34 – Running automated tests and the value of CI / CD
40:04 – Get a test suite up and running and then worry about CI
40:52 – TDD vs writing tests after you write your implementation
41:40 – Having a test suite really helps you refactor and improve your code later on
42:06 – The difference between testing locally vs using a continuous integration server
44:27 – The benefits of CI in an open source project for testing pull requests
45:14 – There’s no database backups in place because Shopify is the source of truth
45:58 – No health check services are being used but Junaid is using Openship all the time
46:42 – Uptime Robot’s free tier is very generous and it pings your site every 5min
47:53 – Best tips? Jump into the code, there’s a lot to take in but it’s manageable
48:25 – A bad decision beats indecision because you can fix bad decisions
49:08 – Junaid got this far with 1 year’s worth of experience which is very impressive
52:00 – Check out openship.org and its open source repo on GitHub
Links
📄 References
https://www.shopify.com/
https://en.wikipedia.org/wiki/Drop_shipping
https://www.shipbob.com/
https://www.shipmonk.com/
https://en.wikipedia.org/wiki/HMAC
https://github.com/zeit/micro
https://github.com/lerna/lerna
https://en.wikipedia.org/wiki/Third-party_logistics (3PL)
https://github.com/prisma-labs/graphql-yoga
https://ifttt.com/
https://zeit.co/home
https://www.cypress.io/
https://airbnb.io/enzyme/
⚙️ Tech Stack
koa →
node →
react →
apollo →
prisma →
caprover →
digitalocean →
mysql →
nextjs →
open-source →
postmark →
sentry →
zapier →
🛠 Libraries Used
https://github.com/Shopify/quilt
Support the Show
This episode does not have a sponsor and this podcast is a labor of love. If
you want to support the show, the best way to do it is to purchase one of my
courses or suggest one to a friend.
Dive into Docker is a video course that takes you from not knowing what Docker is
to being able to confidently use Docker and Docker Compose for your own apps.
Long gone are the days of "but it works on my machine!". A bunch of follow
along labs are included.
Build a SAAS App with Flask is a video course where we build a real
world SAAS app that accepts payments, has a custom admin, includes high test
coverage and goes over how to implement and apply 50+ common web app features.
There's over 20+ hours of video.