AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The speaker highlights the significance of conducting experiments to test any code change or feature introduction. Even small bug fixes or changes can have unexpected impacts. Experimentation is emphasized as a crucial practice, and high-risk, high-reward ideas should be pursued. The speaker advises being prepared for failure and emphasizes the value of testing big ideas, despite the likelihood of a high failure rate.
Controlled experiments are essential in driving product decisions. The podcast guest, recognized as an expert in AB testing, shares advice on implementing experiments at a company. The focus is on building an experiment-driven culture, changing a company's approach, recognizing signs of experiment invalidity, and prioritizing trust as a critical element of successful experimentation. The guest also explains p-values and shares insights on Airbnb's experiments.
The podcast highlights the impact of seemingly small changes and shares surprising AB test results. The guest mentions a case where shifting the position of one line in search results increased revenue by 12%. The importance of incremental improvements is emphasized, citing examples from Bing and Airbnb's search relevance teams. The guest also reveals that most experiments fail to improve metrics, stressing the need for patience and perseverance in testing multiple ideas.
The podcast delves into the misconception that experimentation hinders innovation and discourages risk-taking. The guest advocates for a mindset of testing everything, emphasizing that small changes can lead to breakthroughs. While acknowledging the high percentage of experiment failures, the guest suggests allocating resources for high-risk, high-reward ideas. It is emphasized that long-term growth should be paramount, and the importance of defining the overall evaluation criterion (OEC) is discussed as a guiding principle for experimentation.
One of the most common issues in running experiments is a sample ratio mismatch, where the allocation of users to control and treatment groups deviates from the desired 50-50 split. This red flag indicates that something is wrong with the experiment and the results cannot be trusted. By diagnosing and addressing the causes of sample ratio mismatches, such as bot activity or issues in the data pipeline, experimenters can ensure the validity of their results and make accurate data-driven decisions.
Twyman's Law states that if a result looks too good to be true, it usually is. Experimenters must be cautious when interpreting statistically significant results. A p-value of 0.05, commonly used as a threshold for significance, does not indicate a 95% probability that the treatment is better than the control. The correct interpretation of p-values requires understanding the conditional nature and the need for prior probability information. Additionally, considering the false positive risk is crucial, as it can be significantly higher than commonly assumed. Replication and combining experiments can help mitigate the risk of false positives.
Brought to you by Mixpanel—Event analytics that everyone can trust, use, and afford | Round—The private network built by tech leaders for tech leaders | Eppo—Run reliable, impactful experiments
—
Ronny Kohavi, PhD, is a consultant, teacher, and leading expert on the art and science of A/B testing. Previously, Ronny was Vice President and Technical Fellow at Airbnb, Technical Fellow and corporate VP at Microsoft (where he led the Experimentation Platform team), and Director of Data Mining and Personalization at Amazon. He was also honored with a lifetime achievement award by the Experimentation Culture Awards in September 2020 and teaches a popular course on experimentation on Maven. In today’s podcast, we discuss:
• How to foster a culture of experimentation
• How to avoid common pitfalls and misconceptions when running experiments
• His most surprising experiment results
• The critical role of trust in running successful experiments
• When not to A/B test something
• Best practices for helping your tests run faster
• The future of experimentation
—
Enroll in Ronny’s Maven class: Accelerating Innovation with A/B Testing at https://bit.ly/ABClassLenny. Promo code “LENNYAB” will give $500 off the class for the first 10 people to use it.
—
Find the full transcript at: https://www.lennysnewsletter.com/p/the-ultimate-guide-to-ab-testing
—
Where to find Ronny Kohavi:
• Twitter: https://twitter.com/ronnyk
• LinkedIn: https://www.linkedin.com/in/ronnyk/
• Website: http://ai.stanford.edu/~ronnyk/
—
Where to find Lenny:
• Newsletter: https://www.lennysnewsletter.com
• Twitter: https://twitter.com/lennysan
• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/
—
In this episode, we cover:
(00:00) Ronny’s background
(04:29) How one A/B test helped Bing increase revenue by 12%
(09:00) What data says about opening new tabs
(10:34) Small effort, huge gains vs. incremental improvements
(13:16) Typical fail rates
(15:28) UI resources
(16:53) Institutional learning and the importance of documentation and sharing results
(20:44) Testing incrementally and acting on high-risk, high-reward ideas
(22:38) A failed experiment at Bing on integration with social apps
(24:47) When not to A/B test something
(27:59) Overall evaluation criterion (OEC)
(32:41) Long-term experimentation vs. models
(36:29) The problem with redesigns
(39:31) How Ronny implemented testing at Microsoft
(42:54) The stats on redesigns
(45:38) Testing at Airbnb
(48:06) Covid’s impact and why testing is more important during times of upheaval
(50:06) Ronny’s book, Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
(51:45) The importance of trust
(55:25) Sample ratio mismatch and other signs your experiment is flawed
(1:00:44) Twyman’s law
(1:02:14) P-value
(1:06:27) Getting started running experiments
(1:07:43) How to shift the culture in an org to push for more testing
(1:10:18) Building platforms
(1:12:25) How to improve speed when running experiments
(1:14:09) Lightning round
—
Referenced:
• Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing: https://experimentguide.com/
• Seven rules of thumb for website experimenters: https://exp-platform.com/rules-of-thumb/
• GoodUI: https://goodui.org
• Defaults for A/B testing: http://bit.ly/CH2022Kohavi
• Ronny’s LinkedIn post about A/B testing for startups: https://www.linkedin.com/posts/ronnyk_abtesting-experimentguide-statisticalpower-activity-6982142843297423360-Bc2U
• Sanchan Saxena on Lenny’s Podcast: https://www.lennyspodcast.com/sanchan-saxena-vp-of-product-at-coinbase-on-the-inside-story-of-how-airbnb-made-it-through-covid-what-he8217s-learned-from-brian-chesky-brian-armstrong-and-kevin-systrom-much-more/
• Optimizely: https://www.optimizely.com/
• Optimizely was statistically naive: https://analythical.com/blog/optimizely-got-me-fired
• SRM: https://www.linkedin.com/posts/ronnyk_seat-belt-wikipedia-activity-6917959519310401536-jV97
• SRM checker: http://bit.ly/srmCheck
• Twyman’s law: http://bit.ly/twymanLaw
• “What’s a p-value” question: http://bit.ly/ABTestingIntuitionBusters
• Fisher’s method: https://en.wikipedia.org/wiki/Fisher%27s_method
• Evolving experimentation: https://exp-platform.com/Documents/2017-05%20ICSE2017_EvolutionOfExP.pdf
• CUPED for variance reduction/increased sensitivity: http://bit.ly/expCUPED
• Ronny’s recommended books: https://bit.ly/BestBooksRonnyk
• Chernobyl on HBO: https://www.hbo.com/chernobyl
• Blink cameras: https://blinkforhome.com/
• Narrative not PowerPoint: https://exp-platform.com/narrative-not-powerpoint/
—
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.
—
Lenny may be an investor in the companies discussed.
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode