Stop abusing A/B testing, toxic experimentation culture, how to run A/B tests with rigor - Che Sharma - The Data Scientist Show #071
Nov 4, 2023
auto_awesome
Che Sharma, former data scientist at Airbnb and founder of Eppo, discusses toxic behaviors in experimentation culture, A/B testing best practices, and A/B testing for ML models on The Data Scientist Show. Topics include statistical power, effect size, monitoring metrics, alternative methods to A/B testing, difference in differences method, and A/B testing in ML and AI.
A/B testing should be the preferred method for comparing different versions of a product or service, while differences and differences should be used when A/B testing is not feasible.
Toxic behaviors in experimentation, such as cutting corners and engaging in statistical theater, compromise the integrity of the experiment process and require the use of the right data, good hypotheses, and proper statistical application to overcome.
Small companies with limited data should avoid running experiments due to low statistical power, as it increases the likelihood of false positives, and selecting metrics that are volatile but not impactful can lead to misleading results.
Deep dives
The Importance of A/B Testing
A/B testing is seen as a more trustworthy method compared to differences and differences, especially when it comes to comparing different versions of a product or service. A/B testing should be the preferred method whenever possible, and differences and differences should be reserved for situations where A/B testing is not feasible.
Toxic Behaviors in Experimentation Culture
Toxic behaviors in experimentation include cutting corners and engaging in statistical theater. Cutting corners compromises the integrity of the experiment process and destroys the experiment culture over time. Removing statistical theater requires using the right data, testing good hypotheses, and applying statistics properly.
Understanding Statistical Power and Experiment Design
Statistical power is the ability of an experiment to detect a true effect. It depends on the variability of the measured data and the sample size. Small companies with limited data points should avoid running experiments, as low statistical power increases the likelihood of false positives. Additionally, choosing metrics that are volatile but not impactful for the feature can lead to misleading results.
Importance of Guardrail Metrics
Guardrail metrics are discussed as an important tool for continuous monitoring during experiments. These metrics act as early warning signals that can proactively notify experimenters if a specific metric they care about is trending downwards. By paying attention to guardrail metrics, experimenters can decide whether to extend the experiment to address concerning effects on those metrics, or modify their approach.
Challenges of Cherry Picking Metrics
The podcast highlights the importance of avoiding cherry picking metrics and focusing on the core metrics that matter. Cherry picking, or selectively choosing metrics that support a preconceived idea, can lead to biased and misleading interpretations. The discussion emphasizes the need for consistency and a standardized report template to ensure transparency and trust in the experiment results. It also highlights the importance of building a strong relationship between data scientists and product managers to push back against cherry picking and interpret results accurately.
Che Sharma came back to discuss toxic behaviors in experimentation culture and provide actionable advice on how to handle those situations, how to have rigor and integrity when designing and analyzing A/B tests.
Che was the 4th data scientist at Airbnb, later he joined Webflow as an early employee. In 2021 he founded Eppo, a next-gen A/B experimentation platform designed for modern data and product teams to run more trustworthy and advanced experiments. We talked about A/B testing best practices, A/B testing for ML models, and Che’s career journey.
Reach out to Che: https://www.linkedin.com/in/chetanvsharma/
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode