Causal inference is a new language within machine learning used to help teams better understand causes and impacts so they can make better decisions. If this sounds like a foreign language to you, you’re not alone. Causal inference is only now starting to move outside the world of academics and research scientists — becoming a more relevant asset for businesses.
By Adam Kinney
Throughout my time at Google, Twitter, and Schibsted, every team incorporated machine learning differently. At Mixpanel, we’re constantly trying to unlock deeper insights for our customers through user behavior data which typically offers a view, but not always the full story, depending on how much data customers are working from. We’re exploring how various elements of machine learning can help break through confusing, or even conflicting, observational data and give insights that drive businesses forward.
In my experience, I see two key scenarios where companies can make use of causal inference — the planning side and the impact assessment side.
Learning from existing data
Most companies have high-level objectives like growing the user base, reducing customer churn, or increasing conversions. But it’s difficult to know what to change in your product or marketing to achieve that. Typically, companies will try out different approaches and see what works best. This is expensive and time-consuming. Each experiment involves development time and/or marketing spend. It would be much better to learn from the data you already collected to see which areas of your product or marketing will likely get you to your goals. Causal inference does exactly this. It helps you zero in on the most important areas so you focus your efforts in the right place.
After you’ve built a new feature, you need to know if it actually got you closer to your goal. This isn’t always easy. High-level goals tend to be very difficult to move in dramatic ways. You’re more likely to succeed with small, incremental increases over time. But each of those small incremental increases can get lost in the noise of day-to-day fluctuations in your KPIs. Causal inference helps you understand if the newly launched feature causes users to behave in a way that will get you closer to your goal. For example, you can see if that new email digest causes users to churn less. Of course, you can get answers like this from A/B tests as well, but A/B tests themselves take time and engineering work to run for many product features.
Let’s say you run a digital media site that has a subscription paywall. You want to reduce the number of subscribers that cancel their subscription. You start by comparing users who cancel with those who don’t. You find that users who read at least three stories a day cancel less often than users who read less than three stories. Does that mean that if you get more users to read at least three stories per day cancellations will decrease? Not necessarily. It could be the case, for example, that the users who are most passionate about the media site like to read a lot of stories and are less likely to cancel than other users. In that case, reading more stories doesn’t cause users to be less likely to cancel, even though reading more stories is correlated with being less likely to cancel.
Let’s say you also find that users who sign up for the media site’s email newsletter cancel less often than those who don’t. It is possible that there’s no causal relationship there, and it’s once again just the super-users who are both signing up for the newsletter and keeping their subscriptions. On the other hand, maybe the newsletter is a daily reminder for users of the value they get from their subscription, and it actually causes users to be less likely to cancel. In that case, getting more users to sign up for the newsletter really would work.
Causal inference is a technique that can uncover whether the causal explanation is true in scenarios like these. It generally works by controlling for confounding factors. In the first example of users reading three stories a day, you can control for the super-user confounder by predicting which users are most likely and least likely to read three stories per day (we call this prediction the user’s propensity to read three stories per day). And then seeing if there is a difference in cancellations between those that really do read three stories per day and those who don’t, within each propensity group.
As causal inference becomes more and more popular for businesses, it’s important to tap into tools that are already using it in order to help your teams innovate and grow more quickly.
Adam Kinney is Head of Machine Learning and Automated Insights at Mixpanel.