In data analysis, it's crucial to recognize that correlation does not necessarily imply causation. It's a classic logical fallacy to assume that because two variables are correlated, one must cause the other. However, there could be other factors at play that are causing both variables to change, or it could be purely coincidental. Additional evidence, such as experimental data or a strong theoretical rationale, is essential to ascertain causation.
Moving beyond correlation involves delving deeper into the underlying mechanisms and relationships that govern phenomena, seeking to understand the true drivers of observed patterns. By embracing this approach, analysts can uncover valuable insights that transcend surface-level correlations, enabling more informed decision-making and the development of robust predictive models.
Figure 1: The xkcd comic humorously explores the classic confusion between causation and correlation, emphasizing the significance of background knowledge in making causal inferences. https://xkcd.com/552/
Introduction
In this blog, we explore "correlation does not equal causation," delve into causal inference basics, discuss discovering causal relationships, and showcase AI research benefits from embracing causality. As causal inference gains momentum in the industry, we witness major players recognizing its pivotal role and investing in developing causal data science skills among their personnel.
From a marketing perspective - Picture this: a company launches a groundbreaking advertising campaign and witnesses a surge in sales. It's tempting to attribute this spike directly to the campaign's brilliance. But hold on a moment. Are we jumping to conclusions too quickly? Without the precision of causal inference, we're left in the dark, unable to discern whether the campaign truly ignited the sales boost or if other clandestine forces lurk behind the scenes – perhaps seasonal shifts, rival maneuvers, or even evolving consumer tastes. Enter causal inference techniques, the beacon guiding analysts through this labyrinth of uncertainty, unraveling the intricate web of influences and delivering crystal-clear insights into the campaign's genuine impact on sales.
Figure 2: Follow the Customer Journey
Causal Inference
Classical machine learning aims to minimize prediction error, striving for accurate models. This objective, easily understood by all, has propelled ML research through competitions across various domains. Unlike prediction tasks, causal inference lacks a clear, objective evaluation criterion due to its deeper focus on understanding underlying causal relationships.
Causal inference poses greater challenges than optimizing a loss function, as context-specific domain knowledge becomes pivotal. Benchmarking model predictions against actual experiments, although rare in practice, offers retrospective insights into performance. Yet, lacking a straightforward criterion for accuracy evaluation, assessing estimates proves intricate. Quality causal inferences hinge on untestable assumptions inherent in the available data, prompting a fundamental reevaluation of data science and ML problem-solving approaches.
Indeed, a fundamental theoretical framework shedding light on the challenges of causal data science is the Pearl causal hierarchy (PCH), also known as the ladder of causation. This hierarchy categorizes data analysis into three distinct layers of an information hierarchy.
At the lowest rung are associations, which refer to simple conditional probability statements between variables in the data. They remain purely correlational ("How does X relate to Y?") and, therefore, do not have any causal meaning. The second rung relates to interventions ("What happens to Y if I manipulate X?"), and we have already entered the world of causality here. On the third layer, we finally have counterfactuals ("What would Y be if X had been x?"), which represent the highest form of causal reasoning.
Figure 3: The Causal Hierarchy. Questions at level I can only be answered if information from level I or higher is available. Table from (Pearl, 2018).
The theoretical demands of causal inference outlined by the PCH necessitate a paradigm shift in data science, bringing forth significant organizational hurdles. Engaging domain experts, including clients, engineers, and sales partners, becomes imperative to validate assumptions and ensure the accuracy of problem modeling. This collaborative approach fosters a more comprehensive perspective on data science and prompts the restructuring of team dynamics, heralding a more integrated and holistic approach to tackling complex challenges.
Causal Discovery
In scientific inquiry, uncovering causal relationships stands as a foundational pursuit. One commonly employed method is randomized A/B experiments. Consider the scenario of assessing the efficacy of a novel cancer treatment: researchers recruit participants and randomly assign them to either a control group, receiving a placebo, or a treatment group, receiving the experimental drug. The purpose of randomization is to mitigate potential confounding factors. For instance, age could serve as a confounder, impacting both the decision to take the drug and its treatment effects. Therefore, maintaining comparable age distributions across both groups is essential in practical experiments.
Nevertheless, randomized experiments can prove prohibitively expensive and logistically challenging to execute, occasionally raising ethical concerns. Consequently, causal discovery from observational data, a field gaining traction across machine learning, philosophy, statistics, and computer science, has emerged as a compelling alternative. This approach involves inferring causal relationships directly from observational data, circumventing the need for costly and potentially ethically fraught randomized experiments.
Figure 4: Causal Discovery Based on Observational Data and Process Knowledge in Industrial Processes
Quasi Experiments
Quasi-experiments offer a practical alternative to randomized experiments by leveraging naturally occurring processes to divide treatment and control groups. While lacking true randomness, these methods are deemed sufficiently close to yield reliable estimates. Various techniques are employed to assess the proximity between control and test groups, albeit relying on several underlying assumptions.
Before delving into methods, understanding the pitfalls of quasi-experiments is crucial. The main concern is unanticipated confounds impacting both measured effect and causality attribution. Thus, quasi-experiments demand careful execution for trustworthy results.
Interrupted Time Series
Interrupted Time Series (ITS) serves as a quasi-experimental design, allowing control over changes within a system but lacking the ability to randomize treatment for a proper control group. Instead, the same population serves as both control and treatment, with variations introduced over time.
A straightforward approach to conducting ITS involves implementing interventions for specific periods, followed by cessation periods without intervention.
Figure 5: A good read on how Netflix uses ITS
Difference in Differences
To apply this method effectively, you must identify a control group exhibiting a trend parallel to your treatment group concerning the metric of interest before any intervention occurs. Subsequently, following the introduction of treatment, any deviation from this parallel trend is attributed solely to the treatment itself.
Figure 6 : Explaining DiD method of quasi-experiments
Bayesian Structural Time Series
BSTS is a specific approach to solving "structural time series" models. A structural time series is a member of the very broad class of state-space models, which model our time series as observations of a hidden state that evolves over time. Specifically, this model states that our observations (yt) of the real world that make up our time series are determined as a function of a hidden state (αt) that evolves over time according to a Markov process (one in which each state is dependent only on the previous state).
Figure 7: State-space model
The main goal of the algorithm is to infer the expected effect a given intervention (or any action) had on some response variable by analyzing differences between expected and observed time series data. This algorithm boasts flexibility in modeling a wide range of possibilities while providing a Bayesian framework that accommodates our assumptions about the world, known as Bayesian priors.
The Bayesian approach in BSTS offers several advantages, leveraging prior information and posterior inference. Unlike single fixed-point estimates, Bayesian treatment represents our variables as probability distributions over all possible parameters, offering a comprehensive view of the solution space and quantifying uncertainty. If multiple parameters are nearly as likely as the "best" ones, caution is warranted in interpretation. Moreover, BSTS allows for incorporating pre-existing knowledge and uncertainty into estimates through specified prior probabilities.
Apart from raw performance, one significant advantage of our BSTS technique lies in its modular nature. The ability to deconstruct the components influencing our outcomes separately is invaluable. Although this may not directly enhance performance, its utility should not be underestimated. The straightforward capability to add or remove components from our analysis is compelling. While adding multiple seasons in ARIMA poses a significant challenge, incorporating them (and even more) into BSTS is seamlessly achievable at no additional cost.
Implementation
Google's Causal Impact library provides a very straightforward implementation of a Structural Time-Series model that estimates the effect of a 'designed' intervention on a target time-series. This effect is measured by analyzing the differences between the expected and the observed behavior — specifically, the model generates a forecast counterfactual, i.e., the expected observations of how the dependent variable might have evolved after the event had the event not occurred.
Figure 8 : tfcausalImpact Implementation
Originally developed as an R package, Causal Impact works by fitting a BSTS model to a set of target and control time series observations and subsequently performs posterior inference on the counterfactual. For Python users, the tfcausalimpact package provides a powerful tool. It fits a Bayesian structural model to historical data and assesses forecast accuracy against actual responses. Let's explore some practical examples to demonstrate its usage effectively.
Enhance your decision-making with causal impact analysis! Uncover actionable insights through advanced data science services. Reach out today!
AUTHOR - FOLLOW
Shreyas Kale
Associate Manager, Data Science
Topic Tags
Next Topic
Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning
Next Topic