Leveraging CHAID Decision Trees in Blackbox Models for Enhanced Explanations

Data Science

Date : 03/18/2024

Data Science

Date : 03/18/2024

Leveraging CHAID Decision Trees in Blackbox Models for Enhanced Explanations

Discover how CHAID Decision Trees enhance black box model explanations, offering intuitive and scalable solutions for data-driven decision-making.

Vishal Sachan Singh

AUTHOR - FOLLOW
Vishal Sachan Singh
Associate Manager, Data Science

Leveraging CHAID Decision Trees in Blackbox Models for Enhanced Explanations
Like the blog

Table of contents

Leveraging CHAID Decision Trees in Blackbox Models for Enhanced Explanations

Table of contents

Leveraging CHAID Decision Trees in Blackbox Models for Enhanced Explanations

Leveraging CHAID Decision Trees in Blackbox Models for Enhanced Explanations

Quick Intro Decision Trees

Decision trees are popular algorithms used for both classification and regression tasks.

The idea of the decision tree is to split the dataset into smaller subsets based on the feature until reaching a node where all data points share a single label. It consists of the following types of Nodes:

  1. Root Node- This represents the starting point with the entire dataset; the root node acts as the parent node to all subsequent nodes within the tree.
  2. Terminal Nodes- These represent the bottommost nodes, also known as leaf nodes.
  3. Splitting Criterion- This is the algorithm/logic used to select the best feature to split the data at each node. E.g., Gini, Entropy.
  4. Decision Nodes- These nodes are split into multiple sub-nodes based on some splitting criterion, also called internal nodes.

CHAID Trees

CHAID trees were proposed by Gordon Kass in around 1980. As the name suggests, the basic criterion for recursively splitting the independent features is based on Pearson Chi-square statistics.

CHAID Trees work like decision trees, except the splitting criterion is chi-square instead of entropy or information gain. Additionally, it differs in the way it performs a multiway split for splitting the features into multiple decision and leaf nodes.

Algorithm-

  • The outcome variable can be continuous or categorical, but the predictor variables are categorical only, creating non-binary trees. That is, they can have more than two splits. A workaround for continuous predictor variables is to bin them before feeding the features into the algorithm.
  • The CHAID algorithm involves testing the dependence of two variables at each step. If the dependent variable is categorical, a chi-square test is used to determine the best next split at each node. If the variable is continuous, an F-test is used to determine the next best split.
  • CHAID works recursively by dividing the dataset into subsets according to the categorical predictor variable that exhibits the most significant correlation with the response variable. CHAID computes the chi-squared test of independence between the response variable and each categorical predictor at every iteration. The variable showing the strongest association is selected as the split variable, and the data is partitioned into subsets based on its categories. This iterative process continues until the predefined stopping criteria are satisfied.

Why CHAID Trees?

The CHAID algorithm finds extensive application in fields like market research and social sciences, where the analysis of interactions is crucial. Its ability to handle categorical predictors and identify significant interactions makes it a valuable tool in these domains.

  • CHAID uses multiway splits by default (multiway splits mean that the current node is split into more than two nodes). On the other hand, the different decision tree does binary splits (each node is split into two child nodes) by default.
  • Trees built using CHAID algorithms generally prevent overfitting. They ensure that a node is split only if it meets specific significance criteria, which keeps the model strong and reliable.

Implementation

  • Let's read the Titanic dataset from Seaborn, and to speed up the training, we'll keep only categorical features to build the tree quickly.

  • We will be using the plot_tree method from sklearn to analyze how our decision tree is built. 

  • Furthermore, we notice that the tree's depth is 5+, and features are repeated across splits in sub-branches to further divide the data based on different values of a categorical feature (such as the "embark_town" variable in our case). For instance, the "embark_town" variable is split multiple times, both at level 1 and level 2 of the tree. Since it's a binary tree, the depth (complexity) increases as the data's cardinality increases

In certain instances, this may also lead our decision tree to overfit the data, making it challenging to understand which features are more prominent and significant according to business and marketing needs.

To better visualize the importance of features and overcome the above problems, the CHAID tree comes to our rescue. Let's look at how we can construct the tree,

We have used  “to_graphviz()“ method to convert the tree in diagraph format.

Text Representation of CHAID tree

Graphical Representation

Figure :-Diagraph with multiway splits on the root node

It's evident from the CHAID tree's construction and the differences with the decision tree built from the same data that the CHAID tree tends to be simpler with multiway splits, making it less susceptible to overfitting. This characteristic becomes particularly apparent when a node has more than two child nodes. Additionally, when a split is insignificant (with p-values higher than the alpha value), the tree stops splitting at that node.
Furthermore, the features positioned at the top of the CHAID tree hold higher importance and can significantly help the decision making process. 

Pros

  • CHAID tree produces wider trees with multiple branches, in contrast, other decision trees are binary, and they can show only a couple of outcomes.
  • CHAID produces easily interpretable results, offers insights into the decision-making process of the algorithms, helps understand the predicting power of the features, and thus can be used in market segmentation, brand tracking etc.

Cons

  • It may not always yield satisfactory results; other algorithms may outperform.

Conclusion

The CHAID tree offers a straightforward tool for analyzing features and making significant predictions on predictors, enhancing our understanding of the tree construction process.

However, careful consideration should be taken when selecting the algorithm for a specific use case based on the nuances of the dataset, as it may occasionally underperform in specific scenarios. This emphasizes the importance of thoughtful evaluation and comparison with alternative methods.

Vishal Sachan Singh

AUTHOR - FOLLOW
Vishal Sachan Singh
Associate Manager, Data Science

Topic Tags



Next Topic

Decoding Time Series Analysis: A Comprehensive Guide



Next Topic

Decoding Time Series Analysis: A Comprehensive Guide


Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.

×
Thank you for a like!

Stay informed and up-to-date with the most recent trends in data science and AI.

Share this article
×

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.