Quick Intro Decision Trees
Decision trees are popular algorithms used for both classification and regression tasks.
The idea of the decision tree is to split the dataset into smaller subsets based on the feature until reaching a node where all data points share a single label. It consists of the following types of Nodes:
- Root Node- This represents the starting point with the entire dataset; the root node acts as the parent node to all subsequent nodes within the tree.
- Terminal Nodes- These represent the bottommost nodes, also known as leaf nodes.
- Splitting Criterion- This is the algorithm/logic used to select the best feature to split the data at each node. E.g., Gini, Entropy.
- Decision Nodes- These nodes are split into multiple sub-nodes based on some splitting criterion, also called internal nodes.
CHAID Trees
CHAID trees were proposed by Gordon Kass in around 1980. As the name suggests, the basic criterion for recursively splitting the independent features is based on Pearson Chi-square statistics.
CHAID Trees work like decision trees, except the splitting criterion is chi-square instead of entropy or information gain. Additionally, it differs in the way it performs a multiway split for splitting the features into multiple decision and leaf nodes.
Algorithm-
- The outcome variable can be continuous or categorical, but the predictor variables are categorical only, creating non-binary trees. That is, they can have more than two splits. A workaround for continuous predictor variables is to bin them before feeding the features into the algorithm.
- The CHAID algorithm involves testing the dependence of two variables at each step. If the dependent variable is categorical, a chi-square test is used to determine the best next split at each node. If the variable is continuous, an F-test is used to determine the next best split.
- CHAID works recursively by dividing the dataset into subsets according to the categorical predictor variable that exhibits the most significant correlation with the response variable. CHAID computes the chi-squared test of independence between the response variable and each categorical predictor at every iteration. The variable showing the strongest association is selected as the split variable, and the data is partitioned into subsets based on its categories. This iterative process continues until the predefined stopping criteria are satisfied.
Why CHAID Trees?
The CHAID algorithm finds extensive application in fields like market research and social sciences, where the analysis of interactions is crucial. Its ability to handle categorical predictors and identify significant interactions makes it a valuable tool in these domains.
- CHAID uses multiway splits by default (multiway splits mean that the current node is split into more than two nodes). On the other hand, the different decision tree does binary splits (each node is split into two child nodes) by default.
- Trees built using CHAID algorithms generally prevent overfitting. They ensure that a node is split only if it meets specific significance criteria, which keeps the model strong and reliable.
Implementation
- Let's read the Titanic dataset from Seaborn, and to speed up the training, we'll keep only categorical features to build the tree quickly.
- We will be using the plot_tree method from sklearn to analyze how our decision tree is built.
- Furthermore, we notice that the tree's depth is 5+, and features are repeated across splits in sub-branches to further divide the data based on different values of a categorical feature (such as the "embark_town" variable in our case). For instance, the "embark_town" variable is split multiple times, both at level 1 and level 2 of the tree. Since it's a binary tree, the depth (complexity) increases as the data's cardinality increases
In certain instances, this may also lead our decision tree to overfit the data, making it challenging to understand which features are more prominent and significant according to business and marketing needs.
To better visualize the importance of features and overcome the above problems, the CHAID tree comes to our rescue. Let's look at how we can construct the tree,
We have used “to_graphviz()“ method to convert the tree in diagraph format.
Text Representation of CHAID tree
Graphical Representation
Figure :-Diagraph with multiway splits on the root node
It's evident from the CHAID tree's construction and the differences with the decision tree built from the same data that the CHAID tree tends to be simpler with multiway splits, making it less susceptible to overfitting. This characteristic becomes particularly apparent when a node has more than two child nodes. Additionally, when a split is insignificant (with p-values higher than the alpha value), the tree stops splitting at that node.
Furthermore, the features positioned at the top of the CHAID tree hold higher importance and can significantly help the decision making process.
Pros
- CHAID tree produces wider trees with multiple branches, in contrast, other decision trees are binary, and they can show only a couple of outcomes.
- CHAID produces easily interpretable results, offers insights into the decision-making process of the algorithms, helps understand the predicting power of the features, and thus can be used in market segmentation, brand tracking etc.
Cons
- It may not always yield satisfactory results; other algorithms may outperform.
Conclusion
The CHAID tree offers a straightforward tool for analyzing features and making significant predictions on predictors, enhancing our understanding of the tree construction process.
However, careful consideration should be taken when selecting the algorithm for a specific use case based on the nuances of the dataset, as it may occasionally underperform in specific scenarios. This emphasizes the importance of thoughtful evaluation and comparison with alternative methods.
AUTHOR - FOLLOW
Vishal Sachan Singh
Associate Manager, Data Science
Topic Tags