From the course: Machine Learning and AI: Advanced Decision Trees with SPSS

Overview

- [Instructor] Let's overview the QUEST algorithm. Now remember, we're focused on the algorithm, not any particular software implementation. QUEST is an acronym that stands for Quick, Unbiased, and Efficient Statistical Tree. So what were the co-authors thinking when they came up with this acronym? Well, a little bit of history is helpful. CHAID, Chi-square Automatic Interaction Detection, came out in 1980. CART, Classification And Regression Trees, came out in 1984. So those algorithms were well known when QUEST came out in 1997. So what were they trying to improve upon? Well, a perceived weakness of CART was that it was slow. The reason was, is that, CART examines all possible split points. And as we'll see, QUEST doesn't do it that way. Also, CHAID was perceived to be biased towards branches with a large number of child nodes. So what this means is that CHAID often would gravitate towards categorical variables with lots of categories or grow trees that were somewhat wider than other techniques. So how does QUEST do it? QUEST uses statistical tests instead of a brute force search for all possible cut points. So it examines fewer cut points, but it does so by performing calculations that try to zero in on what that optimal cut point would be. It also uses different tests appropriate to different variable types. So, it uses Chi-square on categorical variables but it uses F-test on scale variables, as we'll see. Once those tests are performed, it can simply rank all of the variables in the data set by their p-values. Finally, QUEST uses surrogates for missing data, just like CART.

Contents