From the course: Machine Learning and AI: Advanced Decision Trees with SPSS

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Understanding information gain

Understanding information gain

- [Instructor] In order to build the tree, C5 uses a criterion called information gain ratio. We've seen many times that knowing how an algorithm works under the hood can help you use it effectively. However, it's also possible to get lost in all the formulas, so let's hit some of the high points. First, information gain ratio as the name implies is strongly in the machine learning camp. So unlike quest or Chade, there's going to be no statistical tests and no P values. Also, information gain ratio will be similar in many ways to the genie coefficient that is used by Cart, which is also firmly in the machine learning camp. However, in contrast to Cart, C5 does allow splits that are not binary in nature, so there are some differences. But let's talk a little bit more about how information gain works. By the way, C5 does not allow interactive trees in modeler. And the reason is that until recently the C5 algorithm has been kept somewhat secret and proprietary. So what I've done instead…

Contents