From the course: Machine Learning and AI: Advanced Decision Trees with SPSS

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Random forests

Random forests

- [Instructor] Before XGBoost became the hot algorithm on Kaggle, Random Forest was doing very well, and continues to be extremely popular. So what is Random Forests all about? Well, essentially, under the hood, it's really just CART, but combined with bagging. Let's take a look in Modeler. I prepared a stream called Random Trees stream. The Random Trees implementation of Random Forests in Modeler is interesting, in that this algorithm potentially works very well on distributed systems, and it's been designed in Modeler to do so. Imagine the following. Let's say that because you have big data, you're building your model on multiple machines. Well, you can build 10 trees on each of 10 machines, and then once they're built, combine those 100 trees together. And it can be highly scalable, even though you're doing a lot of computation. I've already prepared a CART model which I ran on defaults, so now I'm going to go ahead and run Random Forests, which will produce 100 trees. I'll hook…

Contents