Let's now use what we learned to create a model that can estimate the share counts for a given piece of content. We'll use the features that we have already created, as well as a few additional features.
Ideally, we would have a much larger sample of content, especially content that had more typical share counts. Despite this, we'll make do with what we have here.
We're going to use an algorithm called random forest regression. In prior chapters, we looked at a more typical implementation of random forests, which is based upon classification. Here, we're going to use a regression and attempt to predict the share counts. We could bucket our share classes into ranges, but it is preferable to use regression when dealing with continuous variables.
To begin, we'll create a bare-bones model. We'll use the number of images, the site, and the word count. We'll train our model on the number of Facebook likes.
We'll first import the sci-kit learn library, then...