Book Image

Python Artificial Intelligence Projects for Beginners

By : Dr. Joshua Eckroth
Book Image

Python Artificial Intelligence Projects for Beginners

By: Dr. Joshua Eckroth

Overview of this book

Artificial Intelligence (AI) is the newest technology that’s being employed among varied businesses, industries, and sectors. Python Artificial Intelligence Projects for Beginners demonstrates AI projects in Python, covering modern techniques that make up the world of Artificial Intelligence. This book begins with helping you to build your first prediction model using the popular Python library, scikit-learn. You will understand how to build a classifier using an effective machine learning technique, random forest, and decision trees. With exciting projects on predicting bird species, analyzing student performance data, song genre identification, and spam detection, you will learn the fundamentals and various algorithms and techniques that foster the development of these smart applications. In the concluding chapters, you will also understand deep learning and neural network mechanisms through these projects with the help of the Keras library. By the end of this book, you will be confident in building your own AI projects with Python and be ready to take on more advanced projects as you progress
Table of Contents (11 chapters)

Detecting YouTube comment spam


In this section, we're going to look at a technique for detecting YouTube comment spam using bags of words and random forests. The dataset is pretty straightforward. We'll use a dataset that has about 2,000 comments from popular YouTube videos (https://archive.ics.uci.edu/ml/datasets/YouTube+Spam+Collection). The dataset is formatted in a way where each row has a comment followed by a value marked as 1 or 0 for spam or not spam.

First, we will import a single dataset. This dataset is actually split into four different files. Our set of comments comes from the PSY-Gangnam Style video:

Then we will print a few comments as follows:

Here we are able to see that there are more than two columns, but we will only require the content and the class columns. The content column contains the comments and the class column contains the values 1 or 0 for spam or not spam. For example, notice that the first two comments are marked as not spam, but then the comment subscribe to...