Book Image

Artificial Intelligence with Python

Book Image

Artificial Intelligence with Python

Overview of this book

Artificial Intelligence is becoming increasingly relevant in the modern world. By harnessing the power of algorithms, you can create apps which intelligently interact with the world around you, building intelligent recommender systems, automatic speech recognition systems and more. Starting with AI basics you'll move on to learn how to develop building blocks using data mining techniques. Discover how to make informed decisions about which algorithms to use, and how to apply them to real-world scenarios. This practical book covers a range of topics including predictive analytics and deep learning.
Table of Contents (23 chapters)
Artificial Intelligence with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Tokenizing text data


When we deal with text, we need to break it down into smaller pieces for analysis. This is where tokenization comes into the picture. It is the process of dividing the input text into a set of pieces like words or sentences. These pieces are called tokens. Depending on what we want to do, we can define our own methods to divide the text into many tokens. Let's take a look at how to tokenize the input text using NLTK.

Create a new Python file and import the following packages:

from nltk.tokenize import sent_tokenize, \ 
        word_tokenize, WordPunctTokenizer 

Define some input text that will be used for tokenization:

# Define input text 
input_text = "Do you know how tokenization works? It's actually quite interesting! Let's analyze a couple of sentences and figure it out."  

Divide the input text into sentence tokens:

# Sentence tokenizer  
print("\nSentence tokenizer:") 
print(sent_tokenize(input_text)) 

Divide the input text into word...