Book Image

Artificial Intelligence with Python - Second Edition

By : Prateek Joshi
Book Image

Artificial Intelligence with Python - Second Edition

By: Prateek Joshi

Overview of this book

Artificial Intelligence with Python, Second Edition is an updated and expanded version of the bestselling guide to artificial intelligence using the latest version of Python 3.x. Not only does it provide you an introduction to artificial intelligence, this new edition goes further by giving you the tools you need to explore the amazing world of intelligent apps and create your own applications. This edition also includes seven new chapters on more advanced concepts of Artificial Intelligence, including fundamental use cases of AI; machine learning data pipelines; feature selection and feature engineering; AI on the cloud; the basics of chatbots; RNNs and DL models; and AI and Big Data. Finally, this new edition explores various real-world scenarios and teaches you how to apply relevant AI algorithms to a wide swath of problems, starting with the most basic AI concepts and progressively building from there to solve more difficult challenges so that by the end, you will have gained a solid understanding of, and when best to use, these many artificial intelligence techniques.
Table of Contents (26 chapters)
24
Other Books You May Enjoy
25
Index

Tokenizing text data

When we deal with text, we need to break it down into smaller pieces for analysis. To do this, tokenization can be applied. Tokenization is the process of dividing text into a set of pieces, such as words or sentences. These pieces are called tokens. Depending on what we want to do, we can define our own methods to divide the text into many tokens. Let's look at how to tokenize the input text using NLTK.

Create a new Python file and import the following packages:

from nltk.tokenize import sent_tokenize, \
        word_tokenize, WordPunctTokenizer

Define the input text that will be used for tokenization:

# Define input text
input_text = "Do you know how tokenization works? It's actually \ 
   quite interesting! Let's analyze a couple of sentences and \
   figure it out."

Divide the input text into sentence tokens:

# Sentence tokenizer 
print("\nSentence tokenizer:")
print(sent_tokenize(input_text...