Book Image

Raspberry Pi 3 Cookbook for Python Programmers - Third Edition

By : Steven Lawrence Fernandes, Tim Cox
Book Image

Raspberry Pi 3 Cookbook for Python Programmers - Third Edition

By: Steven Lawrence Fernandes, Tim Cox

Overview of this book

Raspberry Pi 3 Cookbook for Python Programmers – Third Edition begins by guiding you through setting up Raspberry Pi 3, performing tasks using Python 3.6, and introducing the first steps to interface with electronics. As you work through each chapter, you will build your skills and apply them as you progress. You will learn how to build text classifiers, predict sentiments in words, develop applications using the popular Tkinter library, and create games by controlling graphics on your screen. You will harness the power of a built in graphics processor using Pi3D to generate your own high-quality 3D graphics and environments. You will understand how to connect Raspberry Pi’s hardware pins directly to control electronics, from switching on LEDs and responding to push buttons to driving motors and servos. Get to grips with monitoring sensors to gather real-life data, using it to control other devices, and viewing the results over the internet. You will apply what you have learned by creating your own Pi-Rover or Pi-Hexipod robots. You will also learn about sentiment analysis, face recognition techniques, and building neural network modules for optical character recognition. Finally, you will learn to build movie recommendations system on Raspberry Pi 3.
Table of Contents (23 chapters)
Title Page
Copyright and Credits
Dedication
Packt Upsell
Contributors
Preface
Index

Pre-processing data using tokenization


The pre-processing of data involves converting the existing text into acceptable information for the learning algorithm.

Tokenization is the process of dividing text into a set of meaningful pieces. These pieces are called tokens.

How to do it...

  1. Introduce sentence tokenization:
from nltk.tokenize import sent_tokenize
  1. Form a new text tokenizer:
tokenize_list_sent = sent_tokenize(text)
print "nSentence tokenizer:" 
print tokenize_list_sent 
  1. Form a new word tokenizer:
from nltk.tokenize import word_tokenize 
print "nWord tokenizer:" 
print word_tokenize(text) 
  1. Introduce a new WordPunct tokenizer:
from nltk.tokenize import WordPunctTokenizer 
word_punct_tokenizer = WordPunctTokenizer() 
print "nWord punct tokenizer:" 
print word_punct_tokenizer.tokenize(text) 

The result obtained by the tokenizer is shown here. It divides a sentence into word groups: