Book Image

Getting Started with Python for the Internet of Things

By : Tim Cox, Steven Lawrence Fernandes, Sai Yamanoor, Srihari Yamanoor, Prof. Diwakar Vaish
Book Image

Getting Started with Python for the Internet of Things

By: Tim Cox, Steven Lawrence Fernandes, Sai Yamanoor, Srihari Yamanoor, Prof. Diwakar Vaish

Overview of this book

This Learning Path takes you on a journey in the world of robotics and teaches you all that you can achieve with Raspberry Pi and Python. It teaches you to harness the power of Python with the Raspberry Pi 3 and the Raspberry Pi zero to build superlative automation systems that can transform your business. You will learn to create text classifiers, predict sentiment in words, and develop applications with the Tkinter library. Things will get more interesting when you build a human face detection and recognition system and a home automation system in Python, where different appliances are controlled using the Raspberry Pi. With such diverse robotics projects, you'll grasp the basics of robotics and its functions, and understand the integration of robotics with the IoT environment. By the end of this Learning Path, you will have covered everything from configuring a robotic controller, to creating a self-driven robotic vehicle using Python. • Raspberry Pi 3 Cookbook for Python Programmers - Third Edition by Tim Cox, Dr. Steven Lawrence Fernandes • Python Programming with Raspberry Pi by Sai Yamanoor, Srihari Yamanoor • Python Robotics Projects by Prof. Diwakar Vaish
Table of Contents (37 chapters)
Title Page
Copyright and Credits
About Packt
Contributors
Preface
Index

Pre-processing data using tokenization


The pre-processing of data involves converting the existing text into acceptable information for the learning algorithm.

Tokenization is the process of dividing text into a set of meaningful pieces. These pieces are called tokens.

How to do it...

  1. Introduce sentence tokenization:
from nltk.tokenize import sent_tokenize
  1. Form a new text tokenizer:
tokenize_list_sent = sent_tokenize(text)
print "nSentence tokenizer:" 
print tokenize_list_sent 
  1. Form a new word tokenizer:
from nltk.tokenize import word_tokenize 
print "nWord tokenizer:" 
print word_tokenize(text) 
  1. Introduce a new WordPunct tokenizer:
from nltk.tokenize import WordPunctTokenizer 
word_punct_tokenizer = WordPunctTokenizer() 
print "nWord punct tokenizer:" 
print word_punct_tokenizer.tokenize(text) 

The result obtained by the tokenizer is shown here. It divides a sentence into word groups: