Book Image

NLTK Essentials

By : Nitin Hardeniya
Book Image

NLTK Essentials

By: Nitin Hardeniya

Overview of this book

<p>Natural Language Processing (NLP) is the field of artificial intelligence and computational linguistics that deals with the interactions between computers and human languages. With the instances of human-computer interaction increasing, it’s becoming imperative for computers to comprehend all major natural languages. Natural Language Toolkit (NLTK) is one such powerful and robust tool.</p> <p>You start with an introduction to get the gist of how to build systems around NLP. We then move on to explore data science-related tasks, following which you will learn how to create a customized tokenizer and parser from scratch. Throughout, we delve into the essential concepts of NLP while gaining practical insights into various open source tools and libraries available in Python for NLP. You will then learn how to analyze social media sites to discover trending topics and perform sentiment analysis. Finally, you will see tools which will help you deal with large scale text.</p> <p>By the end of this book, you will be confident about NLP and data science concepts and know how to apply them in your day-to-day work.</p>
Table of Contents (17 chapters)
NLTK Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Text classification


The simplest definition of text classification is that it is a classification of text based on the content of that text. Now, in general, all the machine learning methods and algorithms are written for numeric features/variables. One of the most important problems with text corpus is how to represent text as numeric features. There are different transformations prescribed in the literature. Let's start with one of the simplest and most widely used transformations.

Now, to understand the processes of text classification, let's take a real word problem of spams. In the world of WhatsApp and SMS, you get many spam messages. Let's start by solving this real problem of spam detection with the help of text classification. We will be using this running example across the chapter.

Here are a few real examples of SMS's that we asked people to manually tag for us:

SMS001 ['spam', 'Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for...