Book Image

NLTK Essentials

By : Nitin Hardeniya
Book Image

NLTK Essentials

By: Nitin Hardeniya

Overview of this book

<p>Natural Language Processing (NLP) is the field of artificial intelligence and computational linguistics that deals with the interactions between computers and human languages. With the instances of human-computer interaction increasing, it’s becoming imperative for computers to comprehend all major natural languages. Natural Language Toolkit (NLTK) is one such powerful and robust tool.</p> <p>You start with an introduction to get the gist of how to build systems around NLP. We then move on to explore data science-related tasks, following which you will learn how to create a customized tokenizer and parser from scratch. Throughout, we delve into the essential concepts of NLP while gaining practical insights into various open source tools and libraries available in Python for NLP. You will then learn how to analyze social media sites to discover trending topics and perform sentiment analysis. Finally, you will see tools which will help you deal with large scale text.</p> <p>By the end of this book, you will be confident about NLP and data science concepts and know how to apply them in your day-to-day work.</p>
Table of Contents (17 chapters)
NLTK Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Writing your first crawler


Let's start with a very basic crawler that will crawl the entire content of a web page. To write the crawlers, we will use Scrapy. Scrapy is a one of the best crawling solutions using Python. We will explore all the different features of Scrapy in this chapter. First, we need to install Scrapy for this exercise.

To do this, type in the following command:

$ pip install scrapy

This is the easiest way of installing Scrapy using a package manager. Let's now test whether we got everything right or not. (Ideally, Scrapy should now be part of sys.path):

>>>
import scrapy

Tip

If there is any error, then take a look at http://doc.scrapy.org/en/latest/intro/install.html.

At this point, we have Scrapy working for you. Let's start with an example spider app with Scrapy:

$ scrapy startproject tutorial

Once you write the preceding command, the directory structure should look like the following:

tutorial/
    scrapy.cfg   #the project configuration file
    tutorial/  ...