-
Book Overview & Buying
-
Table Of Contents
Machine Learning for the Web
By :
Scrapy is a Python library is used to extract content from web pages or to crawl pages linked to a given web page (see the Web crawlers (or spiders) section of Chapter 4, Web Mining Techniques, for more details). To install the library, type the following in the terminal:
sudo pip install Scrapy
Install the executable in the bin folder:
sudo easy_install scrapy
From the movie_reviews_analyzer_app folder, we initialize our Scrapy project as follows:
scrapy startproject scrapy_spider
This command will create the following tree inside the scrapy_spider folder:
├── __init__.py ├── items.py ├── pipelines.py ├── settings.py ├── spiders ├── spiders │ ├── __init__.py
The pipelines.py and items.py files manage how the scraped data is stored and manipulated, and they will be discussed later in the Spiders and Integrate Django with Scrapy sections. The settings.py file sets the parameters each spider (or crawler) defined in the spiders folder uses to operate...