Book Image

Web Scraping with Python

By : Richard Penman
Book Image

Web Scraping with Python

By: Richard Penman

Overview of this book

Table of Contents (16 chapters)

Starting a project


Now that Scrapy is installed, we can run the startproject command to generate the default structure for this project. To do this, open the terminal and navigate to the directory where you want to store your Scrapy project, and then run scrapy startproject <project name>. Here, we will use example for the project name:

$ scrapy startproject example
$ cd example

Here are the files generated by the scrapy command:

    scrapy.cfg
    example/
        __init__.py  
        items.py  
        pipelines.py  
        settings.py  
        spiders/
            __init__.py

The important files for this chapter are as follows:

  • items.py: This file defines a model of the fields that will be scraped

  • settings.py: This file defines settings, such as the user agent and crawl delay

  • spiders/: The actual scraping and crawling code are stored in this directory

Additionally, Scrapy uses scrapy.cfg for project configuration and pipelines.py to process the scraped fields, but they will not...