In this Scrapy example, we have utilized the Scrapy CrawlSpider, which is particularly useful when crawling a website or series of websites. Scrapy has several other spiders you may want to use depending on the site and your extraction needs. These spiders fall under the following categories:
- Spider: A normal scraping spider. This is usually used for just scraping one type of page.
- CrawlSpider: A crawl spider; usually used for traversing a domain and scraping one (or several) types of pages from the pages it finds by crawling links.
- XMLFeedSpider: A spider which traverses an XML feed and extracts content from each node.
- CSVFeedSpider: Similar to the XML spider, but instead can parse CSV rows within the feed.
- SitemapSpider: A spider which can crawl a site with differing rules by first parsing the Sitemap.
Each of these spiders are included in your default Scrapy installation, so you can access...