Book Image

Getting Started with Beautiful Soup

By : Vineeth G Nair
Book Image

Getting Started with Beautiful Soup

By: Vineeth G Nair

Overview of this book

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need without writing excess code for an application. It doesn't take much code to write an application using Beautiful Soup. Getting Started with Beautiful Soup is a practical guide to Beautiful Soup using Python. The book starts by walking you through the installation of each and every feature of Beautiful Soup using simple examples which include sample Python codes as well as diagrams and screenshots wherever required for better understanding. The book discusses the problems of how exactly you can get data out of a website and provides an easy solution with the help of a real website and sample code. Getting Started with Beautiful Soup goes over the different methods to install Beautiful Soup in both Linux and Windows systems. You will then learn about searching, navigating, content modification, encoding support, and output formatting with the help of examples and sample Python codes for each example so that you can try them out to get a better understanding. This book is a practical guide for scraping information from any website. If you want to learn how to efficiently scrape pages from websites, then this book is for you.
Table of Contents (15 chapters)
Getting Started with Beautiful Soup
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Getting selling prices from Amazon


We can search on Amazon for books based on their ISBNs. Normally, we will use the default search page on Amazon and enter the ISBN. We can do this manually, but from a program or scraper, we should know the URL to request based on the ISBN. Let us go to the Amazon site and search for this book with the ISBN, as shown in the following screenshot:

The page generated after the search in Amazon will have a URL structure as follows:

http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=1783289554

If we search based on another ISBN, that is, http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=1847195164, we will see that it gives us back the details based on the 1847195164 ISBN.

From this, we can conclude that if we substitute field-keywords of the URL with the corresponding ISBN, we will be getting the details for that ISBN.

From the http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords...