-
Book Overview & Buying
-
Table Of Contents
Network Science with Python
By :
First, what even is web scraping, and who can do it? Anyone with any programming skill can do scraping using several different programming languages, but we will do this with Python. Web scraping is the action of harvesting content from web resources so that you may use the data in your products and software. You can use scraping to pull information that a website hasn’t exposed as a data feed or through an API. But one warning: do not scrape too aggressively; otherwise, you could knock down a web server through an accidental denial-of-service (DoS) attack. Just get what you need as often as you need it. Go slow. Don’t be greedy or selfish.
BeautifulSoup is a powerful Python library for scraping anything that you have access to online. I frequently use this to harvest story URLs from news websites, and then I scrape each of these URLs for their text content. I typically do not want the actual HTML, CSS, or JavaScript...