4. Collecting Text Data from the Web
Activity 6: Extracting Information from an Online HTML Page
Solution
Let's extract the data from an online source and analyze it. Follow these steps to implement this activity:
- Open a Jupyter notebook.
- Import the
requests
andBeautifulSoup
libraries. Pass the URL torequests
with the following command. Convert the fetched content into HTML format using BeautifulSoup's HTML parser. Add the following code to do this:import requests from bs4 import BeautifulSoup r = requests.get('https://en.wikipedia.org/wiki/Rabindranath_Tagore') soup = BeautifulSoup(r.text, 'html.parser')
- To extract the list of headings, look for the
h3
tag. Here, we only need the first six headings. We will look for aspan
tag that has aclass
attribute with the following set of commands:for ele in soup.find_all('h3')[:6]: tx = BeautifulSoup(str(ele),'html.parser').find('span&apos...