Book Image

Python Natural Language Processing

Book Image

Python Natural Language Processing

Overview of this book

This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them. During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis. You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data. By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world.
Table of Contents (13 chapters)

Web scraping

To develop a web scraping tool, we can use libraries such as beautifulsoup and scrapy. Here, I'm giving some of the basic code for web scraping.

Take a look at the code snippet in Figure 2.6, which is used to develop a basic web scraper using beautifulsoup:

Figure 2.6: Basic web scraper tool using beautifulsoup

The following Figure 2.7 demonstrates the output:

Figure 2.7: Output of basic web scraper using beautifulsoup

You can find the installation guide for beautifulsoup and scrapy at this link:

https://github.com/jalajthanaki/NLPython/blob/master/ch2/Chapter_2_Installation_Commands.txt.

You can find the code at this link:

https://github.com/jalajthanaki/NLPython/blob/master/ch2/2_2_Basic_webscraping_byusing_beautifulsuop.py.

If you get any warning while running the script, it will be fine; don't worry about warnings.

Now, let's do some web scraping...