Book Image

The Natural Language Processing Workshop

By : Rohan Chopra, Aniruddha M. Godbole, Nipun Sadvilkar, Muzaffar Bashir Shah, Sohom Ghosh, Dwight Gunning
5 (1)
Book Image

The Natural Language Processing Workshop

5 (1)
By: Rohan Chopra, Aniruddha M. Godbole, Nipun Sadvilkar, Muzaffar Bashir Shah, Sohom Ghosh, Dwight Gunning

Overview of this book

Do you want to learn how to communicate with computer systems using Natural Language Processing (NLP) techniques, or make a machine understand human sentiments? Do you want to build applications like Siri, Alexa, or chatbots, even if you’ve never done it before? With The Natural Language Processing Workshop, you can expect to make consistent progress as a beginner, and get up to speed in an interactive way, with the help of hands-on activities and fun exercises. The book starts with an introduction to NLP. You’ll study different approaches to NLP tasks, and perform exercises in Python to understand the process of preparing datasets for NLP models. Next, you’ll use advanced NLP algorithms and visualization techniques to collect datasets from open websites, and to summarize and generate random text from a document. In the final chapters, you’ll use NLP to create a chatbot that detects positive or negative sentiment in text documents such as movie reviews. By the end of this book, you’ll be equipped with the essential NLP tools and techniques you need to solve common business problems that involve processing text.
Table of Contents (10 chapters)
Preface

Dealing with Semi-Structured Data

We learned about various types of data in Chapter 2, Feature Extraction Methods. Let's quickly recapitulate what semi-structured data refers to. A dataset is said to be semi-structured if it is not in a row-column format but, if required, can be converted into a structured format that has a definite number of rows and columns. Often, we come across data that is stored as key-value pairs or embedded between tags, as is the case with JSON (JavaScript Object Notation) and XML (Extensible Markup Language) files. These are the most popularly used instances of semi-structured data.

JSON

JSON files are used for storing and exchanging data. JSON is human-readable and easy to interpret. Just like text files and CSV files, JSON files are language-independent. This means that different programming languages, such as Python, Java, and so on, can work with JSON files effectively. In Python, a built-in data structure called a dictionary is capable of...