Book Image

Natural Language Processing with AWS AI Services

By : Mona M, Premkumar Rangarajan
Book Image

Natural Language Processing with AWS AI Services

By: Mona M, Premkumar Rangarajan

Overview of this book

Natural language processing (NLP) uses machine learning to extract information from unstructured data. This book will help you to move quickly from business questions to high-performance models in production. To start with, you'll understand the importance of NLP in today’s business applications and learn the features of Amazon Comprehend and Amazon Textract to build NLP models using Python and Jupyter Notebooks. The book then shows you how to integrate AI in applications for accelerating business outcomes with just a few lines of code. Throughout the book, you'll cover use cases such as smart text search, setting up compliance and controls when processing confidential documents, real-time text analytics, and much more to understand various NLP scenarios. You'll deploy and monitor scalable NLP models in production for real-time and batch requirements. As you advance, you'll explore strategies for including humans in the loop for different purposes in a document processing workflow. Moreover, you'll learn best practices for auto-scaling your NLP inference for enterprise traffic. Whether you're new to ML or an experienced practitioner, by the end of this NLP book, you'll have the confidence to use AWS AI services to build powerful NLP applications.
Table of Contents (23 chapters)
1
Section 1:Introduction to AWS AI NLP Services
5
Section 2: Using NLP to Accelerate Business Outcomes
15
Section 3: Improving NLP Models in Production

Preface

Authors are a quirky lot; almost like the weather in London. The sky is overcast, you want to go for a walk in Trafalgar Square, you wear your raincoat, pick up your umbrella just in case, and you think you are ready for anything. But you are woefully unaware of the sinister plan nature has for you. You walk a mile or so, and suddenly, without warning, the sky clears, the sun pours its brightest song upon your face, and lo and behold, you are caught unaware (like a deer in headlights) with your raincoat and umbrella and you are too far from home to go back and get rid of them. This is exactly what happens to the best of us when we set out to write a book. You set out with a clear objective, focus your thoughts, write a fantastic outline, get it approved, and start formulating your chapters, but unbeknown to you, the book has other plans on how it wants to write itself.

When this happens, as in life, there are always choices. You can let the creative stream express itself through your hands onto the pages of the book, or you can resist and follow the preconceived pattern you laid out. There is, of course, also a third choice, which is to follow the overall structure for what you want to convey, but allow creativity to take control when it wants to. This is what we did for this book. But it was not as easy as we thought at first, because creativity doesn't take no for an answer. The famous Sufi poet Jalaluddin Rumi said: "In silence, there is eloquence. Stop weaving and see how the pattern improves." The most difficult part was to stop "weaving" or to stop being inspired by the content that we had already published as AWS authors. This was also a hard requirement for the book, and so it was a strong motivation for us to be creative and come up with original, in-demand, and fresh content for the book.

So, we stopped "weaving." The next logical step was for the pattern to improve. But nothing happened. The deadline for the first chapter was looming, and our editors were very politely reminding us of the due date. Still nada. We used this "no weaving" time to storyboard and architect the technical chapters, but the glue that was to hold together the book, the main narrative, continued to elude us. And then suddenly, one day, without warning it struck. We had totally missed the important first part of Rumi's saying: "In silence, there is eloquence." A walk in nature at a trail nearby took care of the daily quota of silence, during which time a faint thought appeared, a memory of a story that my father (Shri T. Rangarajan) had narrated to me when I was a kid called Ali Baba and the Forty Thieves. It dawned on me that the famous sequence from the story was in fact my first recollection of using voice to perform a task (please refer to Chapter 1, NLP in the Business Context and Introduction to AWS AI Services, in the book). And from then on, the floodgates opened. They never stopped until the book was written in its entirety. And that is how this book came about.

An interesting fact about life we all know is that change is the only constant thing. And this was true when writing this book as well. One of the best things about AWS is the pace of innovation with which new features are introduced. The AWS product roadmap is based on direct customer feedback and features are improved iteratively with new features launched continuously. So, as we were writing this book, Amazon Comprehend and Amazon Textract added new features, the console experience was changed, and so on. For example, Amazon Comprehend modified its console experience, added support for custom entity recognition training from PDF documents directly, and improved its custom entity recognition model framework to support training with just 100 annotations per entity and 250 documents. Amazon Textract reduced pricing by 32% for the AnalyzeDocument and DetectDocumentText APIs in eight global AWS Regions, announced support for the automated processing of invoices, and so on. A full list of what's new in AWS in 2021 can be reviewed at this link: https://aws.amazon.com/about-aws/whats-new/2021/.

You will notice these changes as you build the solutions for the various NLP use cases in this book. Please note that since the Amazon Textract and Amazon Comprehend consoles have changed, the instructions in the book may not be a word-for-word match with your experience in the AWS Management Console; however, they are accurate and adequate for your needs.

For example, the Train Recognizer button in the Amazon Comprehend console for custom entity recognition has now changed to Create new model. Similarly, Train Classifier in the Amazon Comprehend console for custom classification has now also changed to Create new model. When you specify Training and test dataset for custom entity recognition, a new option will now appear in the console for selecting PDF, Word documents. Amazon Textract has changed and it now reflects AnalyzeExpense as an option to view the results for your document in the console.

In the majority of the book however we have used APIs to build the solutions and the best thing about AWS is that the APIs do not change. You get consistent responses and requests. You just need to upgrade the version of Python Boto3 if you want to use the latest one. Moreover, our goal is to make sure this book remains relevant and up to date.