Book Image

Machine Learning with AWS

By : Jeffrey Jackovich, Ruze Richards
Book Image

Machine Learning with AWS

By: Jeffrey Jackovich, Ruze Richards

Overview of this book

<p>Machine Learning with AWS is the right place to start if you are a beginner interested in learning useful artificial intelligence (AI) and machine learning skills using Amazon Web Services (AWS), the most popular and powerful cloud platform. You will learn how to use AWS to transform your projects into apps that work at high speed and are highly scalable. From natural language processing (NLP) applications, such as language translation and understanding news articles and other text sources, to creating chatbots with both voice and text interfaces, you will learn all that there is to know about using AWS to your advantage. You will also understand how to process huge numbers of images fast and create machine learning models.</p> <p>By the end of this book, you will have developed the skills you need to efficiently use AWS in your machine learning and artificial intelligence projects.</p>
Table of Contents (9 chapters)
Machine Learning with AWS
Preface

Using Amazon Comprehend to Inspect Text and Determine the Primary Language


Amazon Comprehend is used to gather insights from a variety of topics (Health, Media, Telecom, Education, Government, and so on) and languages in text data. Thus, the first step to analyze text data and utilize more complex features (such as topic, entity, and sentiment analysis) is to determine the dominant language. Determining the dominant language ensures the accuracy of more in-depth analysis.

To examine the text in order to determine the primary language, there are two operations (DetectDominantLanguage and BatchDetectDominantLanguage).

DetectDominantLanguage accepts a UTF-8 text string that is at least 20 characters in length and must contain fewer than 5,000 bytes of UTF-8 encoded characters. BatchDetectDominantLanguage accepts an array of strings as a list. The list can contain a maximum of 25 documents. Each document should have at least 20 characters, and must contain fewer than 5,000 bytes of UTF-8 encoded...