Book Image

Artificial Intelligence for Big Data

By : Anand Deshpande, Manish Kumar
Book Image

Artificial Intelligence for Big Data

By: Anand Deshpande, Manish Kumar

Overview of this book

In this age of big data, companies have larger amount of consumer data than ever before, far more than what the current technologies can ever hope to keep up with. However, Artificial Intelligence closes the gap by moving past human limitations in order to analyze data. With the help of Artificial Intelligence for big data, you will learn to use Machine Learning algorithms such as k-means, SVM, RBF, and regression to perform advanced data analysis. You will understand the current status of Machine and Deep Learning techniques to work on Genetic and Neuro-Fuzzy algorithms. In addition, you will explore how to develop Artificial Intelligence algorithms to learn from data, why they are necessary, and how they can help solve real-world problems. By the end of this book, you'll have learned how to implement various Artificial Intelligence algorithms for your big data systems and integrate them into your product offerings such as reinforcement learning, natural language processing, image recognition, genetic algorithms, and fuzzy logic systems.
Table of Contents (19 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Text preprocessing


Preprocessing the data is the process of cleaning and preparing the text for classification and derivation of meaning. Since our data may have a lot of noise, uninformative parts, such as HTML tags, need to be eliminated or re-aligned. At the word level, there might be many words that do not make much impact on the overall semantic of the textual context. Text preprocessing involves a few steps, such as extraction, tokenization, stop words removal, text enrichment, and normalization with stemming and lemmatization. In addition to these, some of the basic and generic techniques that improve accuracy involve converting the text to lower case, removing numbers (based on the context), removing punctuation, stripping white spaces (sometimes these add to noise in the input signal), and eliminating the sparse terms that are infrequent terms in the document. In the subsequent sections, we'll analyze some of these techniques in detail.

Removing stop words

Stop words are words that...