Book Image

Artificial Intelligence for Big Data

By : Anand Deshpande, Manish Kumar
Book Image

Artificial Intelligence for Big Data

By: Anand Deshpande, Manish Kumar

Overview of this book

In this age of big data, companies have larger amount of consumer data than ever before, far more than what the current technologies can ever hope to keep up with. However, Artificial Intelligence closes the gap by moving past human limitations in order to analyze data. With the help of Artificial Intelligence for big data, you will learn to use Machine Learning algorithms such as k-means, SVM, RBF, and regression to perform advanced data analysis. You will understand the current status of Machine and Deep Learning techniques to work on Genetic and Neuro-Fuzzy algorithms. In addition, you will explore how to develop Artificial Intelligence algorithms to learn from data, why they are necessary, and how they can help solve real-world problems. By the end of this book, you'll have learned how to implement various Artificial Intelligence algorithms for your big data systems and integrate them into your product offerings such as reinforcement learning, natural language processing, image recognition, genetic algorithms, and fuzzy logic systems.
Table of Contents (19 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Feature extraction


As mentioned earlier in this chapter, the NLP system does not understand string values. They need numerical input to build models, sometimes they are also called numerical features. Feature extraction in NLP is converting a set of text information into a set of numerical features. Any machine learning algorithm that you are going to train would need features in numerical vector forms as it does not understand the string. There are many ways text can be represented as numerical vectors. Some such ways are One hot encoding, TF-IDF, Word2Vec, and CountVectorizer.

One hot encoding

One hot encoding is the binary sparse vector representation of text. In this encoding, the resulting binary vector is all zero-value except at the position or index of the token where it is one. Let's look at it with an example. Suppose there are two sentences: This is Big Data AI Book. This is book explains AI algorithms on Big Data

Unique tokens (nouns) for earlier sentences would be {data,AI,book...