Artificial Intelligence for Big Data

Artificial Intelligence for Big Data

By : Anand Deshpande, Manish Kumar

Buy this Book

Artificial Intelligence for Big Data

By: Anand Deshpande, Manish Kumar

Buy this Book

Overview of this book

In this age of big data, companies have larger amount of consumer data than ever before, far more than what the current technologies can ever hope to keep up with. However, Artificial Intelligence closes the gap by moving past human limitations in order to analyze data. With the help of Artificial Intelligence for big data, you will learn to use Machine Learning algorithms such as k-means, SVM, RBF, and regression to perform advanced data analysis. You will understand the current status of Machine and Deep Learning techniques to work on Genetic and Neuro-Fuzzy algorithms. In addition, you will explore how to develop Artificial Intelligence algorithms to learn from data, why they are necessary, and how they can help solve real-world problems. By the end of this book, you'll have learned how to implement various Artificial Intelligence algorithms for your big data systems and integrate them into your product offerings such as reinforcement learning, natural language processing, image recognition, genetic algorithms, and fuzzy logic systems.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

Big Data and Artificial Intelligence Systems

Results pyramid

What the human brain does best

What the electronic brain does best

Best of both worlds

Summary

Ontology for Big Data

Human brain and Ontology

Ontology of information science

Summary

Learning from Big Data

Supervised and unsupervised machine learning

The Spark programming model

The Spark MLlib library

Regression analysis

Data clustering

The K-means algorithm

Data dimensionality reduction

Singular value decomposition

The principal component analysis method

Content-based recommendation systems

Frequently asked questions

Summary

Neural Network for Big Data

Fundamentals of neural networks and artificial neural networks

Perceptron and linear models

Nonlinearities model

Feed-forward neural networks

Gradient descent and backpropagation

Overfitting

Recurrent neural networks

Frequently asked questions

Summary

Deep Big Data Analytics

Deep learning basics and the building blocks

Building data preparation pipelines

Practical approach to implementing neural net architectures

Hyperparameter tuning

Distributed computing

Distributed deep learning

Frequently asked questions

Summary

Natural Language Processing

Natural language processing basics

Text preprocessing

Feature extraction

Applying NLP techniques

Implementing sentiment analysis

Frequently asked questions

Summary

Fuzzy Systems

Fuzzy logic fundamentals

ANFIS network

Fuzzy C-means clustering

NEFCLASS

Frequently asked questions

Summary

Genetic Programming

Genetic algorithms structure

KEEL framework

Encog machine learning framework

Introduction to the Weka framework

Attribute search with genetic algorithms in Weka

Frequently asked questions

Summary

Swarm Intelligence

Swarm intelligence

The particle swarm optimization model

Ant colony optimization model

MASON Library

Opt4J library

Applications in big data analytics

Handling dynamical data

Multi-objective optimization

Frequently asked questions

Summary

Reinforcement Learning

Reinforcement learning algorithms concept

Reinforcement learning techniques

Deep reinforcement learning

Frequently asked questions

Summary

Cyber Security

Big Data for critical infrastructure protection

Understanding stream processing

Cyber security attack types

Understanding SIEM

Splunk

ArcSight ESM

Frequently asked questions

Summary

Cognitive Computing

Cognitive science

Cognitive Systems

Application in Big Data analytics

Cognitive intelligence as a service

Frequently asked questions

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Preface

We are at an interesting juncture in the evolution of the digital age, where there is an enormous amount of computing power and data in the hands of everyone. There has been an exponential growth in the amount of data we now have in digital form. While being associated with data-related technologies for more than 6 years, we have seen a rapid shift towards enterprises that are willing to leverage data assets initially for insights and eventually for advanced analytics. What sounded like hype initially has become a reality in a very short period of time. Most companies have realized that data is the most important asset needed to stay relevant. As practitioners in the big data analytics industry, we have seen this shift very closely by working with many clients of various sizes, across regions and functional domains. There is a common theme evolving toward open distributed open source computing to store data assets and perform advanced analytics to predict future trends and risks for businesses.

This book is an attempt to share the knowledge we have acquired over time to help new entrants in the big data space to learn from our experience. We realize that the field of artificial intelligence is vast and it is just the beginning of a revolution in the history of mankind. We are going to see AI becoming mainstream in everyone’s life and complementing human capabilities to solve some of the problems that have troubled us for a long time. This book takes a holistic approach into the theory of machine learning and AI, starting from the very basics to building applications with cognitive intelligence. We have taken a simple approach to illustrate the core concepts and theory, supplemented by illustrative diagrams and examples.

It will be encouraging for us for readers to benefit from the book and fast-track their learning and innovation into one of the most exciting fields of computing so they can create a truly intelligent system that will augment our abilities to the next level.

Who this book is for

This book is for anyone with a curious mind who is exploring the fields of machine learning, artificial intelligence, and big data analytics. This book does not assume that you have in-depth knowledge of statistics, probability, or mathematics. The concepts are illustrated with easy-to-follow examples. A basic understanding of the Java programming language and the concepts of distributed computing frameworks (Hadoop/Spark) will be an added advantage. This book will be useful for data scientists, members of technical staff in IT products and service companies, technical project managers, architects, business analysts, and anyone who deals with data assets.

What this book covers

Chapter 1, Big Data and Artificial Intelligence Systems, will set the context for the convergence of human intelligence and machine intelligence at the onset of a data revolution. We have the ability to consume and process volumes of data that were never possible before. We will understand how our quality of life is the result of our decisive power and actions and how it translates into the machine world. We will understand the paradigm of big data along with its core attributes before diving into the basics of AI. We will conceptualize the big data frameworks and see how they can be leveraged for building intelligence into machines. The chapter will end with some of the exciting applications of Big Data and AI.

Chapter 2, Ontology for Big Data, introduces semantic representation of data into knowledge assets. A semantic and standardized view of the world is essential if we want to implement artificial intelligence, which fundamentally derives knowledge from data and utilizes contextual knowledge for insights and meaningful actions in order to augment human capabilities. This semantic view of the world is expressed as ontologies.

Chapter 3, Learning from Big Data, shows broad categories of machine learning as supervised and unsupervised learning, and we understand some of the fundamental algorithms that are very widely used. In the end, we will have an overview of the Spark programming model and Spark's Machine Learning library (Spark MLlib).

Chapter 4, Neural Networks for Big Data, explores neural networks and how they have evolved with the increase in computing power with distributed computing frameworks. Neural networks get their inspiration from the human brain and help us solve some very complex problems that are not feasible with traditional mathematical models.

Chapter 5, Deep Big Data Analytics, takes our understanding of neural networks to the next level by exploring deep neural networks and the building blocks of deep learning: gradient descent and backpropagation. We will review how to build data preparation pipelines, the implementation of neural network architectures, and hyperparameter tuning. We will also explore distributed computing for deep neural networks with examples using the DL4J library.

Chapter 6, Natural Language Processing, introduces some of the fundamentals of Natural Language Processing (NLP). As we build intelligent machines, it is imperative that the interface with the machines should be as natural as possible, like day-to-day human interactions. NLP is one of the important steps towards that. We will be learning about text preprocessing, techniques for extraction of relevant features from natural language text, application of NLP techniques, and the implementation of sentiment analysis with NLP.

Chapter 7, Fuzzy Systems, explains that a level of fuzziness is essential if we want to build intelligent machines. In the real-world scenarios, we cannot depend on exact mathematical and quantitative inputs for our systems to work with, although our models (deep neural networks, for example) require actual inputs. The uncertainties are more frequent and, due to the nature of real-world scenarios, are amplified by incompleteness of contextual information, characteristic randomness, and ignorance of data. Human reasoning are capable enough to deal with these attributes of the real world. A similar level of fuzziness is essential for building intelligent machines that can complement human capabilities in a real sense. In this chapter, we are going to understand the fundamentals of fuzzy logic, its mathematical representation, and some practical implementations of fuzzy systems.

Chapter 8, Genetic Programming, big data mining tools need to be empowered by computationally efficient techniques to increase the degree of efficiency. Genetic algorithms over data mining create great, robust, computationally efficient, and adaptive systems. In fact, with the exponential explosion of data, data analytics techniques go on to take more time and inversely affect the throughput. Also due to their static nature, complex hidden patterns are often left out. In this chapter, we want to show how to use genes to mine data with great efficiency. To achieve this objective, we’ll introduce the basics of genetic programming and the fundamental algorithms.

Chapter 9, Swarm Intelligence, analyzes the potential of swarm intelligence for solving big data analytics problems. Based on the combination of swarm intelligence and data mining techniques, we can have a better understanding of the big data analytics problems and design more effective algorithms to solve real-world big data analytics problems. In this chapter, we’ll show how to use these algorithms in big data applications. The basic theory and some programming frameworks will be also explained.

Chapter 10,Reinforcement Learning, covers reinforcement learning as one of the categories of machine learning. With reinforcement learning, the intelligent agent learns the right behavior based on the reward it receives as per the actions it takes within a specific environmental context. We will understand the fundamentals of reinforcement learning, along with mathematical theory and some of the commonly used techniques for reinforcement learning.

Chapter 11,Cyber Security, analyzes the cybersecurity problem for critical infrastructure. Data centers, data base factories, and information system factories are continuously under attack. Online analysis can detect potential attacks to ensure infrastructure security. This chapter also explains Security Information and Event Management (SIEM). It emphasizes the importance of managing log files and explains how they can bring benefits. Subsequently, Splunk and ArcSight ESM systems are introduced.

Chapter 12, Cognitive Computing, introduces cognitive computing as the next level in the development of artificial intelligence. By leveraging the five primary human senses along with mind as the sixth sense, a new era of cognitive systems can begin. We will see the stages of AI and the natural progression towards strong AI, along with the key enablers for achieving strong AI. We will take a look at the history of cognitive systems and see how that growth is accelerated with the availability of big data, which brings large data volumes and processing power in a distributed computing framework.

To get the most out of this book

The chapters in this book are sequenced in such a way that the reader can progressively learn about Artificial Intelligence for Big Data starting from the fundamentals and eventually move towards cognitive intelligence. Chapter 1, Big Data and Artificial Intelligence Systems, to Chapter 5, Deep Big Data Analytics, cover the basic theory of machine learning and establish the foundation for practical approaches to AI. Starting from Chapter 6, Natural Language Processing, we conceptualize theory into practical implementations and possible use cases. To get the most out of this book, it is recommended that the first five chapters are read in order. From Chapter 6, Natural Language Processing, onward, the reader can choose any topic of interest and read in whatever sequence they prefer.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at www.packtpub.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Artificial-Intelligence-for-Big-Data. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/ArtificialIntelligenceforBigData_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

StopWordsRemoverremover=newStopWordsRemover().setInputCol("raw").setOutputCol("filtered");

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Artificial Intelligence for Big Data

By : Anand Deshpande, Manish Kumar

Artificial Intelligence for Big Data

By: Anand Deshpande, Manish Kumar

Overview of this book

Related Content you might be interested in

Current Title:

Artificial Intelligence for Big Data

Mastering Hadoop 3

Deep Learning with Hadoop

Hands-On Artificial Intelligence on Google Cloud Platform

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Note

Note

Get in touch

Reviews