Java for Data Science

Java for Data Science

By : Richard M. Reese, Jennifer L. Reese

Buy this Book

Java for Data Science

By: Richard M. Reese, Jennifer L. Reese

Buy this Book

Overview of this book

para 1: Get the lowdown on Java and explore big data analytics with Java for Data Science. Packed with examples and data science principles, this book uncovers the techniques & Java tools supporting data science and machine learning. Para 2: The stability and power of Java combines with key data science concepts for effective exploration of data. By working with Java APIs and techniques, this data science book allows you to build applications and use analysis techniques centred on machine learning. Para 3: Java for Data Science gives you the understanding you need to examine the techniques and Java tools supporting big data analytics. These Java-based approaches allow you to tackle data mining and statistical analysis in detail. Deep learning and Java data mining are also featured, so you can explore and analyse data effectively, and build intelligent applications using machine learning. para 4: What?s Inside ? Understand data science principles with Java support ? Discover machine learning and deep learning essentials ? Explore data science problems with Java-based solutions

Java for Data Science

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Getting Started with Data Science

Problems solved using data science

Understanding the data science problem - solving approach

Acquiring data for an application

The importance and process of cleaning data

Visualizing data to enhance understanding

The use of statistical methods in data science

Machine learning applied to data science

Using neural networks in data science

Deep learning approaches

Performing text analysis

Visual and audio analysis

Improving application performance using parallel techniques

Assembling the pieces

Summary

Data Acquisition

Understanding the data formats used in data science applications

Data acquisition techniques

Summary

Data Cleaning

Handling data formats

The nitty gritty of cleaning text

Cleaning images

Summary

Data Visualization

Understanding plots and graphs

Creating index charts

Creating bar charts

Creating stacked graphs

Creating pie charts

Creating scatter charts

Creating histograms

Creating donut charts

Creating bubble charts

Summary

Statistical Data Analysis Techniques

Working with mean, mode, and median

Standard deviation

Sample size determination

Hypothesis testing

Regression analysis

Summary

Machine Learning

Supervised learning techniques

Unsupervised machine learning

Reinforcement learning

Summary

Neural Networks

Training a neural network

Understanding static neural networks

Understanding dynamic neural networks

Additional network architectures and algorithms

Summary

Deep Learning

Deeplearning4j architecture

Deep learning and regression analysis

Restricted Boltzmann Machines

Deep autoencoders

Convolutional networks

Recurrent Neural Networks

Summary

Text Analysis

Implementing named entity recognition

Classifying text

Understanding tagging and POS

Extracting relationships from sentences

Sentiment analysis

Summary

Visual and Audio Analysis

Text-to-speech

Understanding speech recognition

Extracting text from an image

Identifying faces

Classifying visual data

Summary

Mathematical and Parallel Techniques for Data Analysis

Implementing basic matrix operations

Using map-reduce

Various mathematical libraries

Using OpenCL

Using Aparapi

Using Java 8 streams

Summary

Bringing It All Together

Defining the purpose and scope of our application

Understanding the application's architecture

Data acquisition using Twitter

Understanding the TweetHandler class

Other optional enhancements

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Deep learning approaches

Deep learning networks are often described as neural networks that use multiple intermediate layers. Each layer will train on the outputs of a previous layer potentially identifying features and subfeatures of a dataset. The features refer to those aspects of the data that may be of interest. In Chapter 8, Deep Learning, we will examine these types of networks and how they can support several different data science tasks.

These networks often work with unstructured and unlabeled datasets, which is the vast majority of the data available today. A typical approach is to take the data, identify features, and then use these features and their corresponding layers to reconstruct the original dataset, thus validating the network. The Restricted Boltzmann Machines (RBM) is a good example of the application of this approach.

The deep learning network needs to ensure that the results are accurate and minimizes any error that can creep into the process. This is accomplished by adjusting the internal weights assigned to neurons based on what is known as gradient descent. This represents the slope of the weight changes. The approach modifies the weight so as to minimize the error and also speeds up the learning process.

There are several types of networks that have been classified as a deep learning network. One of these is an autoencoder network. In this network, the layers are symmetrical where the number of input values is the same as the number of output values and the intermediate layers effectively compress the data to a single smaller internal layer. Each layer of the autoencoder is a RBM.

This structure is reflected in the following example, which will extract the numbers found in a set of images containing hand-written numbers. The details of the complete example are not shown here, but notice that 1,000 input and output values are used along with internal layers consisting of RBMs. The size of the layers are specified in the nOut and nIn methods.

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() 
        .seed(seed) 
        .iterations(numberOfIterations) 
        .optimizationAlgo( 
           OptimizationAlgorithm.LINE_GRADIENT_DESCENT) 
        .list() 
        .layer(0, new RBM.Builder() 
            .nIn(numberOfRows * numberOfColumns).nOut(1000) 
            .lossFunction(LossFunctions.LossFunction.RMSE_XENT) 
            .build()) 
        .layer(1, new RBM.Builder().nIn(1000).nOut(500) 
            .lossFunction(LossFunctions.LossFunction.RMSE_XENT) 
            .build()) 
        .layer(2, new RBM.Builder().nIn(500).nOut(250) 
            .lossFunction(LossFunctions.LossFunction.RMSE_XENT) 
            .build()) 
        .layer(3, new RBM.Builder().nIn(250).nOut(100) 
            .lossFunction(LossFunctions.LossFunction.RMSE_XENT) 
            .build()) 
        .layer(4, new RBM.Builder().nIn(100).nOut(30) 
            .lossFunction(LossFunctions.LossFunction.RMSE_XENT) 
            .build()) //encoding stops 
        .layer(5, new RBM.Builder().nIn(30).nOut(100) 
            .lossFunction(LossFunctions.LossFunction.RMSE_XENT) 
            .build()) //decoding starts 
        .layer(6, new RBM.Builder().nIn(100).nOut(250) 
            .lossFunction(LossFunctions.LossFunction.RMSE_XENT) 
            .build()) 
        .layer(7, new RBM.Builder().nIn(250).nOut(500) 
            .lossFunction(LossFunctions.LossFunction.RMSE_XENT) 
            .build()) 
        .layer(8, new RBM.Builder().nIn(500).nOut(1000) 
            .lossFunction(LossFunctions.LossFunction.RMSE_XENT) 
            .build()) 
        .layer(9, new OutputLayer.Builder( 
                LossFunctions.LossFunction.RMSE_XENT).nIn(1000) 
                .nOut(numberOfRows * numberOfColumns).build()) 
        .pretrain(true).backprop(true) 
        .build();

Once the model has been trained, it can be used for predictive and searching tasks. With a search, the compressed middle layer can be used to match other compressed images that need to be classified.

Java for Data Science

By : Richard M. Reese, Jennifer L. Reese

Java for Data Science

By: Richard M. Reese, Jennifer L. Reese

Overview of this book

Related Content you might be interested in

Current Title:

Java for Data Science

Natural Language Processing with Java

Machine Learning in Java

Java Data Science Cookbook

Deep learning approaches