Book Image

Java for Data Science

By : Richard M. Reese, Jennifer L. Reese
Book Image

Java for Data Science

By: Richard M. Reese, Jennifer L. Reese

Overview of this book

para 1: Get the lowdown on Java and explore big data analytics with Java for Data Science. Packed with examples and data science principles, this book uncovers the techniques & Java tools supporting data science and machine learning. Para 2: The stability and power of Java combines with key data science concepts for effective exploration of data. By working with Java APIs and techniques, this data science book allows you to build applications and use analysis techniques centred on machine learning. Para 3: Java for Data Science gives you the understanding you need to examine the techniques and Java tools supporting big data analytics. These Java-based approaches allow you to tackle data mining and statistical analysis in detail. Deep learning and Java data mining are also featured, so you can explore and analyse data effectively, and build intelligent applications using machine learning. para 4: What?s Inside ? Understand data science principles with Java support ? Discover machine learning and deep learning essentials ? Explore data science problems with Java-based solutions
Table of Contents (19 chapters)
Java for Data Science
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Chapter 1. Getting Started with Data Science

Data science is not a single science as much as it is a collection of various scientific disciplines integrated for the purpose of analyzing data. These disciplines include various statistical and mathematical techniques, including:

  • Computer science

  • Data engineering

  • Visualization

  • Domain-specific knowledge and approaches

With the advent of cheaper storage technology, more and more data has been collected and stored permitting previously unfeasible processing and analysis of data. With this analysis came the need for various techniques to make sense of the data. These large sets of data, when used to analyze data and identify trends and patterns, become known as big data.

This in turn gave rise to cloud computing and concurrent techniques such as map-reduce, which distributed the analysis process across a large number of processors, taking advantage of the power of parallel processing.

The process of analyzing big data is not simple and evolves to the specialization of developers who were known as data scientists. Drawing upon a myriad of technologies and expertise, they are able to analyze data to solve problems that previously were either not envisioned or were too difficult to solve.

Early big data applications were typified by the emergence of search engines capable of more powerful and accurate searches than their predecessors. For example, AltaVista was an early popular search engine that was eventually superseded by Google. While big data applications were not limited to these search engine functionalities, these applications laid the groundwork for future work in big data.

The term, data science, has been used since 1974 and evolved over time to include statistical analysis of data. The concepts of data mining and data analytics have been associated with data science. Around 2008, the term data scientist appeared and was used to describe a person who performs data analysis. A more in-depth discussion of the history of data science can be found at http://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#3d9ea08369fd.

This book aims to take a broad look at data science using Java and will briefly touch on many topics. It is likely that the reader may find topics of interest and pursue these at greater depth independently. The purpose of this book, however, is simply to introduce the reader to the significant data science topics and to illustrate how they can be addressed using Java.

There are many algorithms used in data science. In this book, we do not attempt to explain how they work except at an introductory level. Rather, we are more interested in explaining how they can be used to solve problems. Specifically, we are interested in knowing how they can be used with Java.