Book Image

Mastering Java for Data Science

By : Alexey Grigorev
Book Image

Mastering Java for Data Science

By: Alexey Grigorev

Overview of this book

Java is the most popular programming language, according to the TIOBE index, and it is a typical choice for running production systems in many companies, both in the startup world and among large enterprises. Not surprisingly, it is also a common choice for creating data science applications: it is fast and has a great set of data processing tools, both built-in and external. What is more, choosing Java for data science allows you to easily integrate solutions with existing software, and bring data science into production with less effort. This book will teach you how to create data science applications with Java. First, we will revise the most important things when starting a data science application, and then brush up the basics of Java and machine learning before diving into more advanced topics. We start by going over the existing libraries for data processing and libraries with machine learning algorithms. After that, we cover topics such as classification and regression, dimensionality reduction and clustering, information retrieval and natural language processing, and deep learning and big data. Finally, we finish the book by talking about the ways to deploy the model and evaluate it in production settings.
Table of Contents (17 chapters)
Title Page
About the Author
About the Reviewers
Customer Feedback

About the Reviewers

Stanislav Bashkyrtsev has been working with Java for the last 9 years. Last years were focused on automation and optimization of development processes.Luca Massaron is a data scientist and a marketing research director specialized in multivariate statistical analysis, machine learning, and customer insight with over a decade of experience in solving real-world problems and in generating value for stakeholders by applying reasoning, statistics, data mining, and algorithms. From being a pioneer of Web audience analysis in Italy to achieving the rank of top ten Kaggler, he has always been passionate about everything regarding data and analysis and about demonstrating the potentiality of data-driven knowledge discovery to both experts and nonexperts. Favoring simplicity over unnecessary sophistication, he believes that a lot can be achieved in data science just by doing the essential. He is the coauthor of five recently published books and he is just working on the sixth. For Packt Publishing he contributed as an author to Python Data Science Essentials (both 1st and 2nd editions), Regression Analysis with Python, and Large Scale Machine Learning with Python.

You can find him on LinkedIn at

Prashant Verma started his IT carrier in 2011 as a Java developer in Ericsson working in telecom domain. After a couple of years of JAVA EE experience, he moved into big data domain, and has worked on almost all the popular big data technologies such as Hadoop, Spark, Flume, Mongo, Cassandra, and so on. He has also played with Scala. Currently, he works with QA Infotech as lead data engineer, working on solving e-learning domain problems using  analytics and machine learning.

Prashant has worked for many companies such as Ericsson and QA Infotech, with domain knowledge of telecom and e-learning. Prashant has also been working as a freelance consultant in his free time.

I want to thank Packt Publishing for giving me the chance to review the book as well as my employer and my family for their patience while I was busy working on this book.