Book Image

Java Data Analysis

By : John R. Hubbard
Book Image

Java Data Analysis

By: John R. Hubbard

Overview of this book

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the aim of discovering useful information. Java is one of the most popular languages to perform your data analysis tasks. This book will help you learn the tools and techniques in Java to conduct data analysis without any hassle. After getting a quick overview of what data science is and the steps involved in the process, you’ll learn the statistical data analysis techniques and implement them using the popular Java APIs and libraries. Through practical examples, you will also learn the machine learning concepts such as classification and regression. In the process, you’ll familiarize yourself with tools such as Rapidminer and WEKA and see how these Java-based tools can be used effectively for analysis. You will also learn how to analyze text and other types of multimedia. Learn to work with relational, NoSQL, and time-series data. This book will also show you how you can utilize different Java-based libraries to create insightful and easy to understand plots and graphs. By the end of this book, you will have a solid understanding of the various data analysis techniques, and how to implement them using Java.
Table of Contents (20 chapters)
Java Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Index

Herman Hollerith


The decennial United States Census was mandated by the U. S. Constitution in 1789 for the purposes of apportioning representatives and taxes. The first census was taken in 1790 when the U. S. population was under four million. It simply counted free men. But by 1880, the country had grown to over 50 million, and the census itself had become much more complicated, recording dependents, parents, places of birth, property, and income.

Figure 4 Hollerith

The 1880 census took over eight years to compile. The United States Census Bureau realized that some sort of automation would be required to complete the 1890 census. They hired a young engineer named Herman Hollerith, who had proposed a system of electronic tabulating machines that would use punched cards to record the data.

This was the first successful application of automated data processing. It was a huge success. The total population of nearly 62 million was reported after only six weeks of tabulation.

Hollerith was awarded a Ph.D. from MIT for his achievement. In 1911, he founded the Computing-Tabulating-Recording Company, which became the International Business Machines Corporation (IBM) in 1924. Recently IBM built the supercomputer Watson, which was probably the most successful commercial application of data mining and artificial intelligence yet produced.