Book Image

Mastering Apache Cassandra - Second Edition

Book Image

Mastering Apache Cassandra - Second Edition

Overview of this book

Table of Contents (15 chapters)
Mastering Apache Cassandra Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using Hadoop


Hadoop is for data processing. You may ask "So are MATLAB, R, Octave, Python (NLTK and many other libraries for data analysis), and SAS, then why Hadoop". They are great tools, but they are good for data that can fit in memory. It means that you can churn a couple of GBs to maybe 10s of GBs, and the rate of processing depends on the CPU on that machine, maybe 16 cores. This poses a big restriction. The data is no more in GB limits at the Internet scale. In the age of billions of mobile phones (there were an estimated 7.7 billion mobile users at the end of 2014, source: http://mobithinking.com/mobile-marketing-tools/latest-mobile-stats/a#subscribers), we are generating humongous amounts of data every second (Twitter reports 143,199 tweets per second, source: http://dazeinfo.com/2014/04/29/7-7-billion-mobile-devices-among-7-1-billion-world-population-end-2014/) by checking in places, tagging photos, uploading videos, commenting, messaging, purchasing, dining, running (fitness...