Book Image

Practical Data Analysis - Second Edition

By : Hector Cuesta, Dr. Sampath Kumar
Book Image

Practical Data Analysis - Second Edition

By: Hector Cuesta, Dr. Sampath Kumar

Overview of this book

Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you’ll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark.
Table of Contents (21 chapters)
Practical Data Analysis - Second Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface

An introduction to the distributed file system


A distributed file system is practically the same as any file system due to its basic actions such as storing, reading, deleting files, and assigning security levels are support. The main difference is focused on the number of servers that can be used at same time without dealing with complexity of synchronization. In this case, we can store large files in different server nodes without caring about redundancy or parallel operations.

There are a lot of frameworks for distributed file systems, such as Red Hat Cluster FS, Ceph File system, Hadoop Distributed File System (HDFS), and Tachyon File System.

In this chapter, we will use HDFS, which is an open source implementation of Google File System, built to handle large files into a cluster of commodity hardware. The HDFS cluster implements a NameNode that manages operations through the file system, and a series of DataNodes that manage the storage of the files in the cluster nodes individually,...