Book Image

Mastering Python for Finance

Book Image

Mastering Python for Finance

Overview of this book

Table of Contents (17 chapters)
Mastering Python for Finance
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Hadoop for big data


Apache Hadoop is a 100 percent open source software framework used for two important fundamental tasks: storing and processing big data. It has been the leading big data tool for distributed parallel processing of data stored across multiple servers and is able to scale without limits. Because of its scalability, flexibility, fault tolerance, and low-cost features, many cloud-based solution vendors, financial institutions, and enterprises use Hadoop for their big data needs.

The Hadoop framework contains modules that are critical to its functions: the Hadoop Distributed File System (HDFS), Yet Another Resource Negotiator (YARN), and MapReduce (MapR).

HDFS

HDFS is a file system unique to Hadoop that is designed to be scalable and portable, and allows large amounts of file storage over multiple nodes in a Hadoop cluster spanning gigabytes or terabytes of data. Data in a cluster is split into smaller blocks of 128 Megabytes typically and distributed throughout the cluster....