So far we've mainly focused on datasets that can fit on a single machine. For larger datasets, we may need to access them through distributed file systems such as Amazon S3 or HDFS. For this purpose, we can utilize the open-source distributed computing framework PySpark (http://spark.apache.org/docs/latest/api/python/). PySpark is a distributed computing framework that uses the abstraction of Resilient Distributed Datasets (RDDs) for parallel collections of objects, which allows us to programmatically access a dataset as if it fits on a single machine. In later chapters we will demonstrate how to build predictive models in PySpark, but for this introduction we focus on data manipulation functions in PySpark.
Mastering Predictive Analytics with Python
By :
Mastering Predictive Analytics with Python
By:
Overview of this book
The volume, diversity, and speed of data available has never been greater. Powerful machine learning methods can unlock the value in this information by finding complex relationships and unanticipated trends. Using the Python programming language, analysts can use these sophisticated methods to build scalable analytic applications to deliver insights that are of tremendous value to their organizations.
In Mastering Predictive Analytics with Python, you will learn the process of turning raw data into powerful insights. Through case studies and code examples using popular open-source Python libraries, this book illustrates the complete development process for analytic applications and how to quickly apply these methods to your own data to create robust and scalable prediction services.
Covering a wide range of algorithms for classification, regression, clustering, as well as cutting-edge techniques such as deep learning, this book illustrates not only how these methods work, but how to implement them in practice. You will learn to choose the right approach for your problem and how to develop engaging visualizations to bring the insights of predictive modeling to life
Table of Contents (16 chapters)
Mastering Predictive Analytics with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Free Chapter
From Data to Decisions – Getting Started with Analytic Applications
Exploratory Data Analysis and Visualization in Python
Finding Patterns in the Noise – Clustering and Unsupervised Learning
Connecting the Dots with Models – Regression Methods
Putting Data in its Place – Classification Methods and Analysis
Words and Pixels – Working with Unstructured Data
Learning from the Bottom Up – Deep Networks and Unsupervised Features
Sharing Models with Prediction Services
Reporting and Testing – Iterating on Analytic Systems
Index
Customer Reviews