Book Image

Practical Data Science Cookbook, Second Edition - Second Edition

By : Prabhanjan Narayanachar Tattar, Bhushan Purushottam Joshi, Sean Patrick Murphy, ABHIJIT DASGUPTA, Anthony Ojeda
Book Image

Practical Data Science Cookbook, Second Edition - Second Edition

By: Prabhanjan Narayanachar Tattar, Bhushan Purushottam Joshi, Sean Patrick Murphy, ABHIJIT DASGUPTA, Anthony Ojeda

Overview of this book

As increasing amounts of data are generated each year, the need to analyze and create value out of it is more important than ever. Companies that know what to do with their data and how to do it well will have a competitive advantage over companies that don’t. Because of this, there will be an increasing demand for people that possess both the analytical and technical abilities to extract valuable insights from data and create valuable solutions that put those insights to use. Starting with the basics, this book covers how to set up your numerical programming environment, introduces you to the data science pipeline, and guides you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis—R and Python.
Table of Contents (17 chapters)
Title Page
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface

Understanding the data


Understanding your data is critical to all data-related work. In this recipe, we will acquire and take a first look at the data that we will be using to build our recommendation engine.

Getting ready

To prepare for this recipe, and the rest of the chapter, download the MovieLens data from the GroupLens website of the University of Minnesota. You can find the data at http://grouplens.org/datasets/movielens/ .

In this chapter, we will use the smaller MoveLens 100k dataset (4.7 MB in size) in order to load the entire model into the memory with ease.

How to do it...

Perform the following steps to better understand the data that we will be working with throughout this chapter:

  1. Download the data from http://grouplens.org/datasets/movielens/ . The 100K dataset is the one that you want (ml-100k.zip):

  1. Unzip the downloaded data into the directory of your choice.
  2. The two files that we are mainly concerned with are u.data, which contains the user movie ratings, and u.item, which contains...