Book Image

Mastering pandas

By : Femi Anthony
Book Image

Mastering pandas

By: Femi Anthony

Overview of this book

<p>Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. The pandas brings these features of Python into the data analysis realm, by providing expressiveness, simplicity, and powerful capabilities for the task of data analysis. By mastering pandas, users will be able to do complex data analysis in a short period of time, as well as illustrate their findings using the rich visualization capabilities of related tools such as IPython and matplotlib.</p> <p>This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.</p>
Table of Contents (18 chapters)
Mastering pandas
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Logical subsetting


In R as well as in pandas, there is more than one way to perform logical subsetting. Suppose that we wished to display all players with the average goals per game ratio of greater than or equal to 0.5; that is, they average at least one goal every two games.

Logical subsetting in R

Here's how we can do this in R:

  • Via a logical slice:

    >goal_stats[goal_stats$GoalsPerGame>=0.5,]
       Club            Player        Goals GamesPlayedGoalsPerGame
    1  Atletico Madrid Diego Costa     8           9    0.8888889
    6  Real Madrid Cristiano Ronaldo  17          11    1.5454545
    7  Real Madrid       Gareth Bale   6          12    0.5000000
    17 Chelsea          Demba Ba     3           6    0.5000000
    
  • Via the subset() function:

    >subset(goal_stats,GoalsPerGame>=0.5)
       Club            Player      Goals GamesPlayedGoalsPerGame
    1  Atletico Madrid Diego Costa    8           9    0.8888889
    6  Real Madrid Cristiano Ronaldo 17          11    1.5454545
    7  Real Madrid     Gareth Bale    6    ...