Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

Analyzing overlaps between our lists of R users


But our original idea was to predict the number of R users around the world and not to focus on some minor segments, right? Now that we have multiple data sources, we can start building some models combining those to provide estimates on the global number of R users.

The basic idea behind this approach is the capture-recapture method, which is well known in ecology, where we first try to identify the probability of capturing a unit from the population, and then we use this probability to estimate the number of not captured units.

In our current study, units will be R users and the samples are the previously captured name lists on the:

  • Supporters of the R Foundation

  • R package maintainers who submitted at least one package to CRAN

  • R-help mailing list e-mail senders

Let's merge these lists with a tag referencing the data source:

> lists <- rbindlist(list(
+     data.frame(name = unique(supporterlist), list = 'supporter'),
+     data.frame(name...