Book Image

R Data Mining

Book Image

R Data Mining

Overview of this book

R is widely used to leverage data mining techniques across many different industries, including finance, medicine, scientific research, and more. This book will empower you to produce and present impressive analyses from data, by selecting and implementing the appropriate data mining techniques in R. It will let you gain these powerful skills while immersing in a one of a kind data mining crime case, where you will be requested to help resolving a real fraud case affecting a commercial company, by the mean of both basic and advanced data mining techniques. While moving along the plot of the story you will effectively learn and practice on real data the various R packages commonly employed for this kind of tasks. You will also get the chance of apply some of the most popular and effective data mining models and algos, from the basic multiple linear regression to the most advanced Support Vector Machines. Unlike other data mining learning instruments, this book will effectively expose you the theory behind these models, their relevant assumptions and when they can be applied to the data you are facing. By the end of the book you will hold a new and powerful toolbox of instruments, exactly knowing when and how to employ each of them to solve your data mining problems and get the most out of your data. Finally, to let you maximize the exposure to the concepts described and the learning process, the book comes packed with a reproducible bundle of commented R scripts and a practical set of data mining models cheat sheets.
Table of Contents (22 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
14
Epilogue

About the Reviewers

Enrico Pegoraro graduated in statistics from the Italian University of Padua more than 20 years ago. He says that "he has experienced in himself the fast-growing computer science and statistics worlds". He has worked on projects involving databases, software development, programming languages, data integration, Linux, Windows, and cloud computing. He is currently working as a freelance statistician and data scientist.

Enrico has gained more than 10 years of experience with R and other statistical software training and consulting activities, with a special focus on Six Sigma, industrial statistical analysis, and corporate training courses. He is also a partner of the main company supporting the MilanoR Italian community. In this company, he works as a freelance principal data scientist, as well as teacher of statistical models and data mining with R training courses.

In his first job, Enrico collaborated with Italian medical institutions, contributing to some regional projects/publications on nosocomial infections. His main expertise is in consulting and teaching statistical modeling, data mining, data science, medical statistics, predictive models, SPC, and industrial statistics. Enrico planning to develop an Italian-language website dedicated to R (www.r-project.it).

Enrico can be contacted at [email protected].

I would like to thank all the people who support me and my activities, particularly my partner, Sonja, and her son, Gianluca.

 

 

Doug Ortiz is an enterprise cloud, big data, data analytics, and solutions architect who has been architecting, designing, developing, and integrating enterprise solutions throughout his career. Organizations that leverage his skillset have been able to rediscover and reuse their underutilized data via existing and emerging technologies such as Amazon Web Services, Microsoft Azure, Google Cloud, Microsoft BI Stack, Hadoop, Spark, NoSQL databases, and SharePoint along with related toolsets and technologies.

He is also the founder of Illustris, LLC and can be reached at [email protected].

Some interesting aspects of his profession are:

  • Experience in integrating multiple platforms and products
  • Big data, data science, R, and Python Certifications
  • He helps organizations gain a deeper understanding of the value of their current investments in data and existing resources, turning them into useful sources of information
  • He has improved, salvaged, and architected projects by utilizing unique and innovative techniques
  • He regularly reviews books on Amazon Web Services, data science, machine learning, R, and cloud technologies

His hobbies are yoga and scuba diving.

I would like to thank my wonderful wife, Mila, for all her help and support, as well as Maria, Nikolay, and our wonderful children.

 

Radovan Kavicky is the principal data scientist and president at GapData Institute, based in Bratislava, Slovakia, where he harnesses the power of data and wisdom of economics for public good. He is a macroeconomist by education, and consultant and analyst by profession (8+ years of experience in consulting for clients from the public and private sector), with strong mathematical and analytical skills. He is able to deliver top-level research and analytical work. From MATLAB, SAS, and Stata, he switched to Python, R and Tableau.

Radovan is an evangelist of open data and a member of the Slovak Economic Association (SEA), Open Budget Initiative, Open Government Partnership, and the global Tableau #DataLeader network (2017). He is the founder of PyData Bratislava, R <- Slovakia, and the SK/CZ Tableau User Group (skczTUG). He has been a speaker at @TechSummit (Bratislava, 2017) and @PyData (Berlin, 2017).

You can follow him on Twitter at @radovankavicky, @GapDataInst or @PyDataBA. His full profile and experience are available at https://www.linkedin.com/in/radovankavicky/ and https://github.com/radovankavicky.

GapData Institute: https://www.gapdata.org.

 

Oleg Okun is a machine learning expert and author/editor of four books, numerous journal articles, and many conference papers. His career spans more than a quarter of a century. He was employed in both academia and industry in his mother country, Belarus, and abroad (Finland, Sweden, and Germany). His work experience includes document image analysis, fingerprint biometrics, bioinformatics, online/offline marketing analytics, credit scoring analytics, and text analytics.

He is interested in all aspects of distributed machine learning and the Internet of Things. Oleg currently lives and works in Hamburg, Germany.

I would like to express my deepest gratitude to my parents for everything that they have done for me.