Numerical Computing with Python

Numerical Computing with Python

By : Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim, Theodore Petrou

Buy this Book

Numerical Computing with Python

By: Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim, Theodore Petrou

Buy this Book

Overview of this book

Data mining, or parsing the data to extract useful insights, is a niche skill that can transform your career as a data scientist Python is a flexible programming language that is equipped with a strong suite of libraries and toolkits, and gives you the perfect platform to sift through your data and mine the insights you seek. This Learning Path is designed to familiarize you with the Python libraries and the underlying statistics that you need to get comfortable with data mining. You will learn how to use Pandas, Python's popular library to analyze different kinds of data, and leverage the power of Matplotlib to generate appealing and impressive visualizations for the insights you have derived. You will also explore different machine learning techniques and statistics that enable you to build powerful predictive models. By the end of this Learning Path, you will have the perfect foundation to take your data mining skills to the next level and set yourself on the path to become a sought-after data science professional. This Learning Path includes content from the following Packt products: • Statistics for Machine Learning by Pratap Dangeti • Matplotlib 2.x By Example by Allen Yu, Claire Chung, Aldrin Yim • Pandas Cookbook by Theodore Petrou

Title Page

Contributors

About Packt

Preface

Free Chapter

Journey from Statistics to Machine Learning

Statistical terminology for model building and validation

Summary

Tree-Based Machine Learning Models

Introducing decision tree classifiers

Comparison between logistic regression and decision trees

Comparison of error components across various styles of models

Remedial actions to push the model towards the ideal region

HR attrition data example

Decision tree classifier

Tuning class weights in decision tree classifier

Bagging classifier

Random forest classifier

Random forest classifier - grid search

AdaBoost classifier

Gradient boosting classifier

Comparison between AdaBoosting versus gradient boosting

Extreme gradient boosting - XGBoost classifier

Ensemble of ensembles - model stacking

Ensemble of ensembles with different types of classifiers

Ensemble of ensembles with bootstrap samples using a single type of classifier

Summary

K-Nearest Neighbors and Naive Bayes

K-nearest neighbors

KNN classifier with breast cancer Wisconsin data example

Tuning of k-value in KNN classifier

Naive Bayes

Probability fundamentals

Understanding Bayes theorem with conditional probability

Naive Bayes classification

Laplace estimator

Naive Bayes SMS spam classification example

Summary

Unsupervised Learning

K-means clustering

Principal Component Analysis - PCA

Singular value decomposition - SVD

Deep auto encoders

Model building technique using encoder-decoder architecture

Deep auto encoders applied on handwritten digits using Keras

Summary

Reinforcement Learning

Reinforcement learning basics

Markov decision processes and Bellman equations

Dynamic programming

Grid world example using value and policy iteration algorithms with basic Python

Monte Carlo methods

Temporal difference learning

SARSA on-policy TD control

Q-learning - off-policy TD control

Cliff walking example of on-policy and off-policy of TD control

Contributors

About the authors

Pratap Dangeti is currently working as a Senior Data Scientist at Bidgely Technologies, Bangalore. He has a vast experience in analytics and data science. He received his master's degree from IIT Bombay in its industrial engineering and operations research program. Pratap is an artificial intelligence enthusiast. When not working, he likes to read about next-gen technologies and innovative methodologies.

First and foremost, I would like to thank my mom, Lakshmi, for her support throughout my career and in writing this book. She has been my inspiration and motivation for continuing to improve my knowledge and helping me move ahead in my career. She is my strongest supporter, and I dedicate this book to her. I also thank my family and friends for their encouragement, without which it would not be possible to write this book. I would like to thank my acquisition editor, Aman Singh, and content development editor, Mayur Pawanikar, who chose me to write this book and encouraged me constantly throughout the period of writing with their invaluable feedback and input.

Allen Yu, Ph.D., is a Chevening Scholar, 2017-18, and an MSC student in computer science at the University of Oxford. He holds a Ph.D. degree in Biochemistry from the Chinese University of Hong Kong, and he has used Python and Matplotlib extensively during his 10 years of bioinformatics experience.

Apart from academic research, Allen is the co-founder of Codex Genetics Limited, which aims to provide a personalized medicine service in Asia through the use of the latest genomics technology.

I wish to thank my fiancée, Dorothy, for her constant love and support, especially during the difficult time in balancing family, work, and life. On behalf of the authors, I would like to thank the wonderful team at Packt Publishing—Mayur, Tushar, Vikrant, Vivek, and the whole editorial team who helped in the creation of this book. Thanks to Tushar's introduction, the authors feel greatly honored to take part in this amazing project. Special thanks and much appreciation to Mayur for guiding the production of this book from the ground up. The authors truly appreciate the comprehensive reviews from Nikhil Borkar. We cannot be thankful enough to the entire Matplotlib and Python community for their hard work in creating open and incredibly useful tools. Last but not least, I would like to express my sincere gratitude to Prof. Ting-Fung Chan, my parents, friends, and colleagues for their guidance in my life and work. Chevening Scholarships, the UK government’s global scholarship programme, are funded by the Foreign and Commonwealth Office (FCO) and partner organizations.

Claire Chung is pursuing her Ph.D. degree as a Bioinformatician at the Chinese University of Hong Kong. She enjoys using Python daily for work and lifehack. While passionate in science, her challenge-loving character motivates her to go beyond data analytics. She has participated in web development projects, as well as developed skills in graphic design and multilingual translation. She led the Campus Network Support Team in college and shared her experience in data visualization in PyCon HK 2017.

I would like to thank Allen for getting me on board in this exciting authorship journey, and for being a helpful senior, always generous in sharing his experience and insights. It has been a great pleasure to work closely with Allen, Aldrin and the whole editorial team at Packt. I am grateful to everyone along the way that brought my interest in computer to daily practice. I wish to extend my sincere gratitude to my supervisor, Prof. Ting-Fung Chan, my parents, teachers, colleagues, and friends. I would like to make a special mention to my dearest group of high school friends for their unfailing support and source of cheer. I would also like to thank my childhood friend, Eugene, for introducing and provoking me into technological areas. With all the support, I will continue to prove that girls are capable of achieving in the STEM field.

Aldrin Yim is a Ph.D. candidate and Markey Scholar in the Computation and System Biology program at Washington University, School of Medicine. His research focuses on applying big data analytics and machine learning approaches in studying neurological diseases and cancer. He is also the founding CEO of Codex Genetics Limited, which provides precision medicine solutions to patients and hospitals in Asia.

It is not a one-man task to write a book, and I would like to thank Allen and Claire for their invaluable input and effort during the time; the authors also owe a great debt of gratitude to all the editors and reviewers that made this book happened. I also wish to thank my parents for their love and understanding over the years, as well as my best friends, Charles and Angus, for accompanying me through my ups and downs over the past two decades. Last but not least, I also wish to extend my heartfelt thanks to Kimmy for all the love and support in life and moving all the way to Chicago to keep our love alive.

Theodore Petrou is a data scientist and the founder of Dunder Data, a professional educational company focusing on exploratory data analysis. He is also the head of Houston Data Science, a meetup group with more than 2,000 members that has the primary goal of getting local data enthusiasts together in the same room to practice data science. Before founding Dunder Data, Ted was a data scientist at Schlumberger, a large oil services company, where he spent the vast majority of his time exploring data.Some of his projects included using targeted sentiment analysis to discover the root cause of past failures from engineer text, developing customized client/server dashboarding applications, and real-time web services to avoid mispricing sales items. Ted received his Master's degree in statistics from Rice University and used his analytical skills to play poker professionally and teach math before becoming a data scientist. Ted is a strong supporter of learning through practice and can often be found answering questions about pandas on Stack Overflow.

About the reviewers

Manuel Amunategui is vice president of data science at SpringML, a startup offering Google Cloud TensorFlow and Salesforce enterprise solutions. Prior to that, he worked as a quantitative developer on Wall Street for a large equity-options market-making firm and as a software developer at Microsoft. He holds master degrees in predictive analytics and international administration.

He is a data science advocate, blogger/vlogger (amunategui.github.io) and a trainer on Udemy and O'Reilly Media, and technical reviewer at Packt Publishing.

Nikhil Borkar holds a CQF designation and a postgraduate degree in quantitative finance. He also holds certified financial crime examiner and certified anti-money laundering professional qualifications. He is a registered research analyst with the Securities and Exchange Board of India (SEBI) and has a keen grasp of laws and regulations pertaining to securities and investment. He is currently working as an independent FinTech and legal consultant. Prior to this, he worked with Morgan Stanley Capital International as a Global RFP project manager. He is self-motivated, intellectually curious, and hardworking. He loves to approach problems using a multi-disciplinary, holistic approach. Currently, he is actively working on machine learning, artificial intelligence, and deep learning projects. He has expertise in the following areas:

Quantitative investing: equities, futures and options, and derivatives engineering
Econometrics: time series analysis, statistical modeling
Algorithms: parametric, non-parametric, and ensemble machine learning algorithms
Code: R programming, Python, Scala, Excel VBA, SQL, and big data ecosystems.
Data analysis: Quandl and Quantopian
Strategies: trend following, mean reversion, cointegration, Monte-Carlo srimulations, Value at Risk, Credit Risk Modeling and Credit Rating
Data visualization: Tableau and Matplotlib

Sonali Dayal is a masters candidate in biostatistics at the University of California, Berkeley. Previously, she has worked as a freelance software and data science engineer for early stage start-ups, where she built supervised and unsupervised machine learning models as well as data pipelines and interactive data analytics dashboards. She received her bachelor of science (B.S.) in biochemistry from Virginia Tech in 2011.

Kuntal Ganguly is a big data machine learning engineer focused on building large-scale data-driven systems using big data frameworks and machine learning. He has around 7 years of experience building several big data and machine learning applications.

Kuntal provides solutions to AWS customers in building real-time analytics systems using managed cloud services and open source Hadoop ecosystem technologies such as Spark, Kafka, Storm, Solr, and so on, along with machine learning and deep learning frameworks such as scikit-learn, TensorFlow, Keras, and BigDL. He enjoys hands-on software development, and has single-handedly conceived, architectured, developed, and deployed several large scale distributed applications. He is a machine learning and deep learning practitioner and very passionate about building intelligent applications.

Kuntal is the author of the books: Learning Generative Adversarial Network and R Data Analysis Cookbook - Second Edition, Packt Publishing.

Shilpi Saxena is a seasoned professional who leads in management with an edge of being a technology evangelist--she is an engineer who has exposure to a variety of domains (machine-to-machine space, healthcare, telecom, hiring, and manufacturing). She has experience in all aspects of the conception and execution of enterprise solutions. She has been architecting, managing, and delivering solutions in the big data space for the last 3 years, handling high performance geographically distributed teams of elite engineers. Shilpi has around 12+ years (3 years in the big data space) experience in the development and execution of various facets of enterprise solutions, both in the product/services dimensions of the software industry. An engineer by degree and profession who has worn various hats--developer, technical leader, product owner, tech manager--and has seen all the flavors that the industry has to offer. She has architectured and worked through some of the pioneer production implementation in big data on Storm and Impala with auto scaling in AWS. LinkedIn: http://in.linkedin.com/pub/shilpi-saxena/4/552/a30

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Numerical Computing with Python

By : Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim, Theodore Petrou

Numerical Computing with Python

By: Pratap Dangeti, Allen Yu, Claire Chung, Aldrin Yim, Theodore Petrou

Overview of this book

Related Content you might be interested in

Current Title:

Numerical Computing with Python

Contributors

About the authors

About the reviewers

Packt is searching for authors like you