Book Image

Machine Learning with Python

By : Oliver Theobald
Book Image

Machine Learning with Python

By: Oliver Theobald

Overview of this book

The course starts by setting the foundation with an introduction to machine learning, Python, and essential libraries, ensuring you grasp the basics before diving deeper. It then progresses through exploratory data analysis, data scrubbing, and pre-model algorithms, equipping you with the skills to understand and prepare your data for modeling. The journey continues with detailed walkthroughs on creating, evaluating, and optimizing machine learning models, covering key algorithms such as linear and logistic regression, support vector machines, k-nearest neighbors, and tree-based methods. Each section is designed to build upon the previous, reinforcing learning and application of concepts. Wrapping up, the course introduces the next steps, including an introduction to Python for newcomers, ensuring a comprehensive understanding of machine learning applications.
Table of Contents (18 chapters)
Free Chapter
1
FOREWORD
2
DATASETS USED IN THIS BOOK
3
INTRODUCTION
4
DEVELOPMENT ENVIRONMENT
5
MACHINE LEARNING LIBRARIES
6
EXPLORATORY DATA ANALYSIS
7
DATA SCRUBBING
8
PRE-MODEL ALGORITHMS
9
SPLIT VALIDATION
10
MODEL DESIGN
11
LINEAR REGRESSION
12
LOGISTIC REGRESSION
13
SUPPORT VECTOR MACHINES
14
k-NEAREST NEIGHBORS
15
TREE-BASED METHODS
16
NEXT STEPS
APPENDIX 1: INTRODUCTION TO PYTHON
APPENDIX 2: PRINT COLUMNS

FOREWORD

 

While it’s luring to see trends rise quickly, it’s important to see long periods of resilience before the curve. For those pursuing a career in machine learning, it’s reassuring to know this field of study not only predates the Internet and the moon landing but also most readers of this book.

Machine learning is not an overnight movement and the path to the present day has been anything but smooth sailing. Conceptual theories emerged in the 1950s but progress was stalled by computational constraints and limited data. This resulted in a logjam of research and good intentions as theoretical models of prediction, algorithm design, and extrapolation of future possibilities accumulated in research institutions until powerful processing chips and large datasets emerged in the 1990s. Renewed interest helped to breach the gap between theory and capability during this decade but it still wasn’t enough to push field-altering breakthroughs in the space of deep learning.

That breakthrough came in 2009 when Adjunct Professor Andrew Ng and his team at Stanford University experimented with tethering gaming chips—better known for image rendering—to solve complex data problems. The combination of inexpensive GPU (graphic processing unit) chips and compute-intensive algorithms pushed the lead domino in the development of deep learning. This crucial breakthrough coalesced with other developments in reinforcement learning to spark a surge in interest, an oversupply of newspaper analogies to Hollywood movies, and an international hunt for AI talent.

In 2016, media interest climbed to a new high at the glitzy Four Seasons Hotel in Seoul, where TV cameras locked lenses on an 18-by-18 Go board with the world champion on one side and an AI program on the other. The game of Go consists of billions of permutations and commentators described the then world champion, Lee Sodol, as having a sixth sense for interpreting the state of play. His opponent was AlphaGo, a sophisticated deep learning model designed to outperform any opponent—mortal or synthetic.

The team of human developers responsible for designing the AlphaGo program scarcely knew the rules of the game when they began work on the project, but they watched on excitedly as AlphaGo performed its first move.

The AI model unsettled Lee early—forcing him to take a nervous cigarette break—before systemically defeating the South Korean four games to one. News headlines of AlphaGo’s cold and mechanistic victory beamed across the globe—as had been the case with other televised AI feats before it. Predictably, these reports focused on the superiority of machine intelligence over humans.

Contrary to these initial headlines, the 2017 Netflix documentary AlphaGo helped to later realign attention towards the human ingenuity behind AlphaGo’s victory. The documentary details the lead-up to Seoul and in doing so shines the light on a team of talented employees thriving in a new and far-reaching line of work.

Dressed in casual attire, the AlphaGo team can be seen working hard behind their screens stocking the model with training data, optimizing its hyperparameters, and coordinating vital computational resources before extracting game tactics from human experts honed over many years of competition.

Despite its prolific success, the AlphaGo program has not replaced any of the programmers who worked on its source code or taken away their salaries. In fact, the development of AlphaGo has helped to expand the size and profile of the company DeepMind Technologies, which was acquired by Alphabet Inc earlier in 2014.

 

Working in AI

After two AI winters and ongoing battles for academic funding, we have entered a golden age in industry employment. Complex databases, fast and affordable processing units, and advanced algorithms have rejuvenated established fields of human expertise in mathematics, statistics, computer programming, graphics and visualization as well as good old problem-solving skills.

In a global job market steadily automated and simplified by Web 2.0 technology, the field of machine learning provides a professional nirvana for human ingenuity and meaningful work. It’s a cognitively demanding occupation; one that goes far beyond tuning ad campaigns or tracking web traffic on side-by-side monitors. With jobs in this industry demanding expertise in three distinct fields, achieving machine intelligence is far from easy and demands a high level of expertise.

The ideal skillset for a machine learning developer spans coding, data management, and knowledge of statistics and mathematics. Optional areas of expertise include data visualization, big data management, and practical experience in distributed computing architecture. This book converges on the vital coding part of machine learning using Python.

Released in 1991 by Guido van Rossum, Python is widely used in the field of machine learning and is easy to learn courtesy of van Rossum’s emphasis on code readability. Python is versatile too; while other popular languages like R offer advantages in advanced mathematical operations and statistical functions, they offer limited practical use outside of hard data crunching. The utility of Python, however, extends to data collection (web scraping) and data piping (Hadoop and Spark), which are important for sending data to the operating table. In addition, Python is convertible to C and C++, enabling practitioners to run code on graphic processing units reserved for advanced computation.

The other advantages of learning a popular programming language (such as Python) are the depth of jobs and the spread of relevant support. Access to documentation, tutorials, and assistance from a helpful community to troubleshoot code problems cannot be overlooked and especially for anyone beginning their journey in the complex world of computer programming.
As a practical introduction to coding machine learning models, this book falls short of a complete introduction to programming with Python. Instead, general nuances are explained to enlighten beginners without stalling the progress of experienced programmers. For those new to Python, a basic overview of Python can be found in the Appendix section of this book. It’s also recommended that you spend 2-3 hours watching introductory Python tutorials on YouTube or Udemy if this is your first time working with Python.

 

What You Will Learn

As the second book in the Machine Learning for Beginner’s Series, the key premise of this title is to teach you how to code basic machine learning models. The content is designed for beginners with general knowledge of machine learning, including common algorithms such as logistic regression and decision trees. If this doesn’t describe your experience or you’re in need of a refresher, I have summarized key concepts from machine learning in the opening chapter and there are overviews of specific algorithms dispersed throughout the book. For a gentle and more detailed explanation of machine learning theory minus the code, I suggest reading the first title in this series Machine Learning for Absolute Beginners (Third Edition), which is written for a more general audience.

Finally, it’s important to note that as new versions of Python code libraries become available, it’s possible for small discrepancies to materialize between the code shown in this book and the actual output of Python in your development environment. To clarify any discrepancies or to help troubleshoot your code, please contact me at [email protected] for assistance. General code problems can also be solved by searching for answers on Stack Overflow (www.stackoverflow.com) or by Google searching the error message outputted by the Python interpreter.

 

Conventions Used in This Book

Italic indicates the introduction of new technical terms
lowercase bold indicates programming code in Python
the terms “target variable” and “output” are used interchangeably
the terms “variable” and “feature” are used interchangeably
Typical of machine learning literature, “independent variables” are expressed as an uppercase “X” and the “dependent variable” as a lowercase “y”