Book Image

Hands-On Unsupervised Learning with Python

By : Giuseppe Bonaccorso
Book Image

Hands-On Unsupervised Learning with Python

By: Giuseppe Bonaccorso

Overview of this book

Unsupervised learning is about making use of raw, untagged data and applying learning algorithms to it to help a machine predict its outcome. With this book, you will explore the concept of unsupervised learning to cluster large sets of data and analyze them repeatedly until the desired outcome is found using Python. This book starts with the key differences between supervised, unsupervised, and semi-supervised learning. You will be introduced to the best-used libraries and frameworks from the Python ecosystem and address unsupervised learning in both the machine learning and deep learning domains. You will explore various algorithms, techniques that are used to implement unsupervised learning in real-world use cases. You will learn a variety of unsupervised learning approaches, including randomized optimization, clustering, feature selection and transformation, and information theory. You will get hands-on experience with how neural networks can be employed in unsupervised scenarios. You will also explore the steps involved in building and training a GAN in order to process images. By the end of this book, you will have learned the art of unsupervised learning for different real-world challenges.
Table of Contents (12 chapters)

Why Python for data science and machine learning?

Before moving on with more technical discussions, I think it's helpful to explain the choice of Python as the programming language for this book. In the last decade, research in the field of data science and machine learning has seen exponential growth, with thousands of valuable papers and dozens of complete tools. In particular, thanks to its efficiency, elegance, and compactness, Python has been chosen by many researchers and programmers to create a complete scientific ecosystem that has been released for free.

Nowadays, packages such as scikit-learn, SciPy, NumPy, Matplotlib, pandas, and many others represent the backbone of hundreds of production-ready systems and their usage keeps growing. Moreover, complex deep learning applications such as Theano, TensorFlow, and PyTorch allow every Python user to create and train complex models without any speed limits. In fact, it's important to note that Python is not a scripting language anymore. It supports dozens of specific tasks (for example, web frameworks and graphics) and it can be interfaced with native code written in C or C++.

For such reasons, Python is an optimal choice in almost any data science project and due to its features all programmers with different backgrounds can easily learn to use it effectively in a short time. Other free solutions are also available (for example, R, Java, or Scala), however, in the case of R, there's complete coverage of statistical and mathematical functions but it lacks the support frameworks that are necessary to build complete applications. Conversely, Java and Scala have a complete ecosystem of production-ready libraries, but, in particular, Java is not as compact and easy to use as Python. Moreover, the support for native code is much more complex and the majority of libraries rely exclusively on the JVM (with a consequent performance loss).

Scala has gained an important position in the big data panorama, thanks to its functional properties and the existence of frameworks such as Apache Spark, (which can be employed to carry out machine learning tasks with big data). However, considering all the pros and cons, Python remains the optimal choice and that's why it has been chosen for this book.