Book Image

Building Machine Learning Systems with Python

Book Image

Building Machine Learning Systems with Python

Overview of this book

Machine learning, the field of building systems that learn from data, is exploding on the Web and elsewhere. Python is a wonderful language in which to develop machine learning applications. As a dynamic language, it allows for fast exploration and experimentation and an increasing number of machine learning libraries are developed for Python.Building Machine Learning system with Python shows you exactly how to find patterns through raw data. The book starts by brushing up on your Python ML knowledge and introducing libraries, and then moves on to more serious projects on datasets, Modelling, Recommendations, improving recommendations through examples and sailing through sound and image processing in detail. Using open-source tools and libraries, readers will learn how to apply methods to text, images, and sounds. You will also learn how to evaluate, compare, and choose machine learning techniques. Written for Python programmers, Building Machine Learning Systems with Python teaches you how to use open-source libraries to solve real problems with machine learning. The book is based on real-world examples that the user can build on. Readers will learn how to write programs that classify the quality of StackOverflow answers or whether a music file is Jazz or Metal. They will learn regression, which is demonstrated on how to recommend movies to users. Advanced topics such as topic modeling (finding a text's most important topics), basket analysis, and cloud computing are covered as well as many other interesting aspects.Building Machine Learning Systems with Python will give you the tools and understanding required to build your own systems, which are tailored to solve your problems.
Table of Contents (20 chapters)
Building Machine Learning Systems with Python
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Choosing the number of topics


So far, we have used a fixed number of topics, which is 100. This was purely an arbitrary number; we could have just as well done 20 or 200 topics. Fortunately, for many users, this number does not really matter. If you are going to only use the topics as an intermediate step as we did previously, the final behavior of the system is rarely very sensitive to the exact number of topics. This means that as long as you use enough topics, whether you use 100 topics or 200, the recommendations that result from the process will not be very different. One hundred is often a good number (while 20 is too few for a general collection of text documents). The same is true of setting the alpha (α) value. While playing around with it can change the topics, the final results are again robust against this change.

Tip

Topic modeling is often an end towards a goal. In that case, it is not always important exactly which parameters you choose. Different numbers of topics or values...