Book Image

Clojure for Machine Learning

By : Akhil Wali
Book Image

Clojure for Machine Learning

By: Akhil Wali

Overview of this book

<p>Clojure for Machine Learning is an introduction to machine learning techniques and algorithms. This book demonstrates how you can apply these techniques to real-world problems using the Clojure programming language.</p> <p>It explores many machine learning techniques and also describes how to use Clojure to build machine learning systems. This book starts off by introducing the simple machine learning problems of regression and classification. It also describes how you can implement these machine learning techniques in Clojure. The book also demonstrates several Clojure libraries, which can be useful in solving machine learning problems.</p> <p>Clojure for Machine Learning familiarizes you with several pragmatic machine learning techniques. By the end of this book, you will be fully aware of the Clojure libraries that can be used to solve a given machine learning problem.</p>
Table of Contents (17 chapters)
Clojure for Machine Learning
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

Machine learning has a vast variety of applications in computing. Software systems that use machine learning techniques tend to provide their users with a better user experience. With cloud data becoming more relevant these days, developers will eventually build more intelligent systems that simplify and optimize any routine task for their users.

This book will introduce several machine learning techniques and also describe how we can leverage these techniques in the Clojure programming language.

Clojure is a dynamic and functional programming language built on the Java Virtual Machine (JVM). It's important to note that Clojure is a member of the Lisp family of languages. Lisp played a key role in the artificial intelligence revolution that took place during the 70s and 80s. Unfortunately, artificial intelligence lost its spark in the late 80s. Lisp, however, continued to evolve, and several dialects of Lisp have been concocted throughout the ages. Clojure is a simple and powerful dialect of Lisp that was first released in 2007. At the time of writing this book, Clojure is one of the most rapidly growing programming languages for the JVM. It currently supports some of the most advanced language features and programming methodologies out there, such as optional typing, software transactional memory, asynchronous programming, and logic programming. The Clojure community is known to mesmerize developers with their elegant and powerful libraries, which is yet another compelling reason to use Clojure.

Machine learning techniques are based on statistics and logic-based reasoning. In this book, we will focus on the statistical side of machine learning. Most of these techniques are based on principles from the artificial intelligence revolution. Machine learning is still an active area of research and development. Large players from the software world, such as Google and Microsoft, have also made significant contributions to machine learning. More software companies are now realizing that applications that use machine learning techniques provide a much better experience to their users.

Although there is a lot of mathematics involved in machine learning, we will focus more on the ideas and practical usage of these techniques, rather than concentrating on the theory and mathematical notations used by these techniques. This book seeks to provide a gentle introduction to machine learning techniques and how they can be used in Clojure.

What this book covers

Chapter 1, Working with Matrices, explains matrices and the basic operations on matrices that are useful for implementing the machine learning algorithms.

Chapter 2, Understanding Linear Regression, introduces linear regression as a form of supervised learning. We will also discuss the gradient descent algorithm and the ordinary least-squares (OLS) method for fitting the linear regression models.

Chapter 3, Categorizing Data, covers classification, which is another form of supervised learning. We will study the Bayesian method of classification, decision trees, and the k-nearest neighbors algorithm.

Chapter 4, Building Neural Networks, explains artificial neural networks (ANNs) that are useful in the classification of nonlinear data, and describes a few ANN models. We will also study and implement the backpropagation algorithm that is used to train an ANN and describe self-organizing maps (SOMs).

Chapter 5, Selecting and Evaluating Data, covers evaluation of machine learning models. In this chapter, we will discuss several methods that can be used to improve the effectiveness of a given machine learning model. We will also implement a working spam classifier as an example of how to build machine learning systems that incorporate evaluation.

Chapter 6, Building Support Vector Machines, covers support vector machines (SVMs). We will also describe how SVMs can be used to classify both linear and nonlinear sample data.

Chapter 7, Clustering Data, explains clustering techniques as a form of unsupervised learning and how we can use them to find patterns in unlabeled sample data. In this chapter, we will discuss the K-means and expectation maximization (EM) algorithms. We will also explore dimensionality reduction.

Chapter 8, Anomaly Detection and Recommendation, explains anomaly detection, which is another useful form of unsupervised learning. We will also discuss recommendation systems and several recommendation algorithms.

Chapter 9, Large-scale Machine Learning, covers techniques that are used to handle a large amount of data. Here, we explain the concept of MapReduce, which is a parallel data-processing technique. We will also demonstrate how we can store data in MongoDB and how we can use the BigML cloud service to build machine learning models.

Appendix, References, lists all the bibliographic references used throughout the chapters of this book.

What you need for this book

One of the pieces of software required for this book is the Java Development Kit (JDK), which you can get from http://www.oracle.com/technetwork/java/javase/downloads/. JDK is necessary to run and develop applications on the Java platform.

The other major software that you'll need is Leiningen, which you can download and install from http://github.com/technomancy/leiningen. Leiningen is a tool for managing Clojure projects and their dependencies. We will explain how to work with Leiningen in Chapter 1, Working with Matrices.

Throughout this book, we'll use a number of other Clojure and Java libraries, including Clojure itself. Leiningen will take care of the downloading of these libraries for us as required. You'll also need a text editor or an integrated development environment (IDE). If you already have a text editor that you like, you can probably use it. Navigate to http://dev.clojure.org/display/doc/Getting+Started to check the tips and plugins required for using your particular favorite environment. If you don't have a preference, I suggest that you look at using Eclipse with Counterclockwise. There are instructions for getting this set up at http://dev.clojure.org/display/doc/Getting+Started+with+Eclipse+and+Counterclockwise.

In Chapter 9, Large-scale Machine Learning, we also use MongoDB, which can be downloaded and installed from http://www.mongodb.org/.

Who this book is for

This book is for programmers or software architects who are familiar with Clojure and want to use it to build machine learning systems. This book does not introduce the syntax and features of the Clojure language (you are expected to be familiar with the language, but you need not be a Clojure expert).

Similarly, although you don't need to be an expert in statistics and coordinate geometry, you should be familiar with these concepts to understand the theory behind the several machine learning techniques that we will discuss. When in doubt, don't hesitate to look up and learn more about the mathematical concepts used in this book.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "The previously defined probability function requires a single argument to represent the attribute or condition whose probability of occurrence we wish to calculate."

A block of code is set as follows:

(defn predict [coefs X]
  {:pre [(= (count coefs)
            (+ 1 (count X)))]}
  (let [X-with-1 (conj X 1)
        products (map * coefs X-with-1)]
    (reduce + products)))

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

:dependencies [[org.clojure/clojure "1.5.1"]
        [incanter "1.5.2"]
        [clatrix "0.3.0"]
        [net.mikera/core.matrix "0.10.0"]]

Any command-line input or output is written as follows:

$ lein deps

Another simple convention that we use is to always show the Clojure code that's entered in the REPL (read-eval-print-loop) starting with the user> prompt. In practice, this prompt will change depending on the Clojure namespace that we are currently using. However, for simplicity, REPL code starts with the user> prompt, as follows:

user> (every? #(< % 0.0001)
              (map - ols-linear-model-coefs
              (:coefs iris-linear-model))
true

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "clicking the Next button moves you to the next screen".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Downloading the color images of this book

We also provide you a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in he output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/4351OS_Graphics.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.