Book Image

Machine Learning with Swift

By : Jojo Moolayil, Alexander Sosnovshchenko, Oleksandr Baiev
Book Image

Machine Learning with Swift

By: Jojo Moolayil, Alexander Sosnovshchenko, Oleksandr Baiev

Overview of this book

Machine learning as a field promises to bring increased intelligence to the software by helping us learn and analyse information efficiently and discover certain patterns that humans cannot. This book will be your guide as you embark on an exciting journey in machine learning using the popular Swift language. We’ll start with machine learning basics in the first part of the book to develop a lasting intuition about fundamental machine learning concepts. We explore various supervised and unsupervised statistical learning techniques and how to implement them in Swift, while the third section walks you through deep learning techniques with the help of typical real-world cases. In the last section, we will dive into some hard core topics such as model compression, GPU acceleration and provide some recommendations to avoid common mistakes during machine learning application development. By the end of the book, you'll be able to develop intelligent applications written in Swift that can learn for themselves.
Table of Contents (18 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

Choosing a model


Let's say you've defined a task and you have a dataset. What's next? Now you need to choose a model and train it on the dataset to perform that task.

The model is the central concept in ML . ML is basically a science of building models of the real world using data. The term model refers to the phenomenon being modeled, while map refers to the real territory. Depending on the situation, it can play a role of good approximation, an outdated description (in a swiftly changing environment), or even self-fulfilled prophecy (if the model affects the modeled object).

"All models are wrong, but some are useful"

 is a well-known proverb in statistics.

Types of ML algorithms

ML models/algorithms are often divided into three groups depending on the type of input:

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

This division is rather vague because some algorithms fall into two of these groups while others do not fall into any. There are also some middle states, such as semi-supervised learning.

Algorithms in these three groups can perform different tasks, and hence can be divided into subgroups according to the output of the model. Table 1.3 shows the most common ML tasks and their classification.

Supervised learning

Supervised learning is arguably the most common and easy-to-understand type of ML . All supervised learning algorithms have one prerequisite in common: you should have a labeled dataset to train them. Here, a dataset is a set of samples, plus an expected output (label) for each sample. These labels play the role of supervisor during the training.

Note

In different publications, you'll see different synonyms for labels, including dependent variable, predicted variable, and explained variable.

The goal of supervised learning is to get a function that for every given input returns a desired output. In the most simplified version, a supervised learning process consists of two phases: training and inference. During the first phase, you train the model using your labeled dataset. On the second phase, you use your model to do something useful, like make predictions. For instance, given a set of labeled images (dataset), a neural network (model) can be trained to predict (inference) correct labels for previously unseen images.

Using supervised learning, you will usually solve one of two problems: classification or regression. The difference is in the type of labels: categorical in the first case and real numbers in the second.

To classify means simply to assign one of the labels from a predefined set. Binary classification is a special kind of classification, when you have only two labels (positive and negative). An example of a classification task is to assign spam/not-spam labels to letters. We will train our first classifier in the next chapter, and throughout this book we will apply different classifiers for many real-world tasks.

Regression is the task of assigning a real number to a given case. For example, predicting a salary given employee characteristics. We will discuss regression in Chapter6, Linear Regression and Gradient Descentand Chapter 7Linear Classifier and Logistic Regression, in more detail.

If the task is to sort objects in some order (output a permutation, speaking combinatorial), and labels are not really real numbers but rather an order of objects, ranking learning is at hand. You see ranking algorithms in action when you open the Siri suggestions menu on iOS. Each app placed in the list there is done so according to its relevance for you.

If labels are complicated objects, like graphs or trees, neither classification nor regression will be of use. Structured prediction algorithms are the type of algorithms to tackle those problems. Parsing English sentences into syntactic trees is an example of this kind of task.

Ranking and structured learning are beyond the scope of this book because their use cases are not as common as classification or regression, but at least now you know what to Google search for when you need to.

Unsupervised learning

In unsupervised learning, you don't have the labels for the cases in your dataset. Types of tasks to solve with unsupervised learning are: clustering, anomaly detection, dimensionality reduction, and association rule learning.

Sometimes you don't have the labels for your data points but you still want to group them in some meaningful way. You may or may not know the exact number of groups. This is the setting where clustering algorithms are used. The most obvious example is clustering users into some groups, like students, parents, gamers, and so on. The important detail here is that a group's meaning is not predefined from the very beginning; you name it only after you've finished grouping your samples. Clustering also can be useful to extract additional features from the data as a preliminary step for supervised learning. We will discuss clustering in Chapter 4, K-Means Clustering.

Outlier/anomaly detection algorithms are used when the goal is to find some anomalous patterns in the data, weird data points. This can be especially useful for automated fraud or intrusion detection. Outlier analysis is also an important detail of data cleansing.

Dimensionality reduction is a way to distill data to the most informative and, at the same time, compact representation of it. The goal is to reduce a number of features without losing important information. It can be used as a preprocessing step before supervised learning or data visualization.

Association rule learning looks for repeated patterns of user behavior and peculiar co-occurrences of items. An example from retail practice: if a customer buys milk, isn't it more probable that he will also buy cereal? If yes, then perhaps it's better to move shelves, with the cereals closer to the shelf with the milk. Having rules like this, owners of businesses can make informed decisions and adapt their services to customers' needs. In the context of software development, this can empower anticipatory design—when the app seemingly knows what you want to do next and provides suggestions accordingly. In Chapter 5,Association Rule Learning we will implement a priori one of the most well-known rule learning algorithms:

Figure 1.2: Datasets for three types of learning: supervised, unsupervised, and semi-supervised

Note

Labeling data manually is usually a costly thing, especially if special qualification is required. Semi-supervised learning can help when only some of your samples are labeled and others are not (see the following diagram). It is a hybrid of supervised and unsupervised learning. At first, it looks for unlabeled instances, similar to the labeled ones in an unsupervised manner, and includes them in the training dataset. After this, the algorithm can be trained on this expanded dataset in a typical supervised manner.

Reinforcement learning

Reinforcement learning is special in the sense that it doesn't require a dataset (see the following diagram). Instead, it involves an agent who takes actions, changing the state of the environment. After each step, it gets a reward or punishment, depending on the state and previous actions. The goal is to obtain a maximum cumulative reward. It can be used to teach the computer to play video games or drive a car. If you think about it, reinforcement learning is the way our pets train us humans: by rewarding our actions with tail-wagging, or punishing with scratched furniture.

One of the central topics in reinforcement learning is the exploration-exploitation dilemma—how to find a good balance between exploring new options and using what is already known:

Figure 1.3: Reinforcement learning process

Table 1.3: ML tasks:

Task

Output type

Problem example

Algorithms

Supervised learning

   

Regression

Real numbers

Predict house prices, given its characteristics

Linear regression and polynomial regression

Classification

Categorical

Spam/not-spam classification

KNN, Naïve Bayes, logistic regression, decision trees, random forest, and SVM

Ranking

Natural number (ordinal variable)

Sort search results per relevance

Ordinal regression

Structured prediction

Structures: trees, graphs, and so on

Part-of-speech tagging

Recurrent neural networks, and conditional random field

Unsupervised learning

   

Clustering

Groups of objects

Build a tree of living organisms

Hierarchical clustering, k-means, and GMM

Dimensionality reduction

Compact representation of given features

Find most important components in brain activity

PCA, t-SNE, and LDA

Outlier/anomaly detection

Objects that are out of pattern

Fraud detection

Local outlier factor

Association rule learning

Set of rules

Smart house intrusion detection

A priori

Reinforcement learning

   

Control learning

Policy with maximum expected return

Learn to play a video game

Q-learning

Mathematical optimization – how learning works

The magic behind the learning process is delivered by the branch of mathematics called mathematical optimization. Sometimes it's also somewhat misleading being referred to as mathematical programming; the term coined long before widespread computer programming and is not directly related to it. Optimization is the science of choosing the best option among available alternatives; for example, choosing the best ML model.

Mathematically speaking, ML models are functions. You as an engineer chose the function family depending on your preferences: linear models, trees, neural networks, support vector machines, and so on. Learning is a process of picking from the family the function which serves your goals the best. This notion of the best model is often defined by another function, the loss function. It estimates a goodness of the model according to some criteria; for instance, how good the model fits the data, how complex it is, and so on. You can think of the loss function as a judge at a competition whose role is to assess the models. The objective of the learning is to find such a model that delivers a minimum to the loss function (minimize the loss), so the whole learning process is formalized in mathematical terms as a task of function minimization.

Function minimum can be found in two ways: analytically (calculus) or numerically (iterative methods). In ML , we often go for the numerical optimization because the loss functions get too complex for analytical solutions.

A nice interactive tutorial on numerical optimization can be found here: http://www.benfrederickson.com/numerical-optimization/.

From the programmer's point of view, learning is an iterative process of adjusting model parameters until the optimal solution is found. In practice, after a number of iterations, the algorithm stops improving because it is stuck in a local optimum or has reached the global optimum (see the following diagram). If the algorithm always finds the local or global optimum, we say that it converges. On the other hand, if you see your algorithm oscillating more and more and never approaching a useful result, it diverges:

Figure 1.4: Learner represented as a ball on a complex surface: it's possible for him to fall in a local minimum and never reach the global one

Mobile versus server-side ML

Most Swift developers are writing their applications for iOS. Those among us who develop their Swift applications for macOS or server-side are in a lucky position regarding ML . They can use whatever libraries and tools they want, reckoning on powerful hardware and compatibility with interpretable languages. Most of the ML libraries and frameworks are developed with server-side (or at least powerful desktops) in mind. In this book, we talk mostly about iOS applications, and therefore most practical examples consider limitations of handheld devices.

But if mobile devices have limited capabilities, we can do all ML on the server-side, can't we? Why would anyone bother to do ML locally on mobile devices at all? There are at least three issues with client-server architecture:

  • The client app will be fully functional only when it has an internet connection. This may not be a big problem in developed countries but this can limit your target audience significantly. Just imagine your translator app being non-functional during travel abroad.
  • Additional time delay introduced by sending data to the server and getting a response. Who enjoys watching progress bars or, even worse, infinite spinners while your data is being uploaded, processed, and downloaded back again? What if you need those results immediately and without consuming your internet traffic? Client-server architecture makes it almost impossible for such applications of ML as real-time video and audio processing.
  • Privacy concerns: any data you've uploaded to the internet is not yours anymore. In the age of total surveillance, how do you know that those funny selfies you've uploaded today to the cloud will not be used tomorrow to train face recognition, or for target-tracking algorithms for some interesting purposes, like killer drones? Many users don't like their personal information to be uploaded to some servers and possibly shared/sold/leaked to some third parties. Apple also argues for reducing data collection as much as possible.

Some of the applications can be OK (can't be great, though) with those limitations, but most developers want their apps to be responsive, secure, and useful all the time. This is something only on-device ML can deliver.

For me, the most important argument is that we can do ML without server-side. Hardware capabilities are increasing with each year and ML on mobile devices is a hot research field. Modern mobile devices are already powerful enough for many ML algorithms. Smartphones are the most personal and arguably the most important devices nowadays just because they are everywhere. Coding ML is fun and cool, so why should server-side developers have all the fun?

Additional bonuses that you get when implement ML on the mobile side are the free computation power (you are not paying for the electricity) and the unique marketing points (our app puts the power of AI inside of your pocket).

Understanding mobile platform limitations

Now, if I have persuaded you to use ML on mobile devices, you should be aware of some limitations:

  • Computation complexity restriction. The more you load your CPU, the faster your battery will die. It's easy to transform your iPhone into a compact heater with the help of some ML algorithms.
  • Some models take a long time to train. On the server, you can let your neural networks train for weeks; but on a mobile device, even minutes are too long. iOS applications can run and process some data in background mode if they have some good reasons, like playing music. Unfortunately, ML is not on the list of good reasons, so most probably, you will not be able to run it in background mode.
  • Some models take a long time to run. You should think in terms of frames per second and good user experience.
  • Memory restrictions. Some models grow during the training process, while others remain a fixed size.
  • Model size restrictions. Some trained models can take hundreds of megabytes or even gigabytes. But who wants to download your application from the App Store if it is so huge?
  • Locally stored data is mostly restricted to different types of users' personal data, meaning that you will not be able to aggregate the data of different users and perform large-scale ML on mobile devices.
  • Many open source ML libraries are built on top of interpretable languages, like Python, R, and MATLAB, or on top of the JVM, which makes them incompatible with iOS.

Those are only the most obvious challenges. You'll see more as we start to develop real ML apps. But don't worry, there is a way to eat this elephant piece by piece. Efforts spent on it are paid off by a great user experience and users' love. Platform restrictions are not unique to mobile devices. Developers of autonomous devices (like drones), IoT developers, wearable device developers, and many others face the same problems and deal with them successfully.

Many of these problems can be addressed by training the models on powerful hardware, and then deploying them to mobile devices. You can also choose a compromise with two models: a smaller one on a device for offline work, and a large one on the server. For offline work you can choose models with fast inference, then compress and optimize them for parallel execution; for instance, on GPU. We'll talk more about this in Chapter 12, Optimizing Neural Networks for Mobile Devices.