Machine Learning with Swift

Machine Learning with Swift

By : Jojo Moolayil, Alexander Sosnovshchenko, Oleksandr Baiev

Buy this Book

Machine Learning with Swift

By: Jojo Moolayil, Alexander Sosnovshchenko, Oleksandr Baiev

Buy this Book

Overview of this book

Machine learning as a field promises to bring increased intelligence to the software by helping us learn and analyse information efficiently and discover certain patterns that humans cannot. This book will be your guide as you embark on an exciting journey in machine learning using the popular Swift language. We’ll start with machine learning basics in the first part of the book to develop a lasting intuition about fundamental machine learning concepts. We explore various supervised and unsupervised statistical learning techniques and how to implement them in Swift, while the third section walks you through deep learning techniques with the help of typical real-world cases. In the last section, we will dive into some hard core topics such as model compression, GPU acceleration and provide some recommendations to avoid common mistakes during machine learning application development. By the end of the book, you'll be able to develop intelligent applications written in Swift that can learn for themselves.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

Getting Started with Machine Learning

What is AI?

The motivation behind ML

What is ML ?

Applications of ML

Using ML to build smarter iOS applications

Getting to know your data

Choosing a model

Summary

Bibliography

Classification – Decision Tree Learning

Machine learning toolbox

Prototyping the first machine learning app

IPython notebook crash course

Time to practice

Machine learning for extra-terrestrial life explorers

Loading the dataset

Exploratory data analysis

Data preprocessing

Decision trees everywhere

Training the decision tree classifier

How decision tree learning works

Implementing first machine learning app in Swift

Introducing Core ML

Summary

K-Nearest Neighbors Classifier

Calculating the distance

Using instance-based models for classification and clustering

People motion recognition using inertial sensors

Understanding the KNN algorithm

Recognizing human motion using KNN

Reasoning in high-dimensional spaces

KNN pros

KNN cons

Improving our solution

Summary

Bibliography

K-Means Clustering

Unsupervised learning

K-means clustering

Implementing k-means in Swift

Clustering objects on a map

Choosing the number of clusters

K-means clustering – problems

K-means++

Image segmentation using k-means

Summary

Association Rule Learning

Seeing association rules

Defining data structures

Using association measures to assess rules

Decomposing the problem

Generating all possible rules

Finding frequent item sets

The Apriori algorithm

Implementing Apriori in Swift

Running Apriori

Running Apriori on real-world data

The pros and cons of Apriori

Building an adaptable user experience

Summary

Bibliography

Linear Regression and Gradient Descent

Understanding the regression task

Introducing simple linear regression

Feature scaling

Feature standardization

Implementing multiple linear regression in Swift

Fixing linear regression problems with regularization

Summary

Bibliography

Linear Classifier and Logistic Regression

Revisiting the classification task

Implementing logistic regression in Swift

Predicting user intents

Choosing the regression model for your problem

Bias-variance trade-off

Summary

Neural Networks

What are artificial NNs anyway?

Building the neuron

Building the network

Building a neural layer in Swift

Using neurons to build logical functions

Implementing layers in Swift

Training the network

Basic neural network subroutines (BNNS)

Summary

Convolutional Neural Networks

Understanding users emotions

Introducing computer vision problems

Introducing convolutional neural networks

Pooling operation

Convolution operation

Building the network

Loss functions

Training the network

Training the CNN for facial expression recognition

Environment setup

Deep learning frameworks

Plotting the network structure

Training the network

Plotting loss

Making predictions

Saving the model in HDF5 format

Converting to Core ML format

Visualizing convolution filters

Deploying CNN to iOS

Summary

Bibliography

Natural Language Processing

NLP in the mobile development world

Word Association game

Python NLP libraries

Textual corpuses

Common NLP approaches and subtasks

Distributional semantics hypothesis

Word vector representations

Autoencoder neural networks

Word2Vec

Word2Vec in Gensim

Vector space properties

iOS application

Word2Vec friends and relatives

Where to go from here?

Summary

Machine Learning Libraries

Machine learning and AI APIs

Libraries

General-purpose machine learning libraries

Inference-only libraries

NLP libraries

Speech recognition

Computer vision

Low-level subroutine libraries

Choosing a deep learning framework

Summary

Optimizing Neural Networks for Mobile Devices

Delivering perfect user experience

Calculating the size of a convolutional neural network

Lossless compression

Compact CNN architectures

Preventing a neural network from growing big

Lossy compression

An example of the network compression

Summary

Bibliography

Best Practices

Mobile machine learning project life cycle

Best practices

Machine learning gremlins

Recommended learning resources

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Getting to know your data

For many years, researchers argued about what is more important: data or algorithms. But now, it looks like the importance of data over algorithms is generally accepted among ML specialists. In most cases, we can assume that the one who has better data usually beats those with more advanced algorithms. Garbage in, garbage out—this rule holds true in ML more than anywhere else. To succeed in this domain, one need not only have data, but also needs to know his data and know what to do with it.

ML datasets are usually composed from individual observations, called samples, cases, or data points. In the simplest case, each sample has several features.

Features

When we are talking about features in the context of ML , what we mean is some characteristic property of the object or phenomenon we are investigating.

Note

Other names for the same concept you'll see in some publications are explanatory variable, independent variable, and predictor.

Features are used to distinguish objects from each other and to measure the similarity between them.

For instance:

If the objects of our interest are books, features could be a title, page count, author's name, a year of publication, genre, and so on
If the objects of interest are images, features could be intensities of each pixel
If the objects are blog posts, features could be language, length, or presence of some terms

Note

It's useful to imagine your data as a spreadsheet table. In this case, each sample (data point) would be a row, and each feature would be a column. For example, Table 1.1 shows a tiny dataset of books consisting of four samples where each has eight features.

Table 1.1: an example of a ML dataset (dummy books):

Title	Author's name	Pages	Year	Genre	Average readers review score	Publisher	In stock
Learn ML in 21 Days	Machine Learner	354	2018	Sci-Fi	3.9	Untitled United	False
101 Tips to Survive an Asteroid Impact	Enrique Drills	124	2021	Self-help	4.7	Vacuum Books	True
Sleeping on the Keyboard	Jessica's Cat	458	2014	Non-fiction	3.5	JhGJgh Inc.	True
Quantum Screwdriver: Heritage	Yessenia Purnima	1550	2018	Sci-Fi	4.2	Vacuum Books	True

Types of features

In the books example, you can see several types of features:

Categorical or unordered: Title, author, genre, publisher. They are similar to enumeration without raw values in Swift, but with one difference: they have levels instead of cases. Important: you can't order them or say that one is bigger than another.
Binary: The presence or absence of something, just true or false. In our case, the In stock feature.
Real numbers: Page count, year, average reader's review score. These can be represented as float or double.

There are others, but these are by far the most common.

The most common ML algorithms require the dataset to consist of a number of samples, where each sample is represented by a vector of real numbers (feature vector), and all samples have the same number of features. The simplest (but not the best) way of translating categorical features into real numbers is by replacing them with numerical codes (Table 1.2).

Table 1.2: dummy books dataset after simple preprocessing:

Title	Author's name	Pages	Year	Genre	Average readers review score	Publisher	In stock
0.0	0.0	354.0	2018.0	0.0	3.9	0.0	0.0
1.0	1.0	124.0	2021.0	1.0	4.7	1.0	1.0
2.0	2.0	458.0	2014.0	2.0	3.5	2.0	1.0
3.0	3.0	1550.0	2018.0	0.0	4.2	1.0	1.0

This is an example of how your dataset may look before you feed it into your ML algorithm. Later, we will discuss the nuts and bolts of data preprocessing for specific applications.

Choosing a good set of features

For ML purposes, it's necessary to choose a reasonable set of features, not too many and not too few:

If you have too few features, this information may be not sufficient for your model to achieve the required quality. In this case, you want to construct new ones from existing features, or extract more features from the raw data.
If you have too many features you want to select only the most informative and discriminative, because the more features you have the more complex your computations become.

How do you tell which features are most important? Sometimes common sense helps. For example, if you are building a model that recommends books for you, the genre and average rating of the book are perhaps more important features than the number of pages and year of publication. But what if your features are just pixels of a picture and you're building a face recognition system? For a black and white image of size 1024 x 768, we'd get 786,432 features. Which pixels are most important? In this case, you have to apply some algorithms to extract meaningful features. For example, in computer vision, edges, corners, and blobs are more informative features then raw pixels, so there are plenty of algorithms to extract them (Figure 1.1). By passing your image through some filters, you can get rid of unimportant information and reduce the number of features significantly; from hundreds of thousands to hundreds, or even tens. The techniques that helps to select the most important subset of features is known as feature selection, while the feature extraction techniques result in the creation of new features:

Figure 1.1: Edge detection is a common feature extraction technique in computer vision. You can still recognize the object on the right image, despite it containing significantly less information than the left one.

Feature extraction, selection, and combining is a kind of the art which is known as feature engineering. This requires not only hacking and statistical skills but also domain knowledge. We will see some feature engineering techniques while working on practical applications in the following chapters. We also will step into the exciting world of deep learning: a technique that gives a computer the ability to extract high-level abstract features from the low-level features.

The number of features you have for each sample (or length of feature vector) is usually referred to as the dimensionality of the problem. Many problems are high-dimensional, with hundreds or even thousands of features. Even worse, some of those problems are sparse; that is, for each data point, most of the features are zero or missed. This is a common situation in recommender systems. For instance, imagine yourself building the dataset of movie ratings: the rows are movies and columns are users, and in each cell, you have a rating given by the user of the movie. The majority of the cells in the table will remain empty, as most of the users will never have watched most of the movies. The opposite situation is called dense, which is when most values are in place. Many problems in natural language processing and bioinformatics are high-dimensional, sparse, or both.

Feature selection and extraction help to decrease the number of features without significant loss of information, so we also call them dimensionality reduction algorithms.

Getting the dataset

Datasets can be obtained from different sources. The ones important for us are:

Classical datasets such as Iris (botanical measurements of flowers composed by R. Fisher in 1936), MNIST (60,000 handwritten digits published in 1998), Titanic (personal information of Titanic passengers from Encyclopedia Titanica and other sources), and others. Many classical datasets are available as part of Python and R ML packages. They represent some classical types of ML tasks and are useful for demonstrations of algorithms. Meanwhile, there is no similar library for Swift. Implementation of such a library would be straightforward and is a low-hanging fruit for anyone who wants to get some stars on GitHub.
Open and commercial dataset repositories. Many institutions release their data for everyone's needs under different licenses. You can use such data for training production models or while collecting your own dataset.

Some public dataset repositories include:

- The UCI ML repository: https://archive.ics.uci.edu/ml/datasets.html
- Kaggle datasets: https://www.kaggle.com/datasets
- data.world, a social network for dataset sharing: https://data.world

Note

To find more, visit the list of repositories at KDnuggets: http://www.kdnuggets.com/datasets/index.html. Alternatively, you'll find a list of datasets at Wikipedia: https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research.

Data collection (acquisition)is required if no existing data can help you to solve your problem. This approach can be costly both in resources and time if you have to collect the data ad hoc; however, in many cases, you have data as a byproduct of some other process, and you can compose your dataset by extracting useful information from the data. For example, text corpuses can be composed by crawling Wikipedia or news sites. iOS automatically collects some useful data. HealthKit is a unified database of users' health measurements. Core Motion allows getting historical data on user's motion activities. The ResearchKit framework provides standardized routines to assess the user's health conditions. The CareKit framework standardizes the polls. Also, in some cases, useful information can be obtained from app log mining.
- In many cases, to collect data is not enough, as raw data doesn't suit many ML tasks well. So, the next step after data collection is data labeling. For example, you have collected dataset of images, so now you have to attach a label to each of them: to which category does this image belong? This can be done manually (often at expense), automatically (sometimes impossible), or semi-automatically. Manual labeling can be scaled by means of crowdsourcing platforms, like Amazon Mechanical Turk.
Random data generation can be useful for a quick check of your ideas or in combination with the TDD approach. Also, sometimes adding some controlled randomness to your real data can improve the results of learning. This approach is known as data augmentation. For instance, this approach was taken to build an optical character recognition feature in the Google Translate mobile app. To train their model, they needed a lot of real-world photos with letters in different languages, which they didn't have. The engineering team bypassed this problem by creating a large dataset of letters with artificial reflections, smudges, and all kinds of corruptions on them. This improved the recognition quality significantly.
Real-time data sources, such as inertial sensors, GPS, camera, microphone, elevation sensor, proximity sensor, touch screen, force touch, and Apple Watch sensors can be used to collect a standalone dataset or to train a model on the fly.

Note

Real-time data sources are especially important for the special class of ML models called online ML , which allows models to embed new data. A good example of such a situation is spam filtering, where the model should dynamically adapt to the new data. It's the opposite of batch learning, when the whole training dataset should be available from the very beginning.

Data preprocessing

The useful information in the data is usually referred to as a signal. On the other hand, the pieces of data that represent errors of different kinds and irrelevant data are known as noise. Errors can occur in the data during measurements, information transmission, or due to human errors. The goal of data cleansing procedures is to increase the signal/noise ratio. During this stage, you will usually transform all data to one format, delete entries with missed values, and check suspicious outliers (they can be both noise and signal). It is widely believed among ML engineers, that the data preprocessing stage usually consumes 90% of the time allocated for the ML project. Then, algorithm tweaking consumes another 90% of time. This statement is a joke only partially (about 10% of it). In Chapter 13, Best Practices, we are going to discuss common problems with the data and how to fix them.

Machine Learning with Swift

By : Jojo Moolayil, Alexander Sosnovshchenko, Oleksandr Baiev

Machine Learning with Swift

By: Jojo Moolayil, Alexander Sosnovshchenko, Oleksandr Baiev

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning with Swift

Machine Learning with Core ML

Deep Learning with PyTorch Quick Start Guide

Machine Learning for Mobile

Getting to know your data

Features

Note

Note

Types of features

Choosing a good set of features

Getting the dataset

Note

Note

Data preprocessing