Book Image

Machine Learning for Mobile

By : Revathi Gopalakrishnan, Avinash Venkateswarlu
Book Image

Machine Learning for Mobile

By: Revathi Gopalakrishnan, Avinash Venkateswarlu

Overview of this book

Machine learning presents an entirely unique opportunity in software development. It allows smartphones to produce an enormous amount of useful data that can be mined, analyzed, and used to make predictions. This book will help you master machine learning for mobile devices with easy-to-follow, practical examples. You will begin with an introduction to machine learning on mobiles and grasp the fundamentals so you become well-acquainted with the subject. You will master supervised and unsupervised learning algorithms, and then learn how to build a machine learning model using mobile-based libraries such as Core ML, TensorFlow Lite, ML Kit, and Fritz on Android and iOS platforms. In doing so, you will also tackle some common and not-so-common machine learning problems with regard to Computer Vision and other real-world domains. By the end of this book, you will have explored machine learning in depth and implemented on-device machine learning with ease, thereby gaining a thorough understanding of how to run, create, and build real-time machine-learning applications on your mobile devices.
Table of Contents (19 chapters)
Title Page
Copyright and Credits
About Packt
Contributors
Preface
Question and Answers
Index

Types of learning


There some variations in how to define the types of machine learning algorithms. The most common categorization of algorithms is done based on the learner type of the algorithm and is categorized as follows:

  • Supervised learning
  • Unsupervised learning
  • Semi-supervised learning
  • Reinforcement learning

Supervised learning

Supervised learning is a type of learning where the model is fed with enough information and knowledge and closely supervised to learn, so that, based on the learning it has done, it can predict the outcome for a new dataset.

Here, the model is trained in supervision mode, similar to supervision by teachers, where we feed the model with enough training data containing the input/predictors and train it and show the correct answers or output. So, based on this, it learns and will become capable of predicting the output for unseen data that may come in the future.

A classic example of this would be the standard Iris dataset. The Iris dataset consists of three species of iris and for each species, the sepal length, sepal width, petal length, and petal width is given. And for a specific pattern of the four parameters, the label is provided as to what species such a set should belong to. With this learning in place, the model will be able to predict the label—in this case, the iris species, based on the feature set—in this case, the four parameters.

Supervised learning algorithms try to model relationships and dependencies between the target prediction output and the input features such that we can predict the output values for new data based on those relationships which it learned from the previous datasets.

The following diagram will give you an idea of what supervised learning is. The data with labels is given as input to build the model through supervised learning algorithms. This is the training phase. Then the model is used to predict the class label for any input data without the label. This is the testing phase:

Again, in supervised learning algorithms, the predicted output could be a discrete/categorical value or it could be a continuous value based on the type of scenario considered and the dataset taken into consideration. If the output predicted is a discrete/categorical value, such algorithms fall under the classification algorithms, and if the output predicted is a continuous value, such algorithms fall under the regression algorithms.

If there is a set of emails and you want to learn from them and be able to tell which emails belong to the spam category and which emails belong to the non-spam category, then the algorithm to be used for this purpose will be a supervised learning algorithm belonging to the classification type. Here, you need to feed the model with a set of emails and feed enough knowledge to the model about the attributes, based on which it would segregate the email to either the spam category or the non-spam category. So the predicted output would be a categorical value, that is, spam or non-spam.

Let's take the use case where based on a given set of parameters, we need to predict what would be the price of a house in a given area. This cannot be a categorical value. It is going to be a range or a continuous value and also be subject to change on a regular basis. In this problem, the model also needs to be provided with sufficient knowledge, based on which it is going to predict the pricing value. This type of algorithm belongs to the supervised learning regression category of algorithms. 

There are various algorithms belonging to the supervised category of the machine learning family:

  • K-nearest neighbors
  • Naive Bayes
  • Decision trees
  • Linear regression
  • Logistic regression
  • Support vector machines
  • Random forest

Unsupervised learning

In this learning pattern, there is no supervision done to the model to make it learn. The model learns by itself based on the data fed to it and provides us with patterns it has learned. It doesn't predict any discrete categorical value or a continuous value, but rather provides the patterns it has understood by looking at the data fed into it. The training data fed in is unlabeled and doesn't provide sufficient knowledge information for the model to learn. 

Here, there's no supervision at all; actually, the model might be able to teach us new things after it learns the data. These algorithms are very useful where a feature set is too large and the human user doesn't know what to look for in the data.

This class of algorithms is mainly used for pattern detection and descriptive modeling. Descriptive modeling summarizes the relevant information from the data and presents a summary of what has already occurred, whereas predictive modeling summarizes the data and presents a summary of what can occur.

Unsupervised learning algorithms can be used for both categories of prediction. They use the input data to come up with different patterns, a summary of the data points, and insights that are not visible to human eyes. They come up with meaningful derived data or patterns of data that are helpful for end users.

The following diagram will give you an idea of what unsupervised learning is. The data without labels is given as input to build the model through unsupervised learning algorithms. This is the Training Phase. Then the model is used to predict the proper patterns for any input data without the label. This is the Testing Phase:

In this family of algorithms, which is also based on the input data fed to the model and the method adopted by the model to infer patterns in the dataset, there emerge two common categories of algorithms. These are clustering and association rule mapping algorithms. 

Clustering is the model that analyzes the input dataset and groups data items with similarity into the same cluster. It produces different clusters and each cluster will hold data items that are more similar to each other than in items belonging to other clusters. There are various mechanisms that can be used to create these clusters. 

Customer segmentation is one example for clustering. We have a huge dataset of customers and capture all features of customers. The model could come up with interesting cluster patterns of customers that may be very obvious to the human eye. Such clusters could be very helpful for targeted campaigns and marketing.

On the other hand, association rule learning is a model to discover relations between variables in large datasets. A classic example would be market basket analysis. Here, the model tries to find strong relationships between different items in the market basket. It predicts relationships between items and determines how likely or unlikely it is for a user to purchase a particular item when they also purchase another item. For example, it might predict that a user who purchases bread will also purchase milk, or a user who purchases wine will also purchase diapers, and so on.

The algorithms belonging to this category include the following:

  • Clustering algorithms:
    • Centroid-based algorithms
    • Connectivity-based algorithms
    • Density-based algorithms
    • Probabilistic
    • Dimensionality reduction
    • Neural networks/deep learning
  • Association rule learning algorithm

Semi-supervised learning

In the previous two types, either there are no labels for all the observations in the dataset or labels are present for all the observations. Semi-supervised learning falls in between these two. In many practical situations, the cost of labeling is quite high, since it requires skilled human experts to do that. So, if labels are absent in the majority of the observations, but present in a few, then semi-supervised algorithms are the best candidates for the model building. 

Speech analysis is one example of a semi-supervised learning model. Labeling audio files is very costly and requires a very high level of human effort. Applying semi-supervised learning models can really help to improve traditional speech analytic models.

In this class of algorithms, also based on the output predicted, which may be categorical or continuous, the algorithm family could be regression or classification. 

Reinforcement learning

Reinforcement learning is goal-oriented learning based on interactions with the environment. A reinforcement learning algorithm (called the agent) continuously learns from the environment in an iterative fashion. In the process, the agent learns from its experiences of the environment until it explores the full range of possible states and is able to reach the target state.

Let's take the example of a child learning to ride a bicycle. The child tries to learn by riding it, it may fall, it will understand how to balance, how to continue the flow without falling, how to sit in the proper position so that weight is not moved to one side, studies the surface, and also plans actions as per the surface, slope, hill, and so on. So, it will learn all possible scenarios and states required to learn to ride the bicycle. A fall may be considered as negative feedback and the ability to ride along stride may be a positive reward for the child. This is classic reinforcement learning. This is the same as what the model does to determine the ideal behavior within a specific context, in order to maximize its performance. Simple reward feedback is required for the agent to learn its behavior; this is known as the reinforcement signal:

Now, we will just summarize the type of learning algorithms we have seen through a diagram, so that it will be handy and a reference point for you to decide on choosing the algorithm for a given problem statement:

Challenges in machine learning

Some of the challenges we face in machine learning are as follows:

  • Lack of a well-defined machine learning problem. If the problem is not defined clearly as per the definition with required criteria, the machine learning problem is likely to fail.
  • Feature engineering. This relates to every activity with respect to data and its features that are essential for the success of the machine learning problem.
  • No clarity between the training set and test set. Often the model performs well in the training phase, but fails miserably in the field due to a lack of all possible data in the training set. This should be taken care of for the model to succeed in the field.
  • The right choice of algorithm. There is a wide range of algorithms available, but which one suits our problem best? This should be chosen properly in the iteration with proper parameters required.