-
Book Overview & Buying
-
Table Of Contents
AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide
By :
a) Unsupervised learning
b) Reinforcement learning
c) Supervised learning
d) DL
Answer
b, Since there is no labeled data and the agent needs to learn by experience, reinforcement learning is more appropriate for this use case. Another important fact in the question is that the agent is rewarded for good decisions.
a) Unsupervised learning
b) Reinforcement learning
c) Supervised learning
d) DL
Answer
a, Clustering (which is an unsupervised learning approach) is the most common type of algorithm to work with data segmentation/clusters.
a) Unsupervised learning
b) Reinforcement learning
c) Supervised learning
d) DL
Answer
c, Forecasting is a type of supervised learning that aims to predict a numerical value; hence, it might be framed as a regression problem and supervised learning.
a) Unsupervised learning.
b) Reinforcement learning.
c) Supervised learning.
d) ML is not required.
Answer
d, ML is everywhere, but not everything needs ML. In this case, there is no need to use ML since the company should be able to collect their costs from each stage of the production chain and sum it up.
a) Unsupervised learning
b) Reinforcement learning
c) Supervised learning
d) DL
Answer
d, DL has provided state-of-the-art algorithms in the field of natural language processing.
a) Make sure the algorithm used is able to handle binary classification models.
b) Take a look at the proportion of data of each class and make sure they are balanced.
c) Shuffle the dataset before starting working on it.
d) Make sure you are using the right hyperparameters of the chosen algorithm.
Answer
c, Data scientists must be skeptical about their work. Do not make assumptions about the data without prior validation. At this point in the book, you might not be aware of the specifics of neural networks, but you know that ML models are very sensitive to the data they are training on. You should double-check the assumptions that were passed to you before taking other decisions. By the way, shuffling your training data is the first thing you should do. This is likely to be present in the exam.
a) The training and testing sets do not follow the same distribution.
b) The training set used to create this model does not represent the real environment where the model was deployed.
c) The algorithm used in the final solution could not generalize enough to identify fraud cases in production.
d) Since all ML models contain errors, we can't infer their performance in production systems.
Answer
b, Data sampling is very challenging, and you should always make sure your training data can represent the production data as precisely as possible. In this case, there was no evidence that the training and testing sets were invalid, since the model was able to perform well and consistently on both sets of data. Since the problem happens to appear only in production systems, there might have been a systematic issue in the training that is causing the issue.
a) Reduce the number of features.
b) Add extra features.
c) Implement cross-validation during the training process.
d) Select another algorithm.
Answer
a, c, This is clearly an overfitting issue. In order to solve this type of problem, you could reduce the excessive number of features (which will reduce the complexity of the model and make it less dependent on the training set). Additionally, you could also implement cross-validation during the training process.
a) Use a machine with a CPU that implements multi-thread processing.
b) Use a machine with GPU processing.
c) Increase the amount of RAM of the machine.
d) Use a machine with SSD storage.
Answer
b, Although you might take some benefits from multi-thread processing and large amounts of RAM, using a GPU to train a neural network will give you the best performance. You will learn much more about neural networks in later chapters of this book, but you already know that they perform a lot of matrix calculations during training, which is better supported by the GPU rather than the CPU.
a) Cross-validation is a data resampling technique that helps to avoid overfitting during model training.
b) Bootstrapping is a data resampling technique often embedded in ML models that needs resampling capabilities to estimate the target function.
c) The parameter k in k-fold cross-validation specifies how many samples will be created.
d) Bootstrapping works without replacement.
Answer
d, All the statements about cross-validation and bootstrapping are correct except option d, since bootstrapping works with replacement (the same observations might appear on different splits).