Book Image

Hands-On Artificial Intelligence for IoT - Second Edition

By : Amita Kapoor
Book Image

Hands-On Artificial Intelligence for IoT - Second Edition

By: Amita Kapoor

Overview of this book

There are many applications that use data science and analytics to gain insights from terabytes of data. These apps, however, do not address the challenge of continually discovering patterns for IoT data. In Hands-On Artificial Intelligence for IoT, we cover various aspects of artificial intelligence (AI) and its implementation to make your IoT solutions smarter. This book starts by covering the process of gathering and preprocessing IoT data gathered from distributed sources. You will learn different AI techniques such as machine learning, deep learning, reinforcement learning, and natural language processing to build smart IoT systems. You will also leverage the power of AI to handle real-time data coming from wearable devices. As you progress through the book, techniques for building models that work with different kinds of data generated and consumed by IoT devices such as time series, images, and audio will be covered. Useful case studies on four major application areas of IoT solutions are a key focal point of this book. In the concluding chapters, you will leverage the power of widely used Python libraries, TensorFlow and Keras, to build different kinds of smart AI models. By the end of this book, you will be able to build smart AI-powered IoT apps with confidence.
Table of Contents (20 chapters)
Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Preface
Index

Policy gradients


In the Q-learning-based methods, we generated a policy after estimating a value/Q-function. In policy-based methods, such as the policy gradient, we approximate the policy directly.

Continuing as earlier, here, we use a neural network to approximate the policy. In the simplest form, the neural network learns a policy for selecting the actions that maximize the rewards by adjusting its weights using steepest gradient ascent, hence the name policy gradients. 

In policy gradients, the policy is represented by a neural network whose input is a representation of states and whose output is action selection probabilities. The weights of this network are the policy parameters that we need to learn. The natural question arises: how should we update the weights of this network? Since our goal is to maximize rewards, it makes sense that our network tries to maximize the expected rewards per episode:

Here, we've taken a parametrized stochastic policy π—that is, the policy determines the...