Index
A
- action / Use of experience replay
- activation functions, for deep learning
- sigmoid function / The sigmoid function
- tanh function /
- softmax function / The softmax function
- rectified linear unit function / The rectified linear unit function
- inferences / How to choose the right activation function
- actor-critic algorithms / Actor-critic algorithms
- adam optimiser / Deep Q-network for mountain car problem in OpenAI gym
- advantage function / Asynchronous advantage actor critic
- advertisers
- bidding strategies / Bidding strategies of advertisers
- Adwords / Adwords
- agent
- programming, OpenAI Gym environment used / Programming an agent using an OpenAI Gym environment
- agent learning pong
- with policy gradients / Agent learning pong using policy gradients
- AlexNet model / The AlexNet model
- AlphaGo
- about / Minimax and game trees, AlphaGo – mastering Go
- Monte Carlo Tree Search (MCTS) / Monte Carlo Tree Search
- Monte Carlo Tree Search / Monte Carlo Tree Search
- architecture / Architecture and properties of AlphaGo
- properties / Architecture and properties of AlphaGo
- energy consumption analysis / Energy consumption analysis – Lee Sedol versus AlphaGo
- training process / Training process in AlphaGo Zero
- AlphaGo program / The AlphaGo program
- AlphaGo Zero
- about / AlphaGo Zero
- architecture / Architecture and properties of AlphaGo Zero
- properties / Architecture and properties of AlphaGo Zero
- value representation / Architecture and properties of AlphaGo Zero
- policy vector / Architecture and properties of AlphaGo Zero
- training process / Training process in AlphaGo Zero
- asynchronous advantage actor-critic (A3C) / Asynchronous advantage actor-critic, Asynchronous advantage actor critic
- asynchronous methods
- need for / Why asynchronous methods?
- asynchronous n-step Q-learning / Asynchronous n-step Q-learning
- asynchronous one-step Q-learning / Asynchronous one-step Q-learning
- asynchronous one-step SARSA / Asynchronous one-step SARSA
- Atari Breakout / Deep Q-network for Atari Breakout in OpenAI gym
- autonomous driving
- reinforcement learning / Reinforcement learning for autonomous driving
- proposed framework / Proposed frameworks for autonomous driving
- autonomous driving agents
- creating / Creating autonomous driving agents
- recognizing / Creating autonomous driving agents
- predicting / Creating autonomous driving agents
- planning / Creating autonomous driving agents
B
- baseline function
- used, for reducing variance / Using a baseline to reduce variance
- basic computations, TensorFlow / Basic computations in TensorFlow
- Bellman equations
- about / The Bellman equations
- solving / Solving the Bellman equation to find policies
- value iteration example / An example of value iteration using the Bellman equation
- bidding strategies
- of advertisers / Bidding strategies of advertisers
- binary classification / Logistic regression as a neural network
- blank slate learning / AlphaGo Zero
- BLEU / BLEU
- BLEU score / What is BLEU score and what does it do?
- brevity penalty (BP) / What is BLEU score and what does it do?
- business models
- for advertising / Business models used in advertising
C
- Cartpole / Deep Q-network for Cartpole problem in OpenAI gym
- chess game
- versus Go game / Go versus chess
- computational advertising
- computational graph / The computational graph
- continuous action space algorithms
- about / Continuous action space algorithms
- trust region policy optimization (TRPO) / Trust region policy optimization
- deterministic policy gradients / Deterministic policy gradients
- convolutional neural network (CNN)
- about / The neural network model, Convolutional neural networks, Deep Q-networks, AlphaGo – mastering Go
- LeNet-5 convolutional neural network / The LeNet-5 convolutional neural network
- AlexNet model / The AlexNet model
- VGG-Net model / The VGG-Net model
- Inception model / The Inception model
- Convolution Neural Networks (CNNs) / Creating autonomous driving agents
- cost function / The cost function
D
- deep-Q learner / Why reinforcement learning?
- deep autoencoder / Deep autoencoder
- DeepBlue
- defeat Gary Kasparov / How did DeepBlue defeat Gary Kasparov?
- deep learning
- about / Deep learning
- problem statements / Deep learning
- activation functions / Activation functions for deep learning
- limitations / Limitations of deep learning
- vanishing gradient problem / The vanishing gradient problem
- exploding gradient problem / The exploding gradient problem
- limitations, overcoming / Overcoming the limitations of deep learning
- Deep Q-Network (DQN)
- mountain car issues, in OpenAI gym / Deep Q-network for mountain car problem in OpenAI gym
- for Cartpole issues, in OpenAI gym / Deep Q-network for Cartpole problem in OpenAI gym
- for Atari Breakout, in OpenAI gym / Deep Q-network for Atari Breakout in OpenAI gym
- about / Planning
- deep Q-network (DQN)
- about / The Q-learning approach to reinforcement learning, Deep Q-networks
- convolution neural network, used instead of single layer neural network / Using a convolution neural network instead of a single layer neural network
- experience replay, used / Use of experience replay
- target network, to compute target Q-values / Separate target network to compute the target Q-values
- advancements / Advancements in deep Q-networks and beyond
- DeepTraffic
- reference link / DeepTraffic – MIT simulator for autonomous driving
- about / DeepTraffic – MIT simulator for autonomous driving
- delayed adaptation / Online case-based planning
- deterministic environment / Why reinforcement learning?
- deterministic policy gradients / Deterministic policy gradients
- display advertising
- real-time bidding, by reinforcement learning / Real-time bidding by reinforcement learning in display advertising
- Double DQN (DDQN) / Advancements in deep Q-networks and beyond, Double DQN
- Dueling DQN / Advancements in deep Q-networks and beyond, Dueling DQN
- dynamic coattention network (DCN)
- about / Text question answering, Mixed objective and deep residual coattention for Question Answering
- deep residual coattention encoder / Deep residual coattention encoder
- mixed objective, with self-critical policy learning / Mixed objective using self-critical policy learning
E
- Epsilon-Greedy approach / The Epsilon-Greedy approach, Deep Q-network for mountain car problem in OpenAI gym
- experience replay
- using / Use of experience replay
- about / Use of experience replay
- exploding gradient problem / The exploding gradient problem
- exploitation dilemma / The exploration exploitation dilemma
- exploration dilemma / The exploration exploitation dilemma
F
- Faster R-CNN / Fast R-CNN, Faster R-CNN
- financial portfolio management
- problem definition / Problem definition
- data preparation / Data preparation
- reinforcement learning / Reinforcement learning
- further improvements / Further improvements
- FrozenLake-v0 environment
- reference / Programming an agent using an OpenAI Gym environment
- training, MDP used / Training the FrozenLake-v0 environment using MDP
- Frozen Lake environment
- example / Understanding an OpenAI Gym environment
G
- game tree approach
- for Go game, avoiding / Why is the game tree approach no good for Go?
- Gated Recurrent Units (GRUs) / Creating autonomous driving agents
- general artificial intelligence / Why reinforcement learning?
- Generative Adversarial Networks (GANs) / Creating autonomous driving agents
- Go game
- about / What is Go?
- versus chess game / Go versus chess
- Google DeepMind / Google DeepMind
- gradient descent algorithm / The gradient descent algorithm
H
- hierarchical object detection model
- with deep reinforcement learning / Hierarchical object detection with deep reinforcement learning
- about / Hierarchical object detection model
- state / State
- actions / Actions
- reward / Reward
- training / Model and training, Training specifics
- hybrid learning objective
- about / Text summarization, Hybrid learning objective
- supervised learning, with teacher forcing / Supervised learning with teacher forcing
I
- Inception model / The Inception model
- Intesection over Union (IoU) / Reward
- intra-temporal attention / Text summarization
- inverse reinforcement learning / Challenges in robot reinforcement learning
L
- LeNet-5 convolutional neural network / The LeNet-5 convolutional neural network
- Libratus / Libratus
- logistic regression
- with gradient descent / Steps to solve logistic regression using gradient descent
- logistic regression, as neural network
- about / Logistic regression as a neural network
- notation / Notation
- objective / Objective
- cost function / The cost function
- gradient descent algorithm / The gradient descent algorithm
- computational graph / The computational graph
- Long-Short Term Memory Networks (LSTMs) / Long Short Term Memory Networks, Creating autonomous driving agents
M
- machine learning
- for autonomous driving / Machine learning for autonomous driving
- about / Machine learning for autonomous driving
- sensor fusion / Machine learning for autonomous driving
- environment / Machine learning for autonomous driving
- trajectory planning / Machine learning for autonomous driving
- control strategy / Machine learning for autonomous driving
- driver model / Machine learning for autonomous driving
- machine learning algorithms
- graphical representation, of data versus performance / Deep learning
- Markov decision process
- about / Markov decision processes
- Markov property / The Markov property
- S state set / The S state set
- actions / Actions
- transition model / Transition model
- rewards / Rewards
- policy / Policy
- sequence of rewards / The sequence of rewards - assumptions
- Bellman equations / The Bellman equations
- used, for training FrozenLake-v0 environment / Training the FrozenLake-v0 environment using MDP
- max pooling / Convolutional neural networks
- metrics, in computational advertising
- minimax algorithm / Minimax and game trees
- model based learning / Model based learning and model free learning, Evolution of reinforcement learning
- model free land off-policy learner / Evolution of reinforcement learning
- model free learning
- about / Model based learning and model free learning
- Monte Carlo learning / Monte Carlo learning
- temporal difference learning / Temporal difference learning
- off-policy learning / On-policy and off-policy learning
- on--policy learning / On-policy and off-policy learning
- Monte Carlo learning / Monte Carlo learning
- Monte Carlo policy gradient / The Monte Carlo policy gradient
- Monte Carlo Tree Search (MCTS) / The Monte Carlo tree search algorithm, Monte Carlo Tree Search, Architecture and properties of AlphaGo
- Monte Carlo tree search algorithm
- about / The Monte Carlo tree search algorithm, The Monte Carlo Tree Search
- game tree / Minimax and game trees
- minimax algorithm / Minimax and game trees
- mountain car / Q-learning for the mountain car problem in OpenAI gym
N
- neural intra-attention model
- about / Text summarization, Neural intra-attention model
- intra-temporal attention, on input sequence / Intra-temporal attention on input sequence while decoding
- intra-decoder attention / Intra-decoder attention
- token generation / Token generation and pointer
- pointer / Token generation and pointer
- mixed training objective function / Mixed training objective function
- neural network model
- about / The neural network model
- recurrent neural networks (RNNs) / Recurrent neural networks
- convolutional neural network (CNN) / Convolutional neural networks
- neural networks, AlphaGo
- policy network / AlphaGo – mastering Go
- value network / AlphaGo – mastering Go
- next state / Use of experience replay
O
- off-policy learning / On-policy and off-policy learning
- on-policy learning / On-policy and off-policy learning
- online case-based planning
- about / Reinforcement learning and other approaches, Online case-based planning
- expansion / Online case-based planning
- execution / Online case-based planning
- OpenAI Gym
- about / Introduction to TensorFlow and OpenAI Gym, An introduction to OpenAI Gym
- gym open-source library / An introduction to OpenAI Gym
- reference / An introduction to OpenAI Gym
- downloading / The OpenAI Gym
- installing / The OpenAI Gym
- OpenAI Gym environment
- about / Understanding an OpenAI Gym environment
- used, for programming agent / Programming an agent using an OpenAI Gym environment
- OpenAI Gym service / An introduction to OpenAI Gym
- optimality criteria, reinforcement learning
- value function / The value function for optimality
- policy model / The policy model for optimality
P
- Partially observable Markov decision processes
- about / Partially observable Markov decision processes
- state estimation / State estimation
- value iteration / Value iteration in POMDPs
- Pay Per Acquisition (PPA) / Business models used in advertising
- Pay Per Click (PPC) / Business models used in advertising
- policy gradients
- about / Policy gradients
- Monte Carlo policy gradient / The Monte Carlo policy gradient
- actor-critic algorithms / Actor-critic algorithms
- vanilla policy gradient / Vanilla policy gradient
- agent learning pong / Agent learning pong using policy gradients
- policy gradient theorem / Policy Gradient Theorem
- policy iteration / Policy iteration
- policy objective functions / Policy objective functions
- policy optimization method
- about / The policy optimization method
- components / The policy optimization method
- advantages / Why policy optimization methods?
- disadvantages / Why policy optimization methods?
- pooling layer / Convolutional neural networks
- proposed framework, for autonomous driving
- about / Proposed frameworks for autonomous driving
- spatial aggregation / Spatial aggregation
- recurrent temporal aggregation / Recurrent temporal aggregation
- planning / Planning
Q
- Q-learning
- approach, to reinforcement learning / The Q-learning approach to reinforcement learning
- reinforcement learning agent, programming / Q-Learning
- about / Q-learning
- exploitation dilemma / The exploration exploitation dilemma
- exploration dilemma / The exploration exploitation dilemma
- mountain car issues, in OpenAI gym / Q-learning for the mountain car problem in OpenAI gym
- Q-Network
- using, for real-world applications / Using the Q-Network for real-world applications
- question answering task / Text question answering
R
- real-time bidding, by reinforcement learning
- in display advertising / Real-time bidding by reinforcement learning in display advertising
- real-time strategy (RTS) gaming
- about / Real-time strategy games
- drawbacks / Drawbacks to real-time strategy games
- Recall Oriented Understudy for Gisting Evaluation (ROUGE) / ROUGE
- receptive field / Convolutional neural networks
- rectified linear unit function / The rectified linear unit function
- recurrent neural networks (RNNs) / The neural network model, Recurrent neural networks, Reinforcement learning for autonomous driving
- recurrent temporal aggregation / Recurrent temporal aggregation
- Region-based convolution neural networks (R-CNN) / Region-based convolution neural networks
- Regional Proposal Network / Faster R-CNN
- reinforcement learning
- about / Reinforcement learning, Why reinforcement learning?, Reinforcement learning and other approaches, Why reinforcement learning?, How is reinforcement learning better?
- agent / Basic terminologies and conventions
- environment / Basic terminologies and conventions
- state / Basic terminologies and conventions
- rewards / Basic terminologies and conventions
- actions / Basic terminologies and conventions
- SAR triple / Basic terminologies and conventions
- episode / Basic terminologies and conventions
- optimality criteria / Optimality criteria
- pioneers / The pioneers and breakthroughs in reinforcement learning, Pieter Abbeel
- online case-based planning / Online case-based planning
- in RTS gaming / Reinforcement learning in RTS gaming
- for autonomous driving / Reinforcement learning for autonomous driving
- consideration / Why reinforcement learning ?
- evolution / Evolution of reinforcement learning
- reinforcement learning, in robotics
- about / Reinforcement learning in robotics
- challenges / Challenges in robot reinforcement learning
- applications / Challenges in robot reinforcement learning
- high dimensionality problem / High dimensionality problem
- real-world challenges / Real-world challenges
- model uncertainty issue / Issues due to model uncertainty
- open questions / Open questions and practical challenges, Open questions
- practical challenges / Practical challenges for robotic reinforcement learning
- reward / Use of experience replay
S
- same padding / Convolutional neural networks
- SARSA algorithm
- about / The SARSA algorithm
- for mountain car issues, in OpenAI gym / SARSA algorithm for mountain car problem in OpenAI gym
- scoring mechanisms, NLP
- BLEU / Scoring mechanism in sequential models in NLP, BLEU
- ROUGE / ROUGE
- search-advertisement management / Search-advertisement management
- sensor fusion / Sensor fusion
- sequence of rewards, Markov decision process
- infinite horizons / The infinite horizons
- utility of sequences / Utility of sequences
- sequential intra-attention model / Text summarization
- sigmoid function / The sigmoid function
- single shot detector (SSD) / Single Shot Detector
- softmax function / The softmax function
- spatial aggregation
- about / Spatial aggregation
- sensor fusion / Sensor fusion
- spatial features / Spatial features
- spatial features / Spatial features
- Spatial Pooling Pyramid networks (SPP-net) / Spatial pyramid pooling networks
- sponsored-search advertisements / Sponsored-search advertisements
- Stanford Question Answering Dataset (SQuAD) / Text question answering
- state / Use of experience replay
- State–Action–Reward–State–Action (SARSA) / The SARSA algorithm
- stochastic environment / Why reinforcement learning?
- stochastic policy
- need for / Why stochastic policy?
- example / Example 1 - rock, paper, scissors, Example 2 - state aliased grid-world
- stride / Convolutional neural networks
- summarization algorithms
- extractive summarization / Text summarization
- abstractive summarization / Text summarization
- supervised learning
- in neural networks / Deep learning
- about / Deep learning
- Supervised learning / Why reinforcement learning?
T
- tabula rasa learning / AlphaGo Zero
- tanh function /
- TD() rule / TD() rule
- TD(0) rule / TD(0) rule
- TD(1) rule / TD(1) rule
- temporal difference (TD) / Temporal difference rule
- temporal difference learning / Temporal difference learning
- TensorFlow
- about / Introduction to TensorFlow and OpenAI Gym
- basic computations / Basic computations in TensorFlow
- installation link / Basic computations in TensorFlow
- tensors / Basic computations in TensorFlow
- text summarization / Text summarization
- trust region policy optimization (TRPO) / Trust region policy optimization
- two layered neural networks / The neural network model
U
- unsupervised learning / Why reinforcement learning?
V
- valid padding / Convolutional neural networks
- value iteration / Solving the Bellman equation to find policies
- vanilla policy gradient / Vanilla policy gradient
- vanishing gradient problem / The vanishing gradient problem
- variance
- reducing, baseline function used / Using a baseline to reduce variance
- VGG-Net model / The VGG-Net model
- Visual Geometry Group (VGG) / The VGG-Net model
X
- Xavier Initialization
Y
- You Look Only Once (YOLO) / You Look Only Once