Index
A
- Acrobot
- settings / The classic control tasks
- agent, reinforcement learning / The agent
- algorithmic tasks / Algorithmic tasks
- AlphaGo
- about / AlphaGo
- supervised learning policy networks / Supervised learning policy networks
- reinforcement learning policy networks / Reinforcement learning policy networks
- value network / Value network
- neural networks and MCTS, combining / Combining neural networks and MCTS
- AlphaGo Zero
- about / AlphaGo Zero, Putting everything together
- training / Training AlphaGo Zero
- implementing / Implementing AlphaGo Zero
- policy and value networks / Policy and value networks
- preprocessing.py module / preprocessing.py
- features.py module module / features.py
- network.py module / network.py
- alphagozero_agent.py / alphagozero_agent.py
- controller.py / controller.py
- train.py / train.py
- features.py module / Helper methods
- asynchronous advantage actor-critic (A3C) algorithm
- about / Asynchronous advantage actor-critic algorithm
- implementing / Implementation of A3C
- experiments / Experiments
- Atari 2600 games
- references / Introduction to Atari games
- unsolved issues / Demonstrating basic Q-learning algorithm
- Atari emulator
- building / Building an Atari emulator, Getting started
- implementation / Implementation of the Atari emulator
- implementing, gym used / Atari simulator using gym
- Atari games
- playing / Atari
- about / Introduction to Atari games
- data preparation / Data preparation
B
- backpropagation / Backpropagation, Update
- backpropagation through time (BPTT) / DPG algorithm
- basic elements, reinforcement learning
- state / Basic elements of reinforcement learning
- reward function / Basic elements of reinforcement learning
- policy function / Basic elements of reinforcement learning
- value function / Basic elements of reinforcement learning
- basic Q-learning algorithm
- demonstrating / Demonstrating basic Q-learning algorithm
- Bellman equation / Demonstrating basic Q-learning algorithm
- bias / Neural networks
- board state / Go and other board games
- build method
- defining / build method
C
- CartPole
- about / Running an environment, CartPole
- specifications / The classic control tasks
- chatbot
- background, issues / The background problem
- dataset / Dataset
- step-by-step guide / Step-by-step guide
- data parser / Data parser
- data reader / Data reader
- helper methods / Helper methods
- model / Chatbot model
- data, training / Training the data
- testing / Testing and results
- results / Testing and results
- classic control tasks / The classic control tasks
- control tasks
- about / Introduction to control tasks, Getting started
- classic control tasks / The classic control tasks
- convolutional neural network (CNN)
- about / Convolutional neural networks
- advantages / Advantages of neural networks
- implementing, in TensorFlow / Implementing a convolutional neural network in TensorFlow
- network, building / Building the network
- methods, for building network / Methods for building the network
D
- data preparation, Atari games / Data preparation
- deep learning
- about / Deep learning
- neural networks / Neural networks
- backpropagation / Backpropagation
- convolutional neural networks / Convolutional neural networks
- convolutional neural network, implementing in TensorFlow / Implementing a convolutional neural network in TensorFlow
- deep Q-learning
- about / Deep Q-learning
- basic elements, of reinforcement learning / Basic elements of reinforcement learning
- basic Q-learning algorithm, demonstrating / Demonstrating basic Q-learning algorithm
- deep Q-learning algorithm (DQN)
- about / Deep Q-learning
- implementing / Implementation of DQN
- experiments / Experiments
- deterministic policy gradient (DPG)
- about / Deterministic policy gradient
- actor-critic architecture / Deterministic policy gradient
- theory / The theory behind policy gradient
- algorithm / DPG algorithm
- implementing / Implementation of DDPG
- experiments / Experiments
F
- Fashion-MNIST dataset / The Fashion-MNIST dataset
- financial market
- background, issues / Background problem
- data used / Data used
- step-by-step guide / Step-by-step guide
- actor script / Actor script
- critic script / Critic script
- agent script / Agent script
- helper script / Helper script
- data, training / Training the data
- final result / Final result
- fit method
- defining / fit method
- frame-skipping technique / Data preparation
- fully-connected layers / Neural networks
G
- Go
- about / A brief introduction to Go
- and other board games / Go and other board games
- and AI research / Go and AI research
- GridWorld game
- reference / Experiments
H
- Hidden Markov model / Markov models
M
- Markov decision process (MDP) / Markov decision process (MDP)
- Markov models
- about / Markov models
- CartPole / CartPole
- Massively Multiplayer Online Role Playing Game (MMORPGs) / Multi-agent reinforcement learning
- mean-squared error (MSE) / Value network
- methods, for building network
- build method / build method
- fit method / fit method
- Minecraft environment
- about / Introduction to the Minecraft environment
- data preparation / Data preparation
- model, reinforcement learning / Model
- model-free / Demonstrating basic Q-learning algorithm
- Monte Carlo tree search
- about / Monte Carlo tree search
- selection / Selection
- expansion / Expansion
- simulation / Simulation
- update step / Update
- mcts.py / mcts.py
- MuJoCo
- about / MuJoCo
- reference / Introduction to control tasks
- multi-agent reinforcement learning / Multi-agent reinforcement learning
- multilayer perceptrons (MLP) / Neural networks
N
- NAS, implementing
- about / Implementing NAS
- child_network.py module / child_network.py
- cifar10_processor.py / cifar10_processor.py
- controller.py module / controller.py
- controller generating, ways / Method for generating the Controller
- child network generating, controller used / Generating a child network using the Controller
- train_controller method / train_controller method
- ChildCNN, testing / Testing ChildCNN
- config.py module / config.py
- train.py module / train.py
- exercises / Additional exercises
- advantages / Advantages of NAS
- neural architecture search
- about / Neural Architecture Search
- child networks, generating / Generating and training child networks
- child networks, training / Generating and training child networks
- controller, training / Training the Controller
- algorithm, training / Training algorithm
- neural network
- about / Neural networks
- fully-connected layers / Neural networks
- multilayer perceptrons (MLP) / Neural networks
- no operation (NOOP) action / Data preparation
O
- OpenAI
- about / OpenAI Gym
- Gym / Gym
- OpenAI Five / Multi-agent reinforcement learning
- OpenAI Gym
- installation / Installation
- environment, running / Running an environment
- Atari / Atari
- algorithmic tasks / Algorithmic tasks
- MuJoCo / MuJoCo
- Robotics / Robotics
- reference / Introduction to control tasks
P
- Pendulum
- specifications / The classic control tasks
- playout / Simulation
- policy, reinforcement learning / Policy
- PolicyValueNetwork
- and MCTS, combining / Combining PolicyValueNetwork and MCTS
- alphagozero_agent.py / alphagozero_agent.py
- Python reinforcement learning
- expectations / Expectations
- hardware requisites / Hardware and software requirements
- software requisites / Hardware and software requirements
- packages, installing / Installing packages
R
- rectifier nonlinearity (RELU) / Demonstrating basic Q-learning algorithm
- recurrent deterministic policy gradient algorithm (RDPG) / DPG algorithm
- reinforcement learning
- about / What is reinforcement learning?
- agent / The agent
- policy / Policy
- value function / Value function
- model / Model
- Markov decision process (MDP) / Markov decision process (MDP)
- basic elements / Basic elements of reinforcement learning
- shortcomings / The shortcomings of reinforcement learning
- resource efficiency / Resource efficiency
- reproducibility / Reproducibility
- explainability/accountability / Explainability/accountability
- attacks, susceptibility to / Susceptibility to attacks
- limitations, addressing / Addressing the limitations
- reinforcement learning, developments
- about / Upcoming developments in reinforcement learning
- transfer learning / Transfer learning
- multi-agent reinforcement learning / Multi-agent reinforcement learning
- REINFORCE method / Neural Architecture Search, Training the Controller
- Robotics / Robotics
- rollout / Simulation
S
- SGF (Smart Game Format) / alphagozero_agent.py
- supervised learning / What is reinforcement learning?
T
- TensorFlow / TensorFlow
- terminal state / What is reinforcement learning?
- TMUX
- about / Implementation of A3C
- reference / Implementation of A3C
- trust region policy optimization (TRPO) algorithm
- about / Trust region policy optimization, TRPO algorithm
- theory / Theory behind TRPO
- experiments, on MuJoCo tasks / Experiments on MuJoCo tasks
U
- unsupervised learning / What is reinforcement learning?
- Upper Confidence Bound 1 Applied to Trees (UCT) / Selection
V
- value function, reinforcement learning / Value function