Index
A
- A2C, with ACKTR
- about / A2C using ACKTR
- implementing / Implementation
- results / Results
- A2C agent / Adding an extra A to A2C
- A2C baseline
- about / A2C baseline
- results / Results
- videos recording / Videos recording
- A2C on Pong
- about / A2C on Pong
- results / A2C on Pong results
- A3C parallelization / A3C – data parallelism
- results / Results
- parallelizm of gradients / A3C – gradients parallelism, Results
- action space
- about / Action space
- actor-critic / Actor-critic
- Actor-Critic (A2C) / Self-play
- Actor-Critic (A2C) method
- about / The Actor-Critic (A2C) method
- implementation / Implementation
- results / Results
- models, using / Using models and recording videos
- videos, recording / Using models and recording videos
- actor-critic parallelization
- approaches / Adding an extra A to A2C
- agent
- anatomy / The anatomy of the agent
- AgentNet
- reference / The PyTorch Agent Net library
- AlphaGo Zero method
- overview / Overview
- MCTS / Monte-Carlo Tree Search
- self play / Self-play
- training / Training and evaluation
- evaluation / Training and evaluation
- Asynchronous Advantage Actor-Critic (A3C)
- about / Proximal Policy Optimization
- Asynchronous Advantage Actor-Critic (A3C) agent / Model imperfections
- Asynchronous Advantage Actor-Critic (A3C) method / PG on Pong, Why a continuous space?
- Atari transformations
- used by RL researchers / Wrappers
B
- bar
- about / Data
- baseline agent
- training / The baseline agent
- Baselines
- reference / Wrappers
- basic DQN
- about / Basic DQN
- Bellman equation
- Bilingual evaluation understudy (BLEU) score / Bilingual evaluation understudy (BLEU) score
- black-box methods
- about / Black-box methods
- properties / Black-box methods
- board games
- about / Board games
- branching factor / Monte-Carlo Tree Search
- browser automation
- and RL / Browser automation and RL
C
- candlestick chart
- about / Data
- CartPole variance / CartPole variance
- categorical DQN
- about / Categorical DQN
- implementing / Implementation
- results / Results
- chatbot example
- about / The chatbot example
- structure / The example structure
- cornell.py file / Modules: cornell.py and data.py
- data.py file / Modules: cornell.py and data.py
- BLEU score / BLEU score and utils.py
- utils.py module / BLEU score and utils.py
- model / Model
- cross-entropy method / Training: cross-entropy
- training code / Running the training
- data, checking / Checking the data
- trained model / Testing the trained model
- SCST training / Training: SCST
- SCST training, running / Running the SCST training
- results / Results
- Telegram bot / Telegram bot
- chatbots
- overview / Chatbots overview
- entertainment human-mimicking / The chatbot example
- goal-oriented / The chatbot example
- Connect4 bot
- about / Connect4 bot
- game model / Game model
- MCTS implementation / Implementing MCTS
- model / Model
- training process / Training
- testing / Testing and comparison
- comparison / Testing and comparison
- results / Connect4 results
- continuous space
- need for / Why a continuous space?
- convolutional model / Models
- convolution model / The convolution model
- Cornell Movie-Dialogs Corpus
- reference / The example structure
- correlation / Correlation and sample efficiency
- Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
- about / Evolution strategies
- cross-entropy
- on CartPole / Cross-entropy on CartPole
- on FrozenLake / Cross-entropy on FrozenLake
- theoretical background / Theoretical background of the cross-entropy method
- curriculum learning / Log-likelihood training
- custom layers
- about / Custom layers
D
- data
- about / Data
- decoder / Encoder-Decoder
- deep deterministic policy gradients (DDPG)
- about / Deterministic policy gradients
- deep GA
- about / Deep GA
- deep learning (DL) / Chatbots overview
- Deep Learning (DL) / Hardware and software requirements
- DeepMind Control Suite / Things to try
- deep NLP basics
- about / Deep NLP basics
- RNNs / Recurrent Neural Networks
- embeddings / Embeddings
- Encoder-Decoder / Encoder-Decoder
- deep Q-learning
- about / Deep Q-learning
- interaction, with environment / Interaction with the environment
- SGD optimisation / SGD optimization
- correlation, between steps / Correlation between steps
- Markov property / The Markov property
- deep Q-network (DQN) method / Why a continuous space?
- deterministic policy gradients
- about / Deterministic policy gradients
- exploration / Exploration
- implementation / Implementation
- results / Results
- videos, recording / Recording videos
- Dilbert Reward Process (DRP)
- about / Markov reward process
- distributional policy gradients
- about / Distributional policy gradients
- architecture / Architecture
- implementation / Implementation
- results / Results
- Docker
- reference / Installation
- double DQN
- about / Double DQN
- implementing / Implementation
- results / Results
- DQN improvements
- combining / Combining everything
- implementation / Implementation
- results / Results
- DQN model / DQN model
- DQN on Pong
- about / DQN on Pong
- wrappers / Wrappers
- training / Training
- running / Running and performance
- performance / Running and performance
- working / Your model in action
- DQN training
- about / The final form of DQN training
- dueling DQN
- about / Dueling DQN
- implementing / Implementation
- results / Results
E
- ELIZA
- reference / Chatbots overview
- EM weights
- training / Training EM weights
- encoder / Encoder-Decoder
- Encoder-Decoder / Encoder-Decoder
- entropy / Theoretical background of the cross-entropy method
- environment
- about / The anatomy of the agent
- environment model (EM) / Imagination-augmented agent
- environments
- about / Environments
- MuJoCo / Environments
- PyBullet / Environments
- ES, on CartPole
- about / ES on CartPole
- results / Results
- ES, on HalfCheetah
- about / ES on HalfCheetah
- results / Results
- evolution strategies (ES)
- about / Evolution strategies
F
- factorized Gaussian noise / Noisy networks
- feed-forward model / The feed-forward model
- fitness function / Black-box methods
- FrozenLake
- value iteration method / Value iteration in practice
- Q-learning / Q-learning for FrozenLake
G
- GA, on CartPole
- about / GA on CartPole
- results / Results
- GA, on Cheetah
- about / GA on Cheetah
- results / Results
- GAN on Atari images
- example / Example – GAN on Atari images
- GA tweaks
- about / GA tweaks
- deep GA / Deep GA
- novelty search / Novelty search
- Generative Adversarial Networks (GANs)
- generative adversarial networks (GANs) / Example – GAN on Atari images
- genetic algorithms (GA)
- about / Genetic algorithms
- GPU tensors / GPU tensors
- gradients
- Gym / Hardware and software requirements
H
- hardware requisites / Hardware and software requirements
- hidden state / Recurrent Neural Networks
- human demonstrations
- about / Human demonstrations
- recording / Recording the demonstrations
- recording format / Recording format
- training, with demonstrations / Training using demonstrations
- results / Results
- TicTacToe problem / TicTacToe problem
- hyperparameter tuning
- about / Tuning hyperparameters
- learning rate (LR) / Learning rate
- entropy beta / Entropy beta
- count of environments / Count of environments
- batch size / Batch size
I
- I2A, on Atari Breakout
- about / I2A on Atari Breakout
- baseline A2C agent / The baseline A2C agent
- EM training / EM training
- imagination agent / The imagination agent
- implementing / The I2A model
- rollout encoder / The Rollout encoder
- training process / Training of I2A
- I2A model
- training with / Training with the I2A model
- imagination-augmented agent
- about / Imagination-augmented agent
- environment model / The environment model
- rollout policy / The rollout policy
- rollout encoder / The rollout encoder
- paper results / Paper results
- imagination path / Imagination-augmented agent
- independent Gaussian noise
- about / Noisy networks
K
- KaiTai binary parser language
- reference / Recording format
- key decisions / Problem statements and key decisions
- Kullback-Leibler (KL)-divergence / PG on CartPole, RL in seq2seq
- Kullback-Leibler (KL) divergence / Theoretical background of the cross-entropy method
L
- loss functions
- about / Final glue – loss functions and optimizers, Loss functions
- nn.MSELoss / Loss functions
- nn.BCELoss / Loss functions
- nn.CrossEntropyLoss / Loss functions
- nn.NLLLoss / Loss functions
M
- machine learning (ML) / Chatbots overview
- Markov chain
- about / Markov process
- Markov decision process
- about / Markov decision process
- Markov Decision Process (MDP) / Issues with simple clicking
- Markov decision processes
- about / Markov decision processes
- Markov process
- about / Markov process
- Markov property
- about / Markov process
- Markov reward process
- about / Markov reward process
- mean squared error (MSE) / EM training, Training and evaluation
- mean square loss (MSE)
- about / The Actor-Critic (A2C) method
- minimax
- about / Board games
- Mini World of Bits (MiniWoB) / Mini World of Bits benchmark
- Mini World of Bits benchmark / Mini World of Bits benchmark
- model-based approach
- versus , model-free approach / Model-based versus model-free
- model imperfections / Model imperfections
- models
- Monitor / Monitor
- Monte-Carlo Tree Search (MCTS) / Overview
- MuJoCo
- URL / Environments
- about / Environments
- multiprocessing
- in Python / Multiprocessing in Python
N
- N-step DQN
- about / N-step DQN
- implementing / Implementation
- natural language / Chatbots overview
- neural network
- building blocks / NN building blocks
- neural network (NN) / Problem statements and key decisions, Monte-Carlo Tree Search
- neural networks (NNs)
- about / Deterministic policy gradients
- noisy networks
- about / Noisy networks
- implementing / Implementation
- results / Results
- notebook gradients / Gradients
- novelty search
- about / Novelty search
- implementing / Novelty search
- NumPy / Hardware and software requirements
O
- OpenAI
- reference / OpenAI Gym API
- OpenAI Gym API
- about / OpenAI Gym API
- action space / Action space
- observation space / Observation space
- environment / The environment
- environment, creating / Creation of the environment
- CartPole session / The CartPole session
- OpenAI Universe
- reference / Creation of the environment, OpenAI Universe
- about / OpenAI Universe
- installing / Installation
- actions / Actions and observations
- observations / Actions and observations
- environment creation / Environment creation
- MiniWoB stability / MiniWoB stability
- OpenCV Python bindings / Hardware and software requirements
- optimality
- about / Value, state, and optimality
- optimizers
- about / Final glue – loss functions and optimizers, Optimizers
- SGD / Optimizers
- RMSprop / Optimizers
- Adagrad / Optimizers
- Ornstein-Uhlenbeck (OU) process
- about / Implementation
P
- partially-observable Markov decision process (POMDP)
- about / Implementation
- partially observable MDPs (POMDP) / The Markov property
- PG method, on CartPole
- about / PG on CartPole
- results / Results
- PG method, on Pong
- about / PG on Pong
- results / Results
- policy / Values and policy
- need for / Why policy?
- representing / Policy representation
- policy-based method
- versus value-based method / Policy-based versus value-based methods
- policy gradient (PG) / Training of I2A
- policy gradients / Policy gradients
- PPO
- about / Proximal Policy Optimization
- implementing / Implementation
- results / Results
- practical cross-entropy / Practical cross-entropy
- prioritized replay buffer
- about / Prioritized replay buffer
- implementing / Implementation
- results / Results
- problem statements / Problem statements and key decisions
- Ptan
- reference / Hardware and software requirements
- PyBullet
- about / Environments
- Python
- module multiprocessing / Multiprocessing in Python
- PyTorch / Hardware and software requirements
- about / ES on HalfCheetah
- PyTorch Agent Net library
- about / The PyTorch Agent Net library
- design principles / The PyTorch Agent Net library
- agent entity / Agent
- agents experience / Agent's experience
- experience buffer / Experience buffer
- gym env wrappers / Gym env wrappers
- PyTorch documentation
- reference / Tensor operations
Q
- Q-learning, for FrozenLake
- about / Q-learning for FrozenLake
R
- Random CartPole agent / The random CartPole agent
- real-life value iteration / Real-life value iteration
- recurrent neural network (RNN) / The Rollout encoder
- reinforcement learning
- about / Learning – supervised, unsupervised, and reinforcement
- formalisms / RL formalisms and relations
- relations / RL formalisms and relations
- reward / Reward
- agent / The agent
- environment / The environment
- actions / Actions
- observations / Observations
- in seq2seq / RL in seq2seq
- REINFORCE method
- about / The REINFORCE method
- CartPole example / The CartPole example
- results / Results
- issues / REINFORCE issues, Full episodes are required, High gradients variance, Exploration, Correlation between samples
- Remote Framebuffer Protocol (RFP) / Recording format
- reference / Recording format
- results
- feed-forward model / The feed-forward model
- convolution model / The convolution model
- RL methods
- taxonomy / Taxonomy of RL methods
- roboschool
- about / Roboschool
- installation link / Roboschool
S
- sample efficiency / Value iteration in practice, Correlation and sample efficiency
- scalar tensors / Scalar tensors
- seq2seq
- reinforcement learning / RL in seq2seq
- seq2seq model
- about / Encoder-Decoder
- training / Training of seq2seq
- log-likelihood training / Log-likelihood training
- Bilingual evaluation understudy (BLEU) score / Bilingual evaluation understudy (BLEU) score
- self-critical sequence training / Self-critical sequence training
- simple clicking approach
- about / Simple clicking approach
- grid actions / Grid actions
- example overview / Example overview
- model / Model
- training code / Training code, Starting containers
- starting containers / Starting containers
- training process / Training process
- learned policy, checking / Checking the learned policy
- issues, with simple clicking / Issues with simple clicking
- software requisites / Hardware and software requirements
- stochastic
- about / Deterministic policy gradients
- stochastic gradient descent (SGD) / Deep Q-learning, Log-likelihood training
- about / Deterministic policy gradients
- stochastic gradient descent (SGD) method
- about / ES on HalfCheetah
- supervised learning
- supervised learning problems
T
- tabular Q-learning
- about / Tabular Q-learning
- teacher forcing / Log-likelihood training
- Telegram bot
- about / Telegram bot
- reference / Telegram bot
- TensorBoard
- monitoring with / Monitoring with TensorBoard
- plotting stuff / Plotting stuff
- tensorboard-pytorch
- reference / TensorBoard 101
- TensorBoard 101 / TensorBoard 101
- tensors
- about / Tensors
- creating / Creation of tensors
- scalar tensors / Scalar tensors
- operations / Tensor operations
- GPU tensors / GPU tensors
- and gradients / Tensors and gradients
- text description
- adding / Adding text description
- results / Results
- TicTacToe
- game tree / Monte-Carlo Tree Search
- trading
- about / Trading
- trading environment / The trading environment
- training code / Training code
- tree pruning
- about / Board games
- TRPO
- about / Trust Region Policy Optimization
- implementing / Implementation
- results / Results
- trust region policy optimization (TRPO) / Model imperfections
U
- unsupervised learning
V
- value
- about / Value, state, and optimality
- calculating / Value, state, and optimality
- value-based method
- versus policy-based method / Policy-based versus value-based methods
- value iteration method
- about / The value iteration method
- working, for FrozenLake / Value iteration in practice
- reward table / Value iteration in practice
- transitions table / Value iteration in practice
- value table / Value iteration in practice
- value of action
- about / Value of action
- value of state
- about / Value, state, and optimality
- values / Values and policy
- variance reduction / Variance reduction
- VcXsrv
- reference / Monitor
- virtual network computing
- reference / Mini World of Bits benchmark
W
- web navigation
- about / Web navigation
- word2vec / Embeddings
- word embeddings / Embeddings
- wrappers / Wrappers
- wrappers, OpenAI Baselines project
- reference / Gym env wrappers
X
- Xvfb (X11 virtual framebuffer) / Monitor