# Distributional policy gradients

As the last method of this chapter, we will take a look at the very recent paper by Gabriel Barth-Maron, Matthew W. Hoffman, and others, called *Distributed* *Distributional Deterministic Policy Gradients*, published in 2018 (https://arxiv.org/abs/1804.08617).

The full name of the method is **distributed distributional deep deterministic policy gradients** or **D4PG** for short. The authors proposed several improvements to the DDPG method to improve stability, convergence, and sample efficiency.

First of all, they adapted the distributional representation of the Q-value proposed in the paper by Marc G. Bellemare and others called *A Distributional Perspective on Reinforcement Learning*, published in 2017 (https://arxiv.org/abs/1707.06887). We discussed this approach in *Chapter 8*, *DQN Extensions*, when we talked about DQN improvements, so refer to it or to the original Bellemare paper for details. The core idea is to replace a single Q-value from the critic with...