Here is a list of things you can do to improve your understanding of the topic:
In the D4PG code, I used a simple replay buffer, which was enough to get good an improvement over DDPG. You can try to switch the example to the prioritized replay buffer in the same way as we did in Chapter 7, DQN Extensions, and check the effect.
There are lots of interesting and challenging environments around. For example, you can start with other PyBullet environments, but there is also DeepMind Control Suite (there was a paper about it published at the beginning of 2018, comparing the A3C, DDPG, and D4PG methods), MuJoCo-based environments in Gym and lots of others.
You can request the trial license of MuJoCo and compare its stability, performance and resulting policy with PyBullet.
Play with the very challenging Learning how to run competition from NIPS-2017, where you are given a simulator of the human body and your agent needs to figure out how to move it around.