The cross-entropy method falls into the model-free and policy-based category of methods. These notions are new, so let's spend some time exploring them. All methods in RL can be classified into various aspects:
Model-free or model-based
Value-based or policy-based
On-policy or off-policy
There are other ways that you can taxonomize RL methods, but for now we're interested in the preceding three. Let's define them, as your problem specifics can influence your decision on a particular method.
The term model-free means that the method doesn't build a model of the environment or reward; it just directly connects observations to actions (or values that are related to actions). In other words, the agent takes current observations and does some computations on them, and the result is the action that it should take. In contrast, model-based methods try to predict what the next observation and/or reward will be. Based on this prediction, the agent is trying to choose the best possible...