RL-Picker

Please view this page on a large screen

To select appropriate reinforcement-learning algorithms, reply to as many of the following questions as possible:

Less preferred algorithms will be marked yellow

Environment dynamics consisting of reward function r and state-transition probability p(s'|s,a)
Action sequences
Expert
Computational resources for running several agents in parallel during training
Estimate the "risk" of taking certain actions?
Which of the following is more important?
Rewards
Is the optimal policy probably stochastic?
Is estimating the benefit (value) of each possible action considerably more difficult than selecting the best action?
Duration of an episode
Action space
Important information from past observations

Unlike the questions above regarding what is dictated by the environment, the following question is about your planned choice of method properties:

What target policy will you choose?

For selecting a parametric probablity distribution for actions, see Section 3 in the full paper.

Model-free
Off-policy
Tabular value-based with exact maximization
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Tabular value-based with exact maximization
Q(λ)
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Tabular value-based with exact maximization
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Tabular value-based with exact maximization
SARSA(λ)
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD(n)
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Not distributional
Not distributed
Hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD(n)
No entropy regularization
Distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with approximate maximization and fixed search procedure
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with approximate maximization and learned search procedure
Q(λ)
Per-state entropy regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD
No entropy regularization
Distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with approximate maximization and fixed search procedure
TD
No entropy regularization
Distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD(n)
No entropy regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
Retrace(λ)
No entropy regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Non-tabular value-based with exact maximization
TD(n)
No entropy regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Policy-based
MC
Per-state entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD(n)
Soft Q-learning
Not distributional
Not distributed
Not hierarchical
Imitation learning
Model-free
Off-policy
Actor-critic
GTD(λ)
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
Retrace(λ)
Per-state entropy regularization
Distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
GAE(λ)
Soft Q-learning
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
Retrace(λ)
Kullback–Leibler divergence regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
Soft Q-learning
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
Per-state entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD(λ)
Kullback–Leibler divergence regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD(n)
No entropy regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD(n)
No entropy regularization
Distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
TD(n)
Kullback–Leibler divergence regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
GAE(λ)
Per-state entropy and Kullback–Leibler divergence regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
GAE(λ)
Kullback–Leibler divergence regularization
Not distributional
Distributed
Hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
No entropy regularization
Not distributional
Not distributed
Hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
TD(n)
Mutual-information regularization
Not distributional
Not distributed
Hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
TD(n)
Per-state entropy regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
Retrace(λ)
Per-state entropy and Kullback–Leibler divergence regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
LSTD-Q(λ)
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
TD(n)
Per-state entropy and Kullback–Leibler divergence regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
Soft Q-learning
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
Soft Q-learning
Not distributional
Not distributed
Not hierarchical
Imitation learning
Model-free
Off-policy
Actor-critic
TD
Soft Q-learning
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
Soft Q-learning
Distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
GAE(λ)
Kullback–Leibler divergence regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
V-trace(n)
Per-state entropy regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
V-trace(n)
Per-state entropy regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD(n)
No entropy regularization
Not distributional
Distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
GAE(λ)
Kullback–Leibler divergence regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
MC
Per-state entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
TD
Per-state entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
MC
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
Off-policy
Actor-critic
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-free
On-policy
Actor-critic
TD
No entropy regularization
Not distributional
Not distributed
Not hierarchical
Not imitation learning
Model-based

If you find this overview helpful, please cite the detailed version as Bongratz et al., 2024