To select appropriate reinforcementlearning algorithms, reply to as many of the following questions as possible:
Less preferred algorithms will be marked yellow
Unlike the questions above regarding what is dictated by the environment, the following question is about your planned choice of method properties:
For selecting a parametric probablity distribution for actions, see Section 3 in the full paper.
Modelfree 
Offpolicy 
Tabular valuebased with exact maximization 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Qlearning [Watkins & Dayan 1992] with Q(λ) 
Modelfree 
Offpolicy 
Tabular valuebased with exact maximization 
Q(λ) 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 
SARSA [Rummery et al. 1994] with TD 
Modelfree 
Onpolicy 
Tabular valuebased with exact maximization 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 
SARSA [Rummery et al. 1994] with SARSA(λ) 
Modelfree 
Onpolicy 
Tabular valuebased with exact maximization 
SARSA(λ) 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 
Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

DQfD [Hester et al. 2018] with TD 
Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Imitation learning 
DQfD [Hester et al. 2018] with TD(n) 
Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD(n) 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Imitation learning 
Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD(n) 
No entropy regularization 
Distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with approximate maximization and fixed search procedure 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with approximate maximization and learned search procedure 
Q(λ) 
Perstate entropy regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD 
No entropy regularization 
Distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with approximate maximization and fixed search procedure 
TD 
No entropy regularization 
Distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD(n) 
No entropy regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
Retrace(λ) 
No entropy regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Nontabular valuebased with exact maximization 
TD(n) 
No entropy regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Policybased 
MC 
Perstate entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD(n) 
Soft Qlearning 
Not distributional 
Not distributed 
Not hierarchical 
Imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
GTD(λ) 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
Retrace(λ) 
Perstate entropy regularization 
Distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
GAE(λ) 
Soft Qlearning 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
Retrace(λ) 
Kullback–Leibler divergence regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
Soft Qlearning 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
Perstate entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD(λ) 
Kullback–Leibler divergence regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD(n) 
No entropy regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD(n) 
No entropy regularization 
Distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
TD(n) 
Kullback–Leibler divergence regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
GAE(λ) 
Perstate entropy and Kullback–Leibler divergence regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
GAE(λ) 
Kullback–Leibler divergence regularization 
Not distributional 
Distributed 
Hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
TD(n) 
Mutualinformation regularization 
Not distributional 
Not distributed 
Hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
TD(n) 
Perstate entropy regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
Retrace(λ) 
Perstate entropy and Kullback–Leibler divergence regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
LSTDQ(λ) 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
TD(n) 
Perstate entropy and Kullback–Leibler divergence regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
Soft Qlearning 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
Soft Qlearning 
Not distributional 
Not distributed 
Not hierarchical 
Imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
Soft Qlearning 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD 
Soft Qlearning 
Distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
GAE(λ) 
Kullback–Leibler divergence regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
Vtrace(n) 
Perstate entropy regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
Vtrace(n) 
Perstate entropy regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
TD(n) 
No entropy regularization 
Not distributional 
Distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Offpolicy 
Actorcritic 
GAE(λ) 
Kullback–Leibler divergence regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
MC 
Perstate entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

Modelfree 
Onpolicy 
Actorcritic 
TD 
Perstate entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

REDQ [Chen et al. 2021] with MC 
Modelfree 
Offpolicy 
Actorcritic 
MC 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 
REDQ [Chen et al. 2021] with TD 
Modelfree 
Offpolicy 
Actorcritic 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 
Modelfree 
Onpolicy 
Actorcritic 
TD 
No entropy regularization 
Not distributional 
Not distributed 
Not hierarchical 
Not imitation learning 

For modelbased algorithms, see e.g. the survey papers Moerland et al. (2020b), Moerland et al. (2020a), Wang et al. (2019), Hamrick et al. (2020), Plaat et al. (2020). 
Modelbased 