Comment by porridgeraisin

Comment by porridgeraisin 8 days ago

0 replies

Yep. Offline RL is especially full of these types of papers too. The sheer number of alternatives to the KL divergence to prevent the offline distribution from diverging too far from the collected data distribution... There's probably one method for each person on earth.