Comment by macleginn
So this looks essentially like continuous prompting (see prefix tuning) with RL-driven selection of what to present as tokens and what as continuous inputs (embeddings).
So this looks essentially like continuous prompting (see prefix tuning) with RL-driven selection of what to present as tokens and what as continuous inputs (embeddings).