Meta-DiffuB
A contextualized sequence-to-sequence text diffusion model with meta-exploration. Chuang et al., 2024.
One schedule for every sentence
Standard S2S-diffusion (DiffuSeq) adds noise on a fixed √-schedule. A trivial paraphrase and a hard open-domain reply are corrupted exactly the same way.
But sentences differ in difficulty — non-contextualized noise leaves performance on the table.
A network that reads the sentence
The Scheduler Bψ — a small Seq2Seq model — reads the conditioning sentence wˣ and emits a sequence of Meta-Instructions ι, each labelled True or False.
A ‘skipping’ rule turns those labels into noise: True steps the noise up, False holds it.
Less noise for hard sentences
The result is a per-sentence schedule βˣ that bends away from the fixed baseline. Harder sentences get less noise to preserve signal; easier ones get more to boost diversity.
This is the move that non-contextualized schedulers can't make.
Generate with the scheduled noise
The Exploiter Dθ — the S2S-diffusion model — diffuses and denoises using βˣ, recovering z₀ step by step, then rounds it back into a discrete target sentence ŷ.
Generation quality teaches the scheduler
How much did the exploiter improve? Compare BLEU before and after an update: the meta-reward R_β = r′ − r flows back through a policy gradient to train the scheduler.
The scheduler learns how to noise — never touching the generator's loss directly.
State of the art, plug-and-play
Meta-DiffuB sets a new bar across four Seq2Seq benchmarks, beating prior diffusion models and fine-tuned PLMs.
Better still, the trained scheduler drops into existing models like DiffuSeq as a plug-and-play module — no fine-tuning required.