VAR
VAR: AR via Next-Scale Prediction
Next-Scale Prediction
Reconceptualize the autoregressive modeling on images by shifting from “next-token prediction” to “next-scale prediction” strategy.
The autoregressive unit is an entire token map, rather than a single token.
- interpolate → resize
f,z_kto ($h_k,w_k$)
VAR Transformer
-
$r_k$: Token map (可以理解为一个r_k中包含了一组 tokens)- e.g.
$r_3$有9个tokens(9个token会被并行的预测出来)
- e.g.
Time Complexity
Enjoy Reading This Article?
Here are some more articles you might like to read next: