VAR
VAR: AR via Next-Scale Prediction

Next-Scale Prediction
Reconceptualize the autoregressive modeling on images by shifting from “next-token prediction” to “next-scale prediction” strategy.
The autoregressive unit is an entire token map, rather than a single token.


- interpolate → resize
f
,z_k
to ($h_k,w_k$)
VAR Transformer
-
$r_k$
: Token map (可以理解为一个r_k
中包含了一组 tokens)- e.g.
$r_3$
有9个tokens(9个token会被并行的预测出来)
- e.g.
Time Complexity

Enjoy Reading This Article?
Here are some more articles you might like to read next: