VAR

VAR: AR via Next-Scale Prediction

Next-Scale Prediction

Reconceptualize the autoregressive modeling on images by shifting from “next-token prediction” to “next-scale prediction” strategy.

The autoregressive unit is an entire token map, rather than a single token.

  • interpolate → resize f, z_k to ($h_k,w_k$)

VAR Transformer

  • $r_k$: Token map (可以理解为一个r_k中包含了一组 tokens)
    • e.g. $r_3$ 有9个tokens(9个token会被并行的预测出来)

Time Complexity




    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • Terminal Command
  • Computer Environment
  • NeRF
  • 3DGS
  • SDS