CFG

Classifier-Free Guidance(CFG)

Classifier Guidance

At each timestep t, we are interested in the score function $\nabla_{x_t} \log p(x_t, y)$:

\[\nabla_{x_t} \log p(x_t, y) = \nabla_{x_t} \log p(x_t) + \nabla_{x_t} \log p(y | x_t) \\= -\frac{1}{\sqrt{1 - \bar{\alpha}_t}} \varepsilon_\theta ({x}_t, t) + \nabla_{x_t} \log p(y | x_t) \\= -\frac{1}{\sqrt{1 - \bar{\alpha}_t}} \left( \varepsilon_\theta ({x}_t, t) - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log p(y | x_t) \right)\]

\({\varepsilon}_\theta ({x}_t, t) - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log p(y/x_t)\) is the new noise prediction

Update the noise prediction as follows:

\[\bar{\varepsilon}_\theta ({x}_t, t) = {\varepsilon}_\theta ({x}_t, t) - \textcolor{red}{\omega} \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log p_\phi (y|x_t) \]
  • The strength of classifier guidance can be controlled by adding a weight w $w \geq 1$

Need a additional classifier $\log p_\phi(y | x_t)$ which can be trained by a network

Classifier-Free Guidance

Do the same thing as classifier guidance without using an additional classifier network by incorporating the extra input y directly.

Very easy to implement: only need to add a encoder for the condition to be fed into network.

\[\hat{\varepsilon}_\theta(\mathbf{x}_t, t) \ \rightarrow \ \hat{\varepsilon}_\theta(\mathbf{x}_t, \textcolor{red}{\mathbf{y}}, t)\]
  • y is the condition: can be some labels, text description, can also be a null label $\textcolor{red}{\emptyset}$
    • Can also set $w \geq 0$ to extrapolate the conditional noise from null-conditional noise:
      • $\hat{\varepsilon}\theta(\mathbf{x}_t, \mathbf{y}, t) = (1+w)\hat{\varepsilon}\theta(\mathbf{x}t, \textcolor{red}{\mathbf{y}}, t) - w \hat{\varepsilon}\theta(\mathbf{x}_t, \textcolor{red}{\emptyset}, t)$

It is equivalent to the condition enhancement in Classifier Guidance:

\[\hat{\varepsilon}_\theta(\mathbf{x}_t, \mathbf{y}, t) = \hat{\varepsilon}_{\theta} ({x}_t, \textcolor{red}, t) + \textcolor{blue}{(1+w)} (\hat{\varepsilon}_{\theta} ({x}_t, {y}, t) - \hat{\varepsilon}_{\theta} ({x}_t, \textcolor{red}, t))\\ = - \sqrt{1 - \bar{\alpha}_t} \left( \nabla_{x_t} \log p({x}_t) + \textcolor{blue}{(1+w)} \left( \nabla_{x_t} \log p({x}_t | {y}) - \nabla_{x_t} \log p({x}_t) \right) \right)\\ = - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log \left( p({x}_t) \left( \frac{p({x}_t | {y})}{p({x}_t)} \right)^{\textcolor{blue}{(1+w)}} \right)\\ = - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log \left( p({x}_t) \left( p({y} | {x}_t)\right)^{\textcolor{blue}{(1+w)}} \right)\]



    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • Terminal Command
  • Computer Environment
  • NeRF
  • 3DGS
  • SDS