Classifier-Free Guidance(CFG)

Classifier Guidance

At each timestep t, we are interested in the score function $\nabla_{x_t} \log p(x_t, y)$:

\[\nabla_{x_t} \log p(x_t, y) = \nabla_{x_t} \log p(x_t) + \nabla_{x_t} \log p(y | x_t) \\= -\frac{1}{\sqrt{1 - \bar{\alpha}_t}} \varepsilon_\theta ({x}_t, t) + \nabla_{x_t} \log p(y | x_t) \\= -\frac{1}{\sqrt{1 - \bar{\alpha}_t}} \left( \varepsilon_\theta ({x}_t, t) - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log p(y | x_t) \right)\]

${\varepsilon}_\theta ({x}_t, t) - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log p(y/x_t)$ is the new noise prediction

Update the noise prediction as follows:

\[\bar{\varepsilon}_\theta ({x}_t, t) = {\varepsilon}_\theta ({x}_t, t) - \textcolor{red}{\omega} \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log p_\phi (y|x_t) \]

The strength of classifier guidance can be controlled by adding a weight w $w \geq 1$

Need a additional classifier $\log p_\phi(y | x_t)$ which can be trained by a network

Classifier-Free Guidance

Do the same thing as classifier guidance without using an additional classifier network by incorporating the extra input y directly.

Very easy to implement: only need to add a encoder for the condition to be fed into network.

\[\hat{\varepsilon}_\theta(\mathbf{x}_t, t) \ \rightarrow \ \hat{\varepsilon}_\theta(\mathbf{x}_t, \textcolor{red}{\mathbf{y}}, t)\]

y is the condition: can be some labels, text description, can also be a null label $\textcolor{red}{\emptyset}$
- Can also set $w \geq 0$ to extrapolate the conditional noise from null-conditional noise:
  - \[\hat{\varepsilon}_\theta(\mathbf{x}_t, \mathbf{y}, t) = (1+w)\hat{\varepsilon}_\theta(\mathbf{x}_t, \textcolor{red}{\mathbf{y}}, t) - w \hat{\varepsilon}_\theta(\mathbf{x}_t, \textcolor{red}{\emptyset}, t)\]

It is equivalent to the condition enhancement in Classifier Guidance:

\[\hat{\varepsilon}_\theta(\mathbf{x}_t, \mathbf{y}, t) = \hat{\varepsilon}_{\theta} ({x}_t, \textcolor{red}{\phi}, t) + \textcolor{blue}{(1+w)} (\hat{\varepsilon}_{\theta} ({x}_t, {y}, t) - \hat{\varepsilon}_{\theta} ({x}_t, \textcolor{red}{\phi}, t))\\ = - \sqrt{1 - \bar{\alpha}_t} \left( \nabla_{x_t} \log p({x}_t) + \textcolor{blue}{(1+w)} \left( \nabla_{x_t} \log p({x}_t | {y}) - \nabla_{x_t} \log p({x}_t) \right) \right)\\ = - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log \left( p({x}_t) \left( \frac{p({x}_t | {y})}{p({x}_t)} \right)^{\textcolor{blue}{(1+w)}} \right)\\ = - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log \left( p({x}_t) \left( p({y} | {x}_t)\right)^{\textcolor{blue}{(1+w)}} \right)\]

Classifier-Free Guidance(CFG)

Classifier Guidance

Classifier-Free Guidance

Enjoy Reading This Article?