CFG
Classifier-Free Guidance(CFG)
Classifier Guidance
At each timestep t, we are interested in the score function $\nabla_{x_t} \log p(x_t, y)$:
\[\nabla_{x_t} \log p(x_t, y) = \nabla_{x_t} \log p(x_t) + \nabla_{x_t} \log p(y | x_t) \\= -\frac{1}{\sqrt{1 - \bar{\alpha}_t}} \varepsilon_\theta ({x}_t, t) + \nabla_{x_t} \log p(y | x_t) \\= -\frac{1}{\sqrt{1 - \bar{\alpha}_t}} \left( \varepsilon_\theta ({x}_t, t) - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log p(y | x_t) \right)\]\({\varepsilon}_\theta ({x}_t, t) - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log p(y/x_t)\) is the new noise prediction
Update the noise prediction as follows:
\[\bar{\varepsilon}_\theta ({x}_t, t) = {\varepsilon}_\theta ({x}_t, t) - \textcolor{red}{\omega} \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log p_\phi (y|x_t) \]- The strength of classifier guidance can be controlled by adding a weight w $w \geq 1$
Need a additional classifier $\log p_\phi(y | x_t)$
which can be trained by a network
Classifier-Free Guidance
Do the same thing as classifier guidance without using an additional classifier network by incorporating the extra input y
directly.
Very easy to implement: only need to add a encoder for the condition to be fed into network.
\[\hat{\varepsilon}_\theta(\mathbf{x}_t, t) \ \rightarrow \ \hat{\varepsilon}_\theta(\mathbf{x}_t, \textcolor{red}{\mathbf{y}}, t)\]- y is the condition: can be some labels, text description, can also be a null label $\textcolor{red}{\emptyset}$
- Can also set $w \geq 0$ to extrapolate the conditional noise from null-conditional noise:
- $\hat{\varepsilon}\theta(\mathbf{x}_t, \mathbf{y}, t) = (1+w)\hat{\varepsilon}\theta(\mathbf{x}t, \textcolor{red}{\mathbf{y}}, t) - w \hat{\varepsilon}\theta(\mathbf{x}_t, \textcolor{red}{\emptyset}, t)$
- Can also set $w \geq 0$ to extrapolate the conditional noise from null-conditional noise:

It is equivalent to the condition enhancement in Classifier Guidance:
\[\hat{\varepsilon}_\theta(\mathbf{x}_t, \mathbf{y}, t) = \hat{\varepsilon}_{\theta} ({x}_t, \textcolor{red}, t) + \textcolor{blue}{(1+w)} (\hat{\varepsilon}_{\theta} ({x}_t, {y}, t) - \hat{\varepsilon}_{\theta} ({x}_t, \textcolor{red}, t))\\ = - \sqrt{1 - \bar{\alpha}_t} \left( \nabla_{x_t} \log p({x}_t) + \textcolor{blue}{(1+w)} \left( \nabla_{x_t} \log p({x}_t | {y}) - \nabla_{x_t} \log p({x}_t) \right) \right)\\ = - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log \left( p({x}_t) \left( \frac{p({x}_t | {y})}{p({x}_t)} \right)^{\textcolor{blue}{(1+w)}} \right)\\ = - \sqrt{1 - \bar{\alpha}_t} \nabla_{x_t} \log \left( p({x}_t) \left( p({y} | {x}_t)\right)^{\textcolor{blue}{(1+w)}} \right)\]Enjoy Reading This Article?
Here are some more articles you might like to read next: