Comment by BoredPositron
Comment by BoredPositron 6 months ago
You can disable clip l on flux without a loss in quality. You are also making an elephant out of a fly. CLIP is used everywhere.
Comment by BoredPositron 6 months ago
You can disable clip l on flux without a loss in quality. You are also making an elephant out of a fly. CLIP is used everywhere.
The truth is that the CLIP conditioning in Flux works well for Dreambooth style fine tuning where tokenization bugs can be acute, but not so severe as to cause the low impact of CLIP on their dev model. It is likely more impactful on their pro / max models but only BFL could say so.
okay well, there are a few things that are known to be true: (1) clip's tokenizer in diffusers, the reference source in BFL's repo, and in openai's repo, is buggy (2) many clip prompts are observed to have a low impact in the flux dev and schnell models. it is very likely to be true that (1) the tokenizer in the BFL reference source and openai's repo does not match the tokenizer used in training openai's clip or the text conditioning for any of the flux checkpoints (2) the guidance and timestep distillation play a role in weakening the role of clip (3) it is practical to fine tune clip on more image-caption pairs. if you care about fine tuning, the tokenization bugs matter. everything else is hard to prove.
Consider another interpretation: CLIP L in Flux can be disabled without a loss in quality because the way it is used is buggy!