Comment by jatins
> It's a reimplementation of SD3 by writing the code from scratch again, but the weights are taken from HuggingFace due to hardware constraints on my part.
Could you clarify what you mean by this part -- if the weights are taken from HF then what's the implementation for?
My guess the weights from HF are used as initial state for the model because full training is too expensive. Then small dataset is used to train in further for short time. Which is fine tuning. Together it shows that model is 1) compatible 2) trainable. In theory it can be trained from scratch on big dataset. I didn't look in the code yet so the questions are: 1) can it be trained in parallel? 2) resources required for training?
Anyway, I may try to train it on limited specialized dataset...