Comment by yousef_g

Comment by yousef_g 2 days ago

5 replies

The dataset is for trying out fine-tuning of the diffusion model. It's a reimplementation of SD3 by writing the code from scratch again, but the weights are taken from HuggingFace due to hardware constraints on my part.

reedlaw 2 days ago

So this implements SD3 inference and fine-tuning?

jatins 2 days ago

> It's a reimplementation of SD3 by writing the code from scratch again, but the weights are taken from HuggingFace due to hardware constraints on my part.

Could you clarify what you mean by this part -- if the weights are taken from HF then what's the implementation for?

  • MoonGhost 2 days ago

    My guess the weights from HF are used as initial state for the model because full training is too expensive. Then small dataset is used to train in further for short time. Which is fine tuning. Together it shows that model is 1) compatible 2) trainable. In theory it can be trained from scratch on big dataset. I didn't look in the code yet so the questions are: 1) can it be trained in parallel? 2) resources required for training?

    Anyway, I may try to train it on limited specialized dataset...

  • elbear a day ago

    The model consists of its architecture which is expressed as code, and its knowledge, which is gained through training.

  • montebicyclelo a day ago

    > if the weights are taken from HF then what's the implementation for

    The weights are essentially a bunch of floating point numbers, (grouped into tensors). The code says what operations to do with the weights. E.g. say you load matrix W from the weights, you could do `y = W @ x`, or `y = W.T @ x`, or `y = W @ W @ x` etc.