Comment by yousef_g

Comment by yousef_g 2 days ago

The dataset is for trying out fine-tuning of the diffusion model. It's a reimplementation of SD3 by writing the code from scratch again, but the weights are taken from HuggingFace due to hardware constraints on my part.

reedlaw 2 days ago

So this implements SD3 inference and fine-tuning?

Reply View 0 replies

jatins 2 days ago

> It's a reimplementation of SD3 by writing the code from scratch again, but the weights are taken from HuggingFace due to hardware constraints on my part.

Could you clarify what you mean by this part -- if the weights are taken from HF then what's the implementation for?

Reply View 3 replies

MoonGhost 2 days ago

My guess the weights from HF are used as initial state for the model because full training is too expensive. Then small dataset is used to train in further for short time. Which is fine tuning. Together it shows that model is 1) compatible 2) trainable. In theory it can be trained from scratch on big dataset. I didn't look in the code yet so the questions are: 1) can it be trained in parallel? 2) resources required for training?
Anyway, I may try to train it on limited specialized dataset...

Reply View | 0 replies
elbear a day ago

The model consists of its architecture which is expressed as code, and its knowledge, which is gained through training.

Reply View | 0 replies
montebicyclelo a day ago

> if the weights are taken from HF then what's the implementation for
The weights are essentially a bunch of floating point numbers, (grouped into tensors). The code says what operations to do with the weights. E.g. say you load matrix W from the weights, you could do `y = W @ x`, or `y = W.T @ x`, or `y = W @ W @ x` etc.

Reply View | 0 replies