Comment by jatins

Comment by jatins 2 days ago

3 replies

> It's a reimplementation of SD3 by writing the code from scratch again, but the weights are taken from HuggingFace due to hardware constraints on my part.

Could you clarify what you mean by this part -- if the weights are taken from HF then what's the implementation for?

MoonGhost a day ago

My guess the weights from HF are used as initial state for the model because full training is too expensive. Then small dataset is used to train in further for short time. Which is fine tuning. Together it shows that model is 1) compatible 2) trainable. In theory it can be trained from scratch on big dataset. I didn't look in the code yet so the questions are: 1) can it be trained in parallel? 2) resources required for training?

Anyway, I may try to train it on limited specialized dataset...

elbear a day ago

The model consists of its architecture which is expressed as code, and its knowledge, which is gained through training.

montebicyclelo a day ago

> if the weights are taken from HF then what's the implementation for

The weights are essentially a bunch of floating point numbers, (grouped into tensors). The code says what operations to do with the weights. E.g. say you load matrix W from the weights, you could do `y = W @ x`, or `y = W.T @ x`, or `y = W @ W @ x` etc.