Comment by criemen
> It is well-known that even the biggest SOTA models require only 100-200 good samples for fine-tuning.
As someone who's not heard of this before, do you have a link for this? Is this LORA-finetuning only? Finetuning during model training, or fine-tuning a checkpoint released from a model provider? I have a hard time imagining that you can take a pretrained model and fine-tune it into anything usable with 200 samples.
It's a general heuristic for any task.
https://docs.aws.amazon.com/nova/latest/userguide/fine-tune-...
> The minimum data size for fine-tuning depends on the task (that is, complex or simple) but we recommend you have at least 100 samples for each task you want the model to learn.
https://platform.openai.com/docs/guides/supervised-fine-tuni...
> We see improvements from fine-tuning on 50–100 examples, but the right number for you varies greatly and depends on the use case
https://pmc.ncbi.nlm.nih.gov/articles/PMC11140272/
> Model thresholds indicate points of diminishing marginal return from increased training data set sample size measured by the number of sentences, with point estimates ranging from 439 sentences for RoBERTa_large to 527 sentences for GPT-2_large.
> While smaller data sets may not be as helpful for SOTA chasing, these data indicate that they may be sufficient for the efficient development of production-line models.