Comment by PaulHoule

As I see it three letter organizations have been using frameworks like Apache UIMA to build information extraction pipelines that are manual at worst and hybrid at best. Before BERT the models we had for this sucked, only useful for certain things, and usually requiring training sets of 20,000 or so examples.

Today the range of things for which the models are tolerable to "great" has greatly expanded. In arXiv papers you tend to see people getting tepid results with 500 examples, I get better results with 5000 examples and diminishing returns past 15k.

For a lot of people it begins and ends with "prompt engineering" of commercial decoder models and evaluation isn't even an afterthought For information extraction, classification and such though you get often good results with encoder models (e.g. BERT) put together with serious eval, calibration and model selection. Still the system looks like the old systems if your problem is hard and has to be done in a scalable way, but sometimes you can make something that "just works" without trying too hard, keeping your train/eval data in a spreadsheet.