Comment by versteegen
Comment by versteegen 3 days ago
Define "by definition".
Because this statement really makes no sense. Transformers are perfectly capable (and capable of perfectly) learning mathematical functions, given the necessary working-out space, e.g. for long division or for algebraic manipulation. And they can learn to generalise from their training data very well (although very data-inefficiently). That's their entire strength!