Comment by lukev

Comment by lukev a day ago

0 replies

Think about it this way; they are encoding whole "thoughts" or "ideas" as single tokens.

It's effectively a multimodal model, which handles "concept" tokens alongside "language" tokens and "image" tokens.

A really big conceptual step, actually, IMO.