AI Horseless Carriages

864 points by petekoomen 3 months ago

dvt 3 months ago

What we need, imo, is:

1. A new UX/UI paradigm. Writing prompts is dumb, re-writing prompts is even dumber. Chat interfaces suck.

2. "Magic" in the same way that Google felt like magic 25 years ago: a widget/app/thing that knows what you want to do before even you know what you want to do.

3. Learned behavior. It's ironic how even something like ChatGPT (it has hundreds of chats with me) barely knows anything about me & I constantly need to remind it of things.

4. Smart tool invocation. It's obvious that LLMs suck at logic/data/number crunching, but we have plenty of tools (like calculators or wikis) that don't. The fact that tool invocation is still in its infancy is a mistake. It should be at the forefront of every AI product.

5. Finally, we need PRODUCTS, not FEATURES; and this is exactly Pete's point. We need things that re-invent what it means to use AI in your product, not weirdly tacked-on features. Who's going to be the first team that builds an AI-powered operating system from scratch?

I'm working on this (and I'm sure many other people are as well). Last year, I worked on an MVP called Descartes[1][2] which was a spotlight-like OS widget. I'm re-working it this year after I had some friends and family test it out (and iterating on the idea of ditching the chat interface).

[1] https://vimeo.com/931907811

[2] https://dvt.name/wp-content/uploads/2024/04/image-11.png

Reply View 22 replies

hermitShell 3 months ago

Agreed, our whole computing paradigm needs to shift at a fundamental level in order to let AI be 'magic', not just token prediction. Chatbots will provide some linear improvements, but ultimately I very much agree with you and the article that we're trapped in an old mode of thinking.
You might be interested in this series: https://www.youtube.com/@liber-indigo
In the same way that Microsoft and the 'IBM clones' brought us the current computing paradigm built on the desktop metaphor, I believe there will have to be a new OS built on a new metaphor. It's just a question of when those perfect conditions arise for lightning to strike on the founders who can make it happen. And just like Xerox and IBM, the actual core ideas might come from the tech giants (FAANG et al.) but they may not end up being the ones to successfully transition to the new modality.

Reply View | 0 replies
nthingtohide 3 months ago

Feature Request: Can we have dark mode for videos? An AI OS should be able to understand and satisfy such a usecases.
E.g. Scott Aaronson | How Much Math Is Knowable?
https://youtu.be/VplMHWSZf5c
The video slides could be converted into a dark mode for night viewing.

Reply View | 0 replies
jonahx 3 months ago

> 3. Learned behavior. It's ironic how even something like ChatGPT (it has hundreds of chats with me) barely knows anything about me & I constantly need to remind it of things.
I've wondered about this. Perhaps the concern is saved data will eventually overwhelm the context window? And so you must judicious in the "background knowledge" about yourself that gets remembered, and this problem is harder than it seems?
Btw, you can ask ChatGPT to "remember this". Ime the feature feels like it doesn't always work, but don't quote me on that.

Reply View | 3 replies
- dvt 3 months ago
  
  Yes, but this should be trivially done with an internal `MEMORY` tool the LLM calls. I know that the context can't grow infinitely, but this shouldn't prevent filling the context with relevant info when discussing topic A (even a lazy RAG approach should work).
  
  Reply View | 2 replies
  
  otabdeveloper4 3 months ago
  
  What you're describing is just RAG, and it doesn't work that well. (You need a search engine for RAG, and the ideal search engine is an LLM with infinite context. But the only way to scale LLM context is by using RAG. We have infinite recursion here.)
  
  Reply View | 0 replies
  
  nthingtohide 3 months ago
  
  You are asking for a feature like this. Future advances will help in this.
  https://youtu.be/ZUZT4x-detM
  
  Reply View | 0 replies
sanderjd 3 months ago

On the tool-invocation point: Something that seems true to me is that LLMs are actually too smart to be good tool-invokers. It may be possible to convince them to invoke a purpose-specific tool rather than trying to do it themselves, but it feels harder than it should be, and weird to be limiting capability.
My thought is: Could the tool-routing layer be a much simpler "old school" NLP model? Then it would never try to do math and end up doing it poorly, because it just doesn't know how to do that. But you could give it a calculator tool and teach it how to pass queries along to that tool. And you could also give it a "send this to a people LLM tool" for anything that doesn't have another more targeted tool registered.
Is anyone doing it this way?

Reply View | 13 replies
- dvt 3 months ago
  
  > Is anyone doing it this way?
  I'm working on a way of invoking tools mid-tokenizer-stream, which is kind of cool. So for example, the LLM says something like (simplified example) "(lots of thinking)... 1+2=" and then there's a parser (maybe regex, maybe LR, maybe LL(1), etc.) that sees that this is a "math-y thing" and automagically goes to the CALC tool which calculates "3", sticks it in the stream, so the current head is "(lots of thinking)... 1+2=3 " and then the LLM can continue with its thought process.
  
  Reply View | 12 replies
  
  namaria 3 months ago
  
  Cold winds are blowing when people look at LLMs and think "maybe an expert system on top of that?".
  
  Reply View | 10 replies
  
  sanderjd 3 months ago
  
  Definitely an interesting thought to do this at the tokenizer level!
  
  Reply View | 0 replies
erklik 3 months ago

> 1. A new UX/UI paradigm. Writing prompts is dumb, re-writing prompts is even dumber. Chat interfaces suck.
> 2. "Magic" in the same way that Google felt like magic 25 years ago: a widget/app/thing that knows what you want to do before even you know what you want to do.
and not to "dunk" on you or anything of the sort but that's literally what Descartes seems to be? Another wrapper where I am writing prompts telling the AI what to do.

Reply View | 1 reply
- dvt 3 months ago
  
  > and not to "dunk" on you or anything of the sort but that's literally what Descartes seems to be? Another wrapper where I am writing prompts telling the AI what to do.
  Not at all, you're totally correct; I'm re-imagining it this year from scratch, it was just a little experiment I was working on (trying to combine OS + AI). Though, to be clear, it's built in rust & it fully runs models locally, so it's not really a ChatGPT wrapper in the "I'm just calling an API" sense.
  
  Reply View | 0 replies

minimaxir 3 months ago

AI-generated prefill responses is one of the use cases of generative AI I actively hate because it's comically bad. The business incentive of companies to implement it, especially social media networks, is that it reduces friction for posting content, and therefore results in more engagement to be reported at their quarterly earnings calls (and as a bonus, this engagement can be reported as organic engagement instead of automated). For social media, the low-effort AI prefill comments may be on par than the median human comment, but for more intimate settings like e-mail, the difference is extremely noticeable for both parties.

Despite that, you also have tools like Apple Intelligence marketing the same thing, which are less dictated by metrics, in addition to doing it even less well.

Reply View 2 replies

bluGill 3 months ago

The prefill makes things worse. I can type "thank you" in seconds, knowing that someone might have just clicked instead says they didn't think enough about me to take those seconds to type the words.

Reply View | 0 replies
mberning 3 months ago

I agree. They always seem so tone deaf and robotic. Like you could get an email letting you know someone died and the prefill will be along the lines of “damn that’s crazy”.

Reply View | 0 replies

nonameiguess 3 months ago

The proposed alternative doesn't sound all that much better to me. You're hand crafting a bunch of rule-based heuristics, which is fine, but you could already do that with existing e-mail clients and I did. All the LLM is adding is auto-drafting of replies, but this just gets back to the "typing isn't the bottleneck" problem. I'm still going to spend just as long reading the draft and contemplating whether I want to send it that way or change it. It's not really saving any time.

A feature that seems to me would truly be "smart" would be an e-mail client that observes my behavior over time and learns from it directly. Without me prompting or specifying rules at all, it understands and mimics my actions and starts to eventually do some of them automatically. I suspect doing that requires true online learning, though, as in the model itself changes over time, rather than just adding to a pre-built prompt injected to the front of a context window.