AI Horseless Carriages
(koomen.dev)771 points by petekoomen a day ago
771 points by petekoomen a day ago
One of the interesting things I've noticed is that the best experiences I've had with AI are with simple applications that don't do much to get in the way of the model, e.g. chatgpt and cursor/windsurf.
I'm hopeful that as devs figure out how to build better apps with AI we'll have have more and more "cursor moments" in other areas in our lives
Perhaps the real takeaway is that there really is only one product, two if you count image generation.
Perhaps the only reason Cursor is so good is because editing code is so similar to the basic function of an LLM without anything wrapped around it.
Like, someone prove me wrong by linking 3 transformative AI products that:
1. Have nothing to do with "chatting" to a thin wrapper (couldn't just be done inside a plain LLM with a couple of file uploads added for additional context)
2. Don't involve traditional ML that has existed for years and isn't part of the LLM "revolution."
3. Has nothing to do with writing code
For example, I recently used an AI chatbot that was supposed to help me troubleshoot a consumer IoT device. It basically regurgitated steps from the manual and started running around in circles because my issue was simply not covered by documentation. I then had to tell it to send me to a human. The human had more suggestions that the AI couldn't think of but still couldn't help because the product was a piece of shit.
Or just look at Amazon Q. Ask it a basic AWS question and it'll just give you a bogus "sorry I can't help with that" answer where you just know that running over to chatgpt.com will actually give you a legitimate answer. Most AI "products" seem to be castrated versions of ChatGPT/Claude/Gemini.
That sort of overall garbage experience seems to be what is most frequently associated with AI. Basically, a futile attempt to replace low-wage employees that didn't end up delivering any value to anyone, especially since any company interested in eliminating employees just because "fuck it why not" without any real strategy probably has a busted-ass product to begin with.
Putting me on hold for 15 minutes would have been more effective at getting me to go away and no compute cycles would have been necessary.
Outside of coding, Google's NotebookLM is quite useful for analysing complex documentation - things like standards and complicated API specs.
But yes, an AI chatbot that can't actually take any actions is effectively just regurgitating documentation. I normally contact support because the thing I need help with is either not covered in documentation, or requires an intervention. If AI can't make interventions, it's just a fancy kind of search with an annoying interface.
I don’t deny that LLMs are useful, merely that they only represent one product that does a small handful of things well, where the industry-specific applications don’t really involve a whole lot of extra features besides just “feed in data then chat with the LLM and get stuff back.”
Imagine if during the SaaS or big data or containerizaiton technology “revolutions” the application being run just didn’t matter at all. That’s kind of what’s going on with LLMs. Almost none of the products are all that much better than going to ChatGPT.com and dumping your data into the text box/file uploader and seeing what you get back.
Perhaps an analogy to describe what I mean would be if you were comparing two SaaS apps, like let’s say YNAB and the Simplifi budget app. In the world of the SaaS revolution, the capabilities of each application would be competitive advantages. I am choosing one over the other for the UX and feature list.
But in the AI LLM world, the difference between competing products is minimal. Whether you choose Cursor or Copilot or Firebase Studio you’re getting the same results because you’re feeding the same data to the same AI models. The companies that make the AI technologies basically don’t have a moat themselves, they’re basically just PaaS data center operators.
Everything where structured output is involved, from filling in forms based on medical interview transcripts / court proceedings / calls, to an augmented chatbot that can do things for you (think hotel reservations over the phone), to directly generating forms / dashboards / pages in your system.
Two off the top of my head:
There are a lot of tools in the sales space which fit your criteria.
Granola is the exact kind of product I’m criticizing as being extremely basic and barely more than a wrapper. It’s just a meeting transcriber/summarizer, barely provides more functionality than leaving the OpenAI voice mode on during a call and then copying and pasting your written notes into ChatGPT at the end.
Clay was founded 3 years before GPT 3 hit the market so I highly doubt that the majority of their core product runs on LLM-based AI. It is probably built on traditional machine learning.
I have used LLMs for some simple text generation for what I’m going to call boilerplate, eg why $X is important at the start of a reference architecture. But maybe it saved me an hour or two in a topic I was already fairly familiar with. Not something I would have paid a meaningful sum for. I’m sure I could have searched and found an article on the topic.
> Perhaps the only reason Cursor is so good is because editing code is so similar to the basic function of an LLM without anything wrapped around it.
I think this is an illusion. Firstly, code generation is a big field - it includes code completion, generating entire functions, and even agenting coding and the newer vibe-coding tools which are mixes of all of these. Which of these is "the natural way LLMs work"?
Secondly, a ton of work goes into making LLMs good for programming. Lots of RLHF on it, lots of work on extracting code structure / RAG on codebases, many tools.
So, I think there are a few reasons that LLMs seem to work better on code:
1. A lot for work on it has been done, for many reasons, mostly monetary potential and that the people who build these systems are programmers.
2. We here tend to have a lot more familiarity with these tools (and this goes to your request above which I'll get to).
3. There are indeed many ways in which LLMs are a good fit for programming. This is a valid point, though I think it's dwarfed by the above.
Having said all that, to your request, I think there are a few products and/or areas that we can point to that are transformative:
1. Deep Research. I don't use it a lot personally (yet) - I have far more familiarity with the software tools, because I'm also a software developer. But I've heard from many people now that these are exceptional. And they are not just "thing wrappers on chat", IMO.
2. Anything to do with image/video creation and editing. It's arguable how much these count as part of the LLM revolution - the models that do these are often similar-ish in nature but geared towards images/videos. Still, the interaction with them often goes through natural language, so I definitely think these count. These are a huge category all on their own.
3. Again, not sure if these "count" in your estimate, but AlphaFold is, as I understand it, quite revolutionary. I don't know much about the model or the biology, so I'm trusting others that it's actually interesting. It is some of the same underlying architecture that makes up LLMs so I do think it counts, but again, maybe you want to only look at language-generating things specifically.
1. Deep Research (if you are talking about the OpenAI product) is part of the base AI product. So that means that everything building on top of that is still a wrapper. In other words, nobody besides the people making base AI technology is adding any value. An analogy to how pathetic the AI market is would be if during the SaaS revolution everyone just didn’t need to buy any applications and directly used AWS PaaS products like RDS directly with very similar results compared to buying SaaS software. OpenAI/Gemini/Claude/etc are basically as good as a full blown application that leverage their technology and there’s very limited need to buy wrappers that go around them.
2. Image/video creation is cool but what value is it delivering so far? Saving me a couple of bucks that I would be spending on Fiverr for a rough and dirty logo that isn’t suitable for professional use? Graphic designers are already some of the lowest paid employees at your company so “almost replacing them but not really” isn’t a very exciting business case to me. I would also argue that image generation isn’t even as valuable as the preceding technology, image recognition. The biggest positive impact I’ve seen involves GPU performance for video games (DLSS/FSR upscaling and frame generation).
3. Medical applications are the most exciting application of AI and ML. This example is something that demonstrates what I mean with my argument: the normal steady pace of AI innovation has been “disrupted” by LLMs that have added unjustified hype and investment to the space. Nobody was so unreasonably hyped up about AI until it was packaged as something you can chat with since finance bro investors can understand that, but medical applications of neural networks have been developing since long before ChatGPT hit the scene. The current market is just a fever dream of crappy LLM wrappers getting outsized attention.
LLMs make all sorts of classification problems vastly easier and cheaper to solve.
Of course, that isn't a "transformative AI product", just a regular old product that improves your boring old business metrics. Nothing to base a hype cycle on, sadly.
Agree 100%.
We built a very niche business around data extraction & classification of a particular type of documents. We did not have access to a lot of sample data. Traditional ML/AI failed spectacularly.
LLMs have made this super easy and the product is very successful thanks to it. Customers love it. It is definitely transformative for them.
Is Cursor actually good though? I get so frustrated at how confidently it spews out the completely wrong approach.
When I ask it to spit out Svelte config files or something like that, I end up having to read the docs myself anyway because it can’t be trusted, for instance it will spew out tons of lines to configure every parameter as something that looks like the default when all it needs to do is follow the documentation that just uses defaults()
And it goes out of its way to “optimise” things that actually picks the wrong options versus the defaults which are fine.
This challenge is a little unfair. Chat is an interface not an application.
LLMs in data pipelines enable all sorts of “before impossible” stuff. For example, this creates an event calendar for you based on emails you have received:
https://www.indexself.com/events/molly-pepper
(that’s mine, and is due a bugfix/update this week. message me if you want to try it with your own emails)
I have a couple more LLM-powered apps in the works, like next few weeks, that aren’t chat or code. I wouldn’t call them transformative, but they meet your other criteria, I think.
> This demo uses AI to read emails instead of write them
LLMs are so good at summarizing that I should basically only ever read one email—from the AI:
You received 2 emails today that need your direct reply from X and Y. 1 is still outstanding from two days ago, _would you like to send an acknowledgment_? You received 6 emails from newsletters you didn’t sign up for but were enrolled after you bought something _do you want to unsubscribe from all of them_ (_make this a permanent rule_).
I have fed LLMs PDF files, asked about the content and gotten nonsense. I would be very hesitant to trust them to give me an accurate summary of my emails.
One of our managers uses Ai to summarize everything. Too bad it missed important caveats for an offer. Well, we burned an all nighters to correct the offer, but he did not read twenty pages but one...
If I get a technical email I read it myself. The summary just needs to say technical email from X with priority Y about problem Z
> LLMs are so good at summarizing that I should basically only ever read one email—from the AI
This could get really fun with some hidden text prompt injection. Just match the font and background color.
Maybe these tools should be doing the classic air gap approach of taking a picture of the rendered content and analyzing that.
What system are you using to do this? I do think that this would provide value for me. Currently, I barely read my emails, which I'm not exactly proud of, but it's just the reality. So something that summarized the important things every day would be nice.
I fed an LLM the record of a chat between me and a friend, and asked it to summarize the times that we met in the past 3 months.
Every time it gave me different results, and not once did it actually get it all right.
LLMs are horrible for summarizing things. Summarizing is the art of turning low information density text into high information density text. LLMs can’t deal in details, so they can never accurately summarize anything.
What is the reason to unsub ever in that world? Are you saying the LLM can't skip emails? Seems like an arbitrary rule
I enjoy Claude as a general purpose "let's talk about this niche thing" chat bot, or for general ideation. Extracting structured data from videos (via Gemini) is quite useful as well, though to be fair it's not a super frequent use case for me.
That said, coding and engineering is by far the most common usecase I have for gen AI.
Oh, I'm sorry if it wasn't clear. I use Claude and ChatGPT to talk to about a ton of topics. I'm mostly referring to AI features being added to existing SaaS or software products. I regularly find that moving the conversation to ChatGPT or Claude is much better than trying to use anything that they may have built into their existing product.
I think the other application besides code copiloting that is already extremely useful is RAG-based information discovery a la Notion AI. This is already a giant improvement over "search google docs, and slack, and confluence, and jira, and ...".
Just integrated search over all the various systems at a company was an improvement that did not require LLMs, but I also really like the back and forth chat interface for this.
I find that ChatGPT o3 (and the other advanced reasoning models) are decently good at answering questions with a "but".
Google is great at things like "Top 10 best rated movies of 2024", because people make lists of that sort of thing obsessively.
But Google is far less good at queries like "Which movies look visually beautiful but have been critically panned?". For that sort of thing I have far more luck with chatgpt because it's much less of a standard "top 10" list.
o3 has been a big improvement on Deep Research IMHO. o1 (or whatever model I originally used with it) was interesting but the results weren't always great. o3 has done some impressive research tasks for me and, unlike the last model I used, when I "check its work" it has always been correct.
I wonder sometime if this is why there is such an enthusiasm gap over AI between tech people and the general public. It's not just that your average person can't program; it's that they don't even conceptually understand why programming could unlock.
I like perplexity when I need a quick overview of a topic with references to relevant published studies. I often use it when researching what the current research says on parenting questions or education. It's not perfect but because the answers link to the relevant studies it's a good way to get a quick overview of research on a given topic
Have you ever been cooking and asked Siri to set a timer? That's basically the most used AI feature outside of "coding" I can think of.
Setting a timer and setting a reminder. Occasionally converting units of measure. That's all I can rely on Siri (or Alexa) for and even then sometimes Siri doesn't make it clear if it did the thing. Most importantly, "set a reminder", it shows the text, and then the UI disappears, sometimes the reminder was created, sometimes not. It's maddening since I'm normally asking to be reminded about something important that I need to get recorded/tracked so I can "forget" it.
The number of times I've had 2 reminders fire back-to-back because I asked Siri again to create one since I was _sure_ it didn't create the first one.
Siri is so dumb and it's insane that more heads have not rolled at Apple because of it (I'm aware of the recent shakeup, it's about a decade too late). Lastly, whoever decided to ship the new Siri UI without any of the new features should lose their job. What a squandered opportunity and effectively fraud IMHO.
More and more it's clear that Tim Cook is not the person that Apple needs at the helm. My mom knows Siri sucks, why doesn't the CEO and/or why is he incapable of doing anything to fix it. Get off your Trump-kissing, over-relying-on-China ass and fix your software! (Siri is not the only thing rotten)
Honestly I don't even enjoy coding AI features. The only value I get out of AI is translation (which I take with a grain of salt because I don't know the other language and can't spot hallucinations, but it's the best tool I have), and shitposting (e.g. having chatGPT write funny stories about my friends and sending it to them for a laugh). I can't say there's an actual productive use case for me personally.
I've anecdotally tested translations by ripping the video with subtitles and having whisper subtitle it, and also asking several AI to translate the .srt or .vtt file (subtotext I think does this conversion if you don't wanna waste tokens on the metadata)
Whisper large-v3, the largest model I have, is pretty good, getting nearly identical translations to chatgpt or whatever, Google's default speech to text. The fun stuff is when you ask for text to text translations from LLMs.
I did a real small writeup with an example but I don't have a place to publish nor am I really looking for one.
I used whisper to transcribe nearly every "episode" of the Love Line syndicated radio show from 1997-2007 or so. It took, iirc, several days. I use it to grep the audio, as it were. I intend to do the same with my DVDs and such, just so I never have to Google "what movie / tv show is that line from?" I also have a lot of art bell shows, and a few others to transcribe.
> I used whisper to transcribe nearly every "episode" of the Love Line syndicated radio show from 1997-2007 or so.
Yes - second this. I found 'Whisper' great for that type of scenario as well.
A local monastery had about 200 audio talks (mp3). Whisper converted them all to text and GPT did a small 'smoothing' of the output to make it readable. It was about half a million words and only took a few hours.
The monks were delighted - they can distribute their talks in small pamplets / PDFs now and is extra income for the community.
Years ago as a student I did some audio transcription manually and something similar would have taken ages...
I actually was asked by Vermin Supreme to hand-caption some videos, and i instantly regretted besmirching the existing subtitles. I was correct, the subtitles were awful, but boy, the thought of hand-transcribing something with Subtitle Edit had me walking that back pretty quick - and this was for a 4 minute video - however it was lyrical over music, so AI barely gave a starting transcription.
According to the Common Voice 15 graph on OpenAI's github repository, Albanian is the single worst performance you could have had: https://github.com/openai/whisper
But for what it's worth, I tried putting the YouTube video of Tom Scott presenting at the Royal Institute into the model, and even then the results were only "OK" rather than "good". When even a professional presenter and professional sound recording in a quiet environment has errors, the model is not really good enough to bother with.
I really like my speech-to-text program, and I find using ChatGPT to look up things and answer questions is a much superior experience to Google, but otherwise, I completely agree with you.
Companies see that AI is a buzzword that means your stock goes up. So they start looking at it as an answer to the question: "How can I make my stock go up?" instead of "How can I create a better product", and then let the stock go up from creating a better product.
> Auto completing a sentence for the next word in Gmail/iMessage is one example
Interestingly, I despise that feature. It breaks the flow of what is actually a very simple task. Now I'm reading, reconsidering if the offered thing is the same thing I wanted over and over again.
The fact that I know this and spend time repeatedly disabling the damned things is awfully tiresome (but my fault for not paying for my own email etc etc)
I've been using Fastmail in lieu of gmail for ten or eleven years. If you have a domain and control the DNS, I recommend it. At least you're not on Google anymore, and you're paying for fastmail, so it feels better - less like something is reading your emails.
Strava employees claim that casual users like the AI activity summaries. Supposedly users who don't know anything about exercise physiology didn't know how to interpret the various metrics and charts. I don't know if I believe that but it's at least plausible.
Personally I wish I could turn off the AI features, it's a waste of space.
Anytime someone from a company says that users like the super trendy thing they just made I take it with a sizeable grain of salt. Sometimes it's true, and maybe it is true for Strava, but I've seen enough cases where it isn't to discount such claims down to ~0.
At this point, "we aren't adding any AI features" is a selling point for me. I've gotten real tired of AI slop and hype.
I use AI chatbots for 2+ hours a day but the Garmin thing was too much for me. The day they released their AI Garmin+ subscription, I took off my Forerunner and put it in a drawer. The whole point of Garmin is that it feels emotionally clean to use. Garmin adding a scammy subscription makes the ecosystem feel icky, and I'm not going to wear a piece of clothing that makes me feel icky. I don't think I'll buy a Garmin watch again.
(Since taking off the watch, I miss some of the data but my overall health and sleep haven't changed.)
> I’m actually having a really hard time thinking of an AI feature other than coding AI feature that I actually enjoy.
If you attend a lot of meetings, having an AI note-taker take notes for you and generate a structured summary, follow-up email, to-do list, and more will be an absolute game changer.
(Disclaimer, I'm the CTO of Leexi, an AI note-taker)
The catch is: does anyone actually read this stuff? I've been taking meeting notes for meetings I run (without AI) for around 6 months now and I suspect no one other than myself has looked at the notes I've put together. I've only looked back at those notes once or twice.
A big part of the problem is even finding this content in a modern corporate intranet (i.e. Confluence) and having a bunch of AI-generated text in there as well isn't going to help.
When I was a founding engineer at a(n ill-fated) startup, we used an AI product to transcribe and summarize enterprise sales calls. As a dev it was usually a waste of my time to attend most sales meetings, but it was highly illustrative to read the summaries after the fact. In fact many, many of the features we built were based on these action items.
If you're at the scale where you have corporate intranet, like Confluence, then yeah AI note summarizing will feel redundant because you probably have the headcount to transcribe important meetings (e.g. you have a large enough enterprise sales staff that part of their job description is to transcribe notes from meetings rather than a small staff stretched thin because you're on vanishing runway at a small startup.) Then the natural next question arises: do you really need that headcount?
What is the problem?
Notes are valuable for several reasons.
I sometimes take notes myself just to keep myself from falling asleep in an otherwise boring meeting where I might need to know something shared (but probably not). It doesn't matter if nobody reads these as the purpose wasn't to be read.
I have often wished for notes from some past meeting because I know we had good reasons for our decisions but now when questioned I cannot remember them. Most meetings this doesn't happen, but if there were automatic notes that were easy to search years latter that would be good.
Of course at this point I must remind you that the above may be bad. If there is a record of meeting notes then courts can subpoena them. This means meetings with notes have to be at a higher level were people are not comfortably sharing what every it is they are thinking of - even if a bad idea is rejected the courts still see you as a jerk for coming up with the bad idea.
I agree, and my vision of this is that instead of notes, the meeting minutes would be catalogued into a vector store, indexed by all relevant metadata. And then instead of pre-generated notes, you'll get what you want on the fly, with the LLM being the equivalent of chatting with that coworker who's been working there forever and has context on everything.
Is Leexi's AI note-taker able to raise its hand in a meeting (or otherwise interrupt) and ask for clarification?
As a human note-taker, I find the most impactful result of real-time synthesis is the ability to identify and address conflicting information in the moment. That ability is reliant on domain knowledge and knowledge of the meeting attendees.
But if the AI could participate in the meeting in real time like I can, it'd be a huge difference.
If you are attending the meeting as well as using an AI note-taker, then you should be able to ask the clarifying question(s). If you understand the content, then you should understand the AI notes (hopefully), and if you ask for clarification, then the AI should add those notes too.
Your problem really only arises if someone is using the AI to stand in for them at the meeting vs. use it to take notes.
I'll pretend you asked a few questions instead of explaining my work to me without understanding.
1. "Why can't you look at the AI notes during the meeting?" The AI note-takers that I've seen summarize the meeting transcript after the meeting. A human note-taker should be synthesizing the information in real-time, allowing them to catch disagreements in real-time. Not creating the notes until after the meeting precludes real-time intervention.
2. "Why not use [AI Note-taker whose notes are available during the meeting]?" Even if there were a real-time synthesis by AI, I would have to keep track of that instead of the meeting in order to catch the same disagreements a human note-taker would catch.
3. "What problem are you trying to solve?" My problem is that misunderstandings are often created or left uncorrected during meetings. I think this is because most people are thinking about the meeting topics from their perspective, not spending time synthesizing what others are saying. My solution to this so far has been human note-taking by a human familiar with the meeting topic. This is hard to scale though, so I'm curious to see if this start-up is working on building a note-taking AI with the benefits I've mentioned seem to be unique to humans (for now).
But that isn't writing for me, it is taking notes for me. There is a difference. I don't need something to write for me - I know how to write. What I need is someone to clean up grammar, fact check the details, and otherwise clean things up. I have dysgraphia - a writing disorder - so I need help more than most, but I still don't need something to write my drafts for me: I can get that done well enough.
In my company have a few "summaries" made by Zoom neural net, which we share for memes on the joke chats, they are so hilariously bad. No one uses that functionality seriously. I don't know about your app, but I've yet to see a working note taker in the wild.
I've used multiple of these types of services and I'll be honest, I just don't really get the value. I'm in a ton of meetings and I run multiple teams but I just take notes myself in the meetings. Every time I've compared my own notes to the notes that the the AI note taker took, it's missing 0-2 critical things or it focuses on the wrong thing in the meeting. I've even had the note taker say essentially the opposite of what we decided on because we flip-flopped multiple times during the meeting.
Every mistake the AI makes is completely understandable, but it's only understandable because I was in the meeting and I am reviewing the notes right after the meeting. A week later, I wouldn't remember it, which is why I still just take my own notes in meetings. That said, having having a recording of the meeting and or some AI summary notes can be very useful. I just have not found that I can replace my note-taking with an AI just yet.
One issue I have is that there doesn't seem to be a great way to "end" the meeting for the note taker. I'm sure this is configurable, but some people at work use Supernormal and I've just taken to kicking it out of of meetings as soon as it tries to join. Mostly this is because I have meetings that run into another meeting, and so I never end the Zoom call between the meetings (I just use my personal Zoom room for all meetings). That means that the AI note taker will listen in on the second meeting and attribute it to the first meeting by accident. That's not the end of the world, but Supernormal, at least by default, will email everyone who was part of the the meeting a rundown of what happened in the meeting. This becomes a problem when you have a meeting with one group of people and then another group of people, and you might be talking about the first group of people in the second meeting ( i.e. management issues). So far I have not been burned badly by this, but I have had meeting notes sent out to to people that covered subjects that weren't really something they needed to know about or shouldn't know about in some cases.
Lastly, I abhor people using an AI notetaker in lieu of joining a meeting. As I said above, I block AI note takers from my zoom calls but it really frustrates me when an AI joins but the person who configured the AI does not. I'm not interested in getting messages "You guys talked about XXX but we want to do YYY" or "We shouldn't do XXX and it looks like you all decided to do that". First, you don't get to weigh in post-discussion, that's incredibly rude and disrespectful of everyone's time IMHO. Second, I'm not going to help explain what your AI note taker got wrong, that's not my job. So yeah, I'm not a huge fan of AI note takers though I do see where they can provide some value.
We've had the built-in Teams summary AI for a while now and it absolutely misses important details and nuance that causes problems later.
You do you.
I attend a lot of meetings and I have reviewed the results of an AI note taker maybe twice ever. Getting an email with a todo-list saves a bit of time of writing down action items during a meeting, but I'd hardly consider it a game changer. "Wait, what'd we talk about in that meeting" is just not a problem I encounter often.
My experience with AI note takers is that they are useful for people who didn't attend the meeting and people who are being onboarded and want to be able to review what somebody was teaching them in the meeting and much much much less useful for other situations.
I'm not a CTO so maybe your wold is not my world, but for me the advantage of taking the notes myself is that only I know what's important to me, or what was news to me. Teams Premium - you can argue it's so much worse than your product - takes notes like "they discussed about the advantages of ABC" but maybe exactly those advantages are advantageous to know right? And so on. Then like others said, I will review my notes once to see if there's a followup, or a topic to research, and off they go to the bin. I have yet to need the meeting notes of last year. Shortly put: notes apps are to me a solution in search of a problem.
At the end of the day, it comes down to one thing: knowing what you want. And AI can’t solve that for you.
We’ve experimented heavily with integrating AI into our UI, testing a variety of models and workflows. One consistent finding emerged: most users don’t actually know what they want to accomplish. They struggle to express their goals clearly, and AI doesn’t magically fill that gap—it often amplifies the ambiguity.
Sure, AI reduces the learning curve for new tools. But paradoxically, it can also short-circuit the path to true mastery. When AI handles everything, users stop thinking deeply about how or why they’re doing something. That might be fine for casual use, but it limits expertise and real problem-solving.
So … AI is great—but the current diarrhea of “let’s just add AI here” without thinking through how it actually helps might be a sign that a lot of engineers have outsourced their thinking to ChatGPT.
> They struggle to express their goals clearly, and AI doesn’t magically fill that gap—it often amplifies the ambiguity.
One surprising thing I've learned is that a fast feedback loop like this:
1. write a system prompt 2. watch the agent do the task, observe what it gets wrong 3. update the system prompt to improve the instructions
is remarkably useful in helping people write effective system prompts. Being able to watch the agent succeed or fail gives you realtime feedback about what is missing in your instructions in a way that anyone who has ever taught or managed professionally will instantly grok.
What I've found with agents is that they stray from the task and even start to flip flop on implementations, going back and forth on a solution. They never admit they don't know something and just brute force a solution even though the answer cannot be found without trial and error or actually studying the problem. I repeatedly fall back to reading the docs and just finishing the job myself as the agent just does not know what to do.
I think you're missing step 3! A key part of building agents is seeing where they struggling and improving performance in either the prompting or the environment.
There are a lot of great posts out there about how to structure an effective prompt. One thing they all agree on is to break down reasoning steps the agent should follow relevant to your problem area. I think this is relevant to what you said about brute forcing a solution rather than studying the problem.
In the agent's environment there's a fine balance to achieve between enough tools and information to solve any appropriate task, and too many tools/information that it'll frequently get lost down the wrong path and fail to come up with a solution. This is also something that you'll iteratively improve by observing the agent's behavior and adapting.
I have also experienced this in the specific domain of well-learned idiots finding pseudo-explanations for why a technical choice should be taken, despite not knowing anything about the topic.
I have witnessed a colleague look up a component datasheet on ChatGPT and repeating whatever it told him (despite the points that it made weren't related to our use case). The knowledge monopoly in about 10 years when the old-guard programming crowd finally retires and/or unfortunately dies will be in the hands of people that will know what they don't know and be able to fill the gaps using appropriate information sources (including language models). The rest will probably resemble Idiocracy on a spectrum from frustrating to hilarious.
In the process of finding out what customers or a PM/PO wants, developers ask clarifying questions given an ambiguous start. An AI could be made to also ask these questions. It may do this reasonably better than some engineers by having access to a ton of questions in its training data.
By using an AI, you might be making a reasonable guess that your problem has been solved before, but maybe not the exact details. This is true for a lot of technical tasks as I don't need to reinvent database access from first principles for every project. I google ORMs or something in my particular language and consider the options.
Even if the AI doesn't give you a direct solution, it's still a prompt for your brain as if you were in a conversation.
Just want to say the interactive widgets being actually hooked up to an LLM was very fun.
To continue bashing on gmail/gemini, the worst offender in my opinion is the giant "Summarize this email" button, sitting on top of a one-liner email like "Got it, thanks". How much more can you possibly summarize that email?
Thank you! @LewisJEllis and I wrote a little framework for "vibe writing" that allows for writing in markdown and adding vibe-coded react components. It's a lot of fun to use!
It was mind blowing seeing the picture I had in my head appear on the page for e.g. this little prompt diagram:
https://koomen.dev/essays/horseless-carriages/#system-prompt...
MDX & claude are remarkably useful for expressing ideas. You could turn this into a little web app and it would instantly be better than any word processor ever created.
Here's the code btw https://github.com/koomen/koomen.dev
Very nice example of an actually usefully interactive essay.
It is indeed a working demo, hitting
https://llm.koomen.dev/v1/chat/completions
in the OpenAI API format, and it responds to any prompt without filtering. Free tokens, anyone?More seriously, I think the reason companies don't want to expose the system prompt is because they want to keep some of the magic alive. Once most people understand that the universal interface to AI is text prompts, then all that will remain is the models themselves.
That's right. llm.koomen.dev is a cloudflare worker that forwards requests to openai. I was a little worried about getting DDOSed but so far that hasn't been an issue, and the tokens are ridiculously cheap.
Blog author seems smart (despite questionable ideas about how much real world users would want to interact with any of his elaborate feature concepts), you hope he's actually just got a bunch of responses cached and you're getting a random one each time from that endpoint... and that freely sent content doesn't actually hit OpenAI's APIs.
It's like the memes where people in the future will just grunt and gesticulate at the computer instead.
I used that button in Outlook once and the summary was longer than the original email
A lot of people assume that AI naturally produces this predictable style writing but as someone who has dabbled in training a number of fine tunes that's absolutely not the case.
You can improve things with prompting but can also fine tune them to be completely human. The fun part is it doesn't just apply to text, you can also do it with Image Gen like Boring Reality (https://civitai.com/models/310571/boring-reality) (Warning: there is a lot of NSFW content on Civit if you click around).
My pet theory is the BigCo's are walking a tightrope of model safety and are intentionally incorporating some uncanny valley into their products, since if people really knew that AI could "talk like Pete" they would get uneasy. The cognitive dissonance doesn't kick in when a bot talks like a drone from HR instead of a real person.
> My pet theory is the BigCo's are walking a tightrope of model safety and are intentionally incorporating some uncanny valley into their products, since if people really knew that AI could "talk like Pete" they would get uneasy. The cognitive dissonance doesn't kick in when a bot talks like a drone from HR instead of a real person.
FTR, Bruce Schneier (famed cryptologist) is advocating for such an approach:
We have a simple proposal: all talking AIs and robots should use a ring modulator. In the mid-twentieth century, before it was easy to create actual robotic-sounding speech synthetically, ring modulators were used to make actors’ voices sound robotic. Over the last few decades, we have become accustomed to robotic voices, simply because text-to-speech systems were good enough to produce intelligible speech that was not human-like in its sound. Now we can use that same technology to make robotic speech that is indistinguishable from human sound robotic again. — https://www.schneier.com/blog/archives/2025/02/ais-and-robot...
Reminds me of the robot voice from The Incredibles[1]. It had an obviously-robotic cadence where it would pause between every word. Text-to-speech at the time already knew how to make words flow into each other, but I thought the voice from The Incredibles sounded much nicer than the contemporaneous text-to-speech bots, while also still sounding robotic.
That doesn't sound like ring modulation in a musical sense (IIRC it has a modulator above 30 Hz, or inverts the signal instead of attenuating?), so much as crackling, cutting in and out, or an overdone tremolo effect. I checked in Audacity and the signal only gets cut out, not inverted.
> but can also fine tune them to be completely human
what does this mean? that it will insert idiosyncratic modifications (typos, idioms etc)?
I think a big problem is that the most useful AI agents essentially go unnoticed.
The email labeling assistant is a great example of this. Most mail services can already do most of this, so the best-case scenario is using AI to translate your human speech into a suggestion for whatever format the service's rules engine uses. Very helpful, not flashy: you set it up once and forget about it.
Being able to automatically interpret the "Reschedule" email and suggest a diff for an event in your calendar is extremely useful, as it'd reduce it to a single click - but it won't be flashy. Ideally you wouldn't even notice there's a LLM behind it, there's just a "confirm reschedule button" which magically appears next to the email when appropriate.
Automatically archiving sales offers? That's a spam filter. A really good one, mind you, but hardly something to put on the frontpage of today's newsletters.
It can all provide quite a bit of value, but it's simply not sexy enough! You can't add a flashy wizard staff & sparkles icon to it and charge $20 / month for that. In practice you might be getting a car, but it's going to look like a horseless carriage to the average user. They want Magic Wizard Stuff, not invest hours into learning prompt programming.
> Most mail services can already do most of this
I'll believe this when I stop spending so much time deleting email I don't want to read.
Yeah but I'm looking forward to the point where this is not longer about trying to be flashy and sexy, but just quietly using a new technology for useful things that it's good at. I think things are headed that direction pretty quickly now though! Which is great.
Honestly? I think the AI bubble will need to burst first. Making the rescheduling of appointments and dozens of tasks like that slightly more convenient isn't a billion-dollar business.
I don't have a lot of doubt that it is technically doable, but it's not going to be economically viable when it has to pay back hundreds of billions of dollars of investments into training models and buying shiny hardware. The industry first needs to get rid of that burden, which means writing off the training costs and running inference on heavily-discounted supernumerary hardware.
I cannot remember which blogging platform shows you the "most highlighted phrase", but this would be mine:
> The email I'd have written is actually shorter than the original prompt, which means I spent more time asking Gemini for help than I would have if I'd just written the draft myself. Remarkably, the Gmail team has shipped a product that perfectly captures the experience of managing an underperforming employee.
This paragraph makes me think of the old Joel Spolsky blog post that he probably wrote 20+ years ago about his time in the Israeli Defence Forces, explaining to readers how showing is more impactful than telling. I feel like this paragraph is similar. When you have a low performer, you wonder to yourself, in the beginning, why does it seem like I spend more time explaining the task than the low performer spends to complete it!?Loved the fact that the interactive demos were live.
You could even skip the custom system prompt entirely and just have it analyze a randomized but statistically-significant portion of the corpus of your outgoing emails and their style, and have it replicate that in drafts.
You wouldn't even need a UI for this! You could sell a service that you simply authenticated to your inbox and it could do all this from the backend.
It would likely end up being close enough to the mark that the uncanny valley might get skipped and you would mostly just be approving emails after reviewing them.
Similar to reviewing AI-generated code.
The question is, is this what we want? I've already caught myself asking ChatGPT to counterargue as me (but with less inflammatory wording) and it's done an excellent job which I've then (more or less) copy-pasted into social-media responses. That's just one step away from having them automatically appear, just waiting for my approval to post.
Is AI just turning everyone into a "work reviewer" instead of a "work doer"?
honestly you could try this yourself today. Grab a few emails, paste them into chatgpt, and ask it to write a system prompt that will write emails that mimic your style. Might be fun to see how it describes your style.
to address your larger point, I think AI-generated drafts written in my voice will be helpful for mundane, transaction emails, but not for important messages. Even simple questions like "what do you feel like doing for dinner tonight" could only be answered by me, and that's fine. If an AI can manage my inbox while I focus on the handful of messages that really need my time and attention that would be a huge win in my book.
It all depends on how you use it, doesn't it?
A lot of work is inherently repetitive, or involves critical but burdensome details. I'm not going to manually write dozens of lines of code when I can do `bin/rails generate scaffold User name:string`, or manually convert decimal to binary when I can access a calculator within half a second. All the important labor is in writing the prompt, reviewing the output, and altering it as desired. The act of generating the boilerplate itself is busywork. Using a LLM instead of a fixed-functionality wizard doesn't change this.
The new thing is that the generator is essentially unbounded and silently degrades when you go beyond its limits. If you want to learn how to use AI, you have to learn when not to use it.
Using AI for social media is distinct from this. Arguing with random people on the internet has never been a good idea and has always been a massive waste of time. Automating it with AI just makes this more obvious. The only way to have a proper discussion is going to be face-to-face, I'm afraid.
The live demos were neat! I was playing around with "The Pete System Prompt", and one of the times, it signed the email literally "Thanks, [Your Name]" (even though Pete was still right there in the prompt).
Just a reminder that these things still need significant oversight or very targeted applications, I suppose.
It's what we want, though, isn't it? AI should make our lives easier, and it's much easier (and more productive) to review work already done than to do it yourself. Now, if that is a good development morally/spiritually for the future of mankind is another question... Some would argue industrialization was bad in that respect and I'm not even sure I fully disagree
> and it's much easier (and more productive) to review work already done than to do it yourself
This isn't the tautology you imagine it to be.
Consider the example given here of having AI write one line draft response to emails. To validate such response, you have to: (1) read the original email, (2) understand it, (3) decide what you want to communicate in your reply, then (4) validate that the suggested draft communicates the same.
If the AI gave a correct answer, you saved yourself from typing one sentence, which you probably already formulated in your head in step (3). A minor help, at best.
But if the AI was wrong, you now have to write that reply yourself.
To get positive expected utility from the above scenario, you'd need the probability of the AI to be correct extremely high, and even then, the savings would be small.
A task that requires more effort to turn ideas into deliverables would have better expectation, but complex tasks often have results that are not simple nor easy to check, so the savings may not be as meaningful as you naively assume.
What is the point? The effort to write the email is equal to the effort to ask the AI to write the email for you. Only when the AI turns your unprofessional style into something professional is any effort saved - but the "professional" sounding style is most of the time wrong and should get dumped into junk.
Yeah, I'm with you on this one. Surely in most instances it is easier to just bash out the email plus you get the added bonus of exercising your own mind: vocabulary, typing skills, articulating concepts, defining appropriate etiquette. As the years role by I aiming to be more conscious and diligent with my own writing and communication, not less. If one extrapolates on the use of AI for such basic communication, is there a risk some of us lose our ability to meaningfully think for ourselves? The information space of the present day already feels like it is devolving; shorter and shorter content, lack of nuance, reductive messaging. Sling AI in as a mediator for one to one communication too and it feels perilous for social cohesion.
I've been doing something similar to the email automation examples in the post for nearly a decade. I have a much simpler statistical model categorize my emails, and for certain categories also draft a templated reply (for example, a "thanks but no thanks" for cold calls).
I can't take credit for the idea: I was inspired by Hilary Mason, who described a similar system 16 (!!) years ago[0].
Where AI improves is by making it more accessible: building my system required me knowing how to write code, how to interact with IMAP servers, a rudimentary understanding of statistical learning, and then I had to spend a weekend coding it, and even more hours spent since on tinkering with it and duck taping it. None of that effort was required to build the example in the post, and this is where AI really makes a difference.
I tread carefully with anyone that by default augments their (however utilitarian or conventionally bland) messages with language models passing them as their own. Prompting the agent to be as concise as you are, or as extensive, takes just as much time in the former case, and lacks the underlying specificity of your experience/knowledge in the latter.
If these were some magically private models that have insight into my past technical explanations or the specifics of my work, this would be a much easier bargain to accept, but usually, nothing that has been written in an email by Gemini could not have been conceived of by a secretary in the 1970s. It lacks control over the expression of your thoughts. It's impersonal, it separates you from expressing your thoughts clearly, and it separates your recipient from having a chance to understand you the person thinking instead of you the construct that generated a response based on your past data and a short prompt. And also, I don't trust some misandric f*ck not to sell my data before piping it into my dataset.
I guess what I'm trying to say is: when messaging personally, summarizing short messages is unnecessary, expanding on short messages generates little more than semantic noise, and everything in between those use cases is a spectrum deceived by the lack of specificity that agents usually present. Changing the underlying vague notions of context is not only a strangely contortionist way of making a square peg fit an umbrella-shaped hole, it pushes around the boundaries of information transfer in a way that is vaguely stylistic, but devoid of any meaning, removed fluff or added value.
Agreed! As i mentioned in the piece I don't think LLMs are very useful for original writing because instructing an agent to write anything from scratch inevitably takes more time than writing it yourself.
Most of the time I spend managing my inbox is not spent on original writing, however. It's spent on mundane tasks like filtering, prioritizing, scheduling back-and-forths, introductions etc. I think an agent could help me with a lot of that, and I dream of a world in which I can spend less time on email and finally be one of those "inbox zero" people.
On that topic I’m the founder of inbox zero: https://getinboxzero.com
May help you get half way there
The counter argument is some people are terrible at writing. Millions of people sit at the bottom of any given bell curve.
I’d never trust a summery from a current generation LLM for something as critical as my inbox. Some hypothetical drastically improved future AI, sure.
Smarter models aren't going to somehow magically understand what is important to you. If you took a random smart person you'd never met and asked them to summarize your inbox without any further instructions they would do a terrible job too.
You'd be surprised at how effective current-gen LLMs are at summarizing text when you explain how to do it in a thoughtful system prompt.
For the case of writing emails, I tend to agree though I think creative writing is an exception. Pairing with an LLM really helps overcome the blank page / writer's block problem because it's often easier to identify what you don't want and then revise all the flaws you see.
instructing an agent to write anything from scratch inevitably takes more time than writing it yourself
But you can reuse your instructions with zero additional effort. I have some instructions that I wrote for a 'Project' in Claude (and now a 'Gem' in Gemini). The instructions give writing guidelines for a children's article about a topic. So I just write 'write an article about cross-pollination' and a minute later I have an article I can hand to my son.Even if I had the subject matter knowledge, it would take me much longer to write an article with the type of style and examples that I want.
(Because you said 'from scratch', I deliberately didn't choose an example that used web search or tools.)
Why can’t the LLM just learn your writing style from your previous emails to that person?
Or a your more general style for new people.
It seems like Google at least should have a TONNE of context to use for this.
Like in his example emails about being asked to meet - it should be checking the calendar for you and putting in if you can / can’t or suggesting an alt time you’re free.
If it can’t actually send emails without permission there’s less harm with giving an LLM more info to work with - and it doesn’t need to get it perfect. You can always edit.
If it deals with the 80% of replies that don’t matter much then you have 5X more time to spend on the 20% that do matter.
> Why can’t the LLM just learn your writing style from your previous emails to that person?
It totally could. For one thing you could fine tune the model, but I don't think I'd recommend that. For this specific use case, imagine an addition to the prompt that says """To help you with additional context and writing style, here snippets of recent emails Pete wrote to {recipient}: --- {recent_email_snippets} """
I mean, everyone knows Google reads all your emails already right?
Writing an email with AI and having the recipient summarize it with AI is basically all the fun of jpeg compression, but more bandwidth instead of less.
>As I mentioned above, however, a better System Prompt still won't save me much time on writing emails from scratch.
>The thing that LLMs are great at is reading text and transforming it, and that's what I'd like to use an agent for.
Interestingly, the OP agrees with you here and noted in the post that the LLMs are better at transforming data than creating it.
I reread those paragraphs. I find the transformative effect of the email missing from the whole discussion. The end result of the inbox examples is to change some internal information in the mind of the recipient. Agent working within the context of the email has very little to contribute because it does not know the OP's schedule, dinner plans, whether he has time for the walk and talk or if he broke his ankle last week... I'd be personally afraid to have something rummaging in my social interface that can send (and let's be honest, idiots will CtrlA+autoreply their whole inboxes) invites, timetables, love messages etc. in my name. It has too many lemmas that need to be fulfilled before it can be assumed competent, and none of those are very well demonstrated. It's cold fusion technology. Feasible, should be nice if it worked, but it would really be a disappointment if someone were to use it in its current state.
I have a large part of that though. The computer (outlook today) just schedules meetings rooms for me ensuring there are not multiple different meetings in it at the same time. I can schedule my own flights.
When I first started working the company rolled out the first version of meeting scheduling (it wasn't outlook), and all the other engineers loved it - finally they could figure out how to schedule our own meetings instead of having the secretary do it. Apparently the old system was some mainframe based things other programmers couldn't figure out (I never worked with it so I can't comment on how it was). Likewise scheduling a plane ticket involved calling travel agents and spending a lot of time on hold.
If you are a senior executive you still have a secretary. However by the 1970s the secretary for most of us would be department secretary that handled 20-40 people not just our needs, and thus wasn't in tune with all those details. However most of us don't have any needs that are not better handled by a computer today.
I would too, but I would have to trust AI at least as much as a 1970s secretary not to mess up basic facts about myself or needlessly embellish/summarize my conversations with known correspondents. Comparing agents and past office cliches was not to imply agents do it and it's stupid; I'm implying agents claim to do it, but don't.
Aside from saving time, I'm bad at writing. Especially emails. I often open ChatGPT, paste in the whole email chain, write out the bullets of the points I want to make and ask it to draft a response which frames it well.
There's a whole lot of people who struggle to write professionally or when there's any sort of conflict (even telling your boss you won't come to work). It can be crippling trying to find the right wording and certainly take far longer than writing a prompt. AI is incredible for these people. They were never going to express their true feelings anyway and were just struggling to write "properly" or in a way that doesn't lead to misunderstandings. If you can just smash out good emails without a second thought, you wouldn't need it.
AI for writing or research is useful like a dice roll. Terence Tao famously showed how talking to an LLM gave him an idea/approach to a proof that he hadn't immediately thought of (but probably he would have considered it eventually). The other day I wrote an unusal, four-word neologism that I'm pretty sure no one has ever seen, and the AI immediately drew the correct connection to more standard terminology and arguments used, so I did not even have to expand/explain and write it out myself.
I don't know but I am considering the possibility that even for everyday tasks, this kind of exploratory shortcut can be a simple convenience. Furthermore, it is precisely the lack of context that enables LLMs to make these non-human, non-specific connective leaps, their weakness also being their strength. In this sense, they bode as a new kind of discursive common-ground--if human conversants are saying things that an LLM can easily catch then LLMs could even serve as the lowest-common-denominator for laying out arguments, disagreements, talking past each other, etc. But that's in principle, and in practice that is too idealistic, as long as these are built and owned as capitalist IPs.
I really don't get why people would want AI to write their messages for them. If I can write a concise prompt with all the required information, why not save everyone time and just send that instead ? And especially for messages to my close ones, I feel like the actual words I choose are meaningful and the process of writing them is an expression of our living interaction, and I certainly would not like to know the messages from my wife were written by an AI. On the other end of the spectrum, of course sometimes I need to be more formal, but these are usually cases where the precise wording matters, and typing the message is not the time-consuming part.
> If I can write a concise prompt with all the required information, why not save everyone time and just send that instead ?
This point is made multiple times in the article (which is very good; I recommend reading it!):
> The email I'd have written is actually shorter than the original prompt, which means I spent more time asking Gemini for help than I would have if I'd just written the draft myself. Remarkably, the Gmail team has shipped a product that perfectly captures the experience of managing an underperforming employee.
> As I mentioned above, however, a better System Prompt still won't save me much time on writing emails from scratch. The reason, of course, is that I prefer my emails to be as short as possible, which means any email written in my voice will be roughly the same length as the User Prompt that describes it. I've had a similar experience every time I've tried to use an LLM to write something. Surprisingly, generative AI models are not actually that useful for generating text.
People like my dad, who can't read, write, or spell to save his life, but was a very, very successful CPA, would love to use this. It would have replaced at least one of his office staff I bet. Too bad he's getting up there in age, and this newfangled stuff is difficult for him to grok. But good thing he's retired now and will probably never need it.
Well, you know this employment crisis all started when the wheel was invented and put all the porters out of work. Then tech came for lamplighters, ice cutters, knocker-uppers, switchboard operators, telegraph operators, human computers, video store clerks, bowling alley pinsetters, elevator operators, film developers, lamp lighters, coopers, wheelwrights, candle makers, weavers, plowmen, farriers, street sweepers. It's a wonder anyone still has a job, really.
Let's just put an AI in charge of the IRS and have it send us an actual bill which is apparently something that just too complicated for the current and past IRS to do./s
Edit: added /s because it wasn't apparent this was sarcastic
Shorter emails are better 99% of the time. No one's going to read a long email, so you should keep your email to just the most important points. Expanding out these points to a longer email is just a waste of time for everyone involved.
My email inbox is already filled with a bunch of automated emails that provide me no info and waste my time. The last thing I want is an AI tool that makes it easier to generate even more crap.
Definitely. Also, another thing that wastes time is when requests don't provide the necessary context for people to understand what's being asked for and why, causing them to spend hours on the wrong thing. Or when the nuance is left out of a nuanced good idea causing it to get misinterpreted and pattern-matched to a similar-sounding-but-different bad idea, causes endless back-and-forth misunderstandings and escalation.
Emails sent company-wide need to be especially short, because so many person-hours are spent reading them. Also, they need to provide the most background context to be understood, because most of those readers won't already share the common ground to understand a compressed message, increasing the risk of miscommunication.
This is why messages need to be extremely brief, but also not.
There was an HN topic less than a month ago or so where somebody wrote a blog post speculating that you end up with some people using AI to write lengthy emails from short prompts adhering to perfect polite form, while the other people use AI to summarize those blown-up emails back into the essence of the message. Side effect, since the two transformations are imperfect meaning will be lost or altered.
This is a plot point in a sci-fi story I'd read recently, though I cannot place what it was. Possibly in Cloud Atlas, or something by Liu Cixin.
In other contexts, someone I knew had written a system to generate automated emails in response to various online events. They later ran into someone who'd written automated processing systems to act on those emails. This made the original automater quite happy.
(Context crossed organisational / institutional boundaries, there was no explicit coordination between the two.)
It was more than a month ago, but perhaps this one:
https://news.ycombinator.com/item?id=42712143
How is AI in email a good thing?!
There's a cartoon going around where in the first frame, one character points to their screen and says to another: "AI turns this single bullet point list into a long email I can pretend I wrote".
And in the other frame, there are two different characters, one of them presumably the receiver of the email sent in the first frame, who says to their colleague: "AI makes a single bullet point out of this long email I can pretend I read".
The cartoon itself is the one posted above by PyWoody.
If that's the case, you can easily only write messages to your wife yourself.
But for the 99 other messages, especially things that mundanely convey information like "My daughter has the flu and I won't be in today", "Yes 2pm at Shake Shack sounds good", it will be much faster to read over drafts that are correct and then click send.
The only reason this wouldn't be faster is if the drafts are bad. And that is the point of the article: the models are good enough now that AI drafts don't need to be bad. We are just used to AI drafts being bad due to poor design.
I don't understand. Why do you need an AI for messages like "My daughter has the flu and I won't be in today" or "Yes 2pm at Shake Shack sounds good"? You just literally send that.
Do you really run these things through an AI to burden your reader with pointless additional text?
> But for the 99 other messages, especially things that mundanely convey information like "My daughter has the flu and I won't be in today", "Yes 2pm at Shake Shack sounds good", it will be much faster to read over drafts that are correct and then click send.
It takes me all of 5 seconds to type messages like that (I timed myself typing it). Where exactly is the savings from AI? I don't care, at all, if a 5s process can be turned into a 2s process (which I doubt it even can).
How would an AI know if "2pm at Shake Shake" works for me? I still need to read the original email and make a decision. The actual writing out the response takes me basically no time whatsoever.
An AI could read the email and check my calendar and then propose 2pm. Bonus if the AI works with his AI to figure out that 2pm works for both of us. A lot of time is wasted with people going back and forth trying to figure out when they can meet. That is also a hard problem even before you note the privacy concerns.
I sometimes use AI to write messages to colleagues. For example, I had a colleague who was confused about something in Zendesk. When they described the issue I knew it was because they (reasonably) didn't understand that 'views' aren't the same as 'folders'.
I could have written them a message saying "Zendesk has views, not folders [and figure out what I mean by that]", but instead I asked AI something like:
My colleague is confused about why assigning a ticket in Zendesk adds it to a view but doesn't remove it from a different view. I think they think the views are folders. Please write an email explaining this.
The clear, detailed explanation I got was useful for my colleague, and required little effort from me (after the initial diagnosis).Totally agree, for myself.
However, I do know people who are not native speakers, or who didn't do an advanced degree that required a lot of writing, and they report loving the ability to have it clean up their writing in professional settings.
This is fairly niche, and already had products targeting it, but it is at least one useful thing.
Cleaning up writing is very different from writing it. Lawyers will not have themselves as a client. I can write a novel or I can edit someone else's novel - but I am not nearly as good at editing my own novels as I would be editing someone else's. (I don't write novels, but I could. As for editing - you should get a better editor than me, but I'd be better than you doing it to your own writing)
When it's a simple data transfer, like "2 pm at shake shack sounds good", it's less useful. it's when we're doing messy human shit with deep feelings evoking strong emotions that it shines. when you get to the point where you're trading shitty emails to someone that you, at one point, loved, but are now just getting all up in there and writing some horrible shit. Writing that horrible shit helps you feel better, and you really want to send it, but you know it's not gonna be good, but you just send it anyway. OR - you tell ChatGPT the situation, and have it edit that email before you send it and have it take out the shittiness, and you can have a productive useful conversation instead.
the important point of communicating is to get the other person to understand you. if my own words fall flat for whatever reason, if there are better words to use, I'd prefer to use those instead.
"fuck you, pay me" isn't professional communication with a client. a differently worded message might be more effective (or not). spending an hour agonizing over what to say is easier spent when you have someone help you write it
The reason so many of these AI features are "horseless carriage" like is because of the way they were incentivized internally. AI is "hot" and just by adding a useless AI feature, most established companies are seeing high usage growth for their "AI enhanced" projects. So internally there's a race to shove AI in as quickly as possible and juice growth numbers by cashing in on the hype. It's unclear to me whether these businesses will build more durable, well-thought projects using AI after the fact and make actually sticky product offerings.
(This is based on my knowledge the internal workings of a few well known tech companies.)
That sounds about right to me. Massive opportunity for startups to reimagine how software should work in just about every domain.
Totally. I think the comparison between the two is actually very interesting and illustrative.
In my view there is significantly more there there with generative AI. But there is a huge amount of nonsense hype in both cases. So it has been fascinating to witness people in one case flailing around to find the meat on the bones while almost entirely coming up blank, while in the other case progressing on these parallel tracks where some people are mostly just responding to the hype while others are (more quietly) doing actual useful things.
To be clear, there was a period where I thought I saw a glimmer of people being on the "actual useful things" track in the blockchain world as well, and I think there have been lots of people working on that in totally good faith, but to me it just seems to be almost entirely a bust and likely to remain that way.
This happens whenever something hits the peak of the Gartner Hype Cycle. The same thing happened in the social network era (one could even say that the beloved Google Plus was just this for Google), the same thing happened in the mobile app era (Twitter was all about sending messages using SMS lol), and of course it happened during Blockchain as well. The question is whether durable product offerings emerge or whether these products are the throwaway me-too horseless carriages of the AI era.
Meta is a behemoth. Google Plus, a footnote. The goal is to be Meta here and not Google Plus.
For me posts like these go in the right direction but stop mid-way.
Sure, at first you will want an AI agent to draft emails that you review and approve before sending. But later you will get bored of approving AI drafts and want another agent to review them automatically. And then - you are no longer replying to your own emails.
Or to take another example where I've seen people excited about video-generation and thinking they will be using that for creating their own movies and video games. But if AI is advanced enough - why would someone go see a movie that you generated instead of generating a movie for himself. Just go with "AI - create an hour-long action movie that is set in ancient japan, has a love triangle between the main characters, contains some light horror elements, and a few unexpected twists in the story". And then watch that yourself.
Seems like many, if not all, AI applications, when taken to the limit, reduce the need of interaction between humans to 0.
Do you want an LLM writing and sending important messages for you? I don't, and I don't know anyone who does. I want to reduce time I spend managing my inbox, archiving stuff I don't need to read, endless scheduling back-and-forths, etc. etc.
> Sure, at first you will want an AI agent to draft emails that you review and approve before sending. But later you will get bored of approving AI drafts and want another agent to review them automatically.
This doesn't seem to me like an obvious next step. I would definitely want my reviewing step to be as simple as possible, but removing yourself from the loop entirely is a qualitatively different thing.
As an analogue, I like to cook dinner but I am only an okay cook -- I like my recipes to be as simple as possible, and I'm fine with using premade spice mixes and such. Now the simplest recipe is zero steps: I order food from a restaurant, but I don't enjoy that as much because it is (similar to having AI approve and send your emails without you) a qualitatively different experience.
> I order food from a restaurant, but I don't enjoy that as much because it is (similar to having AI approve and send your emails without you) a qualitatively different experience.
What do you like less about it? Is it the smells of cooking, the family checking on the food as it cooks, the joy of realizing your own handiwork?
For me, I think it's the act of control and creation -- I can put the things I like together and try new thing and experiment with techniques or ingredients, whereas ordering from a restaurant I'll only be seeing the end results from someone else's experimentation or experience.
I don't dislike restaurants, to be clear -- I love a dinner out. It just scratches a different itch than cooking a meal at home.
So here's where this all feels a bit "build me a better horse" to me.
You're telling an AI agent to communicate specific information on your behalf to specific people. "Tell my boss I can't come in today", "Talk to comcast about the double billing".
That's not abstracted away enough.
"My daughter's sick, rearrange my schedule." Let the agent handle rebooking appointments and figuring out who to notify and how. Let their agent figure out how to convey that information to them. "Comcast double-billed me." Resolve the situation. Communicate with Comcast, get it fixed, if they don't get it fixed, communicate with the bank or the lawyer.
If we're going to have AI agents, they should be AI agents, not AI chatbots playing a game of telephone over email with other people and AI chatbots.
Exactly. To be a useful assistant, it has to be more proactive than they're currently able to be.
Someone posted here about an AI assistant he wrote that sounded really cool. But when I looked at it, he had written a bunch of scripts that fetched things like his daily calendar appointments and the weather forecast, fed them to an AI to be worded in a particular way, and then emailed the results to him. So his scripts were doing all the work except wording the messages differently. That's a neat toy, but it's not really an assistant.
An assistant could be told, "Here's a calendar. Track my appointments, enter new ones I tell you about, and remind me of upcoming ones." I can script all that, but then I don't need the AI. I'm trying to figure out how to leverage AI to do something actually new in that area, and not having much luck yet.
Short reply:
I agree, it only goes half-way.
Elaboration:
I like the "horseless carriage" metaphor for the transitionary or hybrid periods between the extinction of one way of doing things and the full embrace of the new way of doing things. I use a similar metaphor: "Faster horses," which is exactly what this essay shows: You're still reading and writing emails, but the selling feature isn't "less email," it's "Get through your email faster."
Rewinding to the 90s, Desktop Publishing was a massive market that completely disrupted the way newspapers, magazines, and just about every other kind of paper was produced. I used to write software for managing classified ads in that era.
Of course, Desktop Publishing was horseless carriages/faster horses. Getting rid of paper was the revolution, in the form of email over letters, memos, and facsimiles. And this thing we call the web.
Same thing here. The better interface is a more capable faster horse. But it isn't an automobile.
> You're still reading and writing emails, but the selling feature isn't "less email," it's "Get through your email faster."
The next logical step is not using email (the old horse and carriage) at all.
You tell your AI what you want to communicate with whom. Your AI connects to their AI and their AI writes/speaks a summary in the format they prefer. Both AIs can take action on the contents. You skip the Gmail/Outlook middleman entirely at the cost of putting an AI model in the middle. Ideally the AI model is running locally not in the cloud, but we all know how that will turn out in practice.
Contact me if you want to invest some tens of millions in this idea! :)
Taking this a step farther; both AIs also deeply understand and advocate for their respective 'owner', so rather than simply exchanging a formatted message, they're evaluating the purpose and potential fit of the relationship writ large (for review by the 'owner' of course..). Sort of a preliminary discussion between executive assistants or sales reps -- all non-binding, but skipping ahead to the heart of the communication, not just a single message.
> > Seems like many, if not all, AI applications, when taken to the limit, reduce the need of interaction between humans to 0.
> Same thing here. The better interface is a more capable faster horse. But it isn't an automobile.
I'm over here in "diffusion / generative video" corner scratching my head at all the LLM people making weird things that don't quite have use cases.
We're making movies. Already the AI does things that used to cost too much or take too much time. We can make one minute videos of scale, scope, and consistency in just a few hours. We're in pretty much the sweet spot of the application of this tech. This essay doesn't even apply to us. In fact, it feels otherworldly alien to our experience.
Some stuff we've been making with gen AI to show you that I'm not bullshitting:
- https://www.youtube.com/watch?v=Tii9uF0nAx4
- https://www.youtube.com/watch?v=7x7IZkHiGD8
- https://www.youtube.com/watch?v=_FkKf7sECk4
Diffusion world is magical and the AI over here feels like we've been catapulted 100 years into the future. It's literally earth shattering and none of the industry will remain the same. We're going to have mocap and lipsync, where anybody can act as a fantasy warrior, a space alien, Arnold Schwarzenegger. Literally whatever you can dream up. It's as if improv theater became real and super high definition.
But maybe the reason for the stark contrast with LLMs in B2B applications is that we're taking the outputs and integrating them into things we'd be doing ordinarily. The outputs are extremely suitable as a drop-in to what we already do. I hope there's something from what we do that can be learned from the LLM side, but perhaps the problems we have are just so wholly different that the office domain needs entirely reinvented tools.
Naively, I'd imagine an AI powerpoint generator or an AI "design doc with figures" generator would be so much more useful than an email draft tool. And those are incremental adds that save a tremendous amount of time.
But anyway, sorry about the "horseless carriages". It feels like we're on a rocket ship on our end and I don't understand the public "AI fatigue" because every week something new or revolutionary happens. Hope the LLM side gets something soon to mimic what we've got going. I don't see the advancements to the visual arts stopping anytime soon. We're really only just getting started.
You make some very strong claims and presented material. I hope I am not out of line if I give you my sincere opinion. I am not doing this to be mean, to put you down or to be snarky. But the argument you're making warrants this response, in my opinion.
The examples you gave as "magical", "100 years into the future", "literally earth shattering" are very transparently low effort. The writing is pedestrian, the timing is amateurish and the jokes just don't land. The inflating tea cup with magically floating plate and the cardboard teabag are... bad. These are bad man. At best recycled material. I am sorry but as examples of why using automatically generated art they are making the opposite argument from what you think you're making.
I categorically do not want more of this. I want to see crafted content where talent shines through. Not low effort, automatically generated stuff like the videos in these links.
> Seems like many, if not all, AI applications, when taken to the limit, reduce the need of interaction between humans to 0.
This seems to be the case for most technology. Technology increasingly mediates human interactions until it becomes the middleman between humans. We have let our desire for instant gratification drive the wedge of technology between human interactions. We don't want to make small talk about the weather, we want our cup of coffee a few moments after we input our order (we don't want to relay our orders via voice because those can be lost in translation!). We don't want to talk to a cab driver we want a car to pick us up and drop us off and we want to mindlessly scroll in the backseat rather than acknowledge the other human a foot away from us.
Related short story: the whispering earring http://web.archive.org/web/20121008025245/http://squid314.li...
Great suggestion, thank you. It's appropriately short and more fitting than I anticipated. Specially the part about brain atrophy.
> AI applications, when taken to the limit, reduce the need of interaction between humans to 0. > But if AI is advanced enough - why would someone go see a movie that you generated instead of generating a movie for himself.
I would be the first to pay if we have a GenAI that does that.
For a long time I had a issue with a thing that I found out that was normal for other people that is the concept of dreaming.
For years I did not know what was about, or how looks like during the night have dreams about anything due to a light CWS and I really would love to have something in that regard that I could visualise some kind of hyper personalized move that I could watch in some virtual reality setting to help me to know how looks like to dream, even in some kind of awake mode.
I'm not sure? Are humans - at least sometimes - more creative?
Many sci-fi novels feature non-humans, but their cultures are all either very shallow (all orcs are violent - there is no variation at all in what any orc wants), or they are just humans with a different name and some slight body variation. (even the intelligent birds are just humans that fly). Can AI do better, or will it be even worse because AI won't even explore what orcs love for violent means for the rest of their cultures and nations.
The one movie set in Japan might be good, but I want some other settings once in a while. Will AI do that?
> Will AI do that?
No, never. AI is built on maximum likelihood under the hood, and "maximum likelihood" is another name for "stereotypes and cliches".
> Or to take another example where I've seen people excited about video-generation and thinking they will be using that for creating their own movies and video games. But if AI is advanced enough - why would someone go see a movie that you generated instead of generating a movie for himself
This seems like the real agenda/end game of where this kind of AI is meant to go. The people pushing it and making the most money from it disdain the artistic process and artistic expression because it is not, by default, everywhere, corporate friendly. An artist might get an idea that society is not fair to everyone - we can't have THAT!
The people pushing this / making the most money off of it feel that by making art and creation a commodity and owning the tools that permit such expression that they can exert force on making sure it stays within the bounds of what they (either personally or as a corporation) feel is acceptable to both the bottom line and their future business interests.
There are different agenda. Some want to make money or power upending the existing process. Making production cheaper.
There are people who want this want to make things currently unavailable to them. Taboo topics like casting your sister's best friend in your own x-rated movie.
There are groups who want to restrict this technology to match their worldview. All ai-movies must have a diverse cast or must be Christian friendly.
Not sure how this will play out.
I'm sure the oil paint crowd thought that photography was anti-artist cheating too.
This is just another tool, and it will be used by good artists to make good art, and bad artists to make bad art. The primary difference being that even the bad art will be better than before this tool existed.
> I'm sure the oil paint crowd thought that photography was anti-artist cheating too.
The difference is that the camera company didn't have editorial control over what you could take pictures of, unlike with AI which gives all of that power to the creator of the model.
> The primary difference being that even the bad art will be better than before this tool existed.
[citation needed]
Lmao re modern media: every script that human 'writers' produce is now the same old copy paste slop with the exact same tropes.
It's very rare to see something that isn't completely derivative. Even though I enjoyed Flow immensely, it's just homeward bound with no dialogue. Why do we pretend like humans are magical creativity machines when we're clearly machines ourselves.
> when we're clearly machines ourselves
Well, speak for yourself.
I could not agree more with this. 90% of AI features feel tacked on and useless and that’s before you get to the price. Some of the services out here are wanting to charge 50% to 100% more for their sass just to enable “AI features”.
I’m actually having a really hard time thinking of an AI feature other than coding AI feature that I actually enjoy. Copilot/Aider/Claude Code are awesome but I’m struggling to think of another tool I use where LLMs have improved it. Auto completing a sentence for the next word in Gmail/iMessage is one example, but that existed before LLMs.
I have not once used the features in Gmail to rewrite my email to sound more professional or anything like that. If I need help writing an email, I’m going to do that using Claude or ChatGPT directly before I even open Gmail.