Comment by echelon

Comment by echelon 11 hours ago

5 replies

We now have capacity to program and automate in the optics, signals, and spatial domains.

As someone in the film space, here's just one example: we are getting extremely close to being able to make films with only AI tools.

Nano Banana makes it easy to create character and location consistent shots that adhere to film language and the rules of storytelling. This still isn't "one shot", and considerable effort still needs to be put in by humans. Not unlike AI assistance in IDEs requiring a human engineer pilot.

We're entering the era of two person film studios. You'll undoubtedly start seeing AI short films next year. I had one art school professor tell me that film seems like it's turning into animation, and that "photorealism" is just style transfer or an aesthetic choice.

The film space is hardly the only space where these models have utility. There are so many domains. News, shopping, gaming, social media, phone and teleconference, music, game NPCs, GIS, design, marketing, sales, pitching, fashion, sports, all of entertainment, consumer, CAD, navigation, industrial design, even crazy stuff like VTubing, improv, and LARPing. So much of what we do as humans is non-text based. We haven't had effective automation for any of this until this point.

This is a huge percentage of the economy. This is actually the beating heart of it all.

wild_egg 3 hours ago

Been thinking about this. Curious why you positioned it as Nano Banana having more utility than agents when it seems like the next level even would be Nano Banana with agents?

The two are kind of orthogonal concepts.

yunwal 11 hours ago

> we are getting extremely close to being able to make films with only AI tools

AI still can’t reliably write text on background details. It can’t get shadows right. If you ask it to shoot things from a head on perspective, for example a bookshelf, it fails to keep proportions accurate enough. The bookshelf will not have parallel shelves. The books won’t have text. If in a library, the labels will not be in Dewey decimal order.

It still lacks a huge amount of understanding about how the world works necessary to make a film. It has its uses, but pretending like it can make a whole movie is laughable.

  • wild_egg 9 hours ago

    I don't think they're suggesting AI could one-shot a whole movie. It would be iterative, just like programming.

    • echelon 3 hours ago

      Exactly. You can still open the generations in Photoshop.

      I'd say the image and video tools are much further along and much more useful than AI code gen (not to dunk on code autocomplete). They save so much time and are quite incredible at what they can do.

  • gabriel666smith 9 hours ago

    I don't think equating "extremely close" with "pretending like it can" is a fair way to frame the sentiment of the comment you were replying to. Saying something is close to doing something is not the same as saying it already can.

    In terms of cinema tech, it took us arguably until the early 1940s to achieve "deep focus in artificial light". About 50 years!

    The last couple of years of development in generative video looks, to me, like the tech is improving more quickly than the tech it is mimicking did. This seems unsurprising - one was definitely a hardware problem, and the other is most likely a mixture of hardware and software problems.

    Your complaints (or analogous technical complaints) would have been acceptable issues - things one had to work around - for a good deal of cinema history.

    We've already reached people complaining about "these book spines are illegible", which feels very close to "it's difficult to shoot in focus, indoors". Will that take four or five decades to achieve, based on the last 3 - 5 years of development?

    The tech certainly isn't there yet, nor am I pretending like it is, and nor was the comment you replied to. To call it close is not laughable, though, in the historical context.

    The much more interesting question is: At what point is there an audience for the output? That's the one that will actually matter - not whether it's possible to replicate Citizen Kane.