vunderba 2 days ago

Is that a joke? Because 4o image generation (assuming you click "Image Generate" which uses gpt-image-1) EASILY handles rendering hands with the proper number of fingers even without specifying something like that.

If anything, it's actually MORE difficult to generate hands with an improper number of fingers. Apologies to Count Rugen.

https://imgur.com/a/hIp5DQO

  • kachapopopow a day ago

    The tweet about gpt 4o image generation capabilities release had 4 fingers.

  • washadjeffmad 2 days ago

    If you consider finger count to be the lowest bar, then it's still a problem. People just aren't generally creative enough to find the cracks.

    Think in terms of hands, their components, and their function, and test again. Be specific.

    • vunderba a day ago

      So? My point wasn't that they were capable of EVERYTHING - it was addressing what looked like to me (and likely to any casual observer) factually incorrect information.

      You're also really talking to the wrong person about potential deficiencies in GenAI for images.

      I run an entire site where I compare a multitude of prompts I CREATED to explicitly test state of the art major generative image providers (Imagen4, gpt-image-1, Flux Kontext, etc.) - I'm all too aware of their shortcomings.

      https://genai-showdown.specr.net

      • washadjeffmad a day ago

        Didn't mean to wind you up! Totally wasn't my intent. Looking over your site, I feel like my point is pretty strongly reflected by your work, though.

        While models have been trained to deliver high-level impressions (with increasing attention to detailed problem domains), one-shot control is still relatively poor, and they lack the fundamental skill of a trained artist. There are chasms between what you think you're prompting, what the text encoder understands, and how the model interprets that input, with the resulting effect of a professional musician intentionally playing badly... hands not excepted.

        For instance, in "Mermaid Disciplinary Committee" on your site, every hand has a deformity or finger count inconsistency. In "Spheron", the hands have no variation and suffer from cross-subject cloning (even 4o - look at the shield-carrier).

        That's what I meant about creativity and being specific. Try prompting for three people holding up certain fingers on one or both hands. Start with the index, progress to pinky. Ask it to show you a hand gripping things, rotated in different orientations. Prompt for a hand with 3 fingers, then 6 fingers, then no fingers. Ask for gang signs or shadow puppets, pinching something, with fewer or extra digits. The illusion breaks down quickly.

        This is a space I'm working in, retraining text encoders and diffusion models to understand the same things first year arts students learn. With how limited and poisoned most models are, it's been a huge effort.