Comment by ai_critic

Comment by ai_critic a day ago

5 replies

Reality check:

Your extra data is a big JSON blob. Okay, fine.

File formats dating back to Targa (https://en.wikipedia.org/wiki/Truevision_TGA) support arbitrary text blobs if you're weird enough.

PNG itself has both EXIF data and a more general text chunk mechanism (both compressed and uncompressed, https://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html#C.An... , section 4.2.3, you probably want iTXt chunks).

exiftool will already let you do all of this, by the way. There's no reason to summon non-standard file format into the world (especially when you're just making a weird version of PNG that won't survive resizing or quantization properly).

ai_critic a day ago

Here, two incantations:

> exiftool -config exiftool.config -overwrite_original -z '-_custom1<=meta.json' cat.png

and

> exiftool -config exiftool.config -G1 -Tag_custom1 cat.png

You can (with AI help no less) figure out what `exiftool.config` should look like. `meta.json` is just your JSON from github.

Now go draw the rest of the owl. :)

kuberwastaken a day ago

Hi! Thanks for checking it out, means a lot :)

Yes, it is a big JSON blob atm, haha and t's definitely still a POC, but the idea is to avoid having a separate JSON file that adds to the complexity. While EXIF data works pretty well for most basic stuff, it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.

I'm currently working on redundancy and error correction to deal with the resizing problem. Having a separate file format, even if it's a headache and adds another one to the list (well, another cute-sounding one at least), gives more customization options and makes it easier to associate the properties directly.

There's definitely a ton of work left to do, but I see a lot of potential in something like this (also, nice username)

  • ai_critic a day ago

    > While EXIF data works pretty well for most basic stuff, it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.

    That's why I mentioned that you put anything, include binary data--which includes images--into the chunks in a PNG. I think Pillow even supports this (there are some PRs, like https://github.com/python-pillow/Pillow/pull/4292 , that suggest this).

    Your problem domain is:

    * Have something that looks like a PNG...

    * ...that doesn't need supporting files outside itself...

    * ...that can also store textual data (e.g., that JSON blob of bounding boxes and whatnot)...

    * ...and can also store image data (e.g., attention maps and saliency regions).

    What I'm telling you is that the PNG file format already supports all of this stuff, you just need to be smart enough to read the spec and apply the affordances it gives you.

    > I'm currently working on redundancy and error correction to deal with the resizing problem. Having a separate file format, even if it's a headache and adds another one to the list (well, another cute-sounding one at least), gives more customization options and makes it easier to associate the properties directly.

    In the 90s, we'd already spent vast sums of gold and blood and tears solving the "holy shit, how do we encode multiple things in images so that they can survive an image pipeline, be extensible to end users, and be compressed reliably."

    None of this has been new for three decades. Nothing you are going to do is going to be a value add over correctly using the file format you already have.

    I promise that you aren't going to see anything particularly new or exciting in this AI goldrush that isn't an isomorphism of something much smarter, much better-paid people solved back when image formats were still a novel problem domain (again, in the 1990s).

  • vunderba a day ago

    > it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.

    Why not exactly? ComfyUI encodes an absolute bonker amount of information (all arbitrary JSON) into workflow PNG files without any issues.

    • ai_critic a day ago

      Indeed. And character cards for chatbots (like in SillyTavern) have supported this for years.