Comment by ai_critic

Comment by ai_critic a day ago

0 replies

> While EXIF data works pretty well for most basic stuff, it's not enough for everything one might need for AI specific stuff, especially for things like attention maps and saliency regions.

That's why I mentioned that you put anything, include binary data--which includes images--into the chunks in a PNG. I think Pillow even supports this (there are some PRs, like https://github.com/python-pillow/Pillow/pull/4292 , that suggest this).

Your problem domain is:

* Have something that looks like a PNG...

* ...that doesn't need supporting files outside itself...

* ...that can also store textual data (e.g., that JSON blob of bounding boxes and whatnot)...

* ...and can also store image data (e.g., attention maps and saliency regions).

What I'm telling you is that the PNG file format already supports all of this stuff, you just need to be smart enough to read the spec and apply the affordances it gives you.

> I'm currently working on redundancy and error correction to deal with the resizing problem. Having a separate file format, even if it's a headache and adds another one to the list (well, another cute-sounding one at least), gives more customization options and makes it easier to associate the properties directly.

In the 90s, we'd already spent vast sums of gold and blood and tears solving the "holy shit, how do we encode multiple things in images so that they can survive an image pipeline, be extensible to end users, and be compressed reliably."

None of this has been new for three decades. Nothing you are going to do is going to be a value add over correctly using the file format you already have.

I promise that you aren't going to see anything particularly new or exciting in this AI goldrush that isn't an isomorphism of something much smarter, much better-paid people solved back when image formats were still a novel problem domain (again, in the 1990s).