Comment by AshamedCaptain

Comment by AshamedCaptain 4 days ago

4 replies

Samsung at least does these "dog" cataloguing & searches entirely on-device, as trivially checked by disabling all network connectivity and taking a picture. It may ping home for several other reasons, though.

llm_nerd 4 days ago

Apple also does the vast majority of photo categorization on device, and has for years over multiple major releases. Foods, drinks, many types of animals including specific breeds, OCRing all text on the image even when massively distorted, etc.

This feature is some new "landmark" detection and it feels like it's a trial balloon or something as it simply makes zero sense unless what they are categorizing as landmarks is enormous. The example is always the Eiffel tower, but the data to identify most of the world's major landmarks is small relative to what the device can already detect, not to mention that such lookups don't even need photo identification and could instead (and actually already do and long have) use simple location data and nearby POIs for such metadata tagging.

The landmarks thing is the beginning, but I feel like they want it to be much more detailed. Like every piece of art, model of car, etc, including as they change with new releases, etc.

TeMPOraL 4 days ago

Does or doesn't. You can't really tell if and when it does any cataloguing; best I've managed to observe is that you can increase chances of it happening if you keep your phone plugged in to a charger for extended periods of time.

That's the problem with all those implementations: no feedback of any kind. No list of recognized tags. No information of what is or is to be processed. No nothing. Just magic that doesn't work.

  • reaperman 4 days ago

    With embeddings, there might not be tags to display. Instead of labeling the photo with a tag of “dog”, it might just check whether the embedding of each photo is within some vector distance of the embedding of your search text.

    • TeMPOraL 3 days ago

      Yes and no. Embeddings can be used in both directions - if you can find images closest to some entries in a search text, you can also identify tokens or phrases closest in space to any image or cluster of images, and output that. It's a problem long solved in many different ways, including but not limited to e.g.:

      https://github.com/pythongosssss/ComfyUI-WD14-Tagger

      which uses specific models to generate proper booru tags out of any image you pass to it.

      More importantly, I know for sure they have this capability in practice, because if you tap the right way in the right app, when the Moon is in just the right phase, both Samsung Gallery and OneDrive Photos does (or in case of OneDrive, used to):

      - Provide occasional completions and suggestions for predefined categories, like "sunset" or "outwear" or "people", etc.;

      - Auto-tag photos with some subset of those (OneDrive, which also sometimes records it in metadata), or if you use "edit tag" options, suggest best fitting tags (Samsung);

      - Have a semi-random list of "Things" to choose from to categorize your photos, such as "Sunsets", "City", "Outdoors", "Room", etc. Google Photos does that one too.

      This shows they do maintain a list of correct and recommended classifications. They just choose to keep it hidden.

      With regards to face recognition, it's even worse. There's zero controls and zero information other than occasionally matched (and often mismatched) face under photo properties, that you can sometimes delete.