Comment by _wire_

Comment by _wire_ 5 days ago

21 replies

Solid overview of applied color theory for video, so worth watching.

As to what was to be debunked, the presentation not only fails to set out a thesis in the introduction, it doesn't even beg a question, so you've got to watch hours to get to the point: SDR and HDR are two measurement systems which when correctly used for most cases (legacy and conventional content) must produce the visual result. The increased fidelity of HDR makes it possible to expand the sensory response and achieve some very realistic new looks that were impossible with SDR, but the significance and value of any look is still up to the creativity of the photographer.

This point could be more easily conveyed by this presentation if the author explained in the history of reproduction technology, human visual adaptation exposes a moment by moment contrast window of about 100:1, which is constantly adjusting across time based on average luminance to create an much larger window of perception of billions:1(+) that allows us to operate under the luminance conditions on earth. But until recently, we haven't expected electronic display media to be used in every condition on earth and even if it can work, you don't pick everywhere as your reference environment for system alignment.

(+)Regarding difference between numbers such as 100 or billions, don't let your common sense about big or small values phase your thinking about differences: perception is logarithmic; it's the degree of ratios that matter more than the absolute magnitude of the numbers. As a famous acoustics engineer (Paul Klipsch) said about where to focus design optimization of response traits of reproduction systems: "If you can't double it or halve it, don't worry about it."

dperfect a day ago

It's hard to boil it down to a simple thesis because the problem is complicated. He admits this in the presentation and points to it being part of the problem itself; there are so many technical details that have been met with marketing confusion and misunderstanding that it's almost impossible to adequately explain the problem in a concise way. Here's my takeaway:

- It was clearly a mistake to define HDR transfer functions using absolute luminance values. That mistake has created a cascade of additional problems

- HDR is not what it was marketed to be: it's not superior in many of the ways people think it is, and in some ways (like efficiency) it's actually worse than SDR

- The fundamental problems with HDR formats have resulted in more problems: proprietary formats like Dolby Vision attempting to patch over some of the issues (while being more closed and expensive, yet failing to fully solve the problem), consumer devices that are forced to render things worse than they might be in SDR due to the fact that it's literally impossible to implement the spec 100% (they have to make assumptions that can be very wrong), endless issues with format conversions leading to inaccurate color representation and/or color banding, and lower quality streaming at given bit rates due to HDR's reliance on higher bit depths to achieve the same tonal gradation as SDR

- Not only is this a problem for content delivery, but it's also challenging in the content creation phase as filmmakers and studios sometimes misunderstand the technology, changing their process for HDR in a way that makes the situation worse

Being somewhat of a film nerd myself and dealing with a lot of this first-hand, I completely agree with the overall sentiment and really hope it can get sorted out in the future with a more pragmatic solution that gives filmmakers the freedom to use modern displays more effectively, while not pretending that they should have control over things like the absolute brightness of a person's TV (when they have no idea what environment it might be in).

  • adrian_b 7 hours ago

    While HDR has the problems described by you, in practice, whenever possible, I choose the HDR version of a movie over its SDR version.

    The reason is not HDR itself, but the fact that the HDR movies normally use the BT.2020 color space, while the SDR movies normally use the BT.709 color space.

    The color spaces based on the limitations of the first color CRT tubes, which are no longer relevant today, i.e. sRGB, BT.709 and the like, are really unacceptable from my point of view, because they cannot reproduce many of the more saturated colors in the red-orange region, which are frequently encountered in nature and in manufactured objects, and which are also located in a region of the color space where human vision is most sensitive.

    While no cheap monitor can reproduce the full BT.2020, many cheap monitors can reproduce the full DCI-P3 color space, which provides adequate improvements in the red-orange corner over sRGB/BT.709.

strogonoff 2 days ago

Regardless of whether it is HDR or SDR, when processing raw data for display spaces one must throw out 90%+ of information of what was captured by the sensor (which is often a small amount of what was available at the scene already). There can simply be no objectivity, it is always about what you saw and what you want others to see, an inherently creative task.

  • ttoinou 2 days ago

    90% really ? What color information get ejected exactly ? For the sensor part are you talking about the fact that the photosites don't cover all the surface ? Or that we only capture a short band of wavelength ? Or that the lens only focuses rays unto specific exact points and make the rest blurry and we loose 3D ?

    • sansseriff 2 days ago

      Cameras capture linear brightness data, proportional to the number of photons that hit each pixel. Human eyes (film cameras too) basically process the logarithm of brightness data. So one of the first things a digital camera can do to throw out a bunch of unneeded data is to take the log of the linear values it records, and save that to disk. You lose a bunch of fine gradations of lightness in the brightest parts of the image. But humans can't tell.

      Gamma encoding, which has been around since the earliest CRTs was a very basic solution to this fact. Nowadays it's silly for any high-dynamic image recording format to not encode data in a log format. Because it's so much more representative of human vision.

      • ttoinou 2 days ago

        Ok so similar to the other commentator then, thanks. According to that metric its much more than 90% we’re throwing out then (:

    • Uehreka a day ago

      Third blind man touching the elephant here: the other commenters are wrong! it’s not about bit depth or linear-to-gamma, it’s the fact that the human eye can detect way more “stops” (the word doesn’t make sense you have to just look it up) of brightness (I guess you could say “a wider range of brightness”, but photography people all say “stops”) than the camera, and the camera can detect more stops of brightness than current formats can properly represent!

      So you have to decide whether to lose the darker parts of the image or the brighter parts of the image you’re capturing. Either way, you’re losing information.

      (In reality we’re all kind of right)

      • strogonoff 8 hours ago

        This was what I meant primarily.

        Camera sensor can get <1% of what we can see, any display media (whether paper or screen, SDR or HDR, etc.) can show <1% of what camera sensor can get.

        (That 1% figure is very rough, it will vary by scene conditions, but it is not very off.)

        Add to that, what each of us sees is always subjective and depends on our preceding experience as well as shared cultural baggage.

        As a result, it is a creative task. We selectively amplify and suppress aspects of raw data according to what the display space fits, what we think should be seen, what our audience would be expecting to see.

        People in this thread claiming there to be some objective standard reference process for compressing/discarding extra data for display space completely miss the fundamental aspect of perception. There is no reference process for even a basic task of determining what counts as neutral grey.

        (As a bonus point, think how as more and more of our visual input from the youngest ages comes from looking at bland JPEGs on shining rectangles with tiny dynamic ranges this shapes our common perception of reality, makes it less subjective and more universal. Compare with how before photography we really did not have any equivalent of some “standard”—not really, but we mistake it for such—representation of reality we must all adhere to.)

        • ttoinou 5 hours ago

          Ok I get it but I doubt photographers have full control over that 1%, so it’s not just a creative task, we’re constrained by physics too

    • michaelt a day ago

      A 4k 30fps video sensor capturing 8 bits per pixel (bayer pattern) image, is capturing 2 gigabits per second. That same 4k 30fps video on Youtube will be 20 megabits per second or less.

      Luckily, it turns out relatively few people need to record random noise, so when we lower the data rate by 99% we get away with it.

      • strogonoff 8 hours ago

        1. I believe in modern cameras it’s 10+ bits per pixel, undebayered, but willing to be corrected. Raw-capable cameras capture 12+ bits of usable range. Data rates far exceed 5 gigabit per second.

        2. Your second paragraph is a misunderstanding. Unless you really screw up shooting settings, it is not random noise but pretty usefully scene data available for mapping to narrow display space in whatever way you see fit.

    • adgjlsfhk1 2 days ago

      Presumably they're referring to the fact that most cameras capture ~12-14 bits of brightness vs the 8 that (non-hdr) displays show.

      • ttoinou 2 days ago

        Oh that's normal then. There are mandatory steps of dynamic range reduction in the video editing / color grading pipeline (like a compressor in audio production). So the whole information is not lost but the precision / details can be yes. But that's a weird definition, there are so many photons in daylight capture that you could easily say we really need minimum 21 bits per channel minimum (light intensity of sun / light intensity of moon)