Comment by jaccola
They are not reliable at all unless paired with some physical measurements (Lidar, or a known size object in the scene).
Probably an interesting use for a pretrained model to estimate scale based on common items seen in scenes (cars, doorframes, trees, etc…)