Comment by orochimaaru

Comment by orochimaaru 6 months ago

Metrics are usually minimal overheard. Traces need to be sampled. Logs need to be sampled at error/critical levels. You also need to be able to dynamically change sampling and log levels.

100% traces are a mess. I didn’t see where he setup sampling.

phillipcarter 6 months ago

The post didn't cover sampling, which indeed, significantly reduces overhead in OTel because the spans that aren't sampled aren't ever created, when you head sample at the SDK level. This is more of a concern when doing tail-based sampling only, wherein you will want to trace each request and offload to a sidecar so that export concerns are handled outside your app. And then it routes to a sampler elsewhere in your infrastructure.

FWIW at my former employer we had some fairly loose guidelines for folks around sampling: https://docs.honeycomb.io/manage-data-volume/sample/guidelin...

There's outliers, but the general idea is that there's also a high cost to implementing sampling (especially for nontrivial stuff), and if your volume isn't terribly high then you'll probably eat a lot more in time than paying for the extra data you may not necessarily need.

Reply View 2 replies

nikolay_sivko 6 months ago

As suggested, I measured the overhead at various sampling rates:
No instrumentation (otel is not initialized): CPU=2.0 cores
SAMPLING 0% (otel initialized): CPU=2.2 cores
SAMPLING 10%: CPU=2.5 cores
SAMPLING 50%: CPU=2.6 cores
SAMPLING 100%: CPU=2.9 cores
Even with 0% sampling, OpenTelemetry still adds overhead due to context propagation, span creation, and instrumentation hooks

Reply View | 1 reply
- orochimaaru 6 months ago
  
  Thanks!!
  
  Reply View | 0 replies