Comment by Ciantic

Comment by Ciantic 3 days ago

10 replies

The utility of tracing is great, I've been using Azure Application Insights with NodeJS (and of course in .NET). This is relatively simple because it monkey patches itself everywhere if you go through the "classic" SDK route. Then adding your own data to logs is just a few simple functions trackTrace, trackException, trackEvent, etc.

However, if you want to figure out how it works you might be scared, it is not lightweight. I just spent a few days digging through the Azure Application Insights NodeJS code base which integrates with OpenTelemetry packages. It's an utter mess, a huge amount of abstractions. Adding it to the project brought 100 MB and around 40 extra packages.

the_duke 2 days ago

This isn't just a problem in JS.

In every language I looked the otel libraries were a bloated, over-abstracted and resource-hungry mess.

I think that's partially because it is actually difficult and complex to implement, and partially because the implementations are often written by devs without a long history of implementing similar systems.

  • mcronce 2 days ago

    It's been a bit since I've added it to an existing project, but at least as of a year or so ago, the Rust implementation (tracing + tracing-opentelemetry + opentelemetry-jaeger specifically for that project) was similar.

    The impact on compile time and code size wasn't bad (for a project that was large and already pulling in a lot of crates), but it had a huge runtime cost - mostly allocator pressure in the form of temp allocations from what I could see. For a mostly I/O bound workload, it more than doubled CPU utilization and ballooned OS-measured memory consumption by >30%

  • deathanatos 2 days ago

    The OpenTelemetry spec is a mess. There's so much … abstract blah blah blah? … and very little actual details.

    If I actually go to the part of the spec that I think gets down to "here is how to concretely write OpenTelemetry stuff [1], that seems to have the various attributes camelCased, for example, whereas the article has named them "spanID" and "traceID".

    AFAICT the "spec" also just links you to the implementation. "Just" read this protobuf definition, translate that to JSON in your mind's eye. I "POST" this to a hard-coded path tacked onto a URL… but do I post individual traces/logs? Can I batch them? I'm sure there's a gRPC thing I could start guessing from…

    But it seems like the JSON stuff is a second class citizen from the gRPC interface. Unless that's just as bad, too…

    Actually getting set up in Python isn't too terrible, though there are a few classes that you're like "what's the point of this?" and most of them are apparently just undoc'd. (E.g., [2], ^F TraceProvider, get nothing.)

    It is a bit depressing how this seems to be becoming The Chosen Spec.

    I also sort of have 64-bit integers for span IDs (TFA never mentions it, but AFAICT this is required by spec). I'd much rather have "/span/ids/are/a/tree" span IDs, as this integrates much better with any logging system: I can easily ask my log viewer to filter to a specific span (span_id == "/spans/a/b/c") or to a subtree (span_id regex-matches /^\/spans\/a\/.*/)

    (And the spec bizarrely focuses on some sort of language-abstract API, instead of … actual data types / media types?)

    [1]: https://opentelemetry.io/docs/specs/otlp/#otlphttp

    [2]: https://opentelemetry-python.readthedocs.io/en/latest/api/tr...

  • snuxoll 2 days ago

    The .net implementation is about as clean as it can get, but a lot of that has to do with Microsoft caring very deeply about this kind of performance data for a very long time (thus having the entire System.Diagnostics namespace).

    There’s certainly some abstraction that is gratuitous still, but it’s better than most of the architect astronaut code I’ve seen targeting the CLR.

chucklenorris 2 days ago

Yes, this is exactly my impression too.. the code for opentelemetry-js is over engineered and adds a scary amount of dependency code. There are quite a bunch of libraries which I'm not sure what they do and in which scenarios I might need them. The documentation is not very helpful either. I look forward to someone implementing a opentelemetry-nano package with only the minimum stuff needed and allow me to choose extra support for my dependencies or an easy way of adding my own wrappers.

  • pimeys 2 days ago

    Also badly documented. If you try to implement something non-standard with it, good luck. I once needed to write code where trace started in node an continued inside a node api native library. Getting these two traces to connect must be one of the most frustrating things I've worked on.

    At least on the Rust side you have types to help you out, but it is still quite complex and the crates have bugs open for years, impossible to solve with the current architecture.

lastartes 2 days ago

I had a lot of fun wading through that mess in the past trying to determine why something wasn't working. A fun fact that I just learned is that the node sdk is now just a shim over https://www.npmjs.com/package/@azure/monitor-opentelemetry. It seems like the future is just using that package directly which hopefully improves the situation. One benefit is you can extend it with OTel instrumentation packages.

wordofx 2 days ago

What’s your plans for applications insights sunsetting?

  • MuffinFlavored 2 days ago

    https://azure.microsoft.com/en-us/updates/we-re-retiring-cla...

    Do you have a link for what you are speaking of?

    • wordofx 2 days ago

      There’s no public announcement yet but from what reps say to customers and what people working on azure say is app insights is more or less being wound down in favor of building out open source solutions because it’s more favourable and less maintaince / dev than building out their own solution. Think more OTEL/Grafana. Basically word on the inside is MS doesn’t want to pay to build out app insights.