jcgrillo 11 hours ago

Can you expand upon this? Why would I have more than one point per timestamp? Or am I misunderstanding?

Let's say I'm measuring some quantity, maybe the execution time of an http request handler, and that handler is firing roughly every millisecond and taking a certain amount of time to complete. I'd have about a thousand measurements per second, each with their own timestamp--which to be clear can be aliased if something happens in the same nanosecond! It's totally fine to have a delta of zero. But the point is this value is scalar--it's represented by a single point.

But it seems like you're suggesting vector-valued measurements are a common thing as well--e.g. I should expect to send multiple points per measurement? I'm struggling to think of an application where I'd want this.. I guess it would be easy enough to add more columns.. e.g. values0, values1, ...

EDIT: oh, I see, I think you're saying I should locally aggregate the measurements with some aggregation function and publish the aggregated values.. Yeah that's something I'd really prefer to avoid if possible. By aggressively aggregating to a coarse timestamp we throw away lots of interesting frequency information. But either way, I don't think that really affects this format much. You could totally use it for an aggregated measurement as well. And yeah each of these Measurements objects represents a timeseries of measurements--we'd simultaneously append to a bunch of them, one for each timeseries. I probably should have called it "Timeseries" instead.

EDIT2: It might be worth spelling out a bit why this format is efficient. It has to do with the details of how Google Protocol Buffers are encoded[1]. In particular, the timestamps are very cheap so long as the delta is small, and the values can also be cheap if they're small numbers--e.g. also deltas--which for ~continuously and sufficiently slowly varying phenomena is usually the case. Moreover, packed repeated fields[2] (which is unnecessary to specify in proto3 but I included here to be explicit about what I mean) are further efficient because they omit tags and are just a length followed by a bunch of variable-width encoded values. So this is leaning on packing, varints, and delta-encoding to be as compact as possible.

[1] https://protobuf.dev/programming-guides/encoding/ [2] https://protobuf.dev/programming-guides/encoding/#packed

  • bboreham 10 hours ago

    Take a step back. You wrote:

        repeated int64 values 
    
    I’m saying that in most cases there will only be one value, hence ‘repeated’ is unnecessary.

    I didn’t say anything about aggregation, but yes one counts things going at a thousand per second rather than sending all the detail. The Otel signal if you want detail is traces, not metrics.

    • jcgrillo 9 hours ago

      > Take a step back.

      Excuse me? Modify your tone, read what I wrote again, and this time make an effort to understand it. I'd be happy to answer any questions you might have.

      I'm sorry if this sounds harsh but I truly cannot tell if you're trolling or what.. I think I made a serious effort to understand what you were talking about, and it seems like you haven't done the same.