Comment by bboreham
Generally you only have one point to send per series. You send all the points for ‘now’, then in N seconds you send them all again.
Generally you only have one point to send per series. You send all the points for ‘now’, then in N seconds you send them all again.
Take a step back. You wrote:
repeated int64 values
I’m saying that in most cases there will only be one value, hence ‘repeated’ is unnecessary.I didn’t say anything about aggregation, but yes one counts things going at a thousand per second rather than sending all the detail. The Otel signal if you want detail is traces, not metrics.
> Take a step back.
Excuse me? Modify your tone, read what I wrote again, and this time make an effort to understand it. I'd be happy to answer any questions you might have.
I'm sorry if this sounds harsh but I truly cannot tell if you're trolling or what.. I think I made a serious effort to understand what you were talking about, and it seems like you haven't done the same.
Can you expand upon this? Why would I have more than one point per timestamp? Or am I misunderstanding?
Let's say I'm measuring some quantity, maybe the execution time of an http request handler, and that handler is firing roughly every millisecond and taking a certain amount of time to complete. I'd have about a thousand measurements per second, each with their own timestamp--which to be clear can be aliased if something happens in the same nanosecond! It's totally fine to have a delta of zero. But the point is this value is scalar--it's represented by a single point.
But it seems like you're suggesting vector-valued measurements are a common thing as well--e.g. I should expect to send multiple points per measurement? I'm struggling to think of an application where I'd want this.. I guess it would be easy enough to add more columns.. e.g. values0, values1, ...
EDIT: oh, I see, I think you're saying I should locally aggregate the measurements with some aggregation function and publish the aggregated values.. Yeah that's something I'd really prefer to avoid if possible. By aggressively aggregating to a coarse timestamp we throw away lots of interesting frequency information. But either way, I don't think that really affects this format much. You could totally use it for an aggregated measurement as well. And yeah each of these Measurements objects represents a timeseries of measurements--we'd simultaneously append to a bunch of them, one for each timeseries. I probably should have called it "Timeseries" instead.
EDIT2: It might be worth spelling out a bit why this format is efficient. It has to do with the details of how Google Protocol Buffers are encoded[1]. In particular, the timestamps are very cheap so long as the delta is small, and the values can also be cheap if they're small numbers--e.g. also deltas--which for ~continuously and sufficiently slowly varying phenomena is usually the case. Moreover, packed repeated fields[2] (which is unnecessary to specify in proto3 but I included here to be explicit about what I mean) are further efficient because they omit tags and are just a length followed by a bunch of variable-width encoded values. So this is leaning on packing, varints, and delta-encoding to be as compact as possible.
[1] https://protobuf.dev/programming-guides/encoding/ [2] https://protobuf.dev/programming-guides/encoding/#packed