Comment by bboreham

Can you expand upon this? Why would I have more than one point per timestamp? Or am I misunderstanding?

Let's say I'm measuring some quantity, maybe the execution time of an http request handler, and that handler is firing roughly every millisecond and taking a certain amount of time to complete. I'd have about a thousand measurements per second, each with their own timestamp--which to be clear can be aliased if something happens in the same nanosecond! It's totally fine to have a delta of zero. But the point is this value is scalar--it's represented by a single point.

But it seems like you're suggesting vector-valued measurements are a common thing as well--e.g. I should expect to send multiple points per measurement? I'm struggling to think of an application where I'd want this.. I guess it would be easy enough to add more columns.. e.g. values0, values1, ...

EDIT: oh, I see, I think you're saying I should locally aggregate the measurements with some aggregation function and publish the aggregated values.. Yeah that's something I'd really prefer to avoid if possible. By aggressively aggregating to a coarse timestamp we throw away lots of interesting frequency information. But either way, I don't think that really affects this format much. You could totally use it for an aggregated measurement as well. And yeah each of these Measurements objects represents a timeseries of measurements--we'd simultaneously append to a bunch of them, one for each timeseries. I probably should have called it "Timeseries" instead.

EDIT2: It might be worth spelling out a bit why this format is efficient. It has to do with the details of how Google Protocol Buffers are encoded[1]. In particular, the timestamps are very cheap so long as the delta is small, and the values can also be cheap if they're small numbers--e.g. also deltas--which for ~continuously and sufficiently slowly varying phenomena is usually the case. Moreover, packed repeated fields[2] (which is unnecessary to specify in proto3 but I included here to be explicit about what I mean) are further efficient because they omit tags and are just a length followed by a bunch of variable-width encoded values. So this is leaning on packing, varints, and delta-encoding to be as compact as possible.

[1] https://protobuf.dev/programming-guides/encoding/ [2] https://protobuf.dev/programming-guides/encoding/#packed