Comment by saghm

Comment by saghm a day ago

9 replies

I often tell younger engineers that the human brain is the slowest, lowest-memory, and most error-prone runtime for a program. If they're stuck trying to figure out a bug, one of the most effective things they can do is validate their assumptions about what's happening, because there wouldn't be a bug if everything was happening exactly according to expectations.

alphazard a day ago

> wouldn't be a bug if everything was happening exactly according to expectations

This isn't quite true, especially concerning distributed systems. It's relatively common for a software system to be broken by design. It's not that the developer didn't know how to use the programming language to get the computer to do what they want. It's that what the developer wanted reflects a poor model of the world, a logical inconsistency, or just a behavior which is confusing to users.

  • saghm a day ago

    Keep in mind I said that this is advice I give junior engineers specifically; they shouldn't be the ones responsible for designing distributed systems in the first place. For someone in that part of their career, this advice is meant to help to learn the skills the need to solve the problems they're dealing with, and it's not intended to be universal to all circumstances, just a useful thing to keep i mind.

  • monocasa a day ago

    That sounds distinctly like an expectation that didn't hold.

    • Stefan-H a day ago

      "a poor model of the world, a logical inconsistency, or just a behavior which is confusing to users" I expect when I pull from the queue (but it was designed non-atomically) that I will be guaranteed to only grab the item once and only once, even after a failure. That expectation is wrong, but the developer may have implemented their intent perfectly, they just didn't understand that there are error cases they didn't account for.

wruza a day ago

That’s why I learned to log literally everything into stdout unless a process is time-sensitive and it’s deep production and it passed the mark where bugs and insights occur once a month+ and there’s zero chance someone asking me what exactly happenes with X at Y afternoon-ish last Friday.

The obvious exception are recursive number-fiddling algos which would spam gigabytes of output due to big N.

This way I can just read assumptions and see branches taken and what’s wrong as if it was written in plain text.

When I see klocs without a single log statement, to me it’s readonly and not worth touching. If you’re stuck with a bug, log everything and you’ll see it right there.

  • jimbokun a day ago

    For large systems the cost of maintaining all of those logs in a searchable system can be prohibitive.

    • lanstin a day ago

      Just reduce the time horizon you keep the logs until you can afford it. Also, as he mentioned, once a system is getting bugs infrequently, you can lower the log level. My standard is to have a log msg for each branch in the code. In C, I would use macros to also have a count of all the fmt strings the log package encountered (so I still got a sort of profile of the logic flows encountered, but not have the sprintf expense), but I haven't figured out an efficient way to do that in Go yet (i.e. not using introspection).

nfw2 a day ago

This is one of the reasons I think that the push towards server-side UI is misguided. It's much easier to walk through the runtime of a program running locally than it is to step through a render that's distributed across a network.