Comment by neRok

Comment by neRok 2 months ago

I agree that the first example in the article is "bad"...

  fn frobnicate(walrus: Option<Walrus>)`)

but the rest makes no sense to me!

  // GOOD
  frobnicate_batch(walruses)
  // BAD
  for walrus in walruses {
    frobnicate(walrus)
  }

It doesn't follow through with the "GOOD" example though...

  fn frobnicate_batch(walruses)
    for walrus in walruses { frobnicate(walrus) }
  }

What did that achieve?

And the next example...

  // GOOD
  if condition {
    for walrus in walruses { walrus.frobnicate() }
  } else {
    for walrus in walruses { walrus.transmogrify() }
  }
  // BAD
  for walrus in walruses {
    if condition { walrus.frobnicate() }
    else { walrus.transmogrify() }
  }

What good is that when...

  walruses = get_5_closest_walruses()
  // "GOOD"
    if walruses.has_hungry() { feed_them_all() }
    else { dont_feed_any() }
  // "BAD"
    for walrus in walruses {
       if walrus.is_hungry() { feed() }
       else { dont_feed() }

magicalhippo 2 months ago

> What did that achieve?

An interface where the implementation can later be changed to do something more clever.

At work we have a lot of legacy code written the BAD way, ie the caller loops, which means we have to change dozens of call sites if we want to improve performance, rather than just one implementation.

This makes it significantly more difficult than it could have been.

Reply View 1 reply

lblume 2 months ago

Two counterpoints.
Firstly, in many cases the function needs to serve both purposes — called on a single item or called on a sequence of such. A function that always loops would have to be called on some unitary sequence or iterator which is both unergonomic and might have performance implications.
Second, the caller might have more information than the callee on how to optimize the loop. Consider a function that might be computationally expensive for some inputs while negligible for others — the caller, knowing this information, could choose to parallelize the former inputs while vectorizing etc. the latter (via use of inlining, etc.). This would be very hard or at least complicate things when the callee's responsibility.

Reply View | 0 replies

jerf 2 months ago

I think the "push for loops down" is missing a bit of detail about the why. The author alludes to "superior performance" but I don't think makes it clear how that can happen.

Vectorization is a bit obscure and a lot of coders aren't worried about whether their code vectorizes, but there's a much more common example that I have seen shred the performance of a lot of real-world code bases and HTTP APIs, which is functions (including APIs) that take only a single thing when they should take the full list.

Suppose we have posts in a database, like for a forum or something. Consider the difference between:

    posts = {}
    for id in postIDs:
        post[id] = fetchPost(id)

versus

    posts = fetchPosts(postIDs)

fetchPost and fetchPosts both involve hitting the database. The singular version means that the resulting SQL will, by necessity, only have the one ID in it, and as a result, a full query will be made per post. This is a problem because it's pretty likely here that fetching a post is a very fast (indexed) operation, so the per-query overhead is going to hit you hard.

The plural "fetchPosts", on the other hand, has all the information necessary to query the DB in one shot for all the posts, which is going to be much faster. An architecture based on fetching one post at a time is intrinsically less performant in this case.

This opens up even more in the HTTP API world, where a single query is generally of even higher overhead than a DB query. I think the most frequent mistake I see in HTTP API design (at least, ignoring quibbling about which method and error code scheme to use) is providing APIs that operate on one thing at a time when the problem domain naturally lends itself to operating on arrays (or map/objects/dicts) at a time. It's probably a non-trivial part of the reason why so many web sites and apps are so much slower than they need to be.

I find it is often easy to surprise other devs with how fast your system works. This is one of my "secrets" (please steal it!); you make sure you avoid as many "per-thing" penalties as possible by keeping sets of things together as long as possible. The "per-thing" penalties can really sneak up on you. Like nested for loops, they can easily start stacking up on you if you're not careful, as the inability to fetch all the posts at once further cascades in to you then, say, fetching user avatars one-by-one in some other loop, and then a series of other individual queries. Best part is, profiling may make it look like the problem is the DB because "the DB is taking a long time to serve this" because profiles are not always that good at turning up that your problem is per-item overhead rather than the amount of real work being done.

Reply View 1 reply

mnahkies 2 months ago

The worst / most amusing example of this I've seen in the wild was a third party line of business application that was sequentially triaging "pending tasks" to assign priority/to workers.
Our cloud provider had an aircon/overheating incident in the region we were using, and after it was resolved network latency between the database and application increased by a few milliseconds. Turns out if you multiply that by a few million/fast arrival rate you get a significant amount of time, and the pending tasks queue backs up causing the high priority tasks to be delayed.
Based on the traces we had it looked like a classic case of "ORM made it easy to do it this way, and it works fine until it doesn't" but was unfortunately out of our control being a third party product.
If they'd fetched/processed batches of tasks from the database instead I'm confident it wouldn't have been an issue.

Reply View | 0 replies