Comment by xg15

Comment by xg15 a day ago

1 reply

Doesn't the second rule already imply some counterexamples for the first?

When I work with batches of data, I often end up with functions like this:

  function process_batch(batch) {
    stuff = setUpNeededHelpers(batch);
    results = [];
    for (item in batch) {
      result = process_item(item, stuff);
      results.add(result);
    }
    return results;
  }
Where "stuff" might be various objects, such as counters, lists or dictionaries to track aggregated state, opened IO connections, etc etc.

So the setUpNeededHelpers() section, while not extremely expensive, can have nontrivial cost.

I usually add a clause like

  if (batch.length == 0) {
    return [];
  }
at the beginning of the function to avoid this initialization cost if the batch is empty anyway.

Also, sometimes the initialization requires to access one element from the batch, e.g. to read metadata. Therefore the check also ensures there is at least one element available.

Wouldn't this violate the rule?

Jtsummers a day ago

> Wouldn't this violate the rule?

The article is offering a heuristic, not a hard rule (rule of thumb = heuristic, not dogma). It can't be applied universally without considering your circumstances.

Following his advice to the letter (and ignoring his hedging where he says "consider if"), you'd move the `if (batch.length == 0)` into the callers of `setUpNeededHelpers`. But now you have to make every caller aware that calling the function could be expensive even if there are no contents in `batch` so they have to include the guard, which means you have this scattered throughout your code:

  if (batch.length == 0) { return default }
  setup(batch)
Now it's a pair of things that always go together, which makes more sense to put into one function so you'd probably push it back down.

The advice really is contingent on the surrounding context (non-exhaustive):

1. Is the function with the condition called in only one place? Consider moving it up.

2. Is the function with the condition called in many places and the condition can't be removed (it's not known to be called safely)? Leave the condition in the function.

3. Is the function with the condition called only in places where the guard is redundant? In your example, `batch.length == 0` can be checked in `process_batch`. If all calls to `setup` are in similar functions, you can remove the condition from `setup` and move it up.

4. If it's causing performance concerns (measured), and in many but not all cases the check is unneeded then remove the guard from `setup` and add it back to only those call-sites where it cannot be safely removed. If this doesn't get you any performance improvements, you probably want to move it back down for legibility.

Basically, apply your judgment. But if you can, it's probably (but not always) a good idea to move the ifs up.