Push Ifs Up and Fors Down

Waterluvian 7 months ago

My weird mental model: You have a tree of possible states/program flow. Conditions prune the tree. Prune the tree as early as possible so that you have to do work on fewer branches.

Don’t meticulously evaluate and potentially prune every single branch, only to find you have to prune the whole limb anyways.

Or even weirder: conditionals are about figuring out what work doesn’t need to be done. Loops are the “work.”

Ultimately I want my functions to be about one thing: walking the program tree or doing work.

Reply View 22 replies

igregoryca 7 months ago

This aligns nicely with how things look in the "small-step" flavour of PL theory / lambda calculus.
In the lingo, expressions are evaluated by repeatedly getting "rewritten", according to rules called reduction rules. e.g., (1 + 2) + 4 might get rewritten to 3 + 4, which would then get rewritten to 7.
There are two sorts of these rules. There are "congruence" rules, which direct where work is to be done ("which subexpression to evaluate next?"); and then there are "computation" rules (as Pierce [1] calls them), which actually rewrite the expression, and thus change the program state.
"Strict"/"non-lazy" languages (virtually every popular general-purpose language? except Haskell) are full of congruence rules – all subexpressions must be fully evaluated before a parent expression can be evaluated. The important exceptions are special constructs like conditionals and indefinite loops.
For conditionals in particular, a computation rule will kick in before congruence rules direct all subexpressions to be evaluated. This prunes the expression tree, now in a very literal sense.
[1]: Benjamin C. Pierce, Types and Programming Languages (recommended!)

Reply View | 0 replies
BoorishBears 7 months ago

My mental model: align with the world the very specific code I'm writing lives in. From domain specifics, to existing patterns in the codebase, to the stage in the data pipeline I'm at, performance profile, etc.
I used to try and form these kinds of rules and heuristics for code constructs, but eventually accepted they're at the wrong level of abstraction to be worth keeping around once you write enough code.
It's telling they tend to resort to made up function names or single letters because at that point you're setting up a bit of a punching bag with an "island of code" where nothing exists outside of it, and almost any rule can make sense.
-
Perfect example is the "redundancies and dead conditions" mentioned: we're making the really convenient assumption that `g` is the only caller of `h` and will forever be the only caller of `h` in order to claim we exposed a dead branch using this rule...
That works on the island, but in an actual codebase there's typically a reason why `g` and `h` weren't collapsed into each other to start.

Reply View | 4 replies
- jonahx 7 months ago
  
  I feel this kind of critique, which I see often as a response to articles like this, is so easy as to be meaningless. How is one supposed to ever talk about general principles without using simplified examples?
  Aren't you just saying "Real code is more complicated than your toy example"?
  Well sure, trivially so. But that's by design.
  > Perfect example is the "redundancies and dead conditions" mentioned: we're making the really convenient assumption that `g` is the only caller of `h` and will forever be the only caller of `h` in order to claim we exposed a dead branch using this rule...
  Not really. He's just saying that when you push conditional logic "up" into one place, it's often more readable and sometimes you might notice things you otherwise wouldn't. And then he created the simplest possible example (but that's a good thing!) to demonstrate how that might work. It's not a claim that it always will work that way or that real code won't be more complicated.
  
  Reply View | 3 replies
  
  BoorishBears 7 months ago
  
  Well I guess some comments need to be considered in totality, rather contextomies that enforce whatever point you're trying to make :)
  I spelled out the problem pretty clearly.
  > I used to try and form these kinds of rules and heuristics for code constructs, but eventually accepted they're at the wrong level of abstraction to be worth keeping around once you write enough code.
  It's the wrong level of abstraction to form (useful) principles at, and the example chosen is just a symptom of that.
  I'm not sure why we're acting like I said the core problem with this article is that it uses simple examples.
  
  Reply View | 2 replies
0xWTF 7 months ago

Can I float an adjacent model? Classes are nouns, functions are verbs.

Reply View | 13 replies
- BobbyJo 7 months ago
  
  I like to think of it completely differently: Functions are where you hide things, Classes are where you expose things.
  Functions to me are more about scoping things down than about performing logic. The whole program is about performing logic.
  
  Reply View | 0 replies
- acbart 7 months ago
  
  And then at some point someone shows you how Classes can be verbs, and functions can be nouns, and your brain hurts for a while. You overuse that paradigm for a while, and eventually learn to find the appropriate balance of ideas.
  
  Reply View | 6 replies
  
  2muchcoffeeman 7 months ago
  
  Writing code is like writing though. None of these ideas for structuring code are the be all and end all of coding. Things evolve, sometimes old idea are good, sometimes new.
  Like how the phrase “to boldly go where no man has gone before” will bring out pendants.
  
  Reply View | 1 reply
  
  AStonesThrow 7 months ago
  
  I don't believe that anyone wears pendants much on that show, unless you mean the communicators people wear in TNG. I did have a Romulan keychain once, though.
  
  Reply View | 0 replies
  
  nailer 7 months ago
  
  Haven’t seen that yet after 25 years. It just always seems like lazy naming when this isn’t followed. Maybe I missed something.
  
  Reply View | 2 replies
  
  kiviuq 7 months ago
  
  Example: Object Algebra pattern represents data types ("nouns") as functions.
  
  Reply View | 0 replies
- pshc 7 months ago
  
  http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom...
  
  Reply View | 0 replies
- kjkjadksj 7 months ago
  
  Working with python for a while and I still don’t bother with classes. Only when I “borrow” other code do I mess with them. It just seems like a flowery way to organize functions. I prefer to just write the functions. Maybe its because my first languages lacked classes that I don’t much like to reach for them.
  I don’t even like loops and prefer to functionalize them and run in parallel if sensible.
  I know this makes me a bit of a python heathen but my code runs fast as a result.
  
  Reply View | 0 replies
- Waterluvian 7 months ago
  
  Didn’t the Apollo guidance computers work with VERB and NOUN?
  
  Reply View | 0 replies
- slipnslider 7 months ago
  
  I remember being taught that in CS101 and still use it today 15 years later. It's a good and simple and easy to follow pattern
  
  Reply View | 0 replies
- [removed] 7 months ago
  
  [deleted]
  
  Reply View | 0 replies
nagaiaida 7 months ago

it's not that weird, this taken to its logical conclusion is effectively prolog's execution model

Reply View | 0 replies
Brian_K_White 7 months ago

perfectly good models

Reply View | 0 replies

andyg_blog 7 months ago

A more general rule is to push ifs close to the source of input: https://gieseanw.wordpress.com/2024/06/24/dont-push-ifs-up-p...

It's really about finding the entry points into your program from the outside (including data you fetch from another service), and then massaging in such a way that you make as many guarantees as possible (preferably encoded into your types) by the time it reaches any core logic, especially the resource heavy parts.

Reply View 20 replies

sharno 7 months ago

That's almost the same thing as parse don't validate: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

Reply View | 0 replies
dataflow 7 months ago

Doesn't this obfuscate what assumptions you can make when trying to understand the core logic? You prefer to examine all the call chains everywhere?

Reply View | 18 replies
- fmbb 7 months ago
  
  The ”core logic” of a program is what output it yields for a given input.
  If you find a bug, you find it because you discover that a given input does not lead to the expected output.
  You have to find all those ifs in your code because one of them is wrong (probably in combination with a couple of others).
  If you push all your conditionals up as close to the input as possible, your hunt will be shorter, and fixing will be easier.
  
  Reply View | 0 replies
- avianlyric 7 months ago
  
  This is why we invented type systems. No need to examine call chains, just examine input types. The types will not only tell you what assumptions you can make, but the compiler will even tell you if you make an invalid assumption!
  
  Reply View | 13 replies
  
  dataflow 7 months ago
  
  You can't shove every single assumption into the type system...
  
  Reply View | 12 replies
- furyofantares 7 months ago
  
  The idea and examples are that the type system takes care of it. The rule of thumb is worded overly generally, it's more just about stuff like null checks if you have non-nullable types available.
  
  Reply View | 0 replies
- geysersam 7 months ago
  
  No I don't think so because if you make your assumptions early then the same assumptions exist in the entire program and that makes them easy to reason about
  
  Reply View | 0 replies
- setr 7 months ago
  
  If you’ve massaged and normalized the data at entry, then the assumptions at core logic should be well defined — it’s whatever the rules of the normalized output are.
  You don’t need to know all of the call chains because you’ve established a “narrow waist” where ideally all things have been made clear, and errors have been handled or scoped. So you only need to know the call chain from entry point to narrow waist, and separately narrow waist till end.
  
  Reply View | 0 replies

kazinator 7 months ago

> If there’s an if condition inside a function, consider if it could be moved to the caller instead

This idle conjecture is too rife with counterexamples.

- If the function is called from 37 places, should they all repeat the if statement?

- What if the function is getaddrinfo, or EnterCriticalSection; do we push an if out to the users of the API?

I think that we can only think about this transformation for internal functions which are called from at most two places, and only if the decision is out of their scope of concern.

Another idea is to make the function perform only the if statement, which calls two other helper functions.

If the caller needs to write a loop where the decision is to be hoisted out of the loop, the caller can use the lower-level "decoded-condition helpers". Callers which would only have a single if, not in or around a loop, can use the convenience function which hides the if. But we have to keep in mind that we are doing this for optimization. Optimization often conflicts with good program organization! Maybe it is not good design for the caller to know about the condition; we only opened it up so that we could hoist the condition outside of the caller's loop.

These dilemmas show up in OOP, where the "if" decision that is in the callee is the method dispatch: selecting which method is called.

Techniques to get method dispatch out of loops can also go against the grain of the design. There are some patterns for it.

E.g. wouldn't want to fill a canvas object with a raster image by looping over the image and calling canvas.putpixel(x, y, color). We'd have some method for blitting an image into a canvas (or a rectangular region thereof).

Reply View 26 replies

neoden 7 months ago

> If the function is called from 37 places, should they all repeat the if statement?
the idea here is probably that in this case we might be able to split our function into two implementing true and false branches and then call them from 21 and 16 places respectively

Reply View | 1 reply
- kazinator 7 months ago
  
  That's possible only if the condition is constant-foldable.
  You can achieve it by turning the if part into an inline function.
  Before:
  function(cond, arg) { if (cond) { true logic } else { false logic } }
  after:
  inline function(cond, arg) { cond ? function_true(arg) : function_false(arg) }
  Now you don't do anything to those 37 places. The function is inlined, and the conditional disappears due to cond being constant.
  
  Reply View | 0 replies
panstromek 7 months ago

The keyword here is `consider`. The article targets a somewhat specific design problem where this comes up, especially when you use tagged unions or something similar.

Reply View | 0 replies
PaulRobinson 7 months ago

If the function is called from 37 places, you need to refactor your code, but to answer your question on that point: it depends. DRY feels like the right answer, but I think we'd have to review an actual code example to decide.
On examples where you're talking about a library function, I think you have to accept that as a library you're in a special place: you're on an ownership boundary. Data is moving across domains. You're moving across bounded contexts, in DDD-speak. So, no, you look after your own stuff.
EnterCriticalSection suggests a code path where strong validation on entry - including if conditions - makes sense, and it should be thought of as a domain boundary.
But when you're writing an application and your regular application functions have if statements, you can safely push them out. And within a library or a critical code section you can move the `if` up into the edges of it safely, and not down in the dregs. Manage your domain, don't make demands of other people's and within that domain move your control flow to the edge. Seems a reasonable piece of advice.
However, as ever, idioms are only that, and need to be evaluated in the real world by people who know what they're doing and who can make sensible decisions about that context.

Reply View | 21 replies
- kenjackson 7 months ago
  
  Refactoring due to being called more than N times seems very function dependent. As the prior author noted, I’d expect to call a lock function in some programs a lot. Likewise, memcpy. In fact I’d argue that well factored functionality is often called at many different call sites.
  
  Reply View | 0 replies
- CJefferson 7 months ago
  
  I can't imagine a large program where no function is useful enough to be called more than 37 times. Memory allocation? Printing? Adding a member to a list? Writing to a file?
  I'm guessing you mean something else, or do you feel useful functions can't be called many times in the same program?
  
  Reply View | 0 replies
- jovial_cavalier 7 months ago
  
  Pray tell, how many places is appropriate to call the same function? Is 5 too many? How about 6? When I hit 7, I have to refactor everything, right?
  
  Reply View | 8 replies
  
  cakealert 7 months ago
  
  This only applies to a situation where you have a function that requires dynamic checks for preconditions. I would suggest that such a function (or how it's being used) is likely a blight already, but tolerable with very few call sites. In which case checking at the call site is the right move. And as you continue to abuse the function perhaps the code duplication will prompt you to reconsider what you are doing.
  
  Reply View | 6 replies
  
  tylersmith 7 months ago
  
  You don't need an explicit rule, you just need to be smarter than than the average mid-curve tries-too-hard-to-feel-right hn poster and realize when you're repeating a calling convention too much.
  
  Reply View | 0 replies
- worik 7 months ago
  
  > If the function is called from 37 places, you need to refactor your code,
  Really?
  I do not have to think hard before I have a counter exampl: authentication
  I call authenticate() is some form from every API
  All 37 of them
  
  Reply View | 9 replies
  
  bognition 7 months ago
  
  If you are explicitly calling authenticate() for each api, you’re doing it “wrong”. At that point you want implied authentication not explicit authentication. Why not move it to some middleware that gets called in every api call?
  
  Reply View | 7 replies
  
  kazinator 7 months ago
  
  The strongest interpretation of the remark is not that you need to refactor because you have a function called 37 times (which is likely a good thing) but rather that if you think you need to move an if statement into or out of it, you face refactoring.
  
  Reply View | 0 replies
[removed] 7 months ago

[deleted]

Reply View | 0 replies

layer8 7 months ago

The example listed as “dissolving enum refactor” is essentially polymorphism, i.e. you could replace the match by a polymorphic method invocation on the enum. Its purpose is to decouple the point where a case distinction is established (the initial if) from the point where it is acted upon (the invocation of foo/bar). The case distinction is carried by the object (enum value in this case) or closure and need not to be reiterated at the point of invocation (if the match were replaced by polymorphic dispatch). That means that if the case distinction changes, only the point where it is established needs to be changed, not the points where the distinct actions based on it are triggered.

This is a trade-off: It can be beneficial to see the individual cases to be considered at the points where the actions are triggered, at the cost of having an additional code-level dependency on the list of individual cases.