Comment by SigmundA

Comment by SigmundA 2 days ago

8 replies

>I am not quite following, why would we drop multi-processing with isolated tiny heaps with a few KBs each and move to multi-threading with a large multi-megabyte stacks per thread and all sharing and writing to the same heap? It seems like a step backward.

Performance typically, hard to work in parallel on large amount of data performantly without multiple thread sharing a heap and you typically don't need large amount of threads because you don't actually have that many real cores to run them on.

Lots of little share nothing process are great conceptually but that does create significant overhead.

rdtsc 2 days ago

> Lots of little share nothing process are great conceptually but that does create significant overhead.

It doesn't, really? I have clusters running 1M+ Erlang processes comfortably per node.

> you typically don't need large amount of threads

Exactly, that's why Erlang only spawns just the right amount of threads. One scheduler thread per CPU, then a bunch of long running CPU task threads (same number as CPUs as well), plus some to do IO (10-20) and that's it.

  • SigmundA 2 days ago

    Erlang is not as performant for heavy computational loads, this is bared out in many benchmarks, thats not what it's good at.

    Message passing share nothing adds overhead when trying to reference data between processes because it must be copied, how would you do multithreaded processing of a large amount of data without a shared heap in a performant way? Only one thread can work on a heap at a time, so what do you do? Chop up the data and copy it around then piece it back together afterward? Thats overhead vs just working on a single heap in a lockless way. Far as I can tell the main Erlang image processing libraries just call out to C libraries that says something of that kind of work.

    Yes Erlang indirects computation to a OS thread pool, multiplexing all those little Erlang process on real threads creates scheduling overhead. Those threads cannot work on the same data at the same time unless they call out to a libraries written in another language like C to do the heavy lifting.

    .Net does similar things for say web server implementations, it uses a thread pool to execute many concurrent requests and if you use async it can yield those threads back to the pool while say waiting on a DB call to complete, you would not create a thread per http connection so the 4mb stack size is not an issue just like its not with Erlangs thread pool.

    • rdtsc 2 days ago

      > Erlang is not as performant for heavy computational loads

      Well, sure don't use it for heavy computational loads if it doesn't work for you. It works great for our use case and I think Node-RED is also not for heavy computational workload, but rather for event programming, where lots of events happen concurrently.

      > how would you do multithreaded processing of a large amount of data without a shared heap in a performant way?

      That's a specific workload, it's not universal, for sure. I haven't worked on cases there is a single large data structure and all million clients have to manipulate it concurrently. Not saying it's implausible, I can see a game server with an arena maybe being like that, but there are other systems that work like that.

      • Towaway69 2 days ago

        > I think Node-RED is also not for heavy computational workload

        That is correct, Node-RED is designed for routing many thoughts of smallish data packets through some computational units. Each computational unit is basically a node with a message queue and each message is handled and then passed on.

        This means that if a node takes too long, its queue will potentially overflow causing failure. However if there are fewer messages than nodes can take longer - there is nothing ingrained in Node-RED that prevents having long running processes and fewer messages.

        As long as message queues don't overflow or messages get too large causing massive memory usage, then Node-RED will happily just work.

        NodeREDs many use cases are controlling many small devices (home automation) and (I)IoT - collecting datapoints from 100s or 1000s of devices and routing that around to whereever the data is needed.

      • SigmundA a day ago

        >Well, sure don't use it for heavy computational loads if it doesn't work for you.

        >why would we drop multi-processing with isolated tiny heaps with a few KBs each and move to multi-threading with a large multi-megabyte stacks per thread and all sharing and writing to the same heap?

        This is the question I was answering and I gave you an example then you proceeded to say there is no overhead Erlangs process model, which now you admit exists.

        >That's a specific workload, it's not universal, for sure. I haven't worked on cases there is a single large data structure and all million clients have to manipulate it concurrently.

        Thats not the workload, simple as a single client doing heavy processing of large image or video, multithreaded shared heap wins due the the overhead of Erlang's processing model, thats why you give it up which is answering your original question again.

        Geez this is tough crowd, first Erlang has no overhead, then its well that nots whats its good at, which I said. Then its well those are specific workloads, yeah serving little messages to a million clients is a specific workload too. I like Erlang, it's very good at specific things but it sacrifices quite a bit of performance to do that like many good abstractions. That can help overall reliability and performance on certain workloads but other modern languages following the shared heap multi-threading model will probably outperform it if care is taken in the design because they more closely match the underlying OS and hardware execution model of today with less indirection and overhead and they can also do other workloads that Erlang will never be good at with its share nothing green thread/process design.

    • marci 2 days ago

      Nobody choses erlang for heavy computational loads, but for ease of use and a simple mental model. You do heavy compute on it mostly if you don't want to be bothered. Erlang's multi-process thing, it's not to be efficient at compute, it's to be efficient at crash recovery and distributed compute. It was there before multi-core multithread were a thing.