Show HN: Pyper – Concurrent Python Made Simple
(github.com)149 points by pyper-dev 6 days ago
Hello and happy new year!
We're excited to introduce the Pyper package for concurrency & parallelism in Python. Pyper is a flexible framework for concurrent / parallel data processing, following the functional paradigm.
Source code can be found on [github](https://github.com/pyper-dev/pyper)
Key features:
Intuitive API: Easy to learn, easy to think about. Implements clean abstractions to seamlessly unify threaded, multiprocessed, and asynchronous work.
Functional Paradigm: Python functions are the building blocks of data pipelines. Let's you write clean, reusable code naturally.
Safety: Hides the heavy lifting of underlying task execution and resource clean-up. No more worrying about race conditions, memory leaks, or thread-level error handling.
Efficiency: Designed from the ground up for lazy execution, using queues, workers, and generators.
Pure Python: Lightweight, with zero sub-dependencies.
We'd love to hear any feedback on this project!
Nice work! There is a gap when it comes to writing single-machine, concurrent CPU-bound python code. Ray is too big, pykka is threads only, builtins are poorly abstracted. The syntax is also very nice!
But I'm not sure I can use this even though I have a specific use-case that feels like it would work well (high-performance pure Python downloading from cloud object storage). The examples are a bit too simple and I don't understand how I can do more complicated things.
I chunk up my work, run it in parallel and then I need to do a fan-in step to reduce my chunks - how do you do that in Pyper?
Can the processes have state? Pure functions are nice, but if I'm reaching for multiprocess, I need performance and if I need performance, I'll often want a cache of some sort (I don't want to pickle and re-instantiate a cloud client every time I download some bytes for instance).
How do exceptions work? Observability? Logs/prints?
Then there's stuff that is probably asking too much from this project, but I get it if I write my own python pipeline so it matters to me - rate limiting WIP, cancellation, progress bars.
But if some of these problems are/were solved and it offers an easy way to use multiprocessing in python, I would probably use it!