Comment by that_guy_iain

Comment by that_guy_iain 7 days ago

52 replies

This looks very interesting!

My suggestion for the self-hosting is to create docker images and use docker-compose. The self-hosting currently is a bit of effort to setup.

I also wonder if PHP is a good language for this. For the UI, yea that's fine and makes sense. But for the log processor that's going to need to handle a high throughput which PHP just isn't good at. For the same resources, you can have Go doing thousands of requests per second vs PHP doing hundreds of requests per second.

hipadev23 7 days ago

> PHP doing hundreds of requests per second.

You may want to update your understanding of PHP and Go's speed . Both of your estimates are off by a couple orders of magnitude on commodity hardware. There are also numerous ways to make PHP extremely fast today (e.g. swoole, ngx_php, or frankenphp) instead of the 1999 best practice of apache with mod_php.

Go is absolutely an excellent choice, but your opinion on PHP is quite dated. Here are benchmarks for numerous Go (green) and PHP (blue) web frameworks: https://www.techempower.com/benchmarks/#hw=ph&test=fortune&s...

  • kgeist 7 days ago

    Sure, PHP can process logs of any volume, but it would require 5–10 times more servers to handle the same workload as something like Go. Not to say Go just works out of the box while for PHP you must set up all those additional daemons you listed and make sure they work -- more machinery to maintain, and usually they have quite a lot of footguns, too. Like, recently our website went down with just 60 RPS because of a bad interaction between PHP-FPM (and its max worker count settings) and Symfony's session file locks. For Go on a similar machine 60 RPS is nothing, but PHP can already barely process it, unless you're a guru of process manager settings.

    In a different PHP project, we have a bunch of background jobs which process large amounts of data, and they routinely go OOM because PHP stores data in a very inefficient way compared to Go. In Go, it's trivial to load hundreds of thousands objects into memory to quickly process them, but PHP already starts falling apart before we hit 100k. So we have to have smaller batches (= make more API calls), and the processing itself is much slower as well. And you can't easily parallelize without lots of complex tricks or additional daemons (which you need to set up and maintain). It's just more effort, more waste of time and more RAM/CPU for no particular gain.

    • Implicated 7 days ago

      > In Go, it's trivial to load tens of objects into memory to quickly process them, but PHP already starts falling apart before we hit 100k.

      I'm not going to argue that PHP is _better_ than Go. Just starting off with that.

      But if your background jobs are going OOM when processing large amounts of data it's likely that there's better ways to do what you're trying to do. It is true that it's easy to be lazy with memory/resources with PHP due to the assumption that it'll be used in a throwaway fashion (serve request -> die -> serve request -> die) - but it's also perfectly capable of long-running/daemonized processes that aren't memory issues rather trivially.

    • Implicated 7 days ago

      This isn't a PHP problem, this is a configuration problem. You shouldn't be using the filesystem to handle your sessions in a production application.

      • kgeist 7 days ago

        Anything that unexpectedly blocks a process can bring down your entire PHP server because you will run out of worker processes. For example, imagine you experience a spike in requests while another server you're trying to call is timing out. You can't set the maximum worker count to a very high value because the operating system has an upper limit. Since the limit must remain low enough, you can quickly run out of your worker processes.

        In contrast, Go can efficiently manage thousands of such blocked goroutines without issue. Sure, you can address this problem in PHP, but you need:

        - understand PHP-FPM (or whatever you use) configs and their footguns

        - understand NGINX configs and their footguns

        - fiddle with PHP configs/optimizing your code to fit within PHP's maximum limits

        - rent larger servers to have the same throughput

    • [removed] 7 days ago
      [deleted]
  • that_guy_iain 7 days ago

    What you're talking about is generally not considered production-ready. While you can use these tools you will almost certainly run into problems. I know this because as an active PHP developer for over a decade I'm very much paying attention to that field of PHP.

    What we see here is a classic case of benchmarks saying one thing when the reality of production code says something else.

    Also, I used go as a generic example of compiled languages. But what we see is production-grade Go languages outperforming non-production-ready experimental PHP tooling.

    And if we go to look at all of them https://www.techempower.com/benchmarks/#hw=ph&test=fortune&s...

    We'll see that even the experimental PHP solution is 43 and being beat out by compiled languages.

    • Implicated 7 days ago

      > ... you can have Go doing thousands of requests per second vs PHP doing hundreds of requests per second.

      > I know this because as an active PHP developer for over a decade I'm very much paying attention to that field of PHP.

      <insert swaggyp meme here>

      As an active PHP developer as well it sounds like you have no idea what you're talking about.

      > While you can use these tools you will almost certainly run into problems.

      Which tools are "generally not considered production-ready"? From what I'm seeing on the linked list of benchmarks...

      - vanilla php - workerman - ubiquity - webman - swoole

      I'd venture to bet all of these are battle tested and production ready - years ago now.

      As someone who has built a handful of services that ingest data in high volume through long-running PHP processes... it's stupidly easy and bulletproof. Might not be as fast as go, but to say these libraries or tech isn't production-ready is rather naive.

      • that_guy_iain 4 days ago

        Having read your post:

        * Vanilla PHP can't read anywhere near the same RPS as the others

        * Using those results in removing the ability to use a large amount of the ecosystem. While if you used the correct language you would be able to use it's entire ecosystem.

        * In my opinion, if you're using workerman or Swoole you've already realised the limitations of PHP and should be using another language.

        This seems like a classic case of "if all you have is a hammer everything looks like nail"

        > Might not be as fast as go, but to say these libraries or tech isn't production-ready is rather naive.

        This strawman argument. Firstly, you admit my original point. Secondly, those aren't the tech in question and I notice you left off the tech in question. Roadrunner, FrankenPHP, etc. All the tooling that can make your average PHP app go faster.

    • hipadev23 7 days ago

      Nobody is suggesting PHP beats compiled. We’re arguing with you about your utter lack of expertise in the language, knowledge of the ecosystem and “production-ready” status of the many options, and your overall coding ability when it comes to PHP.

      • that_guy_iain 6 days ago

        > Nobody is suggesting PHP beats compiled.

        Actually, there seems to be people arguing that.

        > We’re arguing with you about your utter lack of expertise in the language, knowledge of the ecosystem and “production-ready” status of the many options, and your overall coding ability when it comes to PHP.

        If you're doing that with benchmarks you're doing a shitty job. My numbers came from experience in production environments with production workloads.

        Not to mention that you're talking experimental tooling as examples. I've literally seen multiple companies try to use FrankenPHP. Not one even made it to QA aka it broke because during the dev testing.

        • hipadev23 6 days ago

          Again, you don't have the slightest clue what you're talking about. There are numerous production-ready choices that myself and others have mentioned.

  • p_ing 7 days ago

    As soon as you add C#, ASP.NET Core shoots to the top of the Fortune stack.

    • [removed] 7 days ago
      [deleted]
ryanianian 7 days ago

PHP trivially scales up to multiple nodes behind an LB. You're really only limited by your backend storage connection count and throughput.

Go and friends may make for more efficient resource utilization, but it will be marginal in the grand scheme of things unless there are plans to do massively different things.

As it is this code is very simple. I haven't used PHP in 15 years and I was able to trace through this from front-end to back-end in less than 3 minutes.

To me it look like a really great level of complexity for the problem it solves.

Keep it up, OP.

  • that_guy_iain 6 days ago

    You can but that costs more money...

    > Keep it up, OP.

    Live in the real world. No one wants to have a fleet of servers for their logging infra when there are options to run it on a single server.

williebeek 7 days ago

Thanks for the tip, I will check if inserting rows with Go is any faster. For reference, inserting a log takes three steps, first the log data is stored in a Redis Stream (memory), a number of logs are taken from the stream and saved to disk and finally inserted in batches in ClickHouse. I've created it so you can take the ClickHouse server offline without losing any data (it will be inserted later).

For reference, moving about 4k logs from memory to disk takes less than 0.1 second. This is a real log from one of the webservers:

Start new cron loop: 2024-12-18 08:11:16.397...stored 3818 rows in /var/www/txtlog/txtlog/tmp/txtlog.rows.2024-12-18_081116397_ES2gnY3fVc (0.0652 seconds).

Storing this data in ClickHouse takes a bit more than 0.1 second:

Start new cron loop: 2024-12-18 08:11:17.124...parsing file /var/www/txtlog/txtlog/tmp/txtlog.rows.2024-12-18_081116397_ES2gnY3fVc

* Inserting 3818 row(s) on database server 1...0.137 seconds (approx. 3021.15 KB).

* Removed /var/www/txtlog/txtlog/tmp/txtlog.rows.2024-12-18_081116397_ES2gnY3fVc

As for Docker, I'm too much of a Docker noob but I appreciate the suggestion.

herbst 7 days ago

On the other side some people (me) are happy to have an actual self hosting setup and not being forced to use a docker setup with unknown overhead.

  • xinu2020 7 days ago

    Why not both? It's not much trouble to publish a Dockerfile while still documenting a normal installation.

    • herbst 7 days ago

      It's not, but more often than not it's just a dockerfile

majkinetor 7 days ago

It uses Clickhouse, though, which should be xtremelly fast for this.

  • that_guy_iain 7 days ago

    Yes. But PHP still needs to process it before it goes to Clickhouse. PHP is the bottleneck.

    • axelthegerman 7 days ago

      If that "bottleneck" is thousands of requests per second then it doesn't really matter for smaller deployments does it? (Which seems to be the target audience and not FAANG)

      I'm not a big fan when folks call out languages as bottlenecks when they have no proof on the actual overhead and how much faster it would be in another language.

      • that_guy_iain 7 days ago

        To tweak a PHP deployment to handle hundreds of requests per second which is very very realistic for a basic logging for a mid-sized application you're looking at having a very beefy server setup.

        Most PHP deployments barely reach a hundred per server.

        And this is an open source project is should be designed to handle basic production workloads which it could but it'll cost you a bunch more than if you used the correct languages.

        > I'm not a big fan when folks call out languages as bottlenecks when they have no proof on the actual overhead and how much faster it would be in another language.

        Honestly, I thought it was so obvious that an interpreted language is not good for high throughput endpoints that it didn't need to be proven. I also thought it was obvious that a logging system is going to handle lots and lots of data.

        It could be easily proven by doing a bunch of work but obviously there is no point in me proving it.

    • robocat 7 days ago

      Are you sure PHP is the bottleneck?

      The author writes that Clickhouse takes 0.1s for an example request: https://news.ycombinator.com/item?id=42666703

      PHP would need to be adding 0.1s CPU time for processing the request for the PHP code to become the bottleneck. That seems unlikely.

      • thunky 7 days ago

        That 0.1s is to write 4k rows to clickhouse, not per (log write) request.

withinboredom 7 days ago

PHP is arguably the best solution here. If a log ingestion process breaks everything, no other logs are harmed (a default shared-nothing architecture). Using something like Go, C#, etc, it might be "faster" but less resilient -- or more complex to handle the resiliency.

> But for the log processor that's going to need to handle a high throughput which PHP just isn't good at.

I'm sorry, but wut? PHP is probably one of the fastest languages out there if you can ignore frameworks. It's backed by some of the most tuned C code out there and should be just about as fast as C for most tasks. The only reason it is not is due to the function call overhead -- which is by-far the slowest aspect of PHP.

> you can have Go doing thousands of requests per second vs PHP doing hundreds of requests per second.

This is mostly due to nginx and friends ... There is frankenphp (a frontend for php running in caddy which is written in go) which can easily handle 80k+ requests per second.

  • that_guy_iain 7 days ago

    I'm going to have to also reply with, sorry but what?!

    PHP is one of the fastest-interpreted languages. But compiled are going to be faster than interpreted pretty much everytime. It loses benchmarks against every language. That's not to mention it's slowed down by the fact it have to rebuild everything per request.

    As a PHP developer for 15+ years, I can tell you what PHP is good at and what PHP is not good at. High throughput API endpoints such as log ingestion are not a good fit for PHP.

    Your argument that if it breaks it's fine. Yea, who wants a log system that will only log some of your logs? No one. It's not mission critical but it's pretty important to keep working if you want to keep your system working. And in fact, some places it is a legal requirement.

    • hipadev23 7 days ago

      How have you worked with PHP for 15 years and have absolutely no idea how it works or even baseline performance metrics.

      • Implicated 7 days ago

        He's not being truthful. Literally, there's no way that what he's saying is true. Either about various aspects of modern PHP or his experience with it.

    • withinboredom 7 days ago

      > It loses benchmarks against every language.

      Every language loses benchmarks against every other language. That's not surprising. Since you didn't provide a specific benchmark, it's hard to say why it lost.

      > High throughput API endpoints such as log ingestion are not a good fit for PHP.

      I disagree; but ultimately, it depends on how you're doing it. You can beat or exceed compiled languages in some cases. PHP allows some low-level stuff directly implemented in C and also the high-level stuff you're used to in interpreted languages.

majkinetor 7 days ago

No benefit using go over C#, IMO, and I am also baffled by the switch