Comment by diath

Comment by diath 17 hours ago

23 replies

> Hopefully, this post helps illustrate the unreasonable effectiveness of SQLite as well as the challenges you can run in with Amdahl's law and network databases like postgres.

No, it does not. This article first says that normally you would run an application and the database on separate servers and then starts measuring the performance of a locally embedded database. If you have to keep the initial requirement for your software, then SQLite is completely out of equation. If you can change the requirement, then you can achieve similar performance by tuning the local PGSQL instance -- and then it also becomes a valuation of features and not just raw throughput. I'm not saying SQLite is not an option either, but this article seems confusing in that it compares two different problems/solutions.

sethev 16 hours ago

Right - but SQLite handily beats the case where postgres is on the same box as well. And it's completely reasonable to test technology in the configuration in which it would actually run.

As an industry, we seem to have settled on patterns that actually are quite inefficient. There's no problem that requires the solution of doing things inefficiently just because someone said databases should run on a different host.

  • stickfigure 15 hours ago

    If you're going to run on more than one piece of hardware, something is going to be remote to your single writer database.

    As an industry, we've generally decided against "one big box", for reasons that aren't necessarily performance related.

    • janandonly 4 hours ago

      I sometimes dream of a local-first world in which all software works with local DB and only writes to the cloud as an afterthought, maybe as a backup or a way to pick up work on another machine. It just boggles my mind that more software nowadays relies on an always on internet connection for no good reason other then the design itself.

      • andersmurphy 3 hours ago

        I think people's reaction to cloud vendors is to go local first. But, there's a middle ground VPS, rented server, even self hosting.

        My problem with local first is it's fine for solo apps with the occasional sync. But doesn't work for medium to large datasets and the stuff I work in is generally real-time and collaborative. To me multiplayer is one of the strengths of the web.

  • [removed] 16 hours ago
    [deleted]
BugsJustFindMe 16 hours ago

> If you have to keep the initial requirement for your software, then SQLite is completely out of equation.

No it isn't? You can run a thin sqlite wrapping process on another server just fine. Ultimately all any DB service is, PostgreSQL included, is a request handler and a storage handler. SQLite is just a storage handler, but you can easily put it behind a request handler too.

Putting access to sqlite behind a serial request queue used to be the standard way of implementing multi-threaded writes. That's only spitting distance away from also putting it behind TCP.

  • gunnarmorling 14 hours ago

    You could do that, but you'd run into exactly the same bottleneck the author describes with a remote Postgres instance. The workload exposes high contention on hot rows. If transactions are kept open for several milliseconds due to this being a remote network call between client and DB server, throughput will be equally limited also when using SQLite.

    • andersmurphy 5 hours ago

      Yeah this is a very good point. Any sort of network drive will have similar issues with SQLite. You're very much wanting attached NVME.

  • Barathkanna 15 hours ago

    Exactly. People forget that “SQLite can’t do X” often really means “SQLite doesn’t ship with X built in.” If you wrap it with a lightweight request handler or a queue, you essentially recreate the same pattern every other DB uses. The fact that PostgreSQL bundles its own coordinator doesn’t make SQLite fundamentally incapable. It just means you choose whether you want that layer integrated or external.

  • chasil 14 hours ago

    As long as WAL mode is not enabled, connections over NFS/SMB or other file sharing protocols will work.

    I'm not saying that this is a good idea, and it could fail in a spectacular manner, but it can be done. DML over this is just asking for trouble.

  • formerly_proven 16 hours ago

    Well that's just dqlite/rqlite.

    • benjiro 13 hours ago

      > Well that's just dqlite.

      Far from it, as now your not just dealing with network but also with raft consensus... So each write is not just a network trip, its also 2x acknowledging. And your reads go over the leader, what can mean if somebody accessed node 1 app but node 2 is the leader, well, ...

      Its slower on reads and writes, then just replications that PostgreSQL does. And i do not mean async but even sync PostgreSQL will be faster.

      The reason dqlite exists is because canonical needed something to synchronize their virtualization cluster (lxd), and they needed a db with raft consensus, that is a lib (as not a full blown server install like postgres). Performance was not the focus and its usage is totally different then most people needs here.

    • BugsJustFindMe 16 hours ago

      Dqlite and Rqlite are primarily for buildling fault-tolerant clusters. But if you just take the network access part, then ok sure, but also so what?

      • otoolep 16 hours ago

        rqlite[1] creator here.

        Nit: dqlite is a library, it is not a network-exposed database like rqlite is. Sure, it requires connecting to other nodes over the network, but local access is via in-process. In contrast one connects with rqlite over the network - HTTP specifically.

        [1] https://rqlite.io

andersmurphy 17 hours ago

Paradoxically, raw throughput matters a lot more if you are going to scale on a single box. SQLite is 10x PG on a single box in this example. Considering databases tend to be the bottle neck that can take you an order of magnitude further. PG on the same server will also be slower the more complex the transaction as unix sockets are still going to be considerably slower than a function call.

The other thing to point out is in this article is that the PG network example CANNOT scale horizontally due to the power law. You can throw a super cluster at the problem and still fundamentally do around 1000 TPS.

  • prisenco 16 hours ago

    Also important is just how fast cheap hardware has gotten which means vertical scaling is extremely effective. People could get a lot farther with sqlite in wal mode on a single box with an nvme drive than they imagine. Feels like our intuition has not caught up with the material reality of current hardware.

    And now that there are solid streaming backup systems, the only real issue is redundancy not scaling.

  • barfoure 16 hours ago

    > Paradoxically, raw throughput matters a lot more if you are going to scale on a single box.

    There’s absolutely nothing paradoxical about any of this.

embedding-shape 17 hours ago

> If you have to keep the initial requirement for your software, then SQLite is completely out of equation.

It'd be a very short article if so, don't you think? Full article would be something like: "Normally you'd have a remote connection to the database, and since we're supposed to test SQLite's performance, and SQLite is embedded, it doesn't compare. Fin"

  • stonemetal12 17 hours ago

    The table of data at the end of the article has 7 lines, only one has data for both DBs. What was the point of setting up the comparison if there is no comparison made?

    • andersmurphy 17 hours ago

      Because it shows that a network RDBS database cannot get you out of this predicament.

fulafel 17 hours ago

What is says first is: "SQLite is for phones and mobile apps (and the occasional airliner)! For web servers use a proper database like Postgres!"

Though I'd say it's for a broader set of applications than that (embedded apps, desktop apps, low-concurrency server apps etc).

Phones and mobile apps installations of course outnumber web app deployments, and it doesn't say what you paraphrased about servers.