Comment by asa400

Comment by asa400 2 days ago

34 replies

In SQLite, transactions by default start in “deferred” mode. This means they do not take a write lock until they attempt to perform a write.

You get SQLITE_BUSY when transaction #1 starts in read mode, transaction #2 starts in write mode, and then transaction #1 attempts to upgrade from read to write mode while transaction #2 still holds the write lock.

The fix is to set a busy_timeout and to begin any transaction that does a write (any write, even if it is not the first operation in the transaction) in “immediate” mode rather than “deferred” mode.

https://zeroclarkthirty.com/2024-10-19-sqlite-database-is-lo...

simonw 2 days ago

Yeah I read the OP and my first instinct was that this is SQLITE_BUSY. I've been collecting posts about that here: https://simonwillison.net/tags/sqlite-busy/

  • gwking 2 days ago

    One tidbit that I don't see mentioned here yet is that ATTACH requires a lock. I just went looking for the documentation about this and couldn't find it, especially for WAL mode (https://www.sqlite.org/lockingv3.html mentions the super-journal, but the WAL docs do not mention ATTACH at all).

    I have a python web app that creates a DB connection per request (not ideal I know) and immediately attaches 3 auxiliary DBs. This is a low traffic site but we have a serious reliability problem when load increases: the ATTACH calls occasionally fail with "database is locked". I don't know if this is because the ATTACH fails immediately without respecting the normal 5 second database timeout or what. To be honest I haven't implemented connection pooling yet because I want to understand what exactly causes this problem.

    • sgbeal 18 hours ago

      > I have a python web app that creates a DB connection per request (not ideal I know)

      FWIW, "one per request per connection is bad" (for SQLite) is FUD, plain and simple. SQLite's own forum software creates one connection per request (it creates a whole forked process per request, for that matter) and we do not have any problems whatsoever with that approach.

      Connection pools (with SQLite) are a solution looking for a problem, not a solution to a real problem.

  • apitman a day ago

    The necessity of this sort of tribal knowledge kills a lot of the simplicity of sqlite for me. Honestly it seems to have a lot of footguns. I've tried to understand proper concurrent use of sqlite with Golang about 5 times and never come away feeling like I actually get it.

    • nasretdinov a day ago

      With Go it's quite straightforward actually: use WAL mode + two connection pools, one for reads and the other, with MaxConnections set to 1, for writes. This way you should never encounter any concurrency issues, and Go will serialise writes for you too

      • johannes1234321 21 hours ago

        I have no experience in Go, but won't that be a full database lock and prevent transactions with more than one operation?

        • nasretdinov 6 hours ago

          Setting MaxOpenConns to 1 essentially limits the number of concurrently running (write) transactions to 1, which is exactly what we want. Whenever a concurrent thread wants to open a new transaction it'll have to wait.

          Note that the application needs to be aware of that there are two pools — one for write operations and one for reads (the latter with no or high connection limit). The separation can be ensured on SQLite level too by adding ?_query_only=1 to connection parameters or setting the respective pragmas in the read-only pool.

  • BinaryIgor a day ago

    Exactly - it's also strange that they didn't find a simple solution of setting PRAGMA busy_timeout=N; if you google/search SQLITE_BUSY: database is locked there are plenty of solid results describing the problem and solution

summarity 2 days ago

I've always tried to avoid situations that could lead to SQLITE_BUSY. SQLITE_BUSY is an architecture smell. For standard SQLite in WAL, I usually structure an app with a read "connection" pool, and a single-entry write connection pool. Making the application aware of who _actually_ holds the write lock gives you the ability to proactively design access patterns, not try to react in the moment, and to get observability into lock contention, etc.

  • simonw 2 days ago

    Even with that pattern (which I use too) you still need to ensure those write operations always start a transaction at the beginning in order to avoid SQLITE_BUSY.

    • summarity 2 days ago

      Yes, indeed. In my apps, which are mostly Nim, my pool manager ensures this always happens - along with a host of other optimizations. I often start with barebones SQLite and then later switch to LiteSync (distributed SQLite with multi-master replication), so I keep the lock management at the app level to adapt to whatever backend I'm using.

      • probst 2 days ago

        I am really curious about LiteSync. Any chance you could share a bit on your experiences with it (recognising it’s somewhat off-topic…). Do you run with multiple primaries? What sort of use cases do you reach to it for? Conflict resolution seems a bit simplistic at first glance (from the perspective of someone very into CRDTs), have you experienced any issues as a result of that?

  • andersmurphy a day ago

    Yes! This is the way.

    Honestly, its the key to getting the most out of sqlite. It also allows for transaction batching and various other forms if batching that can massively improve write throughput.

  • mickeyp 2 days ago

    I mean, you're not wrong, and that is one way to solve it, but the whole point of a sensibly-designed WAL -- never mind database engine -- is that you do not need to commit to some sort of actor model to get your db to serialise writes.

    • sethev 2 days ago

      These are performance optimizations. SQLite does serialize writes. Avoiding concurrent writes to begin with just avoids some overhead on locking.

      • mickeyp 2 days ago

        "performance optimisation" --- yeees, well, if you don't care about data integrity between your reads and writes. Who knows when those writes you scheduled really get written. And what of rollbacks due to constraint violations? There's we co-locate transactions with code: they are intertwined. But yes, a queue-writer is fine for a wide range of tasks, but not everything.

        It's that we need to contort our software to make sqlite not suck at writes that is the problem.

      • ncruces 2 days ago

        SQLite, for the most part, uses polling locks. That means it checks if a lock is available to be taken, and if it's not, it sleeps for a bit, then checks again, until this times out.

        This becomes increasingly inefficient as contention increases, as you can easily get into a situation where everyone is sleeping, waiting for others, for a few milliseconds.

        Ensuring all, or most, writes are serialized, improves this.

BinaryIgor a day ago

Also worth mentioning - it happens more often when you set journal_mode=WAL, which is not a default.

The default is DELETE mode, where the rollback journal is deleted at the conclusion of each transaction. What's more - in this mode (not-WAL), readers can coexist, but they do block the writer (which is always one) and the writer block readers - concurrency is highly limited.

In WAL mode - which pretty much always you should set - there's also at most one writer, but writer can coexist with readers.

mickeyp 2 days ago

Indeed. Everyone who uses sqlite will get burnt by this one day and spend a lot of time chasing down errant write-upgraded transactions that cling on for a little bit longer than intended.

  • BinaryIgor a day ago

    SQLite has its quirks, but in this particular case all you need is set PRAGMA busy_timeout=<a few seconds> and the problem is solved; and if you google it, it's widely known issue with described (this) solution.

    It's just weird that it's set to 0 by default rather than something resonable like 3000 or 5000 ms.

liuliu a day ago

Note that busy_timeout is not applicable to SQLite in this case (the SQLITE_BUSY issued immediately, no wait in this case).

Also this is because WAL mode (and I believe only for WAL mode, since there is really no concurrent reads in the other mode).

The reason is because pages in WAL mode appended to a single log file. Hence, if you read something inside a BEGIN transaction, later wants to mutate something else, there could be another page already appended and potentially interfere with the strict serializable guarantee for WAL mode. Hence, SQLite has to fail at the point of lock upgrade.

Immediate mode solves this problem because at BEGIN time (or more correctly, at the time of first read in that transaction), a write lock is acquired hence no page can be appended between read -> write, unlike in the deferred mode.

tlaverdure 2 days ago

Yes, these are both important points. I didn't see any mention of SQLITE_BUSY in the blog post and wonder if that was never configured. Something that people miss quite often.

chasil 2 days ago

In an Oracle database, there is only one process that is allowed to write to tablespace datafiles, the DBWR (or its slaves). Running transactions can write to ram buffers and the redo logs only.

A similar design for SQLite would design for only one writer, with all other processes passing their SQL to it.

kijin 2 days ago

Wouldn't that "fix" make the problem worse on the whole, by making transactions hold onto write locks longer than necessary? (Not trying to disagree, just curious about potential downsides.)

  • asa400 2 days ago

    It’s a reasonable question!

    In WAL mode, writers and readers don’t interfere with each other, so you can still do pure read queries in parallel.

    Only one writer is allowed at a time no matter what, so writers queue up and you have to take the write lock at some point anyway.

    In general, it’s hard to say without benchmarking your own application. This will get rid of SQLITE_BUSY errors firing immediately in the situation of read/write/upgrade-read-to-write scenario I described, however. You’d be retrying the transactions that fail from SQLITE_BUSY anyway, so that retrying is what you’d need to benchmark against.

    It’s a subtle problem, but I’d rather queue up writes than have to write the code that retries failed transactions that shouldn’t really be failing.

BobbyTables2 2 days ago

Thats the best explanation I’ve seen of this issue.

However, it screams of a broken implementation.

Imagine if Linux PAM logins randomly failed if someone else was concurrently changing their password or vice versa.

In no other application would random failures due to concurrency be tolerated.

SQLite is broken by design; the world shouldn’t give them a free pass.

  • asa400 2 days ago

    SQLite is a truly remarkable piece of software that is a victim both of its own success and its unwavering commitment to backward compatibility. It has its quirks. There are definitely things we can learn from it.