Comment by dv35z
Comment by dv35z 2 days ago
Curious if anyone has strategies on how to perform parallel writes to an SQLite database using Python's `multiprocessing` Pool.
I am using it to loop through a database of 11,000 words, hit an HTTP API for each (ChatGPT) and generate example sentences for the word. I would love to be able to asynchronously launch these API calls and have them come back and update the database row when ready, but not sure how to handle the database getting hit by all these writes from (as I understand it) multiple instances of the same Python program/function.
Technically SQLite can only have 1 writer at any given moment, but it can appear like it works across multiple writers and let it serialize the calls for you.
By default SQLite will not do what you want out of the box. You have to turn on some feature flags(PRAGMA) to get it to behave for you. You need WAL mode, etc read:
* https://kerkour.com/sqlite-for-servers * https://zeroclarkthirty.com/2024-10-19-sqlite-database-is-lo...
My larger question is why multiprocessing? this looks like an IO heavy workload, not CPU bound, so python asyncio or python threads would probably do you better.
multiprocessing is when your resource hog is CPU(probably 1 python process per CPU), not IO bound.