Comment by ajnin
Comment by ajnin 2 days ago
The batch jobs don't take 13 hours. They're just scheduled to run some time at night where the old offices used to be closed and the jobs could be ran with some expectations regarding data stability over the period. There are probably many jobs scheduled to run at 1AM then 2AM, etc, all depending on the previous to be finished so there is some large delay to ensure that a job does not start before the previous one is finished.
As to your "not a complex system" remark, when a system is built for 60 years, piling up new rules to implement new legislation and needs over time, you tend to end up with a tangled mess of services all interdependent that are very difficult to replace piece-wise with a new shiny architecturally pure one. This is closer to a distributed monolith than a microservices architecture. In my experience you can't rebuild such a thing "in 3 months". People who believe that are those that don't realize the complexity and the extraordinary amount of specifics, special cases, that are baked into the system, and any attempt to just rebuild from scratch in a few months hits that wall and ends up taking years.
Anyone who doesn't understand what's so difficult should read this:
https://wiki.c2.com/?WhyIsPayrollHard
Its from a different domain, but it gives you a flavour of the headaches you encounter. These systems always look simple from the outside, but once you get inside you find endless reams of interrelated and arbitrary business rules that have accumulated. There is probably no complete specification (unless you count the accumulated legal, regulatory and procedural history of the DVLA), and the old code will have little or no accurate documentation (if you are lucky there will be comments).