Comment by abigail95
Comment by abigail95 2 days ago
Something is missing here, why do batch jobs take 13 hours? If this thing was started on an old mainframe why isn't the downtime just 5 minutes at 3:39 AM?
Exactly how much data is getting processed?
Edit: Why does rebuilding take a decade or more? This is not a complex system. It doesn't need to solve any novel engineering challenges to operate efficiently. Article does not give much insight into why this particular task couldn't be fixed in 3 months.
The batch jobs don't take 13 hours. They're just scheduled to run some time at night where the old offices used to be closed and the jobs could be ran with some expectations regarding data stability over the period. There are probably many jobs scheduled to run at 1AM then 2AM, etc, all depending on the previous to be finished so there is some large delay to ensure that a job does not start before the previous one is finished.
As to your "not a complex system" remark, when a system is built for 60 years, piling up new rules to implement new legislation and needs over time, you tend to end up with a tangled mess of services all interdependent that are very difficult to replace piece-wise with a new shiny architecturally pure one. This is closer to a distributed monolith than a microservices architecture. In my experience you can't rebuild such a thing "in 3 months". People who believe that are those that don't realize the complexity and the extraordinary amount of specifics, special cases, that are baked into the system, and any attempt to just rebuild from scratch in a few months hits that wall and ends up taking years.