Comment by dgacmu
This is exactly why you have it write code instead of analyzing the data. You can have tests, you can inspect then code, you know that the process will be deterministic. The chatbot LLMs are a bad match for bulk data analysis on regular, structured data. But they're often quite decent at writing code.