Comment by jandrewrogers

I've done a lot of advanced research in this domain. It is far more difficult than people expect for a few reasons.

The biggest issue is that the basic data model for population behavior is a sparse metastable graph with many non-linearities. How to even represent these types of data models at scale is a set of open problem in computer science. Using existing "big data" platforms is completely intractable, they are incapable of expressing what is needed. These data models also tend to be quite large, 10s of PB at a bare minimum.

You cannot use population aggregates like census data. Doing so produces poor models that don't ground truth in practice for reasons that are generally understood. It requires having distinct behavioral models of every entity in the simulation i.e. a basic behavioral profile of every person. It is very difficult to get entity data sufficient to produce a usable model. Think privileged telemetry from mobile carrier backbones at country scales (which is a lot of data -- this can get into petabytes per day for large countries).

Current AI tech is famously bad at these types of problems. There is an entire set of open problems here around machine learning and analytic algorithms that you would need to research and develop. There is negligible literature around it. You can't just throw tensorflow or LLMs at the problem.

This is all doable in principle, it is just extremely difficult technically. I will say that if you can demonstrably address all of the practical and theoretical computer science problems at scale, gaining access to the required data becomes much less of a problem.