Comment by 1wd
Does anyone predict economy/population/... by simulating individual people based on real census information? Monte carlo simulation of major events (births, death, ...) based on known statistics based on age, economic background, location, education, profession, etc.? It seems there are not that many people that this would be computationally infeasible, and states and companies have plenty of data to feed into such systems. Is it not needed because other alternatives give better results, or is it already being done?
I've done a lot of advanced research in this domain. It is far more difficult than people expect for a few reasons.
The biggest issue is that the basic data model for population behavior is a sparse metastable graph with many non-linearities. How to even represent these types of data models at scale is a set of open problem in computer science. Using existing "big data" platforms is completely intractable, they are incapable of expressing what is needed. These data models also tend to be quite large, 10s of PB at a bare minimum.
You cannot use population aggregates like census data. Doing so produces poor models that don't ground truth in practice for reasons that are generally understood. It requires having distinct behavioral models of every entity in the simulation i.e. a basic behavioral profile of every person. It is very difficult to get entity data sufficient to produce a usable model. Think privileged telemetry from mobile carrier backbones at country scales (which is a lot of data -- this can get into petabytes per day for large countries).
Current AI tech is famously bad at these types of problems. There is an entire set of open problems here around machine learning and analytic algorithms that you would need to research and develop. There is negligible literature around it. You can't just throw tensorflow or LLMs at the problem.
This is all doable in principle, it is just extremely difficult technically. I will say that if you can demonstrably address all of the practical and theoretical computer science problems at scale, gaining access to the required data becomes much less of a problem.