Comment by counters

Comment by counters 4 days ago

3 replies

No one is claiming that there is "new knowledge" here.

The entire class of deep learning or AI-based weather models involves a very specific and simple modeling task. You start with a very large training set which is effectively a historical sequence of "4D pictures" of the atmosphere. Here, "4D" means that you have "pixels" for latitude, longitude, altitude, and time. You have many such pictures of these for relevant atmospheric variables like temperature, pressure, winds, etc. These sequences are produced by highly-sophisticated weather models run in what's called a "reanalysis" task, where they consume a vast array of observations and try to create the 4D sequence of pictures that are most consistent with the physics in the weather model and the various observations.

The foundation of AI weather models is taking that 4D picture sequence, and asking the model how to "predict" the next picture in the sequence, given the past 1 or 2 pictures. If you can predict the picture for 6 hours from now, then you can feed that output back into the model and predict the next 6 hours, and so on. AI weather models are trained such that this process is mostly stable, e.g. the small errors you begin to accumulate don't "blow up" the model.

Traditionally, you'd use a physics-based model to accomplish this task. Using the current 3D weather state as your input, you integrate the physics equations forward in time to make the prediction. In many ways, today's AI weather models can be thought of as a black box or emulator that reproduces what those physics-based models do - but without needing to be told much, if any of the underlying physics. Depending on your "flavor" of AI weather model, the architecture of the model might draw some analogies to the underlying physics. For example, NVIDIA's models use Fourier Neural Operators, so you can think of them as learning families of equations which can be combined to approximate the state of the atmosphere (I'm _vastly_ over-simplifying here). Google DeepMind's GraphCast tries to capture both local and non-local relationships between fields through it's graph attention mechanisms. Microsoft Aurora' (and Silurian's, by provenance, assuming it's the same general type of model) try to capture local relationships through sliding windows passed over the input fields.

So again - no new knowledge or physics. Just a surprisingly effective of applying traditional DL/AI tools to a specific problem (weather forecasting) that ends up working quite well in practice.

hwhwhwhhwhwh 4 days ago

Thanks for the explanation. I am still a bit confused how this takes care of the errors? I can see how the weather prediction for tomorrow might have less errors. But shouldn't the errors accumulate as you feed the predicted weather as the input for the model? Wouldn't the results start diverging from reality pretty soon? Isn't that the reason why the current limit is close to 6 days? How exactly does this model fixed this issue?

  • counters 4 days ago

    It doesn't take care of the errors. They still "accumulate" over time, leading to the same divergence that traditional physics-based weather models experience. In fact, the hallmark that these AI models are _doing things right_ is they show realistic modes of error growth when compared with those physics-based models - and there is already early peer-reviewed literature suggesting this is the case.

    This _class_ of models (not Aurora, or Silurian's model specifically) can potentially improve on this a bit by incorporating forecast error at longer lead times in their core training loss. This is already done in practice for some major models like GraphCast and Stormer. But these models are almost certainly not a magical silver bullet for 10x'ing forecast accuracy.