Comment by esperent

Comment by esperent 12 hours ago

Unless marketing blogs from any company specifically say what model they are talking about, we should always assume they're hiding/conflating/mislabeling/misleading in every way possible. This is corporate media literacy 101.

The burden of proof is on Google here. If they've reduced gemini 2.5 energy use by 33x, they need to state that clearly. Otherwise a we should assume they're fudging the numbers, for example:

A) they've chosen one particular tiny model for this number

B) it's a median across all models including the tiny one they use for all search queries

EDIT: I've read over the report and it's B) as far as I can see

Without more info, any other reading of this is a failing on the reader's part, or wishful thinking if they want to feel good about their AI usage.

We should also be ready to change these assumptions if Google or another reputable party does confirm this applies to large models like Gemini 2.5, but should assume the least impressive possible reading until that missing info arrives.

Even more useful info would be how much electricity Google uses per month, and whether that has gone down or continued to grow in the period following this announcement. Because total energy use across their whole AI product range, including training, is the only number that really matters.

mquander 12 hours ago

You should not assume that "they've chosen one particular tiny model", or "it's a median across all models including the tiny one they use for all search queries" because those are totally made up assumptions that have nothing to do with what they say they measured. They measured the Gemini Apps product that completes text prompts. They also provided a chart showing that the thing they are measuring scores comparably to GPT-4o on LM Arena.

Reply View 1 reply

penteract 11 hours ago

From the report:
> To calculate the energy consumption for the median Gemini Apps text prompt on a given day, we first determine the average energy/prompt for each model, and then rank these models by their energy/prompt values. We then construct a cumulative distribution of text prompts along this energy-ranked list to identify the model that serves the 50-th percentile prompt.
They are measuring more than one model. I assume this statement describes how they chose which model to report the LM arena score for, and it's a ridiculous way to do so - the LM arena score calculated this way could change dramatically day-to-day.

Reply View | 0 replies

mgraczyk 11 hours ago

> total energy use across their whole AI product range, including training, is the only number that really matters.

What if they are serving more requests?

Reply View 0 replies

mgraczyk 12 hours ago

They did specifically say in the linked report

Reply View 17 replies

esperent 12 hours ago

Here's the report. Could you tell me where in it you found a link to 33x reduction (or any large reduction) for any specific non-tiny model? Because all I can find is lots of references to "median Gemini". In fact, I would say they're being extremely careful in this paper not to mention any particular Google models with regards to energy reduction.
https://services.google.com/fh/files/misc/measuring_the_envi...

Reply View | 16 replies
- mgraczyk 11 hours ago
  
  Figure 4
  I think you are assuming we are talking about swapping API usage from one model to another. That is not what happened. A specific product doing a specific thing uses less energy now.
  To clarify: the way models become more efficient is usually by training a new one with a new architecture, quantization, etc.
  This is analogous to making a computer more efficient by putting a new CPU in it. It would be completely normal to say that you made the computer more efficient, even though you've actually swapped out the hardware.
  
  Reply View | 15 replies
  
  sigilis 11 hours ago
  
  Don’t they call all their LLM models Gemini? The paper indicates that they specifically used all the AI models to come up with this figure when they describe the methodology. It looks like they even include classification and search models in this estimate.
  I’m inclined to believe that they are issuing a misleading figure here, myself.
  
  Reply View | 5 replies
  
  esperent 11 hours ago
  
  > Figure 4: Median Gemini Apps text prompt emissions over time—broken down by Scope 2 MB emissions (top) and Scope 1+3 emissions (bottom). Over 12 months, we see that AI model efficiency efforts have led to a 47x reduction in the Scope 2 MB emissions per prompt, and 36x reduction in the Scope 1+3 emissions per user prompt—equivalent to a 44x reduction in total emissions per prompt.
  Again, it's talking about "median Gemini" while being very careful not to name any specific numbers for any specific models.
  
  Reply View | 8 replies