Comment by ychen306
Comment by ychen306 2 days ago
It's orders of magnitude cheaper to serve requests with conventional methods than directly with LLM. My back-of-envelope calculation says, optimistically, it takes more than 100 GFLOPs to generate 10 tokens using a 7 billion parameter LLM. There are better ways to use electricity.
I work in enterprise IT and sometimes wonder if we should add the equivalent energy calculations of human effort - both productive and unproductive - that underlies these "output/cost" comparisons.
I realize it sounds inhuman, but so is working in enterprise IT! :)