Comment by phillipcarter

Comment by phillipcarter 2 days ago

8 replies

The issue I see is that this is pretty much the final boss for AI systems. Not because the tasks to do are inherently too difficult or whatever, but the integration of data and quality of that data is so variable that you just can't get something done reliably.

Compare this to codebase AI, where much of the data you need lies in your codebase or repo. Even then, most of these coding tools aren't even close to automating meaningful coding tasks in practice, and while that doesn't mean they can't in the future, it's a long ways off!

Now in the ops world, there's little to no guarantee that you'll have relevant diagnostic data coming out of a system that you need to diagnose it. That weird way you're using kafka right now? The reason for it is told via oral tradition on the team. Runbooks? Oh, those things that we don't bother looking at since they're out of date? ...and so on.

The challenge here is in effective collection of quality data and context, not the AI models, and that's precisely what's so hard about operations engineering in the first place.

ern 2 days ago

Even then, most of these coding tools aren't even close to automating meaningful coding tasks in practice, and while that doesn't mean they can't in the future, it's a long ways off!

Not related to your main point, but I've introduced Github Copilot to my teams, and, surprisingly, two of our strongest developers reached out to me independently, and told me it's been a huge boost to their productivity, one in refactoring legacy code, and another in writing some non-trivial components. I thought the primary use would be as a crutch for less capable developers, so I was surprised by this.

As a middle-manager whose day job previously robbed me of the opportunity to write code, I've used ChatGPT 4o to write complex log queries on legacy systems that would have been nearly impossible for me to do otherwise (and would have taken a lot of effort from my teams) and to turn out small (but meaningful) tasks, including learning Android dev from scratch to unblock another group and other worthwhile things that keep my team from being distracted and able to deliver.

I guess there's a "no true Scotsman" fallacy hiding there, about what constitutes "meaningful coding tasks in practice", but to me, investing in these tools has been money well spent.

  • phillipcarter 2 days ago

    Oh, I completely agree with using tools like this. For example, the latest models are very good at being passed a description of a problem and its inputs, expected outputs, and some sample test cases, and then generating a very diverse set of additional cases that likely account for some edge cases you might have missed. Hugely productive for things like that.

    However, these same coding assistants lack so much! For example, I can't have a CSV co-located in my directory as a jupyter notebook file, then start prompting+coding without having also done a call to df.head to get those results burned into the notebook file. The CSV is sitting right there! These tools should be able to detect that kind of context, but they can't right now. That's the sort of thing I mean when we have a long way to go.

    • salomonk_mur 2 days ago

      But still, a huge productivity boost. I think we can say that as of the lastest models, AI pairs are pretty great and save a ton of time.

  • fumeux_fume 2 days ago

    My experience with Copilot has been the opposite of your devs'. It frequently screws up routine tasks, has poor UI and writes buggy code I would expect from someone who was brand new to programming. Sometimes it'll chance on a nice solution without requiring modifications, but not enough to fool me into any glowing reviews! I think I have higher standards than most though.

clvx 2 days ago

One thing I don’t trust about this approach is when using coding assistants, the generated code might not be what you need at first and then you keep iterating or use what’s necessary from the output but in ops that approach can make things worse burning more money and trust.

  • shombaboor 2 days ago

    ive definitely got into prompt cycles where i ask myself it definitely would have been shorter to have written it myself. so far it can't do anything I can't do myself, it's a time saver for the most common boilerplate blocks, function definition etc.

beoberha 2 days ago

I agree completely. SRE/Ops/Livesite is an incredibly hard problem and very easy to make shiny demos for products that will not reproduce those results when you need them most.

The article talks about moving past “copilots” and moving right to “agents”. There’s probably some semantics to decipher there, but we haven’t even gotten copilots to work well! At their core, they are essentially the same problem, but I feel a lot safer with a chatbot suggesting a mitigation than just going and performing it.

arminiusreturns 2 days ago

Ops dude here. I agree. I recently was doing some digging on the real future of AI in dev/ops, and I found that the higher the complexity, the less capable the AI (oh god, I've turned into one of those that is now saying AI instead of ML/DL). Operations is the height of big picture complexity - exactly what AI is not good at. That said, I think it could do a lot to assist with finding anomalies that get missed in the flood of data. I've done some fun log stuff on this using Z values before but it took mental effort! So I do think it could help a lot with the queries/searches, but it is unlikely to be able to do "whole system" observability across DC/stacks very well in current iteration.

PS: I hate how many "agents" already have to run on systems. Especially when the prod stuff is core starved already. I can't tell you how many times I've found an agent (like crowdstrike) causing some strange cascade issue!