Comment by andsoitis

Comment by andsoitis 2 days ago

Why did NYC release it in the first place? Did they not QA it?

Or was it perhaps one of those cases where they found issues, but the only way to really know for sure that the deleterious impact is significant enough by pushing it to prod?

drillsteps5 2 days ago

>Why did NYC release it in the first place? Did they not QA it?

How do you QA black box non-deterministic system? I'm not being facetious, seriously asking.

EDIT: Formatting

Reply View 13 replies

pegasus 2 days ago

The same way you test any system - you find a sampling of test subjects, have them interact with the system and then evaluate those interactions. No system is guaranteed to never fail, it's all about degree of effectiveness and resilience.
The thing is (and maybe this is what parent meant by non-determinism, in which case I agree it's a problem), in this brave new technological use-case, the space of possible interactions dwarfs anything machines have dealt with before. And it seems inevitable that the space of possible misunderstandings which can arise during these interactions will balloon similarly. Simply because of the radically different nature of our AI interlocutor, compared to what (actually, who) we're used to interacting with in this world of representation and human life situations.

Reply View | 10 replies
- drillsteps5 2 days ago
  
  Does knowing the system architecture not help you with defining things like happy path vs edge case testing? I guess it's much less applicable for overall system testing, but in "normal" systems you test components separately before you test the whole thing, which is not the case with LLMs.
  By "non-deterministic" I meant that it can give you different output for the same input. Ask the same question, get a different answer every time, some of which can be accurate, some... not so much. Especially if you ask the same question in the same dialog (so question is the same but the context is not so the answer will be different).
  EDIT: More interestingly, I find an issue, what do I even DO? If it's not related to integrations or your underlying data, the black box just gave nonsensical output. What would I do to resolve it?
  
  Reply View | 1 reply
  
  bhadass a day ago
  
  >EDIT: More interestingly, I find an issue, what do I even DO? If it's not related to integrations or your underlying data, the black box just gave nonsensical output. What would I do to resolve it?
  Lots of stuff you could do. Adjust the system prompt, add guardrails/filters (catching mistakes and then asking the LLM loop again), improve the RAG (assuming they have one), fine tune (if necessary), etc.
  
  Reply View | 0 replies
- datsci_est_2015 2 days ago
  
  > The same way you test any system - you find a sampling of test subjects, have them interact with the system and then evaluate those interactions.
  That’s not strictly how I test my systems. I can release with confidence because of a litany of SWE best practices learned and borrowed from decades of my own and other people’s experiences.
  > No system is guaranteed to never fail, it's all about degree of effectiveness and resilience.
  It seems like the product space for services built on generative AI is diminishing by the day with respect to “effectiveness and resilience”. I was just laughing with some friends about how terrible most of the results are when using Apple’s new Genmoji feature. Apple, the company with one of the largest market caps in the world.
  I can definitely use LLMs and other generative AI directly, and understand the caveats, and even get great results from them. But so far every service I’ve interacted with that was a “white label” repackaging of generative AI has been absolute dogwater.
  
  Reply View | 0 replies
- [removed] 2 days ago
  
  [deleted]
  
  Reply View | 0 replies
- themafia 2 days ago
  
  > radically different nature of our AI interlocutor
  It's the training data that matters. Your "AI interlocutor" is nothing more than a lossy compression algorithm.
  
  Reply View | 5 replies
  
  pegasus 2 days ago
  
  Yet it won't be easy not to anthropomorphize it, expecting it to just know what we mean, as any human would. And most of the time it will, but once in a while it will betray its unthinking nature, taking the user by surprise.
  
  Reply View | 1 reply
  
  themafia 2 days ago
  
  > taking the user by surprise.
  And surprise is really what you want in computing. ;)
  
  Reply View | 0 replies
  
  sebastiennight 2 days ago
  
  Most AI Chatbots do not rely on their training data, but on the data that is passed to them through RAG. In that sense they are not compressing the data, just searching and rewording it for you.
  
  Reply View | 2 replies
[removed] 2 days ago

[deleted]

Reply View | 0 replies
kylehotchkiss 2 days ago

temperature 0 and 10,000,000 mischievous prompts

Reply View | 0 replies

thedanbob 2 days ago

> Why did NYC release it in the first place? Did they not QA it?

Considering Louis Rossmann's videos on his adventures with NYC bureaucracy (e.g. [0]), the QAers might not have known the laws any better than the chat bot.

[0] https://www.youtube.com/watch?v=yi8_9WGk3Ok

Reply View 1 reply

direwolf20 2 days ago

Considering the previous mayor's relationship with the law, it could be on purpose.

Reply View | 0 replies

cheald 2 days ago

Remember that many people are heavily are happy-path biased. They see a good result once and say "that's it, ship it!"

I'm sure they QA'd it, but QA was probably "does this give me good results" (almost certainly 'yes' with an LLM), not "does this consistently not give me bad results".

Reply View 4 replies

cyrusradfar 2 days ago

Agreed, I just read this paper by AWS' Ahmed El-Deeb
https://dl.acm.org/doi/epdf/10.1145/3780063.3780066 (PDF loads slow....)

Reply View | 0 replies
themafia 2 days ago

> almost certainly 'yes' with an LLM
LLMs can handle search because search is intentionally garbage now and because they can absorb that into their training set.
Asking highly specific questions about NYC governance, which can change daily, is almost certainly 'not' going to give you good results with an LLM. The technology is not well suited to this particular problem.
Meanwhile if an LLM actually did give you good results it's an indication that the city is so bad at publishing information that citizens cannot rightfully discover it on their own. This is a fundamental problem and should be solved instead of layering a $600k barely working "chat bot" on top the mess.

Reply View | 2 replies
- [removed] a day ago
  
  [deleted]
  
  Reply View | 0 replies
- Imustaskforhelp 2 days ago
  
  I use Duckduckgo so I don't see really garbage search imo but not sure about people who google.
  But as you say that LLM's cant handle search. One of the things that I can't understand and I hope you help in is that this doesn't have to be this way.
  Kagi exists (I think I like the product/product idea even though I haven't bought it but I have tried it). Kagi's assistants can actually use Kagi search engine itself which is really customizable and you can almost have a lot of search settings filtered and Kagi overall is considered by many people as giving good search.
  Not to be a sponsor of kagi or anything but if this is such a really big problem assuming that NYC literally had to kill a bot because of it & the reason you mention is the garbage in garbage out problem of search happening.
  I wonder if Kagi could've maybe helped in it. I think that they are B-corp so they would've really appreciated the support itself if say NYC would've used them as a search layer.
  
  Reply View | 0 replies

pibaker 2 days ago

The chatbot was released under the Eric Adams administration. The same Eric Adams, as soon as his term finished, went to Dubai and launched a cryptocurrency.

https://apnews.com/article/eric-adams-crypto-meme-coin-942ba...

I think he is simply not very bright, and got mesmerized by all the shiny promises AI and crypto makes without the slightest understanding of how it actually works. I do not understand how he got into office in the first place.

Reply View 0 replies

elgenie 2 days ago

QA efforts can whack-a-mole some issues, but the mismatch of problem and solution is inherent in any situation in which a generator of plausible-sounding text gets pointed at an area where correctness matters.

Reply View 0 replies

rsynnott a day ago

It’s an LLM. The dirty little secret of LLMs is that they cannot be used for anything important, unless the output is checked by an expert (which typically rather defeats the purpose).

There’s no amount of qa that could save this.

Reply View 0 replies

freejazz 2 days ago

Have you heard of Eric Adams?

Reply View 0 replies

erxam 2 days ago

> Why did NYC release it in the first place?

Perhaps a big fat check was involved.

Reply View 3 replies

Eric_WVGG 2 days ago

Yeah… no offense, but only a person who didn't know anything about Mayor Eric Adams would ask a question like that.
Just days out of office, he made a few million off a crypto scam. Buffoonishly corrupt. https://finance.yahoo.com/news/eric-adams-promoted-memecoin-...

Reply View | 1 reply
- worthless-trash 2 days ago
  
  Not op. That wasn't a question.
  
  Reply View | 0 replies
kevin_thibedeau 2 days ago

Usually it's a manila envelope.

Reply View | 0 replies

fragmede 2 days ago

Why do you think OpenAI let a red team loose on GPT-5 for six months before releasing it to the public?

Reply View 1 reply

bluGill 2 days ago

For the image. There is no way a red team can find all the issues in 6 months. They can find some of the biggest, but even getting all the issues fixed in 6 months seems unlikely.

Reply View | 0 replies

JohnTHaller a day ago

It was implemented by our scammy, grifting, Republican in a Democratic lawmaker suit former mayor Eric Adams who should probably be in prison but who made a deal with Trump to not be prosecuted.

Reply View 0 replies