Part of the problem with Google is it’s use of retrieval augmented generation, where it’s not just the llm answering, but the llm is searching for information, apparently through its reddit database from that deal, and serving it as the answer. The tip off is the absurd answers are exact copies of the reddit comments, whereas if the model was just trained on reddit data and responding on its own the model wouldn’t produce verbatim what was in the comments (or shouldn’t, that’s called overfitting and is avoided in the training process). The gemini llm on its own would probably give a better answer.
The problem here seems to be Google trying to make the answers more trustworthy through rag, but they didn’t bother to scrub the reddit data their relying on well enough, so joke and shit answers are getting mixed in. This is more a datascrubbing problem then an accuracy problem.
But overall I generally agree with your point.
One thing I think people overlook though is that for a lot of things, maybe most things, there isn’t a “correct” answer. Expecting llms to reach some arbitrary level of “accuracy” is silly. But what we do need is intelligence and wisdom in these systems. I think the camera jam example is the best illustration of that. Opening the back of the camera and removing the film is technically a correct way to fix the jam, but it ruins the film so it’s not an ideal solution most of the time, but it takes intelligence and wisdom to understand that.
Just a guess, but it’s probably a combination of two things. First, if we say a self driving car is going to hit an edge case it can’t resolve once in every, say, 100,000 miles, the number of Tesla’s and other self driving cars on the roads now means more miles driven more frequently which means those edge cases are going to occur more frequently. Second, people are becoming over reliant on self driving - they are (incorrectly ) trusting it more and paying less attention, meaning less chance of human intervention when those edge cases occur. So probably the self driving is overall better, but the number of accidents overall is increasing.