Every actual AI researcher worth their salt says this. ๐ Rant incoming...
Toddler-like understanding: GPT is a language model. All it does is predict the next word in a sentence (or, in ChatGPT's case, a conversation) - based on a model it's built from a ginormous text corpus. The reason this works is that letting the "AI" optimize for predicting words - actually makes it build a structure where "math on words" becomes possible in a multidimensional space - where 'Guybrush' minus 'Monkey Island' plus 'Day of the Tentacle' ends up with a result in the vicinity of Bernard (but also relatively close to Hoagie). This is called "word embedding" and is probably the number one principle of current "AI".
The thing is, knowing that the most likely next word in the sentence "The main character of The Secret of Monkey Island is..." is "Guybrush Threepwood" - is not the same as knowing that Guybrush Threepwood is the main character of The Secret of Monkey Island.
Another thing is that "AI" in general has a "utility function" - the "scoring mechanism" for whether they do a good job or not. The ideal utility function for an AI is mostly hard or impossible to actually implement, so researchers usually go for something easier. For example, you might think that "speak the truth" is an ideal utility function for ChatGPT. But "truth" is hard to quantify - you could hire a number of experts to train it - scoring it based on whether its output is actually correct. But you'd need a lot of experts to train it sufficiently. So, OpenAI settle (like all AI developers must) for less - in this case, simply a subjective ranking of which of multiple outputs the reviewer likes the most. Of course, the reviewer will not be an expert on all matters - so they'll tend to simply rate on which response is the most pleasing to read, the most convincing, etc.
In other words, you're not training the AI to pick its words to be truthful - you're training it to pick words to sound authoritative on the matter. In general, AI studies of recent years have shown, that the larger corpus, the more training, and the more processing power you throw at an "AI", the more it will, indeed, increase its score according to its actual utility function - although we've already reached the point of diminishing returns. However, at the same time, you also reach a point - and we've already reached it for the large AIs - where its score according to its ideal utility function drops steeply - and even goes below 0 - as in, the algorithm will "actively" go directly against its ideal utility function (e.g. "truth") while still scoring high on its actual utility function (e.g. "good answer"). A classic example of this is that ChatGPT 3 would happily give people a random poem sounding "old" if asked to write in the style of a Shakespeare sonnet - why? Because most humans can't tell the difference anyway.
This may sound like "lying like a human". But all of this doesn't reflect any kind of understanding on the part of the AI - it just reflects the humans who are training it.
Some people will realize all of this, and still claim that "the evolution of AI is going so fast that in just a few years...". Thing is, the evolution isn't going fast. The vast majority of breakthroughs in the field happened between 1960 and 1989. For example, back-propagation - a major component of the learning of any "AI" - was described in 1962 - and implemented before the end of that decade. Word embeddings as described above were first realized and implemented in the mid-1980's.
(Almost) all that's happened in the past 10 years is throwing more data and computing power at the problem - both resources which are finally approaching their breaking point. On the computing power side, ChatGPT (pre-4) requires a server with 8 GPU's (and we're not talking gamer GPU's here). That server will be dedicated to just you for the amount of time that it takes for it to send you a full response to a prompt. And in that time it will devour about the same amount of power as a couple of old washing machines. ๐ It's a hugely inefficient way to solve most of the problems people use it for - and in most cases, it's also very ill-suited for those problems.
Rant over. Here's a bit of SCUMM... ๐