Why can’t ChatGPT understand ‘STRAWBERRY’?
If you’re an AI enthusiast like me, or just a casual scroller on Instagram, you might have seen the recent controversy that took the internet by storm. Someone tried asking ChatGPT to count how many ‘r’s are there in ‘strawberry’ and the result was 2.
For anyone who has used ChatGPT for boring, repetitive as well as difficult, complex tasks, this must come as quite a shock. After the advent of generative AI and GPTs, we thought AI might take over the world, but here it is struggling to count the number of ‘r’s! And then when asked to verify it with simple code, it finally gives the correct result. How can an LLM with approximately 1.8 trillion parameters fail where a simple 2-line python code succeeds?
The Transformer Architecture
The GPT in ChatGPT stands for Generative Pre-trained Transformers, a family of neural network models that use a transformer architecture to generate natural language, code, answer questions and even summarize text. Now for the uninitiated out there, these are not the same as the ones in that Megan Fox movie. These transformers, originally introduced in the Google paper Attention is All You Need for the purpose of machine translation, now are used in almost all LLMs. Don’t worry though, I won’t go deep into what this architecture actually is. All you need to know is how exactly does it work.
Vectors and Tokens
First off, we need to understand the fact that these models don’t have brains. In fact, the common misconception that artificial neural networks are modelled after our brain is just that, a misconception. It is not a model of our brain, it is just inspired by it. Creating a computational model that successfully captures all the intricacies of the human brain is something technology hasn’t achieved as of yet.
So how do these transformers work? Just like computers cannot understand words or commands, they actually only understand bits, ‘0’ and ‘1’, in a higher dimensionality, an AI model cannot understand words, images or audio. Instead it relies on numbers to make sense of the input data. Everything, from numbers themselves, to images and even audio is converted into a numerical representation called vectors. The transformer then performs certain operations on these vectors to implicitly understand the meaning behind it.