Ok, I have been doing some reading on AI lately, and here is my understanding rambled below. I'll try to put it in a simple way, like telling a story.
The Origins:
Since time immortal (i.e. the advent of computing and computers), humans have tried to make computers smart, in that sense, the real victory would be when computers could communicate with humans without sounding like computers. They should be able to make cohesive statements, stay true to the context, understand the tone of conversation and at times, even make mistakes like misunderstanding something and apologizing. Computers would need the ability to learn and understand, just like humans, before saying something sensible.
To make this happen, Humans tried to make something similar to a human brain, using computers. Let's call that brain as
Artificial Neural Network or simply Neural Network.
For simplicity, a hypothetical computer creature using this 'brain' would be called a 'model'. Since it dealt with processing words, the creature was called
'Large Language Model' or LLM.
Now, A neural network in computers is lot of mathematics, statistics crammed with lot of scientific jargon which we will not go into. But, just know that this Neural Network is the starting point of everything related AI that we talk about.
The Problem:
As time progressed, Neural Networks (i.e. brain) made lot of advancements and so did LLMs. They could learn things on their own and give reasonable answers to humans. But, there was still one problem it would not solve.
All these creatures (models) were not doing a great job at making cohesive statements at lengths. Think about a guy/girl you go on a date where he/she can just go about rambling for hours, or, you sitting in a boring classroom where the professor is just talking continuously, or you are having a conversation with a friend after a long time, sitting in a bar and telling long stories.
These LLMs were just not able to do that. They would falter after two or three statements. Lose context and start sounding dorky.
The Breakthrough
That is until 2017, when a bunch of scientists at Google published a paper called 'Attention is all you need'.
It talked about a new type of Neural Network Architecture i.e. a new computer brain structure, which:
- had the ability to process input (words for example) in parallel, instead of sequence.
- had the ability to learn and adapt fast with better self learning capabilities that could understand multiple contexts at the same time (not getting into the jargon again for simplicity)
Both these qualities were not seen in any neural network architectures before.
They called this new type of computer brain as 'Transformer', simply because they liked the sound of it and their initial paper had an illustration of the Transformers.
The Transformer architecture, used in a creature named LLM, solved the problem of making lengthy conversations, rambling within contexts and giving out solutions based on what it had learnt. It had excellent next sentence prediction and accuracy.
In immediate future, the learnings from these models would go futher, meaning, they could take in not just words, but images, sounds and other forms of data (DNA sequence) as input and generate outputs. The Models would be called as Generative Pretrained Transformers or
GPT
In 2018, humans literally created an architecture, a mathematical model, that would mimic the brain with a far greater accuracy compared to earlier models.
The Birth of Foundation Models
To put this theory of Transformers to test, big companies started implementing it in computers.
Now, To use Transformer in a Model, a computer needed:
- Fast parallel processing capabilities (*nVidia CEO smiling in the corner)
- Loads of training data (*companies having user data: Google, Meta, Microsoft popping champagne bottles. OpenAI, not just yet..)
Google put this into a computer which had parallel processing abilities, and came up with a functional model named BERT in 2018. The results astonished and scared them, so Google started incorporating it in their search engine silently without making a lot of noise.
OpenAI put this into theirs and came up with a model named ChatGPT around 2020, where they trained their model using Wikipedia and some free data available on the internet. Even with this, masses were blown away looking at what it could do. Later Microsoft invested in OpenAI, possibly giving OpenAI access to data that Microsoft owned.
Google came out of their shell and made their model available for general public too. Their product is now known as Gemini.
This is the point where Generative AI exploded on the face of general users.
Other companies started coming up with 'Models' which would use some form or the other of GPT or BERT. These models came to be known as 'Foundation Models'. They are used to (but not limited to):
- Text input to Text Output (ChatGPT, Gemini etc)
- Text Input to Image Output (Dall-E)
- Image to Image
- Text to Sound
... and so on
You can create your own Foundation model, or use the ones already available.
For example: Apple integrated ChatGPT instead of coming up with their own.
To create your own Foundation Model, you would need
- Loads of money
- Good legal team (?)
because, firstly, the hardware is exorbitantly priced. GPUs are expensive. You need lot of computing power to train a model
secondly, training data can be proprietary, you will need to pay for it or steal it as OpenAI is alleged to be doing that for ChatGPT.
The advancement
GPTs and these foundation models in principle, work exactly like humans and the brains they come equipped with. The training mechanism too, can be compared with humans.
A human brain samples data based on experience and makes associations that reside in his brain. Example: A baby looks at a ball for the first time, his dad says the word 'Ball'. Baby registers object's name as ball. Then, the dad throws the ball, baby looks at the act, associates the action of 'throwing' with the object 'ball'. He then 'throws' his dad's cell phone (an object he is not aware of). Dad says 'No', baby associates 'No' with not 'throwing' a 'cell phone' but 'Yes' to a 'ball'.
A computer model too works on similar logic. The 'ball', 'phone', 'act of throwing' all are part of training data, the attributes of the training data can be 'object name', 'action', 'color' etc. These attributes and and related information is stored in the form of vectors (or Matrix, an arrangement of numbers on which you can do operations as well)
So a computer brain will become smarter as it gets more attributes for association and more data for training. Sounds similar to humans learning ?
As processing power of computers increase, the ability of computers to process these attributes will keep on rising. ChatGPT initially had a attributes in order of millions and it would make some mistakes while framing output, now, the latest version of GPT has about 165 billion attributes on which it has trained itself. It has now become aware of the 'tone' of sentences and paragraphs, can write songs and compose music nicely. As it keeps getting trained on more and more data, it will get more and more accurate. That is to say, at some point in time, it will become smarter than some humans in general interaction as far as language models are concerned.
Common Pitfalls
People are using the word AI anywhere and everywhere these days. Don't get awestruck by that. Wait. Wait till they show actual, real results on real data. If someone has no idea about Foundation Models, Neural Networks and is making tall claims about AI, you have the right to look at him/her with suspicion
Not everyone can afford GenAI at this point, building own GenAI models for homegrown solutions requires lot of money and time. It is expensive. Training it may need access to data that nobody is willing to give you due to copyright or confidentiality. So if someone is saying we are using AI powered chatbots on our platform, you might want to have a look at the company's finances, the Data science, legal team they have. They might simply be selling small language model based chatbots which we all have frustrating used in banking apps.
My car has a voice command recognition, does not mean I should post that my car is AI enabled on linkedin.
Ending comments
Is AI really a threat to some jobs? Definitely Yes.
Will AI replace humans completely: No, but will drastically reduce workforce in some sectors.
Is Generative AI simple to implement: No, it involves lot of money and resources, legal stuff and all. People have already died in mysterious ways.
Should humans be worried ?
Yes
Afterall, this GenAI thing comes very close to human brains.
In a world full of gullible humans who do not have the ability to think and act properly, this Generative AI thing has the ability to do some real damage. Just see how videos are morphed to perfection or how audio clips were 'generated' to spread false propaganda during recent elections.