KNOW IT IF YOU DON'T

This blog represents my opinions on matters around the globe.

Follow Me

How Does ChatGPT Work?

Hi readers!

You now know a lot about chat GPT’s hence, you must be wondering how ChatGPT works?

So, this 4th, blog will describe how it does?

As you know, Chat GPT is famous for enabling users to improve and navigate conversation towards a desired length, format, style, level of detail, and language used and allows users to have human-like conversations aided by natural language processing (NLP) tool driven by Artificial Intelligence (AI) technology.

The language model can respond to questions and compose various written contents including articles, social media posts, essays, codes, and emails. It is based on Generative AI that enables users to quickly generate new content based on various inputs and outputs including text, images, sounds, animation, 3D models, etc.

Generative AI is a high-performance infrastructure for cloud computing, data analytics & machine learning which along with transformer models learn from large datasets for generating unique outputs. It’s a process of producing a human language text response based on some data input which can also be converted into a speech format through text-to-speech services. It is trained with Reinforcement Learning through Human Feedback (RLHF) and reward models.

RLHF is a technique of machine learning that trains a “reward model” directly from human feedback and uses the model as a reward to optimize an agent’s policy using reinforcement learning (RL) that combines the power of human guidance with machine learning algorithms.

Chat GPT therefore, works through:

The agent: The learner or decision-maker in RL process,

The Environment: The external system with which the agent interacts, and

The Goal: The objective the agent aims to achieve.

Reinforcement learning tasks comprised three things that is

States, that are representation of the current world or the environment of the task,

Actions are something an RL agent can do to change these states, and

Rewards.

The concept of Rewards is a utility the agent receives for performing the “right” actions.

So, the states will tell the agent in what situation it is in currently, and the reward will signal the states what it should be aspiring towards?

The aim is to learn a “policy” something which tells you which action to take from each state so as to maximize the rewards.

For example, when we drive a car,

The state is the position of the car (the way it is going and the speed with respect to neighboring cars,

The actions we take (are turning the steering wheel and pressing the accelerator or brake),

The reward is dependent on how quickly we reach our destination while respecting traffic rules.

The agent takes actions in the defined environment to achieve maximum rewards over time.

Our actions are driving which results in a reward signal if we are driving carefully, at stipulated speed, respecting the driving and the traffic rules.  The reward will be positive if we reach our destination in propriate time without having an accident while driving according to driving and traffic rules.

To maximize the rewards, our policy should be to drive safely, according to prescribed speed, and traffic rules, yet reaching the destination as soon as possible without having any accident.

Now, consider a robot (agent) navigating a network.

When the robot moves closer to the exit (of the network), it receives a positive reward which means the robot has exited the network successfully. If the robot hits a wall, it might receive a negative reward. The robot learns from such rewards to improve its behavior (which means the software will be improved).

Thus, in RL, rewards represent the feedback that an agent receives based on its actions thereby, guiding it toward better decision-making. 

It’s a fundamental concept that drives the learning process.

This feedback helps supplement ChatGPT to improve future responses with Machine Learning that solve problems such as development of algorithms by human programmers (too expensive exercise) hence, the problems are solved by helping computers to “discover their own” algorithm without explicitly telling the machine what to do with human-developed algorithms.

Cumulative reward is a term used in RL to describe the sum of the individual reward and the rewards inherited from the parentsThe learning agent learns the best sequence of actions to solve a problem where “best” is quantified as the sequence of actions that has the maximum cumulative rewards.

ChatGPT operates through:

  • The integration of AI technologies such as Machin Learning and Deep Learning, utilizing neural networks (NN) for response generation and sentiment analysis,
  • Leveraging neural networks, which allows ChatGPT to comprehend the tone (Nuance) of the user input data enabling it to produce clear/comprehensible and contextually relevant responses. These neural networks are designed to interpret the text data and generate meaningful outputs by recognizing patterns and relationships within the language.
  • Nuances: The system analyzes sentiments by deciphering the emotional tone (nuance) of the text, allowing ChatGPT to tailor responses accordingly.
  • Application of Machine Learning principles, that enables ChatGPT to continuously refines its algorithms based on user interactions, while progressively enhancing its conversational accuracy and adaptability over time,
  • Automation capabilities through which ChatGPT efficiently assists users in completing tasks by streamlining processes and providing timely responses, and
  • Creating immersive (generating a three-dimensional image which appears to surround the user) experiences through human-like dialogues, ChatGPT enhances user interaction leading to more effective communication and problem-solving.

Integration of these features allows more personalized and efficient user experience, ultimately improving productivity and satisfaction.

That’s all for now dear readers. Hope it is not too technical.

See you next week. Take care, bye.

Leave a Reply