Hi readers! Do you remember Generative Artificial Intelligence (generative AI)?
It is a type that can create new content such as text, images, music, audio, and videos using generative models that can learn the patterns and structure of their input data and then generate new data with similar characteristics.
generative model includes the distribution of data itself and tells you how likely a given example is?
Doing this way, it can produce various forms of original content like images, videos, music, speech, text, software code and product designs etc., using complex math and enormous computing power.
has now become an integral part of a society in which people and machines work together.
The benefits of generative AI include faster product development, enhanced customer experience and improved employee’s productivity but generative AI creates artifacts that can be inaccurate or biased hence, human validation is essential.
Dear reader!
A senior contributing writer of ZDNET wrote on Oct. 2, 2023, that,
“generative AI will surpass what ChatGPT can do”
because generative AI’s added multi-modality will move ChatGPT demo to interpersonal collaboration, advanced robotics and perhaps the AI dream of continuous learning.
(you can read the article on https://www.zdnet.com/article/generative-ai-will-far-surpass-what-chatgpt-can-do-heres-everything-you-need-to-know-how-the-tech-advances/).
ChatGPT became the fastest-growing software program in the history, reaching a hundred million users in January 2023 (less than two months from its first public appearance) and produced many rivals, such as Bard of Google and Koala of University of California at Berkeley’s
The excitement has prompted a race between Microsoft and Google and their peers, and a surge in the business of AI chip maker Nvidia.
Currently, ChatGPT, and image programs such as AI’s stable diffusion, and Open AI’s DALL-E are , reproducing a world by feeding a paragraph, a picture, or even the skeleton of a computer program, which is being considered as mirroring society through which people can outline their internal reflection that is not possible without taking into consideration an experience and viewpoint of peers.
Such mirroring aspect will increase dramatically and by the end of this year, generative AI programs will be looking primitive compared to the powers of programs that will be widespread and dominant because many different things could be produced however, as the large language models are one dimensional and see the world through text only hence, multimodule would be the most required feature that will communicate a message which can be a combination of text, images, motion, video and/or audio.
“Modalities refer to the nature of the input and the output data, such as text, image, or video hence a variety of modalities are possible and have been explored with increasing diversity, because the same basic concepts that drive ChatGPT can be applied to any type of input. The availability of every type of such models and their, which combination will therefore, produce amazing results.”
In a large language model (the heart of ChatGPT), text is turned into a token which is a quantified mathematical representation. The machine can then find what is missing from masked parts, entire phrase, or the concluding part of a phrase and recreate the paragraph.
It is this act of recreation that brings about the paragraphs that ChatGPT spits out.
Similarly, in the case of images, the widely used diffusion process corrupts images with noise, and the act of recreating the original image trains a neural network to generate highly reliable images. This processes of recovering missing parts in a phrase or in corrupted image are being used in numerous modalities, or types of data.
For example, in a recent issue of Nature magazine, University of Washington’s biologist and his team corrupted the amino acid sequences of a proteins using RF-diffusion (Rosetta Fold): an open source method for structure generation that can train a neural network to produce a protein: a replication of a novel synthetic protein possessing all desired properties. Such a synthesis can cut down the number of proteins required to be invented and tested before we come up with novel antibodies for a specific diseases.
The combination of multiple modalities has started building a richer picture of the world for the neural network. Scientists at Data-bricks cites the neuroscience concept of “stereognosis,” which means knowing the world by sense of touch. If someone asks how much change you have in your pocket, you can feel the coins and tell by size and weight without seeing them.
Stereognosis is the ability to perceive and recognize the form of an object in the absence of visual and auditory information just by using tactile information to provide cues from texture, size, spatial properties, and temperature, which means the mental perception of depth or three-dimensionality by the senses, which most often refers to the ability to perceive the form of solid objects by touch.
The idea that different senses flesh out understanding is echoed in the multi-modal experiments being carried out.
Research is active into how to make the so-called “backbone” neural networks that can mix and match a dizzying array (a confusing collection of modalities or the data types that include text, images, motion, video and/or audio and produce intriguing results.
Scholars at Carnegie Mellon University recently offered what they call a “High-Modality Multimodal Transformer,” which combines not just text, image, video, and speech but also database table information and time series data. Lead author Paul Pu Liang and colleagues reported that they observed “a crucial climbing behavior” of the 10-mode neural network (a type of machine learning process, called deep learning, that uses interconnected nodes or neurons in a layered structure that resembles the human brain). “Performance is continuously improving with each evert addition of new modality which in turn, is producing entirely new modalities and tasks.”
Do you understand dear readers the wonders being produced by tsunami of technology? Know it and prepare yourself to face it because “In learning you will teach, and in teaching you will learn.”
See you next week.
Take care,
Bye