Hi readers, as expected, lots of people asked about the data and Artificial Intelligence. It appeared that their concept about various types of data in not clear besides, most of them are unable to comprehend what is Artificial Intelligence? Don’t worry, you are not alone. There are many especially in the developing world who are not clear about all this. So, a new series of blogs is here that is “Data and Artificial Intelligence.
This will be a series of 4-5 blogs that will hopefully clarify all your concepts and I will be extremely happy to do that because it will be helpful to many. So, let’s start with the first blog:
Data and artificial intelligence: what are these entities and the nexus in between?
But before I do that, let me tell you how I develop interest in this facet?
I was given an assignment to document scientific programs and activities of some institutions located in four different provinces where supposedly lots of data was being generated. Since I knew all of them who were involved in data generation as well their published work and periodic reports from where I could easily get that data, but I suddenly realized that I should not lean too much on my experience and intuition which could lead me to unconscious bias hence, I decided to be “data driven”. Thus, I asked for the data that was generated and was already published and/or reported by the scientists of the four centers, ways and means to generate that data and all the work done including the ancillary activities base on that data. I emphasized this anticipating that this data will not only be structured but clean data that can be shared and joined with the data generated by all the four centers, but that was not the case.
I then looked at other factors such as the environment under which scientists and centers were working and what the scientists in the centers established in the remote area and their competitor in well-established centers of the organization are doing, how their work is being compared and how they are being viewed by the higherups? I, therefore, decided to be data informed along with data driven and used something in between, but I was shocked when two of the centers refused to provide the data without which the documentation was not possible. Why they did that? The readers will get the answer during the course this blog and will realize where we are standing?
Now the purpose of this long proem is to familiarize you with few terminologies and types of data sets that will be used off and on in this series of blogs which is imperative to understand before understanding Artificial Intelligence.
Have you ever realized why big businesses like Facebook, Twitter, Google, Toyota, Honda, and Sony etc., have progressed so well? These organizations made progress through adopting data driven technologies to make decision on the basis of solid data. In developing countries, decisions are mostly made through personal experience or observation, instinct, hype, and dogma or beliefs and perceptions. Since perception are stronger than realities and in developing countries and perhaps elsewhere also, perception matter more than realities thus, it is imperative, that important decisions be made with the help of supporting, relevant, and quality data to enable the results more reliable and credible compared to those made on instincts, assumptions, and perceptions.
William Edwards Deming: an influential industrialist of the 20th century, and thinker in the field of quality once said,
“without data you’re just another person with an opinion” which means
“if you don’t have data, you have an opinion”
In developing countries, such opinions are commonly expressed everywhere with “authenticity” which most often serve as bases for decision making.
Now the question is, what type of data is he talking about?
Deming highlighted “the quality data” because this will enhance the scope of achievements. Equally important is availability of adequately trained and technically equipped humans who could ask the right questions, can extract the right data and metrics, analyze, and transform data into competitive insights that could drive business decisions and actions using people, processes, and technologies.
“Without these essentials, the data would be just numerical values with no context”
It is this vital point where most of the organizations especially those working in public sector in developing countries do not compete with those working in private sector because they hire task based competent and skilled people who could deliver.
Public sector organization hire untrained youngsters just after they are graduated, and they get training while they are on job. Usually, they get training on the subject their bosses are working on and if the boss does not believe in data driven technologies, he will not encourage others to learn the same.
Can the organizations be influenced through Data & Analytics” or will she wanted to be influenced? are two different questions that prompted me to write this long “proem”. The readers will get the answer in between and will realize why developing countries are still developing and will probably remain so till they realize the power of quality and sharing of data.
The question is what entities are called data, quality data, data types and what are their related phenomena?
Data are individual facts, statistics, or items of information, often numeric. It is a set of values of qualitative or quantitative variables about one or more persons or objects,
Data-informed is a type of decision-making that gives reference to the collection and analyses of data to guide decisions that improves and/or guarantee success. It is also known as making decisions based on hard data contrary to intuition, observation, or guesswork.
Data driven is a quality of a person who possess high degree of intellectual honesty which is an applied method of problem solving characterized by an unbiased, honest attitude which data driven people pursue persistently.
Structured data is a standardized format for providing information about a page and classifying content on that page. For example, to write a recipe, it is must that the recipe page exhibit ingredients, cooking time, temperature, calories, and other required things aiding the recipe to be followed easily. Structured data is often quantitative data, who’s objective and pre-defined nature allows us to easily count, measure, and express data in numbers.
Unstructured data means a datasets or large collections of datafiles that aren’t stored in a structured database format. Though, it possessed internal structure, but it is not predefined through data models. It might be human and/or machine generated in a textual or a non-textual format. The examples of unstructured data vary from imagery and text files like PDF documents to video and audio files,
Clean data is devoid of corrupt and/or inaccurate records (it is not cooked or extrapolated data) in tables, or databases while data cleaning is a process of identifying incomplete, incorrect, inaccurate, or irrelevant parts of the data and then replacing, modifying, or deleting them in order to confer upon data: accuracy which means it is close to true values, completeness which means degree to which all required data is known, consistency which means it is dependable, integrity which means it is reliable, timeliness which means it is appropriate and relevant, and uniformity which means it is homogeneous,
Silos data is a collection of data detained or understood by one group that is not easily or fully accessible by other groups in the same organization. Organizational, silos refer to business divisions that operate independently and avoid sharing information (this is what I experience during documentation),
Mundane data is very ordinary data with no interest. It enables us to focus the analyses on ordinary sites of everyday life where digital data is lively and leaky which means sites, where data becomes part of and open to constantly changing configurations of things and processes through which life continues,
Trivial, redundant and obsolete (ROT) data is information that isn’t necessary to store. Trivial data do not provide any value to the organization because it is surplus, outdated, and unimportant,
Quality data which must be:
- Accurate, timely and relevant and accessible as well as quarriable by anyone who wishes so,
- Joinable to the data generated by other organizations and
- Sharable within the organization and outside so that the data generated on the similar aspects can be joined.
Big data is too large and complex to be processed by human mind. The magnitude of big data can be assessed by considering that in:
One single minute, 277,777 Instagram stories are posted, 511,200 tweets are sent, 4500,000 UTube videos are watched, 4497,420 searches are conducted, 18,100,000 texts and 18,800,000 email are sent.
It is thus beyond comprehension of human mind to analyze the volume, velocity, variety, variability, and veracity of the data collected for one minute, what to talk about the collection made round the clock?
To use such complex and huge data, it is imperative to make a strategy comprising a plan, specific design of storing, managing, sharing, and method of acquiring and using that data within and out of the organization.
Artificial Intelligence (AI) is precisely that method that has produced wonders through the use of “Big Data”
So dear readers, now you realized why knowing about data, its quality, access, and storage is imperative for understanding about Artificial intelligence?
See you next week with all about Artificial Intelligence. Take care and
Bye.