The Answer is 42. On Data, Information and Knowledge
A recent discussion with some colleagues on the differences between data, knowledge and information made me realize that there still is a lot of confusion when it comes to the use of terms; confusion that goes well beyond my earlier blog post on indicators, measures and metrics.
In this blog post I'll discuss the differences between data, information and knowledge by using an example of counting cattle from space.
The Answer is 42
People who have either read or seen the comic science fiction series "The Hitchhiker's Guide to the Galaxy" know that 42 is the "Answer to the Ultimate Question of Life, The Universe, and Everything", calculated by supercomputer Deep Thought.
It took the computer 7.5 million years to compute and check the answer, which turns out to be 42. But when the instruction for the calculation was given there actually wasn't a clear picture as to what the question was. The question to the instruction developed over time.
That makes it the perfect analogy to start a discussion about data, information and knowledge in the light of decision-making with the use of evaluative evidence.
I think we can all agree that 42 in itself is data. Even with the question "What is the answer to life?" the response being 42 is still data. But when we add the information that this number has been calculated, does that change the typology of the answer? Does it then become information? Not in this example, because we do not know what type of calculation took place and what the original data looked like that was used for the calculation.
Counting Cattle From Space
But let's say we are counting cattle from space. I know, still pretty spacey. But hang in there for a moment; it is not as far-fetched as it sounds. Let's say we are using satellite imagery to do a census on cattle grazing as part of the M&E of a protected area intervention. The aim is to identify the drivers of habitat degradation in that protected area. By means of a computer model we identify cattle, based on a hierarchical object-based classification method of both cattle and cattle pens and on-the-ground data on the average number of animals per pen.
In a pilot calculation the answer from our computer at the GEF Independent Evaluation Office is 42. The question however is "What is the average number (per acre) of cattle grazing in the protected area?" Is that data, or is that information? There has been an extensive calculation and a data triangulation process that has used raw data. Some would say this is still data, some would say this is information.
What if the question is "What has been the decline in the average number (per acre) of cattle grazing in the area since it became a protected area?" By using a reference point (the moment the area became protected. Let's say there were 100 cows grazing per acre at that point in time) and by looking at the development through time, the answer 42 becomes information. A relationship is being analyzed, the relationship between the area being protected and the decline in grazing within that area over time. We could even compare this development against a business-as-usual scenario by taking into account the data from a comparable area that has not become protected area.
The DIKW Pyramid
What makes data become information? And what makes information become knowledge? The DIKW pyramid is a model for representing functional relationships between data, information, knowledge, and wisdom. There are some who reject the DIKW pyramid, because it is difficult to explain and leads to bad labels. But difficulty or complexity has never stopped me from pursuing something so let's have a look at the version below of the DIKW Pyramid, which is my interpretation of how these levels interact and inform real-world decision-making.
Data comes in the form of raw observations and measurements. I tend to see data both as raw facts or chunks of facts about the state of the real world, as well as a symbol that attempts to capture the true picture of a real event.
Information is created by analyzing relationships and connections between the data. It is capable of answering simple "Who/What/Where/How many/When/Why is" style questions. Information is a message with an (implied) audience and a purpose. Quite often, when we talk about ‘data science' or ‘data driven decision-making' it is information and not data that feeds into the actual decision-making.
Knowledge is perhaps the concept hardest to define and definitions may refer to information having been processed, organized or structured in some way, or else as being applied or put into action. One view is that knowledge is a product of a synthesis in the human mind, and exists only in the thoughts in someone's mind. This would mean that knowledge can only be shared as information and then become knowledge again in someone else's brain. ‘Knowledge management' under such a definition would basically be thought management.
My feeling is that knowledge (explicit as well as tacit) is created by using the information for action. Knowledge answers the "How" question. Knowledge is contextualized; a local practice or relationship that works, and can be shared by properly sharing the context that makes the information become knowledge. And interesting point made
Wisdom is created through use of knowledge, through knowledge users' communication, and through reflection, i.e. by embedding values, beliefs and experience into knowledge. Wisdom answers the "Why do" question as it relates to actions. In a sense it is what helps us make a better informed decision between two seemingly similar choices, or what helps us to apply knowledge toward the attainment of a common or higher good.
Any of these terms are relative concepts and knowledge can be considered as information (data) on a higher, more abstract domain-of-application level. An example; When humans make decisions and use information for action we tend to talk about knowledge. But these days computers make a lot of decisions on data and information without any human intervention, which begs the question if a computer can be knowledgeable.Another point would be that the pyramid is not really a pyramid, but should perhaps look like an hourglass in which there are both lines going up as well as down. Data can be derived from knowledge and information; the quantification step in the Most Significant Change technique is a good example of this type of reverse processing.
In the end I think we all agree that decisions are often not made on data alone, but on information, knowledge and wisdom, which are established or derived (directly or indirectly) in part from data. Through processes like evaluation, research, observation and feedback we generate new data, information and knowledge.
Questions for the Reader
Do you feel knowledge can be saved, or only exists in a person's thoughts? If a farmer shares his knowledge on historical rainfall patterns in his area, does it become information once he writes it down and explains the context or is it still knowledge?
Which of these concepts informs real world decision making most? Information, knowledge or wisdom?
Do you feel wisdom is the right concept to talk about? Or is it too esoteric and should we talk about understanding?