Review of the Book “Mind in Motion: How Action Shapes Thought” by Barbara Tversky, Basic Books, New York, 2019
What this book is about. This book by Barbara Tversky, President of the Association for Psychological Science, is an entertaining popular introduction to cognitive psychology. It has many observations and ideas, I strongly encourage everyone to read it.
It is impossible to describe all these ideas is a short review – for this, you must read the book. What I will try to do in this review is to describe some of these ideas – namely, those that are most related to fuzzy techniques.
Knowledge is often described by imprecise words. A large part of our knowledge is precise: we have exact equations of motion, we have exact statistical models that describe many real-life phenomena.
It is well known that a significant part of our knowledge is not exact: it is usually described by using imprecise (“fuzzy”) words from natural language, like “small", This is how we describe how we drive a car, this is how skilled doctors explain what treatment to prescribe for a patient, etc. We all know this very well – because the need to take this imprecise knowledge into account was Zadeh’s main motivation for inventing fuzzy techniques.
Fuzzy techniques has helped us design many successful systems. Some of these systems are used for control, they automatically generate the appropriate control values; other systems generate recommendations – recommendations which are also formulated in terms of (imprecise) words from natural language.
Imprecise numbers vs. imprecise words: an interesting distinction. In fuzzy logic, we use the same technique to describe qualitative imprecise words like “big crowd” and more quantitative imprecise words like “in the hundreds” or “about a thousand”. Interestingly, according to psychologists, our brain processes these two types of imprecise words differently. Namely, in addition to the exact number system that we learn in school, we also have an intuitive approximate number system that allows us to perform reasoning and even simple arithmetic operations with imprecise numbers like “in the hundreds”; see Chapter 7.
Imprecise numbers vs. imprecise words: how can we use this distinction? Since there is such a distinction in our brain, maybe a good idea is to provide such a distinction in fuzzy techniques as well? Maybe it will be helpful to describe “fuzzy arithmetic” – a formalization of our intuitive approximate number system? Researchers have made such attempts, and while these attempts have not (yet) led to many interesting applications, in view of the fact that such concepts are treated differently in our brain, this is one of the promising directions to follow?
Different people assign different meaning to words: known fact and how we currently use it. In describing our ideas, our knowledge, in terms of natural-language words, it is important to take into account that different people assign somewhat different meaning to the same word: what is young to an older person may not sound that young to his grandson; what is tall to a short person does not sound that tall to a professional basketball player, etc.; see Chapter 1.
In fuzzy system, we do take this into account when describing the expert’s rules: we determine the expert’s membership function; if there are several experts, we take into account the difference between their opinions – e.g., by using interval-valued (or, more generally, type-2) membership functions. This is all we need for fuzzy-based control systems.
Different people assign different meaning to words: how else can we use this fact? However, for fuzzy-based recommendation systems – e.g., for a system that provides advice to a medical professional – we also need to take into account that the same natural-word recommendation can be interpreted differently by different users. Ideally, in addition to studying what the experts mean by different words – this we do – we also need to study, for each user, what this user means by different words. This way, instead of generating universal recommendations – that different users will interpret somewhat differently – we must personalize recommendations, make the words slightly different for different users, so that the resulting understanding will be exactly as desired.
Another important idea – described in Chapter 9 – is that it is important for the same person to take into account different possible interpretations of the same word. This is how the thought process of the superforecasters – people who are very good in forecasting events – works: they come up with an idea, then they switch to a possible opponent’s viewpoint, trying to find weak points in this idea, then switch again, etc. It would be great to be able to simulate this efficient process.
Our opinions change in time. People opinions are not fixed, most of us adjust our knowledge based on new facts – by using machine learning terms, we learn from new facts. In some sense, our learning is very similar to how computers learn, but there is also a big difference. Machine learning systems, whether they use neural networks or adaptive fuzzy techniques, make a small modification every time when a new fact is presented. Interestingly, according to Chapter 2, our opinions change discretely: a few new facts do not change our view, until a sufficient number of new facts accumulate that force us to make a drastic change in our opinions. In science, this is known as a paradigm shift, in psychology, it is known as a confirmation bias – we tend to stick to the same opinion even when we see some evidence to the contrary, until this contrary evidence becomes overwhelming. When we consider someone a friend and a good person, we continue viewing this person as a friend even when this “friend” does not behave very well towards us, we try to find an explanation for such instances of bad behavior – until these mis-behaviors accumulate, and we are forced to admit the obvious.
Many psychologists view such bias as irrational, as a limitation of human reasoning – but it actually makes sense. Changing our view of the world is a process requiring a lot of efforts and revisions. So, even if our model of the world turns out to be slightly inadequate, our resulting actions not perfectly optimal, we can tolerate this sub-optimality until the resulting loss of efficiency is smaller than the effort needed to redo our reasoning system. So maybe something like this can be built into our machine learning algorithms – to make them more efficient?
But are words all there is? Many people naively think that our thought process, the way we generate and communicate ideas – all this can be described by words. According to this viewpoint, thoughts appear in terms of natural-language words, they get modified into other words, they are communicated as words – and sometimes, they get transformed into exact formulas. This type of thinking is especially natural for people in fuzzy community: when we ask people how they drive a car, how they make medical recommendations, they describe their answer by using words from natural language – and when we use appropriate fuzzy techniques to translate this knowledge into numbers, we get reasonable good systems. So, for us, it is natural to think that words is all there is.
But psychologists found out that words are not all. Our thoughts often start as images – static or dynamic. When we communicate our thoughts, our ideas, we use images, we use gestures – and this helps. Even historically, images were used to communicate way before written language appeared: e.g., the first known map was designed in Spain about 13,600 years ago, predating written language.
According to the book, such “spatial thinking" is the foundation of abstract thinking (Chapter 3) – this is actually one of the main ideas of the book. As shown in Chapter 5, images and gestures help us think, they help us clarify our thoughts, they help us better communicate our thoughts (this part is clear, gestures of a good lecturer help!). Even uncertainty is described by gestures – e.g., when we want to emphasize that something is approximately true, more or less true, we use waving hand gestures.
According to Chapter 4, even for success in STEM, success requiring processing exact numerical models, it turns out that spatial reasoning, ability to think in terms of images, process images, communicate images is extremely important. For example, according to Chapter 9, students who drew visual explanation of what they learned did much better on the following test than students who only used words or numbers.
How can we describe all this? Fuzzy techniques describe knowledge expressed by using words from natural language. But, according to psychology, a significant part of our knowledge is described in terms of static and dynamical images. How to formalize this part is an important open question.
Interesting fact: images can be fuzzy. Images that we draw do not have to be clear. According to Chapter 9, we usually start with a fuzzy ambiguous image, an image not ready for a clear communication to others, and then it becomes clearer and clearer. According to the corresponding research, this is how creative process starts: such initial ambiguity, fuzziness, is a key to creativity.
How can we describe this ambiguity, this fuzziness? How can we describe the creative process as first forming a fuzzy picture and then “defuzzifying” it? Is it similar to fuzzification and defuzzification in fuzzy control and decision making? These are interesting questions to study.
The most efficient way is combining images and words. Words are useful, images are useful, but, according to Chapter 8, the most efficient way to communicate is to combine images and words. For example, the most useful maps are the ones that both images and labels describing important parts of these images.
How can we combine images with maps? This is an important question.
Speculative conclusions. In addition to more direct relations between the book’s ideas and fuzzy techniques, we can make more speculative connections from the fact that the most clear and intuitive way to describe something is to have an image, i.e., from the mathematical viewpoint, a function of two variables x and y that describes the intensity I (x, y) of the image at a point (x, y).
What conclusion can we make about representing a single notion like “small"? In the first approximation, in fuzzy techniques, this notion can be described by a usual (type-1) membership function that assigns, to each possible value x of the corresponding quantity, the degree μ (x) to which the value x has the corresponding property (e.g., the degree to which the value x is small). Of course, just like the expert cannot describe his/her knowledge by an exact value x, the same expert cannot describe his/her degree of confidence by a single number μ (x): this degree can also be viewed as a fuzzy number, by assigning, to each possible value d of this degree, the degree of confidence μ2 (x, d) that d is an appropriate degree of the statement “x is small”. The corresponding function – known as type-2 fuzzy set – is a function of two variables, exactly what we can represent as an image.
But, of course, the expert cannot produce an exact degree μ2 (x, d) either – so a seemingly natural idea is to consider this degree as a fuzzy number too, i.e., to consider, for each possible value d2 of this degree, the extent μ2 (x, d, d2) to which d2 is appropriate. Such type-3 fuzzy numbers have indeed been theoretically proposed, but so far, in contrast to type-2 fuzzy sets, they have not led to practical applications – so maybe the reason is that only type-2 fuzzy numbers can be represented as an image and thus, only type-2 numbers are intuitive?
What if instead of a single notion like “small”, we have many different statements, with different degrees of confidence. If we describe each degree of confidence by a single number from the interval [0, 1] – as in traditional fuzzy logic – we can place all these degrees on a straight line. If we take into account the user’s uncertainty in assigning a number and allow the use to assign an interval of possible numbers instead, we will get two parameters to describe each statement – so all these statements can be placed into a planar image. If we try to make a more adequate description and use three or more parameters, we no longer have a clear image. Maybe this is the reason why in many applications, interval-valued fuzzy techniques work well, while several seemingly natural generalizations are not as efficient?
What is we have many different notions? Each notion needs to be characterized by the corresponding membership function. Then, according to the above idea, we should have a 2-parametric family of such membership functions – then each notion will be describable as a point on a plane, so the whole set of notions will be easily describable as an image. This probably explains the empirical success of symmetric triangular membership functions which are indeed characterized by two parameters: center and width, and of a similar 2-parametric family of Gaussian membership functions.
And if we allow dynamic images, which are described by functions of three variables I (x, y, t) (t is time), then we can use 3-parametric families – e.g., the family of all trapezoid membership functions?
Back to the book. This review just scratched the surface. Read the book, I am sure many other ideas will come – first they will be fuzzy, then they will become clearer. Enjoy!