The Fallacy of Artificial General Intelligence: Microsoft's Recognition of the Limits of LLMs
Microsoft released a research work last week [1] that claims that GPT-4 capabilities can be viewed as an early version of Artificial General Intelligence. The authors states that "the breadth and depth of GPT -4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system."
The researchers adopted the following definition of human Intelligence to reach this conclusion: "a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience.". According to the same paper, the definition was proposed in 1994 by a group of psychologists.
Interestingly, the authors of the paper [1] acknowledges that the definition of human intelligence is somehow restrictive. They also acknowledge that some components of this definition are currently missing with GPT-4 "In fact, even within the restricted context of the 1994 definition of intelligence, it is not fully clear how far GPT-4 can go along some of those axes of intelligence, e.g., planning, and arguably it is entirely missing the part on “learn quickly and learn from experience” as the model is not continuously updating".
In my opinion, none of the aforementioned components in the adopted definition are really exhibited by GPT-4 because no one really understands the meaning of these components. Therefore, we have to question the validity of any attempt to give a definition to the term intelligence.
Intelligence is a very complex concept that we can not define because we don't understand, particularly thinking and reasoning components. In his famous paper "Computing Machinery and Intelligence", Turing notably abstained from defining the concept of thinking, which is an important component of the adopted definition.
Attempting to define intelligence is limiting something that should not have a limit. As the Irish poet Oscar Wilde [2] once said: "to define is to limit". Therefore, no one should really try to define Intelligence nor claims creating a technology that can exhibit it. Our Intelligence can not be limited, it will constantly evolve along with the problems that our universe generates every day.
We can alternatively talk about how human materialise the intelligence today and build a set of benchmarks that allows the technology to advance in order to outperform the current materialisation of intelligence, such as the current exam questions. However, we have to understand that humans can always develop other types of materialisation that machines can not outperform.
Even if we assume that we understand the meaning of reasoning and thinking, we have to admit that they are intrinsically related to problems that we are trying to solve. However, the one who perceives the problem and understands the solution is the human, not the machine.
The intelligence exhibited (or will be exhibited by the grandsons of ChatGPT) depends on how we ask the machines about problems that we perceive. For example, imagine a journalist interviewing the most outstanding scientist in the world. Can the reader of the journalist's article benefit from the scientist's knowledge? It depends on how the journalist asks the questions. The journalist is us, and the scientist is the n-generation grandson of ChatGPT. No one will take benefit from these tools if we don't ask the right question.
If you still have further doubts, take this example that is shared in the Microsoft's paper that is (it is normal if you don't understand the prompt or the answer of GPT-4):
prompt to GPT-4: Suppose g(x) = f −1(x), g(0) = 5, g(4) = 7, g(3) = 2, g(7) = 9, g(9) = 6 what is f (f (f (6)))? answer of GPT-4: There are different ways to approach this problem, but one possible method is to use the fact that g(x) = f −1(x) implies that f (g(x)) = x for any x in the domain of g. That is, applying f to both sides of the equation g(x) = f −1(x) gives f (g(x)) = f (f −1(x)) = x. This means that we can undo the effect of f by applying g, and vice versa. Using this idea, we can try to find f (6) by looking for the value of x such that g(x) = 6. From the given information, we see that g(9) = 6, so f (6) = f (g(9)) = 9. Similarly, we can find f (9) by looking for the value of x such that g(x) = 9. We see that g(7) = 9, so f (9) = f (g(7)) = 7. Finally, we can find f (7) by looking for the value of x such that g(x) = 7. We see that g(4) = 7, so f (7) = f (g(4)) = 4. Therefore, f (f (f (6))) = f (f (9)) = f (7) = 4.
Where is the problem? Most of us don't understand why we have to ask this question and how to understand the solution. The only one who understands, is the one who understands the PROBLEM that the machine doesn't perceive.
The authors of the paper ask an interesting question "Can one reasonably say that a system that passes exams for software engineering candidates is not really intelligent?". In my opinion, the answer is NO. Many exams were designed before the advent of LLMs. The main problems of exams is that they give specific questions and expect from students specific solutions. LLMs can easily answer specific questions, but they will really struggle in answering more open problems that are defined according to the knowledge of the human educator.
The authors acknowledges that LLMs are very limited in producing new science "Perhaps the only real test of understanding is whether one can produce new knowledge, such as proving new mathematical theorems, a feat that currently remains out of reach for LLMs.". In my opinion, not only they are not able to produce a new science but also they are not able to acquire easily the knowledge and the experience of human educators.
Simply passing an exam by memorizing the answers does not necessarily make a student intelligent. With training, the student could even learn to probabilistically generate different answers based on different prompts without truly comprehending the underlying concepts. While the student may be able to provide a perfect answer that matches the context of the question, this is not an actual demonstration of intelligence.
Our problem is that we currently consider anyone who memorises big number of answers and return them on purpose is intelligent. The intelligence is much more broader than this restrictive view. The student will prove his intelligence only when he is able to come up with new solutions to the problems that he was not trained on. So the fact that LLMs pass different exams based on knowledge that human train them to "learn" or better to "probabilistically memorise or predict", doesn't make them necessarily intelligent.
References
- Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M. T., & Zhang, Y. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4 (arXiv:2303.12712). arXiv. http://arxiv.org/abs/2303.12712
- https://en.wikipedia.org/wiki/Oscar_Wilde
- Turing, A. M. (1950). Computing Machinery and Intelligence. Mind, New Series, 59(236), 433–460.
Comments
Post a Comment