Tải bản đầy đủ - 0 (trang)
4 Natural Language, Semantic Grounding, and Learning to Solve Problem Through Language

4 Natural Language, Semantic Grounding, and Learning to Solve Problem Through Language

Tải bản đầy đủ - 0trang


5 Causal Rules, Problem Solving, and Operational Representation

operational representations can be used for effective reasoning and problem

solving. These causal rules encode conceptual generalizations of the concepts

involved (such as Push, Momentum, and Attach). Therefore, in a sense what we

are showing is that knowing (the concepts involved) is knowing how to act (with

the concepts).

We have shown in Fig. 5.14 how a tool could be constructed to solve a certain

problem and the process of tool construction and application was worked out by

the system through a backward chaining problem solving process with causal

rules that have been learned earlier. Now, suppose a system (or a person) has

worked out this solution before. Could the system use natural language to communicate to another system the method to solve the problem without the second

system having to undergo a potentially involved problem solving process? In fact,

a vast proportion of human progress was achieved through learning from others

through natural human language about all kinds of solutions to all kinds of

problems. Of course, this is only possible because the recipients of the instructions in natural language form really understands what the strings of symbols

emitted by the instructors mean.

Using our framework, we illustrate how instructions though language is possible

because our representational framework captures meaning appropriately. In

Fig. 5.18 we show an example, derived from the same problem as that in

Fig. 5.14, of using English language instructions from an “instructor” to teach a

“student” how to solve the problem involved. One could roughly characterize the

meaning of a sentence as existing at two levels – the syntactic level and the lexical

level (e.g., Saeed 2009). For this example, we assume that there are some syntactic

transformational rules that are peculiar to each language (in this case, English) that

convert the syntactic structures to some deeper representations that capture the

syntactic level meaning of the sentences involved. This aspect of meaning

processing is complex in itself and we will not delve into the details there. What

we would like to illustrate with Fig. 5.18 is how the individual lexical items (the

“words”) can be associated with some ground level representations that provide the

meaning of the lexical items involved and this understanding allows the recipient of

the language instructions to carry out tool construction and application, illustrating

that the recipient really understands the strings of symbols emitted by the instructor

through language.

On the right side of Fig. 5.18 is the solution to the problem in Fig. 5.14 and on the

left side is a sequence of instructions in English instructing how a tool could be

constructed and used to solve the problem. To avoid clutter, we extract two

sentences from the left panel and place them right below the Solution picture and

show how each of the words in the sentences can be mapped onto an item, an

activity, or an action in the Solution. Again, to avoid clutter, we show only some of

the mappings. Also, there are linguistic devices such as the article “the” whose

function cannot be clearly identified with an external object or event and we omit

the mappings for these as well. Other than the words corresponding to some of the

concepts we have discussed above, we also show how the concepts of “first,”

“after,” and “then” can be mapped onto the temporal order of events in the Solution

5.5 Discussion



Problem Solving Instructions

in Language Form

1. First you

prepare/materialize one

mobile object.

2. Then you


another mobile object.

3. Then you attach the two

mobile objects together.


6. Then you push the

constructed tool upward.

7. After the tool has

penetrated the Wall, you

attach the sharp end to

Object 2 and the other

end to yourself, and then

you pull downward.

1. First you


one mobile object.

7. After the tool has penetrated the Wall,

you attach the sharp end to Object 2 and

the other end to yourself, and then you

pull downward.

Fig. 5.18 Semantic grounding of natural language sentences through grounding of the meaning of

the lexical items in the sentences. This translates into the actions carried out according to the

natural language instructions, illustrating the idea of knowing (some concepts) is knowing how to

act (with the concepts)

picture. Through an appropriate “meaning construction” process using the

grounded understanding of each of the lexical items in the sentences and the

syntactic structure of the sentences, it is then possible for the recipient of the

instructions in natural language to carry out the appropriate actions accordingly.

We characterize this process of learning from language instruction as learning of

grounded concepts for rapid problem solving through language which constitutes

the bulk of learning for noological systems that have the language faculty, such as

human beings. The great strides made in the human community in problem

solving, as distinct from that of other animals, critically depend on this kind of




We would like to contrast our operational representation paradigm described in

both the previous chapter and this chapter with standard AI approaches in two

major aspects. One aspect concerns the representation of the temporal aspects of the

world. In many AI and neuroscience investigations and methods, it is deemed that


5 Causal Rules, Problem Solving, and Operational Representation

to capture temporal information, a recurrent neural network or a Markov chain

(Arbib 2003; Sutton and Barto 1998) would be the most natural choice as changes

in the state of the world from one time frame to another can be embedded in these

structures both for recognition and generation purposes. In neural networks, a

succession of the physical states of the world which change over time could be

learned and stored and later used to recognize novel instances with some generalizations (Arbib 2003). Recurrent neural networks can also generate a succession of

states representing action sequences to be taken in the world.

In Markov chains, the temporal nature of the physical world can be captured in

state transition probabilities linking one physical state to another, representing how

likely the transitions have been observed in the real world. These transition probabilities can direct an intelligent system to output action sequences to achieve

certain goals. Even though in some sense the temporal aspects of the world are

captured in these representations, as these representations can recognize and generate states over time, the temporal aspect is implicit. Take, for example, the basic

movements of Move-Up and Move-Down as discussed in Chap. 4, Sect. 4.3.3. State

transition representations would link the state in the earlier time step to the later

time step, and perhaps assign a transition probability to the transition, but they do

not really capture the physical nature of the movements. The fact that, say, MoveUp and Move-Down are actually intimately physically related to each other and

related in various manners to other operations, is best represented and revealed

through the spatiotemporal representations in which the temporal dimension is

explicitly represented such as that we have advocated here – in the form of

operational representations. The participation of these operational representations,

in the construction of conceptual hierarchies and in various reasoning and problem

solving processes, becomes very natural and straightforward, as can be seen in

various examples described in both chapters. It would be difficult to see how a

neural network or Markov chain representation can support relatively complex

problem solving processes of Fig. 5.14 and also allow explicit linguistic descriptions and instructions such as that illustrated in Fig. 5.18.

Another aspect concerns a field of AI investigation known as qualitative or naăve

physics (Hobbs and Moore 1985; Weld and de Kleer 1990). In qualitative physics,

knowledge of the physical behavior of objects and causal agents is also encoded and

represented for reasoning and problem solving purposes. But the various methods

advanced are often haphazard and no standard methods exist for handling various

situations. Many methods are proposed to handle temporal information but none of

them is explicit and principled like what we propose here. Moreover, certain

approaches seem convoluted, unnatural, and difficult to extend to more general

domains, such as the representation of liquids using predicate logic as described by

Hayes (1985). Comparing Hayes’ scheme with what we have outlined for

representing fluids in the operational representational framework (Figs. 4.5, 4.11,

5.16, and 5.17), the current method is more straightforward, natural, and effective,

with the possibility that the knowledge for the physical behavior of fluids being

learnable by observing and interacting with the environment directly through causal

learning as discussed in Chap. 2. The area of qualitative physics (Hobbs and Moore

5.5 Discussion


1985; Weld and de Kleer 1990) does not address the issue of learning of causal


A note should be made about the convergence of our method here and an area of

research known as cognitive linguistics as discussed in Chap. 4, Sect. 4.5 (e.g., the

work by Croft and Cruse 2009; Evans and Green 2006; Geeraerts 2006; Johnson

1987; Langacker 1987, 1991, 2000, 2002, 2008, 2009; Ungerer and Schmid 2006,

etc.). In cognitive linguistics, many concepts such as the concept of Arrive are

represented in a form with an explicit temporal axis such as illustrated in Fig. 4.28

(Langacker 1987). From the point of view of the cognitive linguists, the pictorial

representation is an adequate representation of the meaning of the concepts

involved. What we have achieved here goes beyond that. We provide a fully

computationalized account of a representational scheme that is based on explicit

temporal representation together with the successful demonstration of the power

and effectiveness of the representational scheme in noological tasks of learning,

reasoning, and problem solving. However, the convergence with the cognitive

linguistic approach further strengthens our belief that our representational scheme

does indeed provide the foundation for capturing the meaning of concepts adequately for noological processing.

A further point to note is that in a full-blown AI system that employs operational

representations to handle, in an epistemically grounded manner, various real world

objects, events, and processes that are 3D and of very high resolution, a massive

amount of information processing is expected. However, there is no short-cut for

creating truly intelligent systems. Human beings deal with the full complexity of

the physical world in a massively parallel processing manner with their large

number of neurons connected mostly in parallel in their nervous systems to achieve

rapid processing for the purpose of survival. If epistemically grounded knowledge

and characterization of the world is necessary for intelligent actions and survival,

there is no escape from addressing the massive nature of information processing

required. The structured and principled methods outlined in our operational representational scheme have the generality and power to scale-up to drive noological

processing that addresses physically realistic situations in the real world.

In Sect. 5.4 we pointed out that a critical kind of learning, learning of grounded

concepts for problem solving through language, constitutes the most important kind

of learning for noological systems with the language faculty, which engenders rapid

learning of a huge amount of knowledge for survival. This critical mechanism

resulted in the rapid advancement of the human community compared to that of the

other animals. This kind of learning is only possible through grounded conceptual

representations such as the various operational representations discussed in the past

two chapters.

We would like to highlight the significance of the spatialization of time from the

point of view of another discipline – physics. As we know, one major breakthroughs

in physics in the past century was the discovery of the theory of relativity that

depends on the spatialization of time to achieve a more general formulation of

physical laws and hence a deeper understanding of reality (Lorentz et al. 1923).

Hence our explicit temporal representation of some grounded concepts here may

have the same significance to noology as the spatialization of time has to physics.


5 Causal Rules, Problem Solving, and Operational Representation


The problem in Fig. 5.18 is analogous to the “monkey-and-bananas” problem in

traditional AI (McCarthy 1968; Nilsson 1980) – a monkey/robot cannot reach some

bananas attached to a high ceiling and hence has to use a tool (such as a long stick)

or climb up one or more boxes to reach the bananas. The scenario in Fig. 5.18 could

be interpreted as a monkey constructing a tool (with a sharp end) and jumping up

and using the tool to poke through the ceiling to spear at and retrieve the banana(s)

on the other side of the ceiling. In contrast to the approach in traditional AI, when a

problem such as this (the left side of Fig. 5.18) is posed, a noologistically realistic

approach, as has been demonstrated in this chapter, is to tackle all the issues

involved from learning (of physical laws in this case) to problem solving. This

includes incremental chunking of knowledge (Sects. 3.5 and 5.2.2) and learning of

heuristics (Sect. 2.6.1). Using the 2D/3D representational scheme devised in the

problem at the end of Chap. 4, formulate a real/3D-world monkey-and-bananas

problem and its solution in the spirit of the noologistically realistic approach as

expounded in the foregoing discussions.


Arbib, M. A. (2003). The handbook of brain theory and neural networks (2nd ed.). Cambridge,

MA: MIT Press.

Barr, A., & Feigenbaum, E. A. (Eds.). (1981). The handbook of artificial intelligence: Volume I.

Los Altos: William Kaufmann.

Croft, W., & Cruse, D. A. (2009). Cognitive linguistics. Cambridge: Cambridge University Press.

Evans, V., & Green, M. (2006). Cognitive linguistics: An introduction. Mahwah: Lawrence

Erlbaum Associates.

Geeraerts, D. (2006). Cognitive linguistics. Berlin: Mouton de Gruyter.

Gleitman, H., Gross, J., & Reisberg, D. (2010). Psychology (8th ed.). New York: W. W. Norton &


Hayes, P. (1985). Naăve physics I: Ontology for liquids. In J. R. Hobbs & R. C. Moore (Eds.),

Formal theories of the commonsense world. Norwood: Alex Publishing.

Ho, S.-B. (2013). Operational representation – A unifying representation for activity learning and

problem solving. In AAAI 2013 Fall Symposium Technical Reports-FS-13-02, Arlington,

Virginia (pp. 34–40). Palo Alto: AAAI.

Hobbs, J. R., & Moore, R. C. (Eds.). (1985). Formal theories of the commonsense world.

Norwood: Alex Publishing.

Johnson, M. (1987). The body in the mind. Chicago: The University of Chicago Press.

Langacker, R. W. (1987). Foundation of cognitive grammar (Vol. I). Stanford: Stanford University Press.

Langacker, R. W. (1991). Foundation of cognitive grammar (Vol. II). Stanford: Stanford University Press.

Langacker, R. W. (2000). Grammar and conceptualizations. Berlin: Mouton de Gruyter.

Langacker, R. W. (2002). Concept, image, and symbol: The cognitive basis of grammar. Berlin:

Mouton de Gruyter.

Langacker, R. W. (2008). Cognitive grammar: A basic introduction. Oxford: Oxford University




Langacker, R. W. (2009). Investigations in cognitive grammar. Berlin: Mouton de Gruyter.

Lorentz, H. A., Einstein, A., Minkowski, H., & Weyl, H. (1923). The principle of relativity: A

collection of original memoirs on the special and general relativity. London: Constable.

McCarthy, J. (1968). Program with commonsense. In M. Minsky (Ed.), Semantic information

processing. Cambridge, MA: MIT Press.

Nilsson, N. J. (1980). Principles of artificial intelligence. Los Angeles: Morgan Kaufmann.

Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach. Upper Saddle River:

Prentice Hall.

Saeed, J. (2009). Semantics. Malden: Wiley-Blackwell.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA:

MIT Press.

Ungerer, F., & Schmid, H.-J. (2006). An introduction to cognitive linguistics. Harlow: Pearson.

Weld, D. S., & de Kleer, J. (Eds.). (1990). Readings in qualitative reasoning about physical

systems. San Mateo: Morgan Kaufmann Publishers, Inc.

Chapter 6

The Causal Role of Sensory Information

Abstract This chapter investigates in a deep and principled manner how sensory

information contributes to a noological system’s learning and formulation of causal

rules that enables it to solve problems rapidly and effectively. The spatial movement to goal with obstacle problem is used to illustrate this process. The process is

one of deep thinking and quick learning – the causal learning process requires few

training instances, but extensive use is made of the many causal rules learned to

reason out a solution to the problem involved. Heuristic learning and generalization

emerge in the process as an aid to further accelerate problem solving.

Keywords Sensory information • Problem solving • Causal rule • Heuristic •

Heuristic generalization • Learning of heuristic • Script • Recovery from thwarted

script • Deep thinking • Quick learning

In Chap. 1, Sect. 1.2, using a simple and yet illustrative example, we showed how

sensory information contributes to the problem solving abilities of a noological

system. The role of sensory information is that of a “service function” to the

problem solving process as illustrated in Fig. 1.7. In Chap. 2, Sect. 2.6.1, we

demonstrated how certain parameters sensed and provided by the sensory system

allow an agent to learn useful heuristics that guide and vastly reduce the problem

solving search effort in the SMG problem.

Even though computer vision systems are improving in their visual abilities

(e.g., Forsyth and Ponce 2002; Shapiro and Stockman 2001; Szeliski 2010, etc.), by

far the most successful visual systems are found in biological systems. Humans and

primates lie at the apex of these abilities, with well-endowed visual cortices. What

do these visual systems do and what information can they provide for an agent to

assist with problem solving? Marr (1982) characterizes the visual system’s stages of

processing as including the stages of primal sketch, 2.5D sketch, and 3D model

representation. The 3D model representation is an object centric representation that

requires the use of some top-down models to construct as typically our visual

system cannot see the “backside” of objects from a particular vantage point.

Therefore, the information a visual system can derive from a particular vantage

point on a scene with object(s) inside it is a 2.5D sketch – a depiction of the scene in

terms of the surface orientation at every point of the surfaces in the scene, the

distance to every point on the surfaces, the discontinuities in the surface orientation,

© Springer International Publishing Switzerland 2016

S.-B. Ho, Principles of Noology, Socio-Affective Computing 3,

DOI 10.1007/978-3-319-32113-4_6



6 The Causal Role of Sensory Information

Fig. 6.1 A 2.5D sketch of a

scene with an object. Scene

consists of a cubicle object

floating in front of a “wall.”

Information includes local

surface orientation (shown

as arrows, and the

background wall has arrows

pointing toward the reader),

distances of points on

surface from the viewer,

discontinuities in surface

orientation (shown as dotted

lines), and the

discontinuities in depth of

surface points (shown as

solid lines) (Marr 1982)

and the discontinuities in the depth of the surfaces as viewed from the vantage point

(Fig. 6.1). In this 2.5D sketch the backside of various objects in the scene are

occluded and cannot be seen.

The contour information in a 2.5D sketch basically implies that the agent has a

full map of the relative distances of every point in the visual scene from its position.

From this information, it is implied that the agent would also be able to compute the

relative distances between any two points in the scene. This level of characterization is necessary before the characterization of full 3D information, such as a 3D

model of an object in the scene, which would lead to the next step of characterization which would be the category of the object involved. Constructing 3D information from 2.5D information requires some pre-knowledge as may be gained from

earlier experiences (e.g., having seen the object involved from a different direction). Sometimes, a novel object or an object of no interest may not have been

assigned a label or category, but their 2.5D or 3D information can still be made

available from the visual input.

As we shall see in our subsequent sections in this chapter as well as in the rest of

the book, this 2.5D and/or 3D information is an important foundation upon which a

noological system builds its causal knowledge about the world which it then applies

to problem solving situations.


Information on Material Points in an Environment

Let us now consider the kind of rudimentary information that is captured by a visual

system that is then presented to the higher level causal learning processes. As the

concern of this book is not with vision per se, we shall use an idealized “super-

6.1 Information on Material Points in an Environment


Fig. 6.2 A super-visual

perception of the















visual” system, a visual system that provides more information than that of a natural

visual system or an artificial computer vision system. As will be described below,

what this super-visual system could “perceive” at any instance is not only based on

the information from one vantage point.

Figure 6.2 shows an Agent embedded in an environment with an arbitrarily

shaped Fence and a couple of objects. We assume that the visual system of the

agent is “super-visual” and “all-sensing” in that as long as a material points exists, it

can sense it. Therefore, occlusion does not matter, and the Agent can see in all 360

directions simultaneously – i.e., it has an unlimited viewing angle. In a realistic

visual system, points that are occluded can also be uncovered through exploration

and points that are currently not in view can be brought into view by the change of

direction of placement of the visual sensory organs – i.e., often by turning the body

or the head on which the organs rest, or by turning the sensory organ within its

supporting structure (e.g., turning an eye in an eye socket). Of course, if some

special technology is available, such as some kind of “penetrating radar,” an

artificial system may be able to perceive all this information about the objects in

the environment at one instance.

Therefore, in Fig. 6.2, arrows a, b, c, d, e represent relative distances to points on

the Fence that are perceived, whether they are occluded or not (The “Notch”

would occlude d in a normal visual situation). The Fence does not have any

thickness so there is only a linear series of material points on its “surface.” We

assume there is a finite number of points on a seemingly “continuous” object such

as the Fence that can be perceived and registered. Each point can be thought of as an

elemental object from which the Fence is made. (If there are material points that

exist beyond the Fence, they can also be sensed.) Arrow f is the relative distance to

an elemental object (characterized by a singular point). Arrows g and h are relative

distances to two elemental points on the surface of another object, from which the

local shape of the object can be constructed. We also assume that any occluded


6 The Causal Role of Sensory Information

point (arrow i) and any interior point (arrow j) of any object can be perceived. In

natural visual systems such as the human visual system, the occluded points cannot

be perceived directly but can be known cognitively based on earlier perception that

is currently available in some sort of visual memory. Similarly, interior points of

non-transparent objects are typically not perceived directly by natural visual systems but can be known cognitively through inferences made based on the surface

points perceived and/or earlier perceptual experiences with the objects involved.

In our real world, there are three ways an object that is originally not perceived

by a visual sensory organ that has a limited viewing angle can come into perceptual

existence. One way is, the object that is currently not within the viewing angle of

the sensory organ can be brought into view through the rotation of the body or the

head on which the sensory organ rests or rotation of the sensory organ within its

supporting structure (e.g., rotation of an eye in its socket). A second way is, the

object can come into perceptual existence through its movement or the movement

of the agent if it had earlier been occluded by other objects. The third way is, the

object had originally not existed and it comes into existence through materialization. (In our current real world, materialization is not common other than in particle

accelerators and the closest analog of materialization is the switching on of some


In any case, when an object comes into existence, some processes are triggered

in the visual system. Firstly, the existence of the object is registered. In a computer

system, this could correspond to the creation of a new entry in the memory system

that contains the following predicate logical representation: Exist(Object(X)). The

representation must of course be accompanied by the corresponding processes that

interprets the meaning of “Exist” and we posit that this is one of the fundamental

built-in processes of a noological system. (The concept Exist is grounded in the

spatiotemporal pattern as discussed in Chap. 4.) As a result of the existence of the

object, the relative distances to the surface points of the object are computed by the

visual system, and in our super-visual system, the distances to the interior points

also become available. As discussed in Chap. 4, an extended object is made up of a

number of material points existing at different locations. This process is shown in

Fig. 6.3.

In Fig. 6.3, the predicate ExistObjectXị, S ẳ fxgị means that an Object(X) is

seen by the noological Agent in the environment that occupies the set of locations, S

¼ {x}. For simplicity, we use x to represent the usual x, y coordinate tuple [x, y] for

2D space. When that happens, assuming that the object is made up of a collection of

material Points, it implies that there is a material Point at every one of the locations

{x}. The following rule states this situation:

ExistObjectXị, S ẳ fxgị ! 8x x 2 S ! ExistðPointðxÞÞÞ


Note that this logical assertion asserts the internal mental representations that are

created within the agent’s “mental” processing machineries. Now, if a material

point exists at a certain location x, it also means that the visual system of the Agent

would compute a relative distance, RD, to it:

6.1 Information on Material Points in an Environment



∀x Exist(Object(X),

S={x}) → (∀x x ∈ S

→ Exist(Point(x)))

∀x Exist(Point(x)) →







(A specific



(Set of


Fig. 6.3 Internal “mental” events of a noological system that take place when an Object(X) is

observed by a visual system. For simplicity, x is used to represent the usual x, y coordinate tuple [x,

y] for 2D space



8x ExistðPointðxÞÞ ! ExistPar RD Agent, PointðxÞ


ExistPar is a process by which the parameter concerned, in this case the relative

distance measure, RD, becomes available to the Agent, provided for by the visual

system. Note that this logical assertion asserts the internal noological processes that

take place within the Agent’s mental processing machineries. The “Exist” operator

is used not only to describe the entities that exist “out there” in the external

environment but also entities that are created and in existence in the noological

system’s “mind” as a result of sensory processing and representations.

And we assume that the visual system can also provide the relative distance

information, RD, between any two material points, whether they belong to the same

extended object or not:

8x, y ExistðPointðxÞÞ ^ ExistðPointðyÞÞ

! ExistParðRDðPointðxÞ, PointðyÞÞÞ


The role of the visual system is to provide information on possible causal agents

(inanimate or animate objects) in the environment for the noological system to take

advantage of for its problem solving purposes. As the simple example in Chap. 1,

Sect. 1.2 shows, if the physical existence of a certain kind of object in the

environment can provide alleviation to the agent’s hunger state, and the agent has

access to information on the parameters (such as location) of the object(s) involved,

it can take advantage of this information to quickly solve its possibly urgent

problem of hunger alleviation before its death state is reached.

Also, other than the visual system, there are other kinds of sensory systems (such

as one that operates with laser or sound waves, in robots or animals such as bats)

that can inform the noological system about relative distance information. Other

than relative distance information, typically there might also be information on

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

4 Natural Language, Semantic Grounding, and Learning to Solve Problem Through Language

Tải bản đầy đủ ngay(0 tr)