Tải bản đầy đủ - 0 (trang)
4 Case Study: A Hybrid Visual Servo Controller Using Q Learning

4 Case Study: A Hybrid Visual Servo Controller Using Q Learning

Tải bản đầy đủ - 0trang

546



Mechatronics



grasp. Driven by six reversible open-loop servo motors, the tip of the closed fingers of the

arm has a reach of 50 cm from the center of the rotating base. The color CCD camera is a

pan-tilt-zoom (PTZ) vision system with 26× optical and 12× digital zoom for a wide range

of applications.

In addition, the robot comes with ACTSTM color blob–tracking software, which is

employed to track the image coordinates of the target object, which in Figure 14.13 is a

toothpaste tube with a red cap.

In addition to the Pioneer Arm and the CCD camera, the robot is equipped with an

onboard computer and numerous sensors, such as a laser distance finder, sonar ring,

compass, and a gyro. The onboard computational capability can support real-time performance of the visual servo control system.

Unlike the classical visual servo control approach [5], the control scheme developed in

this section decouples the control objective between the degrees of freedom of the mobile

robot base and the onboard 5DOF arm. Specifically, the control process is decoupled into

two steps when the robot attempts to pick up an object of interest in its field of view. In the

first step, by observing the visual error between the current image and the prerecorded

image at the desired location and orientation, the mobile robot base attempts to change its

position and heading by adjusting the angular velocities of its wheels so as to make the

robot base move closer to the target object and align itself with the object. In the second

step, when the mobile robot base moves sufficiently close to the target object, the onboard

arm reaches and grasps the object using the image-based eye-to-hand visual servo control

scheme as described in [5]. Because the approach used in the second step is well known,

the present section focuses primarily on the development of the visual servo control law

in the first step.

14.4.1 Vision-Based Mobile Robot Motion Control

14.4.1.1 Control Scheme

In this section, the classical image-based visual servoing (IBVS) approach for fixed-base

manipulators is extended for the motion control of wheeled mobile robots (WMR) as schematically shown in Figure 14.14.

In Figure 14.14, a CCD camera is mounted on the mobile base, which is able to continuously capture live images of the target object. The captured image is compared with the

prerecorded desired image to determine the visual errors. Using the visual errors, the

IBVS controller computes the desired speeds ω1, ω2 of the wheels of the mobile robot, and

sends them to the low-level proportional–integral–derivative (PID) controller of the robot.

The robot then rotates and translates continuously until it reaches the desired position and

orientation at which the camera will observe the desired image.



Desired

image

+



Visual

errors



IBVS



ω1, ω2



ν, ω



FIGURE 14.14

Visual servoing scheme for motion control of wheeled mobile robots.



CCD



Current

image



547



Robotic Learning and Applications



14.4.1.2 Kinematic Model

Four sets of coordinate frames are defined now, namely the robot frame, the camera frame,

the image plane frame, and the pixel coordinate frame, in order to derive the IBVS control law for the mobile robot. The relationship between the first two coordinate frames is

shown in Figure 14.15.

In Figure 14.15, the camera frame is rigidly attached to the camera while the robot frame is

fixed to the mobile robot. The coordinate transformation between the two frames is given by

 Rr

Hrc =  c

 0







drc 

 (14.11)

1 



Here, Hrc represents the homogeneous transformation from the camera frame to the

robot frame, R rc is the rotational matrix, and drc = [dx dy dz ]T is the position vector of the

origin of the camera frame with respect to the robot frame. Because the camera is rigidly

attached to the mobile robot base, the relationship between the camera velocity and the

robot velocity may be obtained from Equation 14.12 as [5]

 0



 0

 v

r

ξr =  0



 ω



 0















  r

 =  Rc

 0

  3× 3









( )



s drc R rc  c

ξ c = Gξ cc (14.12)

r



Rc





yc



x



c



Camera

coordinate frame



Mobile robot



zc

yr



ω1



Left

wheel



Right

wheel

Robot coordinate

frame



zr

FIGURE 14.15

Robot frame and camera frame.



xr



ω2



548



Mechatronics



Here ξ rr is the robot velocity vector with respect to the robot frame. Also, v and ω

are the translational and rotational velocities of the robot on the ground. In addition,

ξ cc = [ vx vy vz ω x ω y wz ]T is the camera velocity vector with respect to the camera frame,

and s drc is the skew symmetric matrix of the vector drc.



( )



The translational and rotational velocities of the robot can be expressed in terms of the

wheel speeds as

 v   D/2

 = 

 ω   D/l







D/2 



− D/l 



ω

 1

 ω 2





 (14.13)





Here, D is the wheel diameter, l is the distance between the two wheels, and ω1 and ω2

are the speeds of the wheels.

14.4.1.3 Camera Projection Model

The relationship between the camera frame, the image plane frame, and the pixel coordinate frame is represented as in Figure 14.16.

In Figure 14.16, P is a point in the work environment with coordinates (x, y, z)c relative

to the camera frame, and p is the projection of P on the image plane with coordinates (u, v)

relative to the image plane frame and with coordinates (r, c) relative to the pixel coordinate frame. The distance between the origin of the camera frame and the image plane is

denoted by λ, and the coordinates of the principal point is (or, oc) with respect to the pixel

coordinate frame. The coordinate transformation between the frames is given by [5]

 u = − s (r − o )

x

r



v

=



s

(

c



o

y

c)









(14.14)



Pixel coordinate

frame {(r,c)}

c



P = (x, y, z)



r



yc



p = (u, v)

x



c



v

zc



Camera coordinate

frame {(x c, y c, z c)}



u



Image plane coordinate

frame {(u,v)}



(or , oc)



c

Image plane

λ

FIGURE 14.16

Relationship between the camera frame, the image coordinate frame, and the pixel coordinate frame.



549



Robotic Learning and Applications





 r = − fx





c = −f

y











xc

+ or

zc

yc

+ oc

zc



(14.15)



Here, sx and sy are the horizontal and vertical dimensions, respectively, of a pixel as given

λ

λ

by f x =

and f y =

.

sx

sy

Let p be a feature point on the image plane with coordinates (u, v). The moving velocity

of p can be expressed in terms of the camera velocity using the interaction matrix L as [5]:



 u 

 

 v 





 λ

− c

c

= Lξ c =  z

 0





0





λ

zc



u

zc

v

zc



uv

λ

λ 2 + v2

λ







λ 2 + u2

λ





uv

λ





v 

 ξc

 c (14.16)

−u 





In the present work, the depth information zc in the interaction matrix (Equation 14.16)

can be estimated using the onboard laser/sonar distance finder of the mobile robot in real

time.

14.4.1.4 Control Law

In this section, an image-based eye-in-hand visual servo control law is developed for

motion control of a wheeled mobile robot. The underlying concept is that the controller

will continuously adjust the wheel speeds of the mobile robot so that the coordinates (u, v)

of the feature point are moved toward the desired position (ud, vd) on the image. In particular, the error vector of the image feature point is defined as











u − ud  − sx (r − rd )

=

e=

v − vd  − sy (c − cd )







 (14.17)





d(u − ud )

 dt  



 =  u 

e = 

d( v − v )  v  (14.18)

d





 dt 

By substituting Equations 14.16 and 14.12 into Equation 14.18, one obtains







 

e = Lξ cc = LG−1 ξ rr = M  v  (14.19)

ω 



550



Mechatronics



Here, M is a 2 × 2 matrix, which is constituted by the third and the fifth columns of LG −1.

By substituting Equation 14.13 into Equation 14.19, we get

 D/2

e = M 

 D/l







D/2

− D/l



  ω1 

 (14.20)



  ω 2 



Equation 14.20 shows the relationship between the error rate defined on the image plane

and the angular velocities of the wheels of the mobile robot. In view of Equation 14.20,

assuming the error dynamics as e = −ke, a proportional controller based on the Lyapunov

method is designed according to







ω

 1

 ω 2



  D/2

= 

  D/l



D/2 



− D/l 



−1



 D/2

M e= 

 D/l

−1



D/2 



− D/l 



−1



M −1 (− ke) (14.21)



Here, k is the scalar proportional gain, with k > 0. The control law is obtained by substituting Equation 14.17 into Equation 14.21, as







ω

 1

 ω 2





 = −k





 D/2



 D/l



D/2 



− D/l 



−1



 − s (r − r ) 

x

d

 (14.22)

M −1 



(



s

c

c

 y

d) 







In Equation 14.22, the pixel coordinates r and c can be directly measured from the current

image using the available image-processing software. Therefore, according to Equation

14.22, the desired angular velocities of the two wheels of the mobile robot can be directly

computed from the image measurements. Moreover, it is noted that the developed controller guarantees asymptotic stability of the closed-loop system in view of e = −ke.

14.4.1.5 Experimental Results

In this section, the vision-based mobile grasping system is employed to validate the visual

servo controller. In the experiment, the mobile robot employs its onboard CCD camera

to continuously observe the position of the target object (toothpaste tube with a red cap)

shown in Figure 14.13 with the ACTS color blob–tracking software and computes the visual

error e on the image plane. Next, the IBVS controller determines the desired wheel speeds

ω1 and ω2 using Equation 14.22 and accordingly commands the low-level controller of the

mobile robot to gradually move the robot close to the target object. When the robot is sufficiently close to the object, it grasps the object with assistance of the onboard 5DOF camera.

The overall process of the experiment is shown as a still image serial in Figure 14.17.

The trajectory of the visual feature point on the image plane is shown in Figure 14.18.

In Figure 14.18, the initial position of the visual feature is close to the top right corner of

the image and then moves directly toward the desired position at the bottom left corner.

The position and heading history of the robot in the entire process is shown in Figure

14.19, the pixel errors on the image plane are shown in Figure 14.20, and the control inputs

are given in Figure 14.21.



551



Robotic Learning and Applications



(a)



(b)



(c)



(d)



(e)



(f)



(g)

FIGURE 14.17

Vision-based autonomous grasping using a wheeled mobile robot and the developed IBVS controller.



552



Mechatronics



180



Pixel trajectory



160

140



c



120

100

80

60

40

20

100



150



200



250



r



300



350



400



450



FIGURE 14.18

Trajectory of the visual feature point (object) on the image plane when the robot approaches the object to grasp it.



500



x

y

Theta



400



300



200



100



0

–100



0



20



40



60

t



80



100



120



FIGURE 14.19

Trajectory (position and heading) of the mobile robot in the physical environment for the mobile manipulation

task.



553



Robotic Learning and Applications



50



Errorr



0



Errorc



–50

–100

–150

–200

–250

–300

–350



0



20



40



60



80



100



120



t

FIGURE 14.20

Visual errors on the image plane when the robot approaches the object to grasp it.



30



v

w



20



10



0



–10



–20



–30



0



20



40



60

t



FIGURE 14.21

Control inputs of the plant under the new visual servo controller.



80



100



120



554



Mechatronics



From Figures 14.17 through 14.21, it is clear that the developed IBVS controller is able

to effectively drive a wheeled mobile robot toward a target object until its CCD camera

observes the desired visual features.

14.4.2 Hybrid Controller for Robust Visual Servoing

14.4.2.1 Control Scheme

Although the experimental results in the previous section validate the proportional controller for visual servoing of a wheeled mobile robot, some shortcomings are still present

in the approach. Most importantly, this controller cannot guarantee the retention of visual

features within the image plane. On one hand, in order to increase the speed of response

and to reduce steady-state errors, the controller gain has to be sufficiently high. On the

other hand, when the controller gain is too big, the control inputs (v and ω) increase correspondingly, and as a result, the visual features can quickly move out of the image plane,

which leads to failure of the controller. In view of such shortcomings, it is necessary to

improve the robustness of the controller developed in the previous section. In the present section, a hybrid switching controller is developed to eliminate the main shortcomings of the previous controller [6]. The new control scheme is schematically presented in

Figure 14.22.

In Figure 14.22, there are two control loops: the IBVS controller developed in the previous section and a new Q learning–based controller. The IBVS controller will drive the

mobile robot toward the target object, and the Q-learning controller will observe the

visual features on the image plane and select an optimal action (an appropriate rotational

or translational movement) so that the visual features are pushed from the image edge to

the center. In addition, the Q-learning controller is able to continuously learn and improve

its action–selection policy online using its machine-learning algorithm.

There is a rule-based arbitrator in the control system as indicated in Figure 14.22, which

autonomously switches between the outputs of the controllers so that the overall hybrid

control system achieves a tradeoff between its robustness and accuracy.



Q-learning

controller



World state

extraction



Current

image

Q-table

(knowledge base)



State

Desired

image

+



Action



Visual

errors



ω 1, ω 2

IBVS

controller



ν, ω



Arbitrator



CCD

camera





Rule base



FIGURE 14.22

Hybrid control scheme for robust visual servoing.



Current

image



555



Robotic Learning and Applications



The Q-learning controller can be trained offline to improve its performance. Once it is

trained, the Q-learning controller can quickly select correct actions in real time (usually in

less than 10 msec in the experiments).

14.4.2.2 Q-Learning Controller

The Q-learning controller indicated in Figure 14.22 is a customized controller, which

will keep the visual features within the image plane. It is based on the machine learning

approach Q learning. The main advantage of the Q-learning controller is that it is able to

autonomously learn the action–selection policy of the mobile robot and improve the controller performance continuously so that the visual features remain in the field of view of

the CCD camera. Due to integration of the Q-learning controller with the IBVS controller developed in the previous section, the robustness of the resulting hybrid controller is

improved.

14.4.2.2.1  States, Actions, and Rewards

In this section, a Q learning–based discrete event controller is developed to improve the

robustness of a vision-based mobile manipulation system. In particular, the controller will

continuously grab images from the CCD camera, compute the current world state based on

the positions of the visual features in the current image, and will employ the Q-learning

algorithm to learn and select an optimal action for the robot under the current state. This

optimal action (a command for desired translational or rotational motion) will be sent to

the low-level controller of the mobile robot so that the visual features move toward a desirable area on the image plane. After the robot takes the action, a reward will be computed

based on the new positions of the visual features on the image plane, and this reward will

be used to update the Q value of the action under that world state.

The definition of the world states in the present Q-learning controller incorporates a

discrete grid world of the image plane as shown in Figure 14.23.

In Figure 14.23, the 640 × 480 CCD image plane is divided into an 8 × 6 discrete grid

world in which each cell of the grid has a length of 80 pixels. When an image is grabbed

x

(0, 0)



(0, 1)



(0, 7)



Desirable

area



y

(5, 0)



(5, 1)



Safe area



Dangerous area



FIGURE 14.23

Discrete grid world defined on the 640 × 480 CCD image plane.



(5, 7)



556



Mechatronics



from the CCD camera, the position of the visual feature point on the image plane can be

easily converted into the coordinates in the grid world as follows:

x = INT(r/80)

y = INT(c/80)







(14.23)



Here r = 0,1,⋯639 and c = 0,1,⋯479 are the pixel coordinates of the visual feature points

on the image plane, x = 0,1,⋯7 and y = 0,1,⋯5 are the corresponding grid coordinates in the

grid world, and INT() is a function that converts a floating point number into an integer by

discarding its decimal portion.

The world state in the Q-learning controller is made up of the grid coordinates of the

visual feature and the current depth from the robot to the target object, as given by





s = (x, y, d)(14.24)

Here, d is a discrete index value of the current depth, which is computed according to







 0,



 1,

d= 

 2,

 3,





if depth < 40 cm

if 40 cm < depth <= 90 cm

if 90 cm < depth <= 140 cm



(14.25)



if depth > 140 cm



Under each state, it is assumed that the mobile robot is able to select one of the following

four actions:

• Action #1: Move forward 5 cm

• Action #2: Move backward 5 cm

• Action #3: Rotate 5°

• Action #4: Rotate −5°

After the robot takes an action, it will receive a reward r from the environment. This

reward is computed based on the new position of the visual feature on the image plane,

according to







 +20, if (x , y ) ∈ the desirable area



if ( x , y ) ∈ thee safe area

 0,

r=

(14.26)

 −10, if ( x , y ) ∈ the dangerous area

 −20, if ( x , y ) is out of the image





Here, (x, y) is the new position of the visual feature in the grid world after the robot takes

an action, and the definitions of the desirable, safe, and dangerous areas are shown in

Figure 14.23. From Equation 14.26, it is clear that the robot is encouraged to select correct



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

4 Case Study: A Hybrid Visual Servo Controller Using Q Learning

Tải bản đầy đủ ngay(0 tr)

×