Tải bản đầy đủ - 0 (trang)
5 Heuristics, Metaheuristics, and Hyper-Heuristics

5 Heuristics, Metaheuristics, and Hyper-Heuristics

Tải bản đầy đủ - 0trang

10



1 Introduction



problem. By searching over a large set of feasible solutions, metaheuristics can often

find good solutions with less computational effort than calculus-based methods, or

simple heuristics, can.

Metaheuristics can be single-solution-based or population-based. Single-solution

based metaheuristics are based on a single solution at any time and comprise

local search-based metaheuristics such as SA, Tabu search, iterated local search

[40,42], guided local search [61], pattern search or random search [31], Solis–Wets

algorithm [54], and variable neighborhood search [45]. In population-based metaheuristics, a number of solutions are updated iteratively until the termination condition is satisfied. Population-based metaheuristics are generally categoried into EAs

and swarm-based algorithms. Single-solution-based metaheuristics are regarded to

be more exploitation-oriented, whereas population-based metaheuristics are more

exploration-oriented.

The idea of hyper-heuristics can be traced back to the early 1960s [23]. Hyperheuristics can be thought of as heuristics to choose heuristics or as search algorithms

that explore the space of problem solvers. A hyper-heuristic is a heuristic search

method that seeks to automate the process of selecting, combining, generating, or

adapting several simpler heuristics to efficiently solve hard search problems. The lowlevel heuristics are simple local search operators or domain-dependent heuristics,

which operate directly on the solution space for a given problem instance. Unlike

metaheuristics that search in a space of problem solutions, hyper-heuristics always

search in a space of low-level heuristics.

Heuristic selection and heuristic generation are currently the two main methodologies in hyper-heuristics. In the first method, the hyper-heuristic chooses heuristics

from a set of known domain-dependent low-level heuristics. In the second method,

the hyper-heuristic evolves new low-level heuristics by utilizing the components

of the existing ones. Hyper-heuristics can be based on genetic programming [11]

or grammatical evolution [10], which becomes an excellent candidate for heuristic

generation.

Several Single-Solution-Based Metaheuristics

Search strategies that randomly generate initial solutions and perform a local search

are also called multi-start descent search methods. However, to randomly create an

initial solution and perform a local search often results in low solution quality as the

complete search space is uniformly searched and search cannot focus on promising

areas of the search space.

Variable neighborhood search [45] combines local search strategies with dynamic

neighborhood structures subject to the search progress. The local search is an intensification step focusing the search in the direction of high-quality solutions. Diversification is a result of changing neighborhoods. By changing neighborhoods, the

method can easily escape from local optima. With an increasing cardinality of the

neighborhoods, diversification gets stronger as the shaking steps can choose from a

larger set of solutions and local search covers a larger area of the search space.

Guided local search [61] uses a similar principle and dynamically changes the

fitness landscape subject to the progress that is made during the search so that local



1.5 Heuristics, Metaheuristics, and Hyper-Heuristics



11



search can escape from local optima. The neighborhood structure remains constant.

It starts from a random solution x0 and performs a local search returning the local

optimum x1 . To escape the local optimum, a penalty is added to the fitness function

f such that the resulting fitness function h allows local search to escape. A new local

search is started from x1 using the modified fitness function h. Search continues until

a termination criterion is met.

Iterated local search [40,42] connects the unrelated local search phases as it creates

initial solutions not randomly but based on solutions found in previous local search

runs. If the perturbation steps are too small, the search cannot escape from a local

optimum. If perturbation is too strong, the search has the same behavior as multi-start

descent search methods. The modification step as well as the acceptance criterion

can depend on the search history.



1.6 Optimization

Optimization can generally be categorized into discrete or continuous optimization,

depending on whether the variables are discrete or continuous ones. There may be

limits or constraints on the variables. Optimization can be a static or a dynamic

problem depending upon whether the output is a function of time. Traditionally,

optimization is solved by calculus-based method, or based on random search, or

enumerative search. Heuristics-based optimization is the topic treated in this book.

Optimization techniques can generally be divided into derivative methods and

nonderivative methods, depending on whether or not derivatives of the objective

function are required for the calculation of the optimum. Derivative methods are

calculus-based methods, which can be either gradient search methods or secondorder methods. These methods are local optimizers. The gradient descent is also

known as steepest descent. It searches for a local minimum by taking steps along

the negative direction of the gradient of the function. Examples of second-order

methods are Newton’s method, the Gauss-Newton method, quasi-Newton methods,

the trust-region method, and the Levenberg-Marquardt method. Conjugate gradient

and natural gradient methods can also be viewed as reduced forms of the quasiNewton method.

Derivative methods can also be classified into model-based and metric-based

methods. Model-based methods improve the current point by a local approximating

model. Newton and quasi-Newton methods are model-based methods. Metric-based

methods perform a transformation of the variables and then apply a gradient search

method to improve the point. The steepest-descent, quasi-Newton, and conjugate

gradient methods belong to this latter category.

Methods that do not require gradient information to perform a search and sequentially explore the solution space are called direct search methods. They maintain

a group of points. They utilize some sort of deterministic exploration methods to

search the space and almost always utilize a greedy method to update the maintained



12



1 Introduction



Figure 1.3 The landscape of Rosenbrock function f (x) with two variables x1 , x2 ∈

[−204.8, 204.8]. The spacing of the grid is set as 1. There are many local minima, and the global

minimum 0 is at (1, 1).



points. Simplex search and pattern search are two examples of effective direct search

methods.

Typical nonderivative methods for multivariable functions are random-restart

hill-climbing, random search, many heuristic and metaheuristic methods, and their

hybrids. Hill-climbing attempts to optimize a discrete or continuous function for

a local optimum. When operating on continuous space, it is called gradient ascent.

Other nonderivative search methods include univariant search parallel to an axis (i.e.,

coordinate search method), sequential simplex method, and acceleration methods in

direct search such as the Hooke-Jeeves method, Powell’s method and Rosenbrock’s

method. Interior-point methods represent state-of-the-art techniques for solving linear, quadratic, and nonlinear optimization programs.

Example 1.1: The Rosenbrock function

n−1



f (x) =



100 xi+1 − xi2



2



+ (1 − xi )2 .



i=1



has the global minimum f (x) = 0 at xi = 1, i = 1, . . . , n. Our simulation is limited

to the two-dimensional case (n = 2), with x1 , x2 ∈ [−204.8, 204.8]. The landscape

of this function is shown in Figure 1.3.



1.6.1 Lagrange Multiplier Method

The Lagrange multiplier method can be used to analytically solve continuous function optimization problem subject to equality constraints [24]. By introducing the



1.6 Optimization



13



Lagrangian formulation, the dual problem associated with the primal problem is

obtained, based on which the optimal values of the Lagrange multipliers can be

found.

Let f (x) be the objective function and hi (x) = 0, i = 1, . . . , m, be the constraints.

The Lagrange function can be constructed as

m



L (x; λ1 , . . . , λm ) = f (x) +



λi hi (x),



(1.1)



i=1



where λi , i = 1, . . . , m, are called the Lagrange multipliers.

The constrained optimization problem is converted into an unconstrained optimization problem: Optimize L (x; λ1 , . . . , λm ). By setting



L (x; λ1 , . . . , λm ) = 0,

∂x



(1.2)





L (x; λ1 , . . . , λm ) = 0, i = 1, . . . , m,

∂λi



(1.3)



and solving the resulting set of equations, we can obtain the x position at the extremum

of f (x) under the constraints.

To deal with constraints, the Karush-Kuhn-Tucker (KKT) theorem, as a generalization to the Lagrange multiplier method, introduces a slack variable into each

inequality constraint before applying the Lagrange multiplier method. The conditions

derived from the procedure are known as the KKT conditions [24].



1.6.2 Direction-Based Search and Simplex Search

In direct search, generally the gradient information cannot be obtained; thus, it is

impractical to implement a step in the negative gradient direction for a minimum

problem. However, when the objectives of a group of solutions are available, the

best one can guide the search direction of the other solutions. Many direction-based

search methods and EAs are inspired by this intuitive idea.

Some of the direct search methods use improvement direction information to

search the objective space. Thus, it is useful to embed these directions into an EA as

either a local search method or an exploration operator.

Simplex search [47], introduced by Nelder and Mead in 1965, a well-known deterministic direction-based search method. MATLAB contains a direct search toolbox

based on simplex search. Scatter search [26] includes the elitism mechanism into

simplex search. Like simplex search, for a group of points, the algorithm finds new

points, accepts the better ones, and discards the worse ones. Differential evolution

(DE) [56] uses the directional information from the current population. The mutation

operator of DE needs three randomly selected different individuals from the current

population for each individual to form a simplex-like triangle.



14



1 Introduction



Simplex Search

Simplex search is a group-based deterministic local search method capable of exploring the objective space very fast. Thus many EAs use simplex search as a local search

method after mutation.

A simplex is a collection of n + 1 points in n-dimensional space. In an optimization

problem involving n variables, simplex method searches for an optimization solution

by evaluating a set of n + 1 points. The method continuously forms new simplices

by replacing the point having the worst performance in a simplex with a new point.

The new point is generated by reflection, expansion, and contraction operations.

In a multidimensional space, the subtraction of two vectors means a new vector

starting at one vector and ending at the other, like x2 − x1 . We often refer to the

subtraction of two vectors as a direction. Addition of two vectors can be implemented

in a triangular way, moving the start of one vector to the end of the other to form

another vector. The expression x3 + (x2 − x1 ) can be regarded as the destination of

a moving point that starts at x3 and has a length and direction of x2 − x1 .

For every new simplex, several points are assigned according to their objective

values. Then simplex search repeats reflection, expansion, contraction, and shrink in

a very efficient and deterministic way. Vertices of the simplex will move toward the

optimal point and the simplex will become smaller and smaller. Stop criteria can be

selected as a predetermined number of maximal iterations, the length of the edge or

the improving rate of B.

Simplex search for minimization is shown in Algorithm 1.1. The coefficients for

the reflection, expansion, contraction, and shrinking operations are typically selected

as α = 1, β = 2, γ = −1/2, and δ = 1/2. The initial simplex is important. The

search may easily get stuck for too small an initial simplex. This simplex should be

selected depending on the nature of the problem.



1.6.3 Discrete Optimization Problems

The discrete optimization problem is also known as combinatorial optimization problem (COP). Any problem that has a large set of discrete solutions and a cost function

for rating those solutions relative to one another is a COP. COPs are known to be

NP-complete.1 The goal for COPs is to find an optimal solution or sometimes a

nearly optimal solution. In COPs, the number of solutions grows exponentially with

the size of the problem n at O(n!) or O (en ) such that no algorithm can find the global

minimum solution in a polynomial computational time.

Definition 1.1 (Discrete optimization problem). A discrete optimization problem

is denoted as (X , f , ), or as minimizing the objective function

min f (x), x ∈ X , subject to

1 Namely,



nondeterministic polynomial-time complete.



,



(1.4)



1.6 Optimization



15



Algorithm 1.1 (Simplex Search).

1. Initialize parameters.

Randomize the set of individuals xi .

2. Repeat:

a. Find the worst and best individuals as xh and xl .

Calculate the centroid of all xi ’s, i = h, as x.

b. Enter reflection mode:

xr = x + α(x − xh );

c. if f (xl ) < f (xr ) < f (xh ), xh ← xr ;

else if f (xr ) < f (xl ), enter expansion mode:

xe = x + β(x − xh );

if f (xe ) < f (xl ), xh ← xe ;

else xh ← xr ;

end

else if f (xr ) > f (xi ), ∀i = h, enter contraction mode:

xc = x + γ(x − xh );

if f (xc ) < f (xh ), xh ← xc ;

else enter shrinking mode:

xi = xl + δ(xi − xl ), ∀i = l;

end

end

until termination condition is satisfied.



where X ⊂ RN is the search space defined over a finite set of N discrete decision

variables x = (x1 , x2 , . . . , xN )T , f : X → R, is the set of constraints on x. Space

X is constructed according to all the constraints imposed on the problem.

Definition 1.2 (Feasible solution). A vector x that satisfies the set of constraints for

an optimization problem is called a feasible solution.

Traveling salesman problem (TSP) is perhaps the most famous COP. Given a set

of points, either nodes on a graph or cities on a map, find the shortest possible tour

that visits every point exactly once and then returns to its starting point. There are

(n − 1)!/2 possible tours for an n-city TSP. TSP arises in numerous applications,

from routing of wires on a printed circuit board (PCB), VLSI circuit design, to fast

food delivery.

Multiple traveling salesmen problem (MTSP) generalizes TSP using more than

one salesman. Given a set of cities and a depot, m salesmen must visit all cities

according to the constraints that the route formed by each salesman must start and

end at the depot, that each intermediate city must be visited once and by a single

salesman, and that the cost of the routes must be minimum. TSP with a time window

is a variant of TSP in which each city is visited within a given time window.

The vehicle routing problem concerns the transport of items between depots and

customers by means of a fleet of vehicles. It can be used for logistics and public



16



1 Introduction



services, such as milk delivery, mail or parcel pick-up and delivery, school bus

routing, solid waste collection, dial-a-ride systems, and job scheduling. Two wellknown routing problems are TSP and MTSP.

The location-allocation problem is defined as follows. Given a set of facilities,

each of which serves a certain number of nodes on a graph, the objective is to place

the facilities on the graph so that the average distance between each node and its

serving facility is minimized.



1.6.4 P, NP, NP-Hard, and NP-Complete

An issue related to the efficiency and efficacy of an algorithm is how hard the problem

itself is. The optimization problem is first transformed into a decision problem.

Problems that can be solved using a polynomial-time algorithm are tractable. A

polynomial-time algorithm has an upper bound O(nk ) on its running time, where k is

a constant and n is the problem size (input size). Usually, tractable problems are easy

to solve as running time increases relatively slowly with n. In contrast, problems are

intractable if they cannot be solved by a polynomial-time algorithm and there is a

lower bound on the running time which is (k n ), where k > 1 is a constant and n is

the input size.

The complexity class P (standing for polynomial time complexity) is defined as

the set of decision problems that can be solved by a deterministic Turing machine

using an algorithm with worst-case polynomial time complexity. P problems are

usually easy as there are algorithms that solve them in polynomial time.

The class NP (standing for nondeterministic polynomial time complexity) is the

set of all decision problems that can be verified by a nondeterministic Turing machine

using a nondeterministic algorithm in worst-case polynomial time. Although nondeterministic algorithms cannot be executed directly on conventional computers, this

concept is important and helpful for the analysis of the computational complexity

of problems. All problems in P also belong to the class NP, i.e., P ⊆ NP. There are

also problems where correct solutions cannot be verified in polynomial time.

All decision problems in P are tractable. Those problems that are in NP, but not in

P, are difficult as no polynomial-time algorithms exist for them. There are problems

in NP where no polynomial algorithm is available and which can be transformed into

one another with polynomial effort. A problem is said to be NP-hard, if an algorithm

for solving this problem is polynomial-time reducible to an algorithm that is able to

solve any problem in NP. Therefore, NP-hard problems are at least as hard as any

other problem in NP, and are not necessarily in NP.

The set of NP-complete problems is a subset of NP [14]. A decision problem A is

said to be NP-complete, if A is in NP and A is also NP-hard. NP-complete problems

are the hardest problems in NP. They all have the same complexity. They are difficult

as no polynomial-time algorithms are known. Decision problems that are not in NP

are even more difficult. The relationship between all these classes is illustrated in

Figure 1.4.



1.6 Optimization



17



Figure 1.4 The relationship

between P, NP, NP-complete,

and NP-hard classes.



NP

P



NP hard

NP complete



Practical COPs are all NP-complete or NP-hard. Right now, no algorithm with

polynomial time complexity can guarantee that an optimal solution will be found.



1.6.5 Multiobjective Optimization Problem

A multiobjective optimization problem (MOP) requires finding a variable vector x

in the domain X that optimizes the objective vector f (x).

Definition 1.3 (Multiobjective optimization problem). MOP is to optimize a system with k conflicting objectives

min f (x) = (f1 (x), f2 (x), . . . , fk (x))T , x ∈ X



(1.5)



gi (x) ≤ 0, i = 1, 2, . . . , m,



(1.6)



hi (x) = 0, i = 1, 2, . . . , p,



(1.7)



subject to



where x = (x1 , x2 , . . . , xn )T ∈ Rn , the objective functions fi : Rn → R, i = 1, . . . , k,

and gi , hj : Rn → R, i = 1, . . . , m, j = 1, . . . , p are the constraint functions of the

problem.

Conflicting objectives will be the case where increasing the quality of one objective

tends to simultaneously decrease the quality of another objective. The solution to

an MOP is not a single optimal solution, but a set of solutions representing the best

trade-offs among the objectives.

In order to optimize a system with conflicting objectives, the weighted sum of

these objectives is usually used as the compromise of the system

k



wi f i (x),



F(x) =



(1.8)



i=1

fi (x)

are normalized objectives, and ki=1 wi = 1.

where f i (x) = |max(f

i (x))|

For many problems, there are difficulties in normalizing the individual objectives,

and also in selecting the weights. The lexicographic order optimization is based on

the ranking of the objectives in terms of their importance.



18



1 Introduction



The Pareto method is a popular method for multiobjective optimization. It is based

on the principle of nondominance. The Pareto optimum gives a set of solutions for

which there is no way of improving one criterion without deteriorating another

criterion. In MOPs, the concept of dominance provides a means by which multiple

solutions can be compared and subsequently ranked.

Definition 1.4 (Pareto dominance). A variable vector x1 ∈ Rn is said to dominate

another vector x2 ∈ Rn , denoted x1 x2 , if and only if x1 is better than or equal to

x2 in all attributes, and strictly better in at least one attribute, i.e., ∀i: fi (x1 ) ≥ fi (x2 )

∧∃j: fj (x1 ) > fj (x2 ).

For two solutions x1 , x2 , if x1 is better in all objectives than x2 , x1 is said to

strongly dominate x2 . If x1 is not worse than x2 in all objectives and better in at least

one objective, x1 is said to dominate x2 . A nondominated set is a set of solutions that

are not weakly dominated by any other solution in the set.

Definition 1.5 (Nondominance). A variable vector x1 ∈ X ⊂ Rn is nondominated

with respect to X , if there does not exist another vector x2 ∈ X such that x2 ≺ x1 .

Definition 1.6 (Pareto optimality). A variable vector x∗ ∈ F ⊂ Rn (F is the feasible region) is Pareto optimal if it is nondominated with respect to F .

Definition 1.7 (Pareto optimal frontier). The Pareto optimal frontier P ∗ is defined

by the space in Rn formed by all Pareto optimal solutions P ∗ = {x ∈ F |x

is Pareto optimal}.

The Pareto optimal frontier is a set of optimal nondominated solutions, which

may be infinite.

Definition 1.8 (Pareto front). The Pareto front PF ∗ is defined by

PF ∗ = {f (x) ∈ Rk |x ∈ P ∗ }.



(1.9)



The Pareto front is the image set of the Pareto optimal frontier mapping into the

objective space.

Obtaining the Pareto front of a MOP is the main goal of multiobjective optimization. A good solution must contain a limited number of points, which should be as

close as possible to the exact Pareto front, as well as they should be uniformly spread

so that no regions are left unexplored.

An illustration of Pareto optimal solutions for a two-dimensional problem with

two objectives is given in Figure 1.5. The upper border from points A to B of the

domain X , denoted P ∗ , contains all Pareto optimal solutions. The frontier from points

f A to f B along the lower border of the domain Y , denoted PF ∗ , contains all Pareto

frontier in the objective space. For two points a and b, their mapping f a dominates f b ,



1.6 Optimization



19



x1



f1

A



f1*



P*

X



a



B



fA



fb



f (x)



b



Y

fa



fB



PF *



Parameter space



x2



Objective space f2*



f2



Figure 1.5 An illustration of Pareto optimal solutions for a two-dimensional problem with two

objectives. X ⊂ Rn is the domain of x, and Y ⊂ Rm is the domain of f (x).



(a)



(b)



(c)



f1



f1



f1



f1*



fA



f1*



fA



fB



PF *



PF *

f2*



f2



f1*



Y



Y



fA



fB

f2*



Y

fB



f2



PF *



f2*



f2



Figure 1.6 Different Pareto fronts. a Convex. b Concave. c Discontinuous.



denoted f a ≺ f b . Hence, the decision vector xa is a nondominated solution. Figure 1.6

illustrates that Pareto fronts can be convex, concave, or discontinuous.

Definition 1.9 (ε-dominance). A variable vector x1 ∈ Rn is said to ε-dominate

another vector x2 ∈ Rn , denoted x1 ε x2 , if and only if x1 is better than or

equal to εx2 in all attributes, and strictly better in at least one attribute, i.e., ∀i:

fi (x1 ) ≥ fi (εx2 ) ∧∃j: fj (x1 ) > fj (εx2 ) [69].

If ε = 1, ε-dominance is the same as Pareto dominance; otherwise, the area dominated by xi is enlarged or shrunk. Thus, ε-dominance relaxes the area of Pareto

dominance by a factor of ε.



1.6.6 Robust Optimization

The robustness of a particular solution can be confirmed by resampling or by reusing

neighborhood solutions. Resampling is reliable, but computationally expensive. In



20



1 Introduction



contrast, the method of reusing neighborhood solutions is cheap but unreliable. A

confidence measure increases the reliability of the latter method. In [44], confidencebased operators are defined for robust metaheuristics. The confidence metric and five

confidence-based operators are employed to design confidence-based robust PSO

and confidence-based robust GA. History can be utilized in helping to estimate the

expected fitness of an individual to produce more robust solutions in EAs.

Confidence metric defines the confidence level of a robust solution. The highest

confidence is achieved when there are a large number of solutions available with

greatest diversity within a suitable neighborhood around the solution in the parameter

space. Mathematically, confidence is expressed by [44]

n

,

(1.10)

C=



where n is the number of sampled points in the neighborhood, r is the radius of the

neighborhood, and σ is the distribution of the available points in the neighborhood.



1.7 Performance Indicators

For evaluation of different EA or iterative algorithms, one can implement overall

performance indicators and evolving performance indicators.

Overall Performance Indicators

The overall performance indicators provide a general description for the performance. Overall performance can be compared according to their efficacy, efficiency,

and reliability on a benchmark problem with many runs.

Efficacy evaluates the quality of the results without caring about the speed of an

algorithm. Mean best fitness (MBF) is defined as the average of the best fitness in the

last population over all runs. The best fitness values thus far can be used as a more

absolute measure for efficacy.

Reliability indicates the extent to which the algorithm can provide acceptable

results. Success rate (SR) is defined as the percentage of runs terminated with success.

A successful run is defined as the difference between the best fitness value in the last

generation f ∗ and a predefined value f o under a predefined threshold ε.

Efficiency requires finding the global optimal solution rapidly. Average number

of evaluations to a solution (AES) is defined as the average number of evaluations

it takes for the successful runs. If an algorithm has no successful runs, its AES is

undefined.

Low SR and high MBF may indicate that the algorithm converges slowly, while

high SR and low MBF may indicate that the algorithm is basically reliable, but may

provide very bad results accidentally. It is desirable to have smaller AES and larger

SR, thus small AES/SR criterion considers reliability and efficiency at the same time.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

5 Heuristics, Metaheuristics, and Hyper-Heuristics

Tải bản đầy đủ ngay(0 tr)

×