Chapter 12. How Can Specialized Discrete and Convex Optimization Methods be Married
Tải bản đầy đủ - 0trang
A . M . Geoffrion
206
programs.” We define this class, survey its applications, describe four promising
approaches to the development of applicable hybrid algorithms, and finally
conclude with an indication of attractive opportunities for further research.
1.1. Definition of discretelconvex programming
By a discretelconvex program we mean an optimization problem of t h e form
Min cs + f , ( x )
s,x
s.t. 6 E A , x E Xs,
where A is a finite set of possible discrete choices or logical designs 6, and X , is a
convex set of possible continuous choices or activities x associated with any given 6.
The objective function distinguishes the direct cost of 6, cs, from the cost f , ( x ) of
the activities carried out under 6. The asymmetry of the notation in 6 and x reflects
the fact that, in many of the applications we have in mind, the choice of x is
predicated on the choice of 6 but not conversely; that is, the very domain of x may
depend on 6 whereas the domain of 6 can always be described independently of x.
More specifically, we presume that (DC) satisfies these two properties:
Property 1. For any fixed 6 in A, f,(. ) is convex on X , and its minimum can be
computed with reasonable efficiency by a known convex programming algorithm
(e.g., by LP, NLP, a network flow method, etc.)
Property 2 . A reasonable efficient discrete or combinatorial optimization algorithm is known for some problem related to (and hopefully a reasonable
approximation of)
Min c,
,€A
+ v(6),
where v ( 6 )
Inf f , ( x ) .
XEX.
Problem (D) obviously is equivalent to (DC): it is infeasible or has unbounded
optimal value if and only if (DC) does; and if S o is optimal (E1-optimal) in (D) and
xo is optimal (&,-optimal) in the “inner” problem defining v ( S o ) , then (6°,xo) is
optimal (el + &,-optimal) in (DC).’Notice that Property 1 assures the relatively easy
evaluation of v ( 6 ) . Exactly what relative of (D) for which a discrete or combinatorial algorithm is available is deliberately left unspecified in Property 2. Usually v (
must be approximated by a much simpler function in such an algorithm, and
sometimes c, or even A must also be approximated. The intent of Property 2 is
simply to focus on applications where the discrete aspect of the problem is tractable
provided suitable approximations are made to submerge the continuous aspect.
One further comment must be made about (DC): although X , will necessarily be
a subset of a finite-dimensional vector space, no such restriction need be imposed
on A . In some applications S will be a map of one finite set into another, or some
a )
’
See, e.g., [15, Theorem I] (where (D) would be called the “projection” of (DC) onto 6).
How can optimization methods be married?
207
other combinatorial object, rather than a tuple of real numbers. It is not the
structure of the space in which A dwells, but rather the logical structure of A itself
(in addition to finiteness) which permits mathematical manipulations involving 6 to
be carried out. Of course (DC) could always be reformulated so that 6 is replaced
by integer-valued indicator variables. However, in most applications such an
artifice serves only to obscure the natural structure of A and t o cause an excessive
increase in representational complexity or size or both.’ It therefore seems wise not
to insist that (DC) be stated as a conventional mathematical programming problem
in real variables and equality or inequality constraints.
2. Some applications
Here we survey briefly some of the principal types of applications which fall
within the domain of discrete/convex pi-ogramming as defined above.
2.1. Production scheduling [21, 24, 28, 291
Setup and sequence-dependent changeover costs, minimum batch sizes, precedence constraints, and crew integrity are some of the factors which remove many
production scheduling problems from the realm of ordinary linear o r nonlinear
programming. The logical design 6 typically determines which jobs are to be done
in what order on which machines (or machine configurations), and possibly which
crew will handle each setup. The activity vector x then determines, for a given 6,
the timing and quantities of each run, the allocation of divisible resources to job
activities, and so on.
An algorithm in keeping with Property 1 is likely to be of L P type, possibly with
some nonlinear costs, while combinatorial algorithms in keeping with Property 2
abound (but with only limited success) in the literature on machine/job shop
scheduling/sequencing [6, 71. Example 1 describes a case where a successful
partnership was achieved between linear programming and a quadratic assignment
algorithm (see Section 3.1).
2.2. Network design [l, 2, 3, 4, 5, 12, 13, 321
Many problems connected with the design or modification of communication
networks and transportation networks can be posed as discrete/convex programs.
The discrete design S may select nodes for the installation of facilities multiplexers, concentrators, or interface message processors in computer communication networks, junctions in pipeline networks, interchanges in highway networks,
See Example 1 below, and think of the futility of attempting to express many realistic scheduling and
sequencing problems as integer linear programs.
208
A.M. Geoflrion
and so on. A design 6 may also select connecting links from a finite list of
possibilities, both in terms of which nodes are to be connected and in terms of the
capacity of the connection (there are standard transmission speeds for communication lines, standard sizes for gas and oil pipelines, only a few choices for the number
of lanes of a highway, etc.). The choice of discrete design requires that due
consideration be given to its impact on the flows in the network. Differences in unit
flow costs, delays due to congestion, and demand elasticity all tend to render flow
prediction a nontrivial problem even when 6 is fixed (see [13] for a discussion of the
influence of cost and congestion on the utilization of store-and-forward communication networks, and 19, 331 for a discussion of equilibrium flows in transportation
networks). The activity vector x represents, of course, the flows in a network.
Network flow algorithms are obviously the most natural choice for the task posed
by Property 1, particularly since their power has increased dramatically during the
last few years. Convex cost functions occur when congestion delays are taken as the
criterion [ 131. A variety of discrete optimization algorithms have potential for
Property 2: minimum spanning trees [4] when the network must have a tree
structure, set covering (341 for emergency service networks, generalized assignment
[31] when peripheral facilities must be linked directly to fixed service facilities, and
so on. Example 2 describes an application where a multicommodity flow algorithm
can be combined with a knapsack algorithm (see Section 3.2).
2.3. Physical distribution system planning [ 19, 20, 251
In distribution system planning problems the discrete design 6 determines the
geographic location of plants and/or warehouses, and possibly also the all-ornothing assignment of customers to these facilities for each integral bundle of
products. The activity vector x corresponds to product flows. This class of models is
conceptually close to network design as discussed above, but has enough distinguishing characteristics (such as the absence of link capacities and the presence of
facility capacities and economies-of-scale) that separate treatment is warranted.
2.4. Facilities layout [lo, 231
Facilities layout problems occur on a hierarchy of scales. On a global scale, in
which cities should the various facilities of a firm be located? Within a given city,
which sub-facility should be located in each available building? Within a given
building, which department or operating unit should be located on each floor and in
each work area? Within a given work area, what should be the layout of the various
pieces of equipment? The problem appears to be a combinatorial one, but flows
and communications can be influenced by locational layout and often need to be
considered jointly. Locational layout would be specified by 6 and x would specify
flows and communications.
Example 3 describes an application where linear programming for
How can optimization methods be married?
209
flow/communications is combined with a quadratic assignment algorithm for the
layout choices (see Section 3.3).
2.5. Other applications
There are many other applications which can be modeled as discrete/convex
programs. O n e interesting class is that of selecting and sequencing interdependent
capital investment projects (for hydroelectricity, manufacturing capacity expansion,
etc.). The logical design 6 would determine which projects are selected and their
sequence of execution, while x would determine the details of project timing and
how the system corresponding to a given 6 is operated over time. A particularly
nice case is developed in [8], where a dynamic programming approach was derived
for (D) itself that can be used for a variety of different “operating cost submodels”
specified by Xs and f s ( x ) .
Another important class of applications for discrete/convex programming is
transport scheduling. The problem here is different from the transportation
network design problems discussed earlier because the major emphasis is on how
fleet vehicles (planes, ships, trains, pool trucks, etc.) should move over an
established transportation network in response to demands for transport. The
possible sequences of moves for each vehicle comprises the combinatorial aspect of
the problem, while the exact timing of the moves and the determination of
passenger/cargo patronage comprises the continuous aspect. It is usually essential
to consider both aspects together since patronage adjusts to the frequency and
timing of transport service. See, for instance, [30] for a treatment of the problem in
the context of airline routing; the evaluation of v(6) is a linear programming
problem which determines the maximum profit loading of available passengers to
flights.
3. Computational approaches
We now describe four promising generic computational approaches to the
development of hybrid algorithms for discrete/convex programming. They are: (i)
combinatorial seeding with local convex enumeration, (ii) generalized branch-andbound, (iii) cyclic marginal optimization over 6 and x, and (iv) improving
approximations to (D).
3.1. Combinatorial seeding with local convex enumeration
By Property 2, a discrete optimization algorithm is available for some relative of
(D). Let 6” be the resulting approximation to an optimal choice for 6. Now use 6’ as
a “seed” to be improved, if possible, via “low order” changes evaluated by the
convex programming algorithm postulated by Property 1. What constitutes a low
A . M . Geoflrion
210
order change depends on the structure of A ; for instance, if S were a binary n-tuple
the order of change might be measured as the number of components whose values
are altered. It is helpful but not necessary for A to be a subset of a metrizable space.
Sometimes it is convenient to use the term “neighbor” for any modification of 6
that qualifies as being of acceptably low order. The emphasis on low order changes
is designed, of course, to restrict the magnitude of the local enumeration task.
Generally one wants the allowable order of change to be sufficiently low that local
enumeration is computationally practical, yet sufficiently high that an improved
logical design will be found if one exists.
This approach is pictured informally in Fig. 1. It is understood that the seed is not
actually replaced as the incumbent until one of its neighbors proves to yield a
superior feasible solution of (DC). Termination occurs when no neighbor of t h e
current incumbent is superior; the higher the allowable order of change the
stronger the degree of local optimality at termination.
Giscrete Problem
Convex Problem
-Solve an approximation “Seed” 80
4 Evaluate c a t
to (D).
~(6).
i
Fig. 1
A variant would be to generate several seeds from (D) rather than just one, as by
solving several approximations to (D) or by finding several suboptimal solutions to
a single approximation.
This approach has familiar analogs in the literature on heuristic programming.
See [14, Chapter 91 and [27]. See also [32]€or a highly successful application to gas
pipeline network design that has since been adapted and used extensively for
computer communication network design (e.g., [ 111).
The author has had very satisfactory experience with this approach in the context
of scheduling parallel chemical reactors with product-dependent changeover costs.
This application is now briefly reviewed.
Example 1. A changeover scheduling problem [21]. Several independent continuous process facilities or flow shop production lines are arranged in parallel. Each
can make (process) some subset of products with production rates that may vary
from line to line, but that are reasonably proportional from line to line (as would be
the case when lines are similar except for their scale of implementation or their
basic cycle time). Each line has a linear production cost for each product it can
make, and a possibly different changeover cost between each pair of products. The
How can optimiznrion methods be married?
211
changeover cost matrices are reasonably proportional across lines. A number of
independent production orders are given, each of which specifies a minimum and
maximum production quantity, an earliest start date, and a due date. Violation of
either date incurs a per diem cost penalty. Splitting production orders is allowed. It
is desired to find a production schedule - which line produces how much of what
and when -that fills the production orders at minimum total cost over a scheduling
horizon of fixed (but somewhat flexible) length.
In this application, 6 gives the sequence of production runs specified as to
product and line but nor fully specified as to duration. Durations are given by x .
Property 1 holds because, when 6 is fixed, the optimal choice of x may be
determined by solving a linear program. The LP balances production costs
(exclusive of changeover charges) against penalties associated with any violations of
earliest start and due dates. Property 2 holds because (D) can be approximated
quite well by a quadratic assignment problem of reasonable size.
An LP code and quadratic assignment code were combined in the manner of
Figure 1. The definition used for “neighbor” was that any single production run
may be moved to another position on the same or another line, and any two
production runs may be interchanged.
A real application was made to the monthly scheduling of a complex of six
chemical reactors. A three month independent parallel test showed that the
program was able to achieve considerably better solutions than (experienced)
manual schedulers. The program has since been installed on the firm’s computer
and is being used routinely [21].
3.2. Generalized branch -and - bound
The essential concepts of branch-and-bound, currently the dominant approach to
integer programming, require very little mathematical structure and are quite
broad enough to encompass discrete/convex programming. The framework of [22]
will serve nicely with only the obvious notational changes to phrase it in terms of
(DC) rather than in terms of mixed integer linear programming. It is also advisable
to generalize the notion of “relaxation,” whence nearly all bounds are obtained in
branch-and-bound methods, to the following: a minimizing problem (PR) is said to
be a relaxation of a minimizing problem (P) if the feasible region of (PR)contains
that of (P) and if the objective function of (PR) is less than or equal to that of (P)
everywhere on the latter’s feasible region. This generalized definition requires an
obvious modification to property R3 and fathoming criterion FC3 in [22] in order to
reflect the fact that an optimal solution of (PR) is not optimal in (P) unless it is
feasible in (P) and yields the same objective function value for both problems
(although an E-optimality statement can still be made if the very last condition
fails). [22] will be sufficiently accessible to most readers that the algorithmic
framework of Section I1 therein, as generalized to (DC), need not be given in detail
here.
212
A.M. Geoffrion
So far, no use whatever has been made of Properties 1 and 2. The principal way
of doing so is to select a type of relaxation which permits advantage to be taken of
one or the other or both of these properties when trying to fathom the candidate
problems (alias node- or sub-problems). There are two major types of relaxations
used in mixed integer linear programming, both of which can b e generalized to
apply to candidate problems derived from (DC) provided certain conditions hold:
relaxations based o n direct convexification of the decision domain of the candidate
problem (as by allowing integer variables t o take on continuous values), and
Lagrangean relaxation of selected constraints [ 181. Suppose that candidate problems are derived from (DC) by partially specifying certain components of 6 (we
presume, as seems permissible for most potential applications, that the structure of
A renders this prescription meaningful). An obvious difficulty with such candidate
problems is that the very notion of convexification in the domain of 6 is not
meaningful unless 6 inhabits a vector space, which definitely is not the case in many
applications of interest (e.g., Example 1). Moreover, the mathematical operation of
Lagrangean relaxation requires X , to be expressible at least partially in terms of
conventional real-valued equality or inequality constraints. T h e first difficulty can
be skirted if necessary by convexifying not in the domain of S, but rather in the
range spaces associated with S - the range of the real-valued function c ( )and of
the point-to-set map X , ). The second difficulty apparently cannot be skirted.
There is a striking relationship between the two types of relaxation just
discussed. It was shown in [18] that, for mixed integer linear programs, the best
possible Lagrangean relaxation is equivalent in a natural sense to a corresponding
convexification in the domain of the decision variables and also to a corresponding
convexification in the range space of the objective function and Lagrangeanized
constraints. The analysis can be generalized. Dropping the assumption that all
functions are linear invalidates the equivalence t o convexification in domain space
but does not invalidate the equivalence to convexification in range space. The latter
equivalence even remains true when 6 is no longer taken to dwell in a finite vector
space, and when the constraining conditions other than those being Lagrangeanized
are no longer expressible as conventional real-valued constraints. This is a
consequence of the fact that many basic results of Lagrangean duality theory
require virtually no assumptions at all on the domains of the functions (e.g., [16,
Lemmas 3, 4 and 51). A formal proof of the basic equivalence between the best
Lagrangean relaxation and problem convexification in range space can be found in
[26, Lemma 2.21.
In particular applications one seeks to apply the convexification or Lagrangean
relaxation devices just discussed or possibly some other device, in order to obtain
candidate problem relaxations which Properties 1 and/or 2 render tractable. The
following example illustrates a situation in which this can be done.
Example 2. Network expansion with a budget constraint. This problem is a
capacitated version of the o n e treated in [2]. A conventional multicommodity
How can optimization methods be married?
213
network is given with capacitated links, a known flow requirements matrix, and
linear flow costs. A number of possible new links have been proposed, each with a
given flow capacity, linear flow cost, and fixed capital cost. What is the optimal
subset of new links which reduces the total cost of the optimal flow as much as
possible without exceeding a given maximum authorized capital expenditure?
The problem can be stated mathematically as follows in an obvious notation
where ij refers to the particular commodity which flows from the i t h to the j t h
node, A is the set of existing links, B is the set of possible new links, and D is the
capital budget.
xz,2o0,
for all ij and kl E A U B,
(5)
8 k f = 0 or 1,
for all kl E B.
(6)
This is a mixed integer linear programming problem which, for reasonable
numbers of potential new links (not much more than a hundred, say), should b e
tractable by branch-and-bound if the main candidate problem relaxation is chosen
suitably. The usual LP relaxation, obtained by allowing the free binary variables to
be fractional, is not a multicommodity flow problem; efficient specialized multicommodity flow algorithms cannot be used and one must fall back to general linear
programming algorithms. A n attractive alternative t o the usual LP relaxation is t o
employ a “tandem” Lagrangean relaxation. This will be illustrated on the full
problem (P) as stated above since the candidate problems are of the same
mathematical form so long as conventional dichotomous branching is used.
Let p o> 0 be the analyst’s best guess concerning the marginal value to (P) of
increasing the budget D by one dollar. Solve the relaxation of (P) which results
when (4) is Lagrangeanized using p o and (6) is convexified in the usual way. This is
equivalent to an ordinary classical multicommodity flow problem because the 6 k f
variables can be eliminated analytically (solve for
from (3), which must hold with
equality in an optimal solution):
214
A.M. Geoffrion
Let x o be an optimal solution, and let
to (3)'. It can be shown that
h'hP5:,+po(2)
5okl be the optimal multipliers corresponding
forall k l € B
(7)
is a set of optimal multipliers corresponding to (3) in the relaxed version of (P) prior
to analytic reduction to (MF,.). Now solve a second relaxed version of (Pj in which
(3)' is appended and h a from (7) is used to Lagrangeanize (3):
Evidently this problem can be solved independently for x and for 6. It is easy to
show that x n from (MF,.) is also optimal here, leaving just the binary knapsack
problem
Max
6
c
subject to (4) and (6)
(KA(1)
kl€B
as the only work necessary to solve the second relaxation (PRAo). Methods are
available which can solve (KAo) very efficiently even with several hundred binary
variables.
In summary, a tandem relaxation of (P) has been proposed which requires the
solution of one ordinary multicommodity flow problem (cf. Property 1) and one
binary knapsack problem over the possible new links (cf. Property 2). Both
Properties 1 and 2 are exploited. A n otherwise conventional branch-and-bound
procedure can be built around this tandem relaxation. How well such a procedure
would function depends on how good the resulting bounds are. This has not been
tested experimentally, but it can be observed from the known theory of Lagrangean
relaxation [18] that the lower bound produced by this tandem relaxation has the
potential of being superior to that provided by the usual L P relaxation (in which (6)
is convexified). It all depends on the choice of p n .If p n happens to have the same
value as an optimal multiplier of (4) in the usual LP relaxation, then the bound
produced by (MF,.) will coincide with that of the usual L P relaxation and the
second bound obtained with the help of (K,.) will usually be still better (it cannot be
worse).
How can optimization methods be married?
215
It may be worthwhile to iterate on the choice of p o . There are at least two
conspicuous ways to do this. One is to perform a one-dimensional (unimodal)
search for the value of p which leads to the highest optimal value of (MF,). This is
particularly easy to do if a parametric multicommodity flow algorithm is available
which accommodates a single linear parameter in the objective function (the cost
coefficients of the links in set B are cfcI+ p d k I / b k , )This
.
search is equivalent to
solving the partial dual of the usual LP relaxation in which only the budget
constraint (4) is dualized. The second way to find an improved p is to feed back the
)
(6) convexified.
budget constraint multiplier from ( K h ~with
3.3. Cyclic marginal optimization over 6 and x
In some applications, Property 2 permits (DC) to be optimized with any fixed x.
Then it is natural to think of seeking an optimum of (DC) by first optimizing over x
with some fixed 6, then optimizing over 6 with the resulting x, th,en by optimizing
over x again with the new resulting 6, and so on. A monotonely improving
succession of feasible solutions will be found by such a cyclic marginal optimization
approach until a “marginally optimal” solution is found after which the marginal
solutions in x and 6 begin to repeat. Marginal optimality is an obvious necessary
condition for global optimality, but whether it is sufficient depends upon the
structure of the problem.
This general approach is, of course, far from novel (e.g., [35, p. 1111).
The following example illustrates a plausible application of this approach in
which the discrete and convex marginal optimization problems are, respectively, a
quadratic assignment problem and a linear program.
Example 3. A facility assignment problem. A firm has a number of indivisible
facilities and a number of distinct locations to which they could be assigned. The
firm carries on a number of different activities, each of which imposes its own
requirements for “traffic” between the facilities. These requirements are sufficiently dissimilar, and the traffic costs are sufficiently high, that the assignment of
facilities to locations materially influences the most profitable mix of activities. It is
therefore appropriate to optimize jointly the facility location assignments and
activity mix.
We adopt the following notations and assumptions:
the level of the k th activity of the firm,
xk
Ax s b the constraints specifying the set of possible activities,
(independent of the facility location assignments),
x z o
the
net profit per unit of activity k exclusive of traffic costs,
Pk
the amount of traffic between facilities i and j incurred for each unit of
41;
activity k ,
the cost per unit of traffic between locations 1 and m,
cl,
A.M. Geoffrion
216
the cost associated with assigning facility i to location 1 (can be m to
indicate an impossible assignment),
6
a mapping of facilities into locations; 6 ( i ) = 1 means that 6 assigns
facility i to location 1.
Then the problem can be written:
a,/
s.t. Ax
b.
x 20,
6 a 1:l mapping.
.
It is evident that this is an ordinary quadratic assignment problem for fixed x and an
ordinary linear program for fixed 6, and hence a plausible candidate for cyclic
marginal optimization. This approach has not been tested computationally.
3.4. Improving approximations to (D)
The essential idea of this computational approach is to generate a sequence of
approximations t o (D) which are improving in the sense that their solutions tend to
converge to an optimal solution of (D) itself. Property 1 comes into play in the
course of evaluating the performance v ( 6 " ) of the solution 6" of the K t h
Of course, the form of
must be compatible with the
approximation (b)".
is to be
scope of Property 2. A rule must be specified to prescribe how (8)"
generated based on knowledge of 6' and x k obtained from previous (k =
1,. . ., K - 1) evaluations of ~ ( 6and
~ ()f i ) k . See Fig. 2.
(e)"
Select
S1cA. -
Set K = 1 .
Evaluate v( 8 ').
Call
the solution xK.
Approximation
4
Generator
Generate an approximation (D)
to (D) based on knowledge of
8 k a nd xk for k
f
sK
1, . . . , K.
I
Discrete
'.
Problem
Solve (61K+ Call the solution
a K + 1'. Increment K by 1 .
Fig. 2