·

Cursos Gerais ·

Matemática Discreta

Send your question to AI and receive an answer instantly

Ask Question

Preview text

Universitext Jiˇrí Matoušek Bernd Gärtner Understanding and Using Linear Programming ABC Jiˇrí Matoušek Department of Applied Mathematics Charles University Malostranské nám 25 118 00 Praha 1 Czech Republic Email matousekkammffcunicz Bernd Gärtner Institute of Theoretical Computer Science ETH Zurich CH8092 Zurich Switzerland Email gaertnerinfethzch Mathematics Subject Classification 2000 90C05 Library of Congress Control Number 2006931795 ISBN10 3540306978 Springer Berlin Heidelberg New York ISBN13 9783540306979 Springer Berlin Heidelberg New York This work is subject to copyright All rights are reserved whether the whole or part of the material is concerned specifically the rights of translation reprinting reuse of illustrations recitation broadcasting reproduction on microfilm or in any other way and storage in data banks Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9 1965 in its current version and permission for use must always be obtained from Springer Violations are liable for prosecution under the German Copyright Law Springer is a part of Springer ScienceBusiness Media springercom c The use of general descriptive names registered names trademarks etc in this publication does not imply even in the absence of a specific statement that such names are exempt from the relevant protective laws and regulations and therefore free for general use Typesetting by the authors and techbooks using a Springer TEX macro package Cover design design production GmbH Heidelberg Printed on acidfree paper SPIN 11592457 46techbooks 5 4 3 2 1 0 SpringerVerlag Berlin Heidelberg 2007 Preface This is an introductory textbook of linear programming written mainly for students of computer science and mathematics Our guiding phrase is what every theoretical computer scientist should know about linear programming The book is relatively concise in order to allow the reader to focus on the basic ideas For a number of topics commonly appearing in thicker books on the subject we were seriously tempted to add them to the main text but we decided to present them only very briefly in a separate glossary At the same time we aim at covering the main results with complete proofs and in sufficient detail in a way ready for presentation in class One of the main focuses is applications of linear programming both in practice and in theory Linear programming has become an extremely flex ible tool in theoretical computer science and in mathematics While many of the finest modern applications are much too complicated to be included in an introductory text we hope to communicate some of the flavor and excitement of such applications on simpler examples We present three main computational methods The simplex algorithm is first introduced on examples and then we cover the general theory putting less emphasis on implementation details For the ellipsoid method we give the algorithm and the main claims required for its analysis omitting some technical details From the vast family of interior point methods we concen trate on one of the most efficient versions known the primaldual central path method and again we do not present the technical machinery in full Rigorous mathematical statements are clearly distinguished from informal explanations in such parts The only real prerequisite to this book is undergraduate linear algebra We summarize the required notions and results in an appendix Some of the examples also use rudimentary graphtheoretic terminology and at several places we refer to notions and facts from calculus all of these should be a part of standard undergraduate curricula Errors If you find errors in the book especially serious ones we would appreciate it if you would let us know email matousekkammffcunicz gaertnerinfethzch We plan to post a list of errors at httpwww infethzchpersonalgaertnerlpbook vi Preface Acknowledgments We would like to thank the following people for help such as reading preliminary versions and giving us invaluable comments Pierre Dehornoy David Donoho Jiˇrı Fiala Michal Johanis Volker Kaibel Edward Kim Petr Kolman Jesus de Loera Nathan Linial Martin Loebl He lena Nyklova Yoshio Okamoto Jiˇrı Rohn Leo Rust Rahul Savani Andreas Schulz Petr ˇSkovroˇn Bernhard von Stengel Tamas Terlaky Louis Theran Jiˇrı Tuma and Uli Wagner We also thank David Kramer for thoughtful copyediting Prague and Zurich July 2006 Jiˇrı Matouˇsek Bernd Gartner Contents Preface v 1 What Is It and What For 1 11 A Linear Program 1 12 What Can Be Found in This Book 6 13 Linear Programming and Linear Algebra 7 14 Significance and History of Linear Programming 8 2 Examples 11 21 Optimized Diet Wholesome and Cheap 12 22 Flow in a Network 14 23 Ice Cream All Year Round 16 24 Fitting a Line 19 25 Separation of Points 21 26 Largest Disk in a Convex Polygon 23 27 Cutting Paper Rolls 26 3 Integer Programming and LP Relaxation 29 31 Integer Programming 29 32 MaximumWeight Matching 31 33 Minimum Vertex Cover 37 34 Maximum Independent Set 39 4 Theory of Linear Programming First Steps 41 41 Equational Form 41 42 Basic Feasible Solutions 44 43 ABC of Convexity and Convex Polyhedra 48 44 Vertices and Basic Feasible Solutions 53 5 The Simplex Method 57 51 An Introductory Example 57 52 Exception Handling Unboundedness 61 53 Exception Handling Degeneracy 62 viii Contents 54 Exception Handling Infeasibility 63 55 Simplex Tableaus in General 65 56 The Simplex Method in General 66 57 Pivot Rules 71 58 The Struggle Against Cycling 72 59 Efficiency of the Simplex Method 76 510 Summary 79 6 Duality of Linear Programming 81 61 The Duality Theorem 81 62 Dualization for Everyone 84 63 Proof of Duality from the Simplex Method 87 64 Proof of Duality from the Farkas Lemma 89 65 Farkas Lemma An Analytic Proof 95 66 Farkas Lemma from Minimally Infeasible Systems 97 67 Farkas Lemma from the FourierMotzkin Elimination 100 7 Not Only the Simplex Method 105 71 The Ellipsoid Method 106 72 Interior Point Methods 115 8 More Applications 131 81 ZeroSum Games 131 82 Matchings and Vertex Covers in Bipartite Graphs 142 83 Machine Scheduling 148 84 Upper Bounds for Codes 156 85 Sparse Solutions of Linear Systems 167 86 Transversals of dIntervals 177 87 Smallest Balls and Convex Programming 184 9 Software and Further Reading 193 Appendix Linear Algebra 195 Glossary 201 Index 217 1 What Is It and What For Linear programming surprisingly is not directly related to computer pro gramming The term was introduced in the 1950s when computers were few and mostly top secret and the word programming was a military term that at that time referred to plans or schedules for training logistical supply or deployment of men The word linear suggests that feasible plans are re stricted by linear constraints inequalities and also that the quality of the plan eg costs or duration is also measured by a linear function of the considered quantities In a similar spirit linear programming soon started to be used for planning all kinds of economic activities such as transport of raw materials and products among factories sowing various crop plants or cutting paper rolls into shorter ones in sizes ordered by customers The phrase planning with linear constraints would perhaps better capture this original meaning of linear programming However the term linear program ming has been well established for many years and at the same time it has acquired a considerably broader meaning Not only does it play a role only in mathematical economy it appears frequently in computer science and in many other fields 11 A Linear Program We begin with a very simple linear programming problem or linear pro gram for short Maximize the value x1 x2 among all vectors x1 x2 R2 satisfying the constraints x1 0 x2 0 x2 x1 1 x1 6x2 15 4x1 x2 10 For this linear program we can easily draw a picture The set x R2 x2 x1 1 is the halfplane lying below the line x2 x1 1 and similarly 2 1 What Is It and What For each of the remaining four inequalities defines a halfplane The set of all vectors satisfying the five constraints simultaneously is a convex polygon 0 0 x2 0 x1 6x2 15 x1 0 x2 x1 1 3 2 1 1 4x1 x2 10 x2 x1 Which point of this polygon maximizes the value of x1 x2 The one lying farthest in the direction of the vector 1 1 drawn by the arrow that is the point 3 2 The phrase farthest in the direction is in quotation marks since it is not quite precise To make it more precise we consider a line perpendicular to the arrow and we think of translating it in the direction of the arrow Then we are seeking a point where the moving line intersects our polygon for the last time Let us note that the function x1 x2 is constant on each line perpendicular to the vector 1 1 and as we move the line in the direction of that vector the value of the function increases See the next illustration 0 0 3 2 1 1 x1 x2 2 x1 x2 4 x1 x2 5 11 A Linear Program 3 In a general linear program we want to find a vector x Rn maximizing or minimizing the value of a given linear function among all vectors x Rn that satisfy a given system of linear equations and inequalities The linear function to be maximized or sometimes minimized is called the objective function It has the form cT x c1x1 cnxn where c Rn is a given vector1 The linear equations and inequalities in the linear program are called the constraints It is customary to denote the number of constraints by m A linear program is often written using matrices and vectors in a way similar to the notation Ax b for a system of linear equations in linear algebra To make such a notation simpler we can replace each equation in the linear program by two opposite inequalities For example instead of the constraint x1 3x2 7 we can put the two constraints x1 3x2 7 and x1 3x2 7 Moreover the direction of the inequalities can be reversed by changing the signs x1 3x2 7 is equivalent to x1 3x2 7 and thus we can assume that all inequality signs are say with all variables appearing on the lefthand side Finally minimizing an objective function cT x is equivalent to maximizing cT x and hence we can always pass to a maximization problem After such modifications each linear program can be expressed as follows Maximize the value of cT x among all vectors x Rn satisfying Ax b where A is a given mn real matrix and c Rn b Rm are given vectors Here the relation holds for two vectors of equal length if and only if it holds componentwise Any vector x Rn satisfying all constraints of a given linear program is a feasible solution Each x Rn that gives the maximum possible value of cT x among all feasible x is called an optimal solution or optimum for short In our linear program above we have n 2 m 5 and c 1 1 The only optimal solution is the vector 3 2 while for instance 2 3 2 is a feasible solution that is not optimal A linear program may in general have a single optimal solution or in finitely many optimal solutions or none at all We have seen a situation with a single optimal solution in the first example of a linear program We will present examples of the other possible situations 1 Here we regard the vector c as an n1 matrix and so the expression cT x is a product of a 1n matrix and an n1 matrix This product formally speaking should be a 11 matrix but we regard it as a real number Some readers might wonder If we consider c a column vector why in the example above dont we write it as a column or as 1 1T For us a vector is an ntuple of numbers and when writing an explicit vector we separate the numbers by commas as in c 1 1 Only if a vector appears in a context where one expects a matrix that is in a product of matrices then it is regarded as or converted to an n1 matrix However sometimes we declare a vector to be a row vector and then it behaves as a 1n matrix 4 1 What Is It and What For If we change the vector c in the example to 1 6 1 all points on the side of the polygon drawn thick in the next picture are optimal solutions 0 0 x2 0 x1 6x2 15 x1 0 x2 x1 1 1 6 1 4x1 x2 10 If we reverse the directions of the inequalities in the constraints x2 x1 1 and 4x1 x2 10 in our first example we obtain a linear program that has no feasible solution and hence no optimal solution either 0 0 x2 0 x1 6x2 15 x1 0 x2 x1 1 1 1 4x1 x2 10 Such a linear program is called infeasible Finally an optimal solution need not exist even when there are feasible solutions This happens when the objective function can attain arbitrarily large values such a linear program is called unbounded This is the case when we remove the constraints 4x1 x2 10 and x1 6x2 15 from the initial example as shown in the next picture 11 A Linear Program 5 0 0 x2 0 x1 0 x2 x1 1 1 1 Let us summarize We have seen that a linear program can have one or infinitely many optimal solutions but it may also be unbounded or infeasible Later we will prove that no other situations can occur We have solved the initial linear program graphically It was easy since there are only two variables However for a linear program with four variables we wont even be able to make a picture let alone find an optimal solution graphically A substantial linear program in practice often has several thou sand variables rather than two or four A graphical illustration is useful for understanding the notions and procedures of linear programming but as a computational method it is worthless Sometimes it may even be mislead ing since objects in high dimension may behave in a way quite different from what the intuition gained in the plane or in threedimensional space suggests One of the key pieces of knowledge about linear programming that one should remember forever is this A linear program is efficiently solvable both in theory and in practice In practice a number of software packages are available They can han dle inputs with several thousands of variables and constraints Linear programs with a special structure for example with a small number of nonzero coefficients in each constraint can often be managed even with a much larger number of variables and constraints In theory algorithms have been developed that provably solve each linear program in time bounded by a certain polynomial function of the input size The input size is measured as the total number of bits needed to write down all coefficients in the objective function and in all the con straints These two statements summarize the results of long and strenuous research and efficient methods for linear programming are not simple In order that the above piece of knowledge will also make sense forever one should not forget what a linear program is so we repeat it once again 6 1 What Is It and What For A linear program is the problem of maximizing a given linear function over the set of all vectors that satisfy a given system of linear equations and inequalities Each linear program can easily be transformed to the form maximize cT x subject to Ax b 12 What Can Be Found in This Book The rest of Chapter 1 briefly discusses the history and importance of linear programming and connects it to linear algebra For a large majority of readers it can be expected that whenever they encounter linear programming in practice or in research they will be using it as a black box From this point of view Chapter 2 is crucial since it describes a number of algorithmic problems that can be solved via linear programming The closely related Chapter 3 discusses integer programming in which one also optimizes a linear function over a set of vectors determined by linear constraints but moreover the variables must attain integer values In this context we will see how linear programming can help in approximate solutions of hard computational problems Chapter 4 brings basic theoretical results on linear programming and on the geometric structure of the set of all feasible solutions Notions introduced there such as convexity and convex polyhedra are important in many other branches of mathematics and computer science as well Chapter 5 covers the simplex method which is a fundamental algorithm for linear programming In full detail it is relatively complicated and from the contemporary point of view it is not necessarily the central topic in a first course on linear programming In contrast some traditional introductions to linear programming are focused almost solely on the simplex method In Chapter 6 we will state and prove the duality theorem which is one of the principal theoretical results in linear programming and an extremely useful tool for proofs Chapter 7 deals with two other important algorithmic approaches to linear programming the ellipsoid method and the interior point method Both of them are rather intricate and we omit some technical issues Chapter 8 collects several slightly more advanced applications of linear programming from various fields each with motivation and some background material Chapter 9 contains remarks on software available for linear programming and on the literature Linear algebra is the main mathematical tool throughout the book The required linearalgebraic notions and results are summarized in an appendix The book concludes with a glossary of terms that are common in linear programming but do not appear in the main text Some of them are listed to 13 Linear Programming and Linear Algebra 7 ensure that our index can compete with those of thicker books and others appear as background material for the advanced reader Two levels of text This book should serve mainly as an introductory text for undergraduate and early graduate students and so we do not want to assume previous knowledge beyond the usual basic undergraduate courses However many of the key results in linear programming which would be a pity to omit are not easy to prove and sometimes they use mathematical methods whose knowledge cannot be expected at the undergraduate level Consequently the text is divided into two levels On the basic level we are aiming at full and sufficiently detailed proofs The second more advanced and edifying level is typographically distinguished like this In such parts intended chiefly for mathemati cally more mature readers say graduate or PhD students we include sketches of proofs and somewhat imprecise formulations of more ad vanced results Whoever finds these passages incomprehensible may freely ignore them the basic text should also make sense without them 13 Linear Programming and Linear Algebra The basics of linear algebra can be regarded as a theory of systems of linear equations Linear algebra considers many other things as well but systems of linear equations are surely one of the core subjects A key algorithm is Gaussian elimination which efficiently finds a solution of such a system and even a description of the set of all solutions Geometrically the solution set is an affine subspace of Rn which is an important linearalgebraic notion2 In a similar spirit the discipline of linear programming can be regarded as a theory of systems of linear inequalities In a linear program this is somewhat obscured by the fact that we do not look for an arbitrary solution of the given system of inequalities but rather a solution maximizing a given objective function But it can be shown that finding an arbitrary feasible solution of a linear program if one exists is computationally almost equally difficult as finding an optimal solution Let us outline how one can gain an optimal solution provided that feasible solutions can be computed a different and more elegant way will be described in Section 61 If we somehow know in advance that for instance the maximum value of the objective function in a given linear program lies between 0 and 100 we can first ask whether there exists a feasible x Rn for which the objective 2 An affine subspace is a linear subspace translated by a fixed vector x Rn For example every point every line and R2 itself are the affine subspaces of R2 8 1 What Is It and What For function is at least 50 That is we add to the existing constraints a new constraint requiring that the value of the objective function be at least 50 and we find out whether this auxiliary linear program has a feasible solution If yes we will further ask by the same trick whether the objective function can be at least 75 and if not we will check whether it can be at least 25 A reader with computerscience conditioned reflexes has probably already recognized the strategy of binary search which allows us to quickly localize the maximum value of the objective function with great accuracy Geometrically the set of all solutions of a system of linear inequalities is an intersection of finitely many halfspaces in Rn Such a set is called a convex polyhedron and familiar examples of convex polyhedra in R3 are a cube a rectangular box a tetrahedron and a regular dodecahedron Con vex polyhedra are mathematically much more complex objects than vector subspaces or affine subspaces we will return to this later So actually we can be grateful for the objective function in a linear program It is enough to compute a single point x Rn as a solution and we need not worry about the whole polyhedron In linear programming a role comparable to that of Gaussian elimination in linear algebra is played by the simplex method It is an algorithm for solving linear programs usually quite efficient and it also allows one to prove theoretical results Let us summarize the analogies between linear algebra and linear pro gramming in tabular form Basic problem Algorithm Solution set Linear system of Gaussian affine algebra linear equations elimination subspace Linear system of simplex convex programming linear inequalities method polyhedron 14 Significance and History of Linear Programming In a special issue of the journal Computing in Science Engineering the simplex method was included among the ten algorithms with the greatest influence on the development and practice of science and engineering in the 20th century3 Although some may argue that the simplex method is only 3 The remaining nine algorithms on this list are the Metropolis algorithm for Monte Carlo simulations the Krylov subspace iteration methods the decompo sitional approach to matrix computations the Fortran optimizing compiler the QR algorithm for computing eigenvalues the Quicksort algorithm for sorting the fast Fourier transform the detection of integer relations and the fast multi pole method 14 Significance and History of Linear Programming 9 number fourteen say and although each such evaluation is necessarily sub jective the importance of linear programming can hardly be cast in doubt The simplex method was invented and developed by George Dantzig in 1947 based on his work for the US Air Force Even earlier in 1939 Leonid Vitalyevich Kantorovich was charged with the reorganization of the timber industry in the USSR and as a part of his task he formulated a restricted class of linear programs and a method for their solution As happens under such regimes his discoveries went almost unnoticed and nobody continued his work Kantorovich together with Tjalling Koopmans received the Nobel Prize in Economics in 1975 for pioneering work in resource allocation Somewhat ironically Dantzig whose contribution to linear programming is no doubt much more significant was never awarded a Nobel Prize The discovery of the simplex method had a great impact on both the ory and practice in economics Linear programming was used to allocate resources plan production schedule workers plan investment portfolios and formulate marketing and military strategies Even entrepreneurs and man agers accustomed to relying on their experience and intuition were impressed when costs were cut by 20 say by a mere reorganization according to some mysterious calculation Especially when such a feat was accomplished by someone who was not really familiar with the company just on the basis of some numerical data Suddenly mathematical methods could no longer be ignored with impunity in a competitive environment Linear programming has evolved a great deal since the 1940s and new types of applications have been found by far not restricted to mathematical economics In theoretical computer science it has become one of the fundamental tools in algorithm design For a number of computational problems the existence of an efficient polynomialtime algorithm was first established by general techniques based on linear programming For other problems known to be computationally difficult NPhard if this term tells the reader anything finding an exact solution is often hope less One looks for approximate algorithms and linear programming is a key component of the most powerful known methods Another surprising application of linear programming is theoretical the duality theorem which will be explained in Chapter 6 appears in proofs of numerous mathematical statements most notably in combinatorics and it provides a unifying abstract view of many seemingly unrelated results The duality theorem is also significant algorithmically We will show examples of methods for constructing algorithms and proofs based on linear programming but many other results of this kind are too advanced for a short introductory text like ours The theory of algorithms for linear programming itself has also grown con siderably As everybody knows todays computers are many orders of mag nitude faster than those of fifty years ago and so it doesnt sound surprising 10 1 What Is It and What For that much larger linear programs can be solved today But it may be sur prising that this enlargement of manageable problems probably owes more to theoretical progress in algorithms than to faster computers On the one hand the implementation of the simplex method has been refined considerably and on the other hand new computational methods based on completely differ ent ideas have been developed This latter development will be described in Chapter 7 2 Examples Linear programming is a wonderful tool But in order to use it one first has to start suspecting that the considered computational problem might be expressible by a linear program and then one has to really express it that way In other words one has to see linear programming behind the scenes One of the main goals of this book is to help the reader acquire skills in this direction We believe that this is best done by studying diverse examples and by practice In this chapter we present several basic cases from the wide spectrum of problems amenable to linear programming methods and we demonstrate a few tricks for reformulating problems that do not look like linear programs at first sight Further examples are covered in Chapter 3 and Chapter 8 includes more advanced applications Once we have a suitable linear programming formulation a model in the mathematical programming parlance we can employ general algorithms From a programmers point of view this is very convenient since it suffices to input the appropriate objective function and constraints into generalpurpose software If efficiency is a concern this need not be the end of the story Many prob lems have special features and sometimes specialized algorithms are known or can be constructed that solve such problems substantially faster than a general approach based on linear programming For example the study of network flows which we consider in Section 22 constitutes an extensive subfield of theoretical computer science and fairly efficient algorithms have been developed Computing a maximum flow via linear programming is thus not the best approach for largescale instances However even for problems where linear programming doesnt ultimately yield the most efficient available algorithm starting with a linear program ming formulation makes sense for fast prototyping case studies and deciding whether developing problemspecific software is worth the effort 12 2 Examples 21 Optimized Diet Wholesome and Cheap and when Rabbit said Honey or condensed milk with your bread he was so excited that he said Both and then so as not to seem greedy he added But dont bother about the bread please A A Milne Winnie the Pooh The Office of Nutrition Inspection of the EU recently found out that dishes served at the dining and beverage facility Bullnecks such as herring hot dogs and housestyle hamburgers do not comport with the new nutritional regulations and its report mentioned explicitly the lack of vitamins A and C and dietary fiber The owner and operator of the aforementioned facility is attempting to rectify these shortcomings by augmenting the menu with vegetable side dishes which he intends to create from white cabbage carrots and a stockpile of pickled cucumbers discovered in the cellar The following table summarizes the numerical data the prescribed amount of the vitamins and fiber per dish their content in the foods and the unit prices of the foods1 Food Carrot White Cucumber Required Raw Cabbage Raw Pickled per dish Vitamin A mgkg 35 05 05 05 mg Vitamin C mgkg 60 300 10 15 mg Dietary Fiber gkg 30 20 10 4 g price e kg 075 05 015 Residual accounting price of the inventory most likely unsaleable At what minimum additional price per dish can the requirements of the Office of Nutrition Inspection be satisfied This question can be expressed by the following linear program Minimize 075x1 05x2 015x3 subject to x1 0 x2 0 x3 0 35x1 05x2 05x3 05 60x1 300x2 10x3 15 30x1 20x2 10x3 4 The variable x1 specifies the amount of carrot in kg to be added to each dish and similarly for x2 cabbage and x3 cucumber The objective function 1 For those interested in healthy diet The vitamin contents and other data are more or less realistic 21 Optimized Diet Wholesome and Cheap 13 expresses the price of the combination The amounts of carrot cabbage and cucumber are always nonnegative which is captured by the conditions x1 0 x2 0 x3 0 if we didnt include them an optimal solution might perhaps have the amount of carrot say negative by which one would seemingly save money Finally the inequalities in the last three lines force the requirements on vitamins A and C and of dietary fiber The linear program can be solved by standard methods The optimal solution yields the price of e 007 with the following doses carrot 95 g cabbage 38 g and pickled cucumber 290 g per dish all rounded to two significant digits This probably wouldnt pass another round of inspection In reality one would have to add further constraints for example one on the maximum amount of pickled cucumber We have included this example so that our treatment doesnt look too revolutionary It seems that all introductions to linear programming begin with various dietary problems most likely because the first largescale prob lem on which the simplex method was tested in 1947 was the determination of an adequate diet of least cost Which foods should be combined and in what amounts so that the required amounts of all essential nutrients are sat isfied and the daily ration is the cheapest possible The linear program had 77 variables and 9 constraints and its solution by the simplex method using handoperated desk calculators took approximately 120 mandays Later on when George Dantzig had already gained access to an electronic computer he tried to optimize his own diet as well The optimal solution of the first linear program that he constructed recommended daily consumption of several liters of vinegar When he removed vinegar from the next input he obtained approximately 200 bouillon cubes as the basis of the daily diet This story whose truth is not entirely out of the question doesnt diminish the power of linear programming in any way but it illustrates how difficult it is to capture mathematically all the important aspects of reallife problems In the realm of nutrition for example it is not clear even today what exactly the influence of various components of food is on the human body Although of course many things are clear and hopes that the science of the future will recommend hamburgers as the main ingredient of a healthy diet will almost surely be disappointed Even if it were known perfectly few people want and can formulate exactly what they expect from their dietapparently it is much easier to formulate such requirements for the diet of someone else Moreover there are nonlinear dependencies among the effects of various nutrients and so the dietary problem can never be captured perfectly by linear programming There are many applications of linear programming in industry agricul ture services etc that from an abstract point of view are variations of the diet problem and do not introduce substantially new mathematical tricks It may still be challenging to design good models for reallife problems of this kind but the challenges are not mathematical We will not dwell on 14 2 Examples such problems here many examples can be found in Chvatals book cited in Chapter 9 and we will present problems in which the use of linear program ming has different flavors 22 Flow in a Network An administrator of a computer network convinced his employer to purchase a new computer with an improved sound system He wants to transfer his music collection from an old computer to the new one using a local network The network looks like this 3 1 1 4 1 4 3 4 c e b a d 1 1 o n What is the maximum transfer rate from computer o old to computer n new The numbers near each data link specify the maximum transfer rate of that link in Mbits say We assume that each link can transfer data in either direction but not in both directions simultaneously So for example through the link ab one can either send data from a to b at any rate from 0 up to 1 Mbits or send data from b to a at any rate from 0 to 1 Mbits The nodes a b e are not suitable for storing substantial amounts of data and hence all data entering them has to be sent further immediately From this we can already see that the maximum transfer rate cannot be used on all links simultaneously consider node a for example Thus we have to find an appropriate value of the data flow for each link so that the total transfer rate from o to n is maximum For every link in the network we introduce one variable For example xbe specifies the rate by which data is transfered from b to e Here xbe can also be negative which means that data flow in the opposite direction from e to b And we thus do not introduce another variable xeb which would correspond to the transfer rate from e to b There are 10 variables xoa xob xoc xab xad xbe xcd xce xdn and xen We set up the following linear program 22 Flow in a Network 15 Maximize xoa xob xoc subject to 3 xoa 3 1 xob 1 1 xoc 1 1 xab 1 1 xad 1 3 xbe 3 4 xcd 4 4 xce 4 4 xdn 4 1 xen 1 xoa xab xad xob xab xbe xoc xcd xce xad xcd xdn xbe xce xen The objective function xoa xob xoc expresses the total rate by which data is sent out from computer o Since it is neither stored nor lost hopefully anywhere it has to be received at n at the same rate The next 10 constraints 3 xoa 3 through 1 xen 1 restrict the transfer rates along the individual links The remaining constraints say that whatever enters each of the nodes a through e has to leave immediately The optimal solution of this linear program is depicted below 2 1 1 1 1 2 2 3 c e b a d 1 1 o n The number near each link is the transfer rate on that link and the arrow determines the direction of the data flow Note that between c and e data has to be sent in the direction from e to c and hence xce 1 The optimum value of the objective function is 4 and this is the desired maximum transfer rate In this example it is easy to see that the transfer rate cannot be larger since the total capacity of all links connecting the computers o and a to the rest of the network equals 4 This is a special case of a remarkable theorem on maximum flow and minimum cut which is usually discussed in courses on graph algorithms see also Section 82 Our example of data flow in a network is small and simple In practice however flows are considered in intricate networks sometimes even with many source nodes and sink nodes These can be electrical networks current flows road or railroad networks cars or trains flow telephone networks voice or data signals flow financial money flows and so on There are also many lessobvious applications of network flowsfor example in image processing 16 2 Examples Historically the network flow problem was first formulated by American military experts in search of efficient ways of disrupting the railway system of the Soviet block see A Schrijver On the history of the transportation and max imum flow problems Math Programming Ser B 912002 437445 23 Ice Cream All Year Round The next application of linear programming again concerns food which should not be surprising given the importance of food in life and the diffi culties in optimizing sleep or love The ice cream manufacturer Icicle Works Ltd2 needs to set up a production plan for the next year Based on history extensive surveys and bird observations the marketing department has come up with the following prediction of monthly sales of ice cream in the next year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 100 200 300 400 500 600 700 sales tons Now Icicle Works Ltd needs to set up a production schedule to meet these demands A simple solution would be to produce just in time meaning that all the ice cream needed in month i is also produced in month i i 1 2 12 However this means that the produced amount would vary greatly from month to month and a change in the produced amount has significant costs Temporary workers have to be hired or laid off machines have to be adjusted 2 Not to be confused with a rock group of the same name The name comes from a nice science fiction story by Frederik Pohl and so on So it would be better to spread the production more evenly over the year In months with low demand the idle capacities of the factory could be used to build up a stock of ice cream for the months with high demand So another simple solution might be a completely flat production schedule with the same amount produced every month Some thought reveals that such a schedule need not be feasible if we want to end up with zero surplus at the end of the year But even if it is feasible it need not be ideal either since storing ice cream incurs a nontrivial cost It seems likely that the best production schedule should be somewhere between these two extremes production following demand and constant production We want a compromise minimizing the total cost resulting both from changes in production and from storage of surpluses To formalize this problem let us denote the demand in month i by di geq 0 in tons Then we introduce a nonnegative variable xi for the production in month i and another nonnegative variable si for the total surplus in store at the end of month i To meet the demand in month i we may use the production in month i and the surplus at the end of month i 1 xi si1 geq di for i 1 2 12 The quantity xi si1 di is exactly the surplus after month i and thus we have xi si1 si di for i 1 2 12 Assuming that initially there is no surplus we set s0 0 if we took the production history into account s0 would be the surplus at the end of the previous year We also set s12 0 unless we want to plan for another year Among all nonnegative solutions to these equations we are looking for one that minimizes the total cost Let us assume that changing the production by 1 ton from month i 1 to month i costs 50 and that storage facilities for 1 ton of ice cream cost 20 per month Then the total cost is expressed by the function 50 sumi112 xi xi1 20 sumi112 si where we set x0 0 again history can easily be taken into account Unfortunately this cost function is not linear Fortunately there is a simple but important trick that allows us to make it linear at the price of introducing extra variables The change in production is either an increase or a decrease Let us introduce a nonnegative variable yi for the increase from month i 1 to month i and a nonnegative variable zi for the decrease Then xi xi1 yi zi and xi xi1 yi zi A production schedule of minimum total cost is given by an optimal solution of the following linear program Minimize 50 sumi112 yi 50 sumi112 zi 20 sumi112 si subject to xi si1 si di for i 1 2 12 xi xi1 yi zi for i 1 2 12 x0 0 s0 0 s12 0 xi si yi zi geq 0 for i 1 2 12 To see that an optimal solution s y z of this linear program indeed defines a schedule we need to note that one of yi and zi has to be zero for all i for otherwise we could decrease both and obtain a better solution This means that yi zi indeed equals the change in production from month i 1 to month i as required In the Icicle Works example above this linear program yields the following production schedule shown with black bars the gray background graph represents the demands Below is the schedule we would get with zero storage costs that is after replacing the 20 by 0 in the above linear program 24 Fitting a Line 19 The pattern of this example is quite general and many problems of opti mal control can be solved via linear programming in a similar manner A neat example is Moon Rocket Landing a oncepopular game for programmable calculators probably not sophisticated enough to survive in todays compe tition A lunar module with limited fuel supply is descending vertically to the lunar surface under the influence of gravitation and at chosen time inter vals it can flash its rockets to slow down the descent or even to start flying upward The goal is to land on the surface with almost zero speed before exhausting all of the fuel The reader is invited to formulate an appropriate linear program for determining the minimum amount of fuel necessary for landing given the appropriate input data For the linear programming for mulation we have to discretize time first in the game this was done anyway but with short enough time steps this doesnt make a difference in practice Let us remark that this particular problem can be solved analytically with some calculus or even mathematical control theory But in even slightly more complicated situations an analytic solution is out of reach 24 Fitting a Line The loudness level of nightingale singing was measured every evening for a number of days in a row and the percentage of people watching the principal TV news was surveyed by questionnaires The following diagram plots the measured values by points in the plane loudness level dB TV watchers 60 50 40 20 30 40 50 The simplest dependencies are linear and many dependencies can be well approximated by a linear function We thus want to find a line that best fits the measured points Readers feeling that this example is not sufficiently realistic can recall some measurements in physics labs where the measured quantities should actually obey an exact linear dependence How can one formulate mathematically that a given line best fits the points There is no unique way and several different criteria are commonly used for line fitting in practice The most popular one is the method of least squares which for given points x1y1 xnyn seeks a line with equation y ax b minimizing the expression i1n axi b yi2 In words for every point we take its vertical distance from the line square it and sum these squares of errors This method need not always be the most suitable For instance if a few exceptional points are measured with very large error they can influence the resulting line a great deal An alternative method less sensitive to a small number of outliers is to minimize the sum of absolute values of all errors i1n axi b yi By a trick similar to the one we have seen in Section 23 this apparently nonlinear optimization problem can be captured by a linear program Minimize e1 e2 en subject to ei axi b yi for i 12n ei axi b yi for i 12n The variables are a b and e1e2en while x1xn and y1yn are given numbers Each ei is an auxiliary variable standing for the error at the ith point The constraints guarantee that ei maxaxi b yi axi b yi axi b yi In an optimal solution each of these inequalities has to be satisfied with equality for otherwise we could decrease the corresponding ei Thus an optimal solution yields a line minimizing the expression 22 The following picture shows a line fitted by this method solid and a line fitted using least squares dotted 25 Separation of Points 21 In conclusion let us recall the useful trick we have learned here and in the previous section Objective functions or constraints involving absolute values can often be handled via linear programming by introducing extra variables or extra constraints 25 Separation of Points A computercontrolled rabbit trap Gromit RT 21 should be programmed so that it catches rabbits but if a weasel wanders in it is released The trap can weigh the animal inside and also can determine the area of its shadow These two parameters were collected for a number of specimens of rabbits and weasels as depicted in the following graph weight shadow area empty circles represent rabbits and full circles weasels Apparently neither weight alone nor shadow area alone can be used to tell a rabbit from a weasel One of the nextsimplest things would be a lin ear criterion distinguishing them That is geometrically we would like to separate the black points from the white points by a straight line if possi ble Mathematically speaking we are given m white points p1 p2 pm 22 2 Examples and n black points q1 q2 qn in the plane and we would like to find out whether there exists a line having all white points on one side and all black points on the other side none of the points should lie on the line In a solution of this problem by linear programming we distinguish three cases First we test whether there exists a vertical line with the required prop erty This case needs neither linear programming nor particular cleverness The next case is the existence of a line that is not vertical and that has all black points below it and all white points above it Let us write the equation of such a line as y ax b where a and b are some yet unknown real numbers A point r with coordinates xr and yr lies above this line if yr axr b and it lies below it if yr axr b So a suitable line exists if and only if the following system of inequalities with variables a and b has a solution ypi axpi b for i 1 2 m yqj axqj b for j 1 2 n We havent yet mentioned strict inequalities in connection with linear programming and actually they are not allowed in linear programs But here we can get around this issue by a small trick We introduce a new variable δ which stands for the gap between the left and right sides of each strict inequality Then we try to make the gap as large as possible Maximize δ subject to ypi axpi b δ for i 1 2 m yqj axqj b δ for j 1 2 n δ δ y ax b δ y ax b y ax b δ This linear program has three variables a b and δ The optimal δ is positive exactly if the preceding system of strict inequalities has a solution and the latter happens exactly if a nonvertical line exists with all black points below and all white points above 26 Largest Disk in a Convex Polygon 23 Similarly we can deal with the third case namely the existence of a non vertical line having all black points above it and all white points below it This completes the description of an algorithm for the line separation problem A plane separating two point sets in R3 can be computed by the same approach and we can also solve the analogous problem in higher dimensions So we could try to distinguish rabbits from weasels based on more than two measured parameters Here is another perhaps more surprising extension Let us imagine that separating rabbits from weasels by a straight line proved impos sible Then we could try for instance separating them by a graph of a quadratic function a parabola of the form ax2 bx c So given m white points p1 p2 pm and n black points q1 q2 qn in the plane we now ask are there coefficients a b c R such that the graph of fx ax2 bxc has all white points above it and all black points below This leads to the inequality system ypi axpi2 bxpi c for i 1 2 m yqj axqj2 bxqj c for j 1 2 n By introducing a gap variable δ as before this can be written as the following linear program in the variables a b c and δ Maximize δ subject to ypi axpi2 bxpi c δ for i 1 2 m yqj axqj2 bxqj c δ for j 1 2 n In this linear program the quadratic terms are coefficients and there fore they cause no harm The same approach also allows us to test whether two point sets in the plane or in higher dimensions can be separated by a function of the form fx a1ϕ1x a2ϕ2x akϕkx where ϕ1 ϕk are given functions possibly nonlinear and a1 a2 ak are real co efficients in the sense that fpi 0 for every white point pi and fqj 0 for every black point qj 26 Largest Disk in a Convex Polygon Here we will encounter another problem that may look nonlinear at first sight but can be transformed to a linear program It is a simple instance of a geometric packing problem Given a container in our case a convex polygon we want to fit as large an object as possible into it in our case a disk of the largest possible radius Let us call the given convex polygon P and let us assume that it has n sides As we said we want to find the largest circular disk contained in P For simplicity let us assume that none of the sides of P is vertical Let the ith side of P lie on a line ℓi with equation y ai x bi i 12n and let us choose the numbering of the sides in such a way that the first second up to the kth side bound P from below while the k1st through nth side bound it from above Let us now ask under what conditions does a circle with center s s1 s2 and radius r lie completely inside P This is the case if and only if the point s has distance at least r from each of the lines ℓ1ℓn lies above the lines ℓ1ℓk and lies below the lines ℓk1ℓn We compute the distance of s from ℓi A simple calculation using similarity of triangles and the Pythagorean theorem shows that this distance equals the absolute value of the expression s2 ai s1 biai2 1 Moreover the expression is positive if s lies above ℓi and it is negative if s lies below ℓi The disk of radius r centered at s thus lies inside P exactly if the following system of inequalities is satisfied s2 ai s1 biai2 1 r i 12k s2 ai s1 biai2 1 r i k1 k2n Therefore we want to find the largest r such that there exist s1 and s2 so that all the constraints are satisfied This yields a linear program Some might be frightened by the square roots but these can be computed in advance since all the ai are concrete numbers Maximize r subject to s2 ai s1 biai2 1 r for i 12k s2 ai s1 biai2 1 r for i k1 k2n There are three variables s1 s2 and r An optimal solution yields the desired largest disk contained in P A similar problem in higher dimension can be solved analogously For example in threedimensional space we can ask for the largest ball that can be placed into the intersection of n given halfspaces Interestingly another similarlooking problem namely finding the smallest disk containing a given convex ngon in the plane cannot be expressed by a linear program and has to be solved differently see Section 87 Both in practice and in theory one usually encounters geometric packing problems that are more complicated than the one considered in this section and not so easily solved by linear programming Often we have a fixed collection of objects and we want to pack as many of them as possible into a given container or several containers Such problems are encountered by confectioners when cutting cookies from a piece of dough by tailors or clothing manufacturers when making as many trousers say as possible from a large piece of cloth and so on Typically these problems are computationally hard but linear programming can sometimes help in devising heuristics or approximate algorithms 27 Cutting Paper Rolls Here we have another industrial problem and the application of linear programming is quite nonobvious Moreover we will naturally encounter an integrality constraint which will bring us to the topic of the next chapter A paper mill manufactures rolls of paper of a standard width 3 meters But customers want to buy paper rolls of shorter width and the mill has to cut such rolls from the 3 m rolls One 3 m roll can be cut for instance into two rolls 93 cm wide one roll of width 108 cm and a rest of 6 cm which goes to waste Let us consider an order of 97 rolls of width 135 cm 610 rolls of width 108 cm 395 rolls of width 93 cm and 211 rolls of width 42 cm What is the smallest number of 3 m rolls that have to be cut in order to satisfy this order and how should they be cut In order to engage linear programming one has to be generous in introducing variables We write down all of the requested widths 135 cm 108 cm 93 cm and 42 cm Then we list all possibilities of cutting a 3 m paper roll into rolls of some of these widths we need to consider only possibilities for which the wasted piece is shorter than 42 cm P1 2 135 P7 108 93 2 42 P2 135 108 42 P8 108 4 42 P3 135 93 42 P9 3 93 P4 135 3 42 P10 2 93 2 42 P5 2 108 2 42 P11 93 4 42 P6 108 2 93 P12 7 42 For each possibility Pj on the list we introduce a variable xj 0 representing the number of rolls cut according to that possibility We want to minimize the total number of rolls cut ie j112 xj in such a way that the customers are satisfied For example to satisfy the demand for 395 rolls of width 93 cm we require x3 2x6 x7 3x9 2x10 x11 395 27 Cutting Paper Rolls 27 For each of the widths we obtain one constraint For a more complicated order the list of possibilities would most likely be produced by computer We would be in a quite typical situ ation in which a linear program is not entered by hand but rather is generated by some computer program Moreadvanced techniques even generate the possibilities on the fly during the solution of the linear program which may save time and memory considerably See the entry column generation in the glossary or Chvatals book cited in Chapter 9 from which this example is taken The optimal solution of the resulting linear program has x1 485 x5 20625 x6 1975 and all other components 0 In order to cut 485 rolls according to the possibility P1 one has to unwind half of a roll Here we need more information about the technical possibilities of the paper mill Is cutting a fraction of a roll technically and economically feasible If yes we have solved the problem optimally If not we have to work further and somehow take into account the restriction that only feasible solutions of the linear program with integral xi are of interest This is not at all easy in general and it is the subject of Chapter 3 3 Integer Programming and LP Relaxation 31 Integer Programming In Section 27 we encountered a situation in which among all feasible so lutions of a linear program only those with all components integral are of interest in the practical application A similar situation occurs quite often in attempts to apply linear programming because objects that can be split into arbitrary fractions are more an exception than the rule When hiring workers scheduling buses or cutting paper rolls one somehow has to deal with the fact that workers buses and paper rolls occur only in integral quantities Sometimes an optimal or almostoptimal integral solution can be obtained by simply rounding the components of an optimal solution of the linear pro gram to integers either up or down or to the nearest integer In our paper cutting example from Section 27 it is natural to round up since we have to fulfill the order Starting from the optimal solution x1 485 x5 20625 x6 1975 of the linear program we thus arrive at the integral solution x1 49 x5 207 and x6 198 which means cutting 454 rolls Since we have found an optimum of the linear program we know that no solution whatsoever even one with fractional amounts of rolls allowed can do better than cutting 4525 rolls If we insist on cutting an integral number of rolls we can thus be sure that at least 453 rolls must be cut So the solution obtained by rounding is quite good However it turns out that we can do slightly better The integral solution x1 49 x5 207 x6 196 and x9 1 with all other components 0 requires cutting only 453 rolls By the above considerations no integral solution can do better In general the gap between a rounded solution and an optimal integral solution can be much larger If the linear program specifies that for most of 197 bus lines connecting villages it is best to schedule something between 01 and 03 buses then clearly rounding to integers exerts a truly radical influence The problem of cutting paper rolls actually leads to a problem with a lin ear objective function and linear constraints equations and inequalities but the variables are allowed to attain only integer values Such an optimization 30 3 Integer Programming and LP Relaxation problem is called an integer program and after a small adjustment we can write it in a way similar to that used for a linear program in Chapter 1 An integer program Maximize cT x subject to Ax b x Zn Here A is an mn matrix b Rm c Rn Z denotes the set of integers and Zn is the set of all ncomponent integer vectors The set of all feasible solutions of an integer program is no longer a convex polyhedron as was the case for linear programming but it consists of separate integer points A picture illustrates a twodimensional integer program with five constraints 0 0 Feasible solutions are shown as solid dots and the optimal solution is marked by a circle Note that it lies quite far from the optimum of the linear program with the same five constraints and the same objective function It is known that solving a general integer program is computationally dif ficult more exactly it is an NPhard problem in contrast to solving a linear program Linear programs with many thousands of variables and constraints can be handled in practice but there are integer programs with 10 variables and 10 constraints that are insurmountable even for the most modern com puters and software 32 MaximumWeight Matching 31 Adding the integrality constraints can thus change the difficulty of a problem in a drastic way indeed This may not look so surpris ing anymore if we realize that integer programs can model yesno decisions since an integer variable xj satisfying the linear constraints 0 xj 1 has possible values only 0 no and 1 yes For those familiar with the foundations of NPcompleteness it is thus not hard to model the problem of satisfiability of logical formulas by an integer program In Section 34 we will see how an integer program can ex press the maximum size of an independent set in a given graph which is also one of the basic NPhard problems Several techniques have been developed for solving integer programs In the literature some of them can be found under the headings cutting planes branch and bound as well as branch and cut see the glossary The most successful strategies usually employ linear programming as a subroutine for solving certain auxiliary problems How to do this efficiently is investigated in a branch of mathematics called polyhedral combinatorics The most widespread use of linear programming today and the one that consumes the largest share of computer time is most likely in auxiliary com putations for integer programs Let us remark that there are many optimization problems in which some of the variables are integral while others may attain arbitrary real values Then one speaks of mixed integer programming This is in all likelihood the most frequent type of optimization problem in practice We will demonstrate several important optimization problems that can easily be formulated as integer programs and we will show how linear pro gramming can or cannot be used in their solution But it will be only a small sample from this area which has recently developed extensively and which uses many complicated techniques and clever tricks 32 MaximumWeight Matching A consulting company underwent a thorough reorganization in order to adapt to current trends in which the department of Creative Accounting with 7 employees was closed down But flexibly enough seven new positions have been created The human resources manager in order to assign the new positions to the seven employees conducted interviews with them and gave them extensive questionnaires to fill out Then he summarized the results in scores Each employee got a score between 0 and 100 for each of the positions she or he was willing to accept The manager depicted this information in a diagram in which an expert can immediately recognize a bipartite graph 32 3 Integer Programming and LP Relaxation Amos Boris Devdatt Colette Eleanor Fortunato Gudrun quality manager representative appearance secretary trend analyst undersecretary vacation specialist webmaster 81 90 75 70 88 64 60 85 60 96 74 57 87 95 48 75 55 26 For example this diagram tells us that Boris is willing to accept the job in quality management for which he achieved score of 87 or the job of a trend analyst for which he has score 70 Now the manager wants to select a position for everyone so that the sum of scores is maximized The first idea naturally coming to mind is to give everyone the position for which heshe has the largest score But this cannot be done since for example three people are best suited for the profession of webmaster Eleanor Gudrun and Devdatt If we try to assign the positions by a greedy algorithm meaning that in each step we make an assignment of largest possible score between a yet unoccupied position and a still unassigned employee we end up with filling only 6 positions 81 90 75 70 88 64 60 85 60 96 74 57 87 95 48 75 55 26 1 2 3 4 5 6 A B C D E F G q r s t u v w The bold digits 16 indicate the order of making assignments by the greedy algorithm In the language of graph theory we have a bipartite graph with vertex set V X Y and edge set E Each edge connects a vertex of X to a vertex of Y Moreover we have X Y For each edge e E we are given a nonnegative weight we We want to find a subset M E of edges such that each vertex of both X and Y is incident to exactly one edge of M such an M is called a perfect matching and the sum e M we is the largest possible In order to formulate this problem as an integer program we introduce variables xe one for each edge e E that can attain values 0 or 1 They will encode the soughtafter set M xe 1 means e M and xe 0 means e M Then e M we can be written as e E we xe and this is the objective function The requirement that a vertex v V have exactly one incident edge of M is expressed by having the sum of xe over all edges incident to v equal to 1 In symbols e E v e xe 1 The resulting integer program is maximize e E we xe subject to e E v e xe 1 for each vertex v V and xe 01 for each edge e E 31 If we leave out the integrality conditions ie if we allow each xe to attain all values in the interval 01 we obtain the following linear program Maximize e E we xe subject to e E v e xe 1 for each vertex v V and 0 xe 1 for each edge e E It is called an LP relaxation of the integer program 31we have relaxed the constraints xe 01 to the weaker constraints 0 xe 1 We can solve the LP relaxation say by the simplex method and either we obtain an optimal solution x or we learn that the LP relaxation is infeasible In the latter case the original integer program must be infeasible as well and consequently there is no perfect matching Let us now assume that the LP relaxation has an optimal solution x What can such an x be good for Certainly it provides an upper bound on the best possible solution of the original integer program 31 More precisely the optimum of the objective function in the integer program 31 is bounded above by the value of the objective function at x This is because every feasible solution of the integer program is also a feasible solution of the LP relaxation and so we are maximizing over a larger set of vectors in the LP 34 3 Integer Programming and LP Relaxation relaxation An upper bound can be very valuable For example if we manage to find a feasible solution of the integer program for which the value of the objective function is 98 of the upper bound we can usually stop striving since we know that we cannot improve by more than roughly 2 no matter how hard we try Of course it depends on what we are dealing with if it is a state budget even 2 is still worth some effort A pleasant surprise awaits us in the particular problem we are considering here The LP relaxation not only yields an upper bound but it provides an optimal solution of the original problem Namely if we solve the LP relaxation by the simplex method we obtain an optimal x that has all components equal to 0 or 1 and thus it determines an optimum perfect matching If a better perfect matching existed it would determine a better solution of the LP relaxation The optimal solution discovered in this way is drawn below 81 90 75 70 88 64 60 85 60 96 74 57 87 95 48 75 55 26 A B C D E F G q r s t u v w The following nontrivial theorem shows that things work this nicely for every problem of the considered type 321 Theorem Let G V E be an arbitrary bipartite graph with real edge weights we If the LP relaxation of the integer program 31 has at least one feasible solution then it has at least one integral optimal solution This is an optimal solution for the integer program 31 as well An interested reader can find a proof at the end of this section The theorem doesnt say that every optimal solution of the LP relaxation is necessarily integral However the proof gives an easy recipe for producing an integral optimal solution from an arbitrary optimal solution Moreover it can be shown see Section 82 that the simplex method always returns an integral optimal solution using the terminology of Chapter 4 each basic feasible solution is integral An LP relaxation can be considered for an arbitrary integer program The condition x Zn is simply replaced by x Rn Cases in which we always get an optimal solution of the integer program from the LP relaxation such as in Theorem 321 are rather rare but LP relaxation can also be useful in other ways The maximum of the objective function for the LP relaxation always provides an upper bound for the maximum of the integer program Sometimes this bound is quite tight but at other times it can be very bad see Section 34 The quality of the upper bound from an LP relaxation has been studied for many types of problems Sometimes one adds new linear constraints to the LP relaxation that are satisfied by all integral solutions that is all feasible solutions of the integer program satisfy these new constraints but that exclude some of the nonintegral solutions In this way the upper bound on the optimum of the integer program can often be greatly improved If we continue adding suitable new constraints for long enough then we even arrive at an optimal solution of the integer program This is the main idea of the method of cutting planes A nonintegral optimal solution of the LP relaxation can sometimes be converted to an approximately optimal solution of the integer program by appropriate rounding We will see a simple example in Section 33 and a more advanced one in Section 83 Proof of Theorem 321 Let x be an optimal solution of the LP relaxation and let wx e E we xe be the value of the objective function at x Let us denote the number of nonintegral components of the vector x by kx If kx 0 then we are done For kx 0 we describe a procedure that yields another optimal solution x with kx kx We reach an integral optimal solution by finitely many repetitions of this procedure Let xe1 be a nonintegral component of the vector x corresponding to some edge e1 a1 b1 Since 0 xe1 1 and e E b1 e xe 1 there exists another edge e2 a2 b1 a2 a1 with xe2 nonintegral For a similar reason we can also find a third edge e3 a2 b2 with 0 xe3 1 We continue in this manner and look for nonintegral components along a longer and longer path a1 b1 a2 b2 Since the graph G has finitely many vertices eventually we reach a vertex that we have already visited before This means that we have found a cycle C E in which 0 xe 1 for all edges Since the graph is bipartite the length t of the cycle C is even For simplicity of notation let us assume that the edges of C are e1 e2 et although in reality the cycle found as above need not begin with the edge e1 Now for a small real number ε we define a vector x by xe xe ε for e e1 e3 et1 xe ε for e e2 e4 et xe otherwise It is easy to see that this x satisfies all the conditions e E v e xe 1 v V since at the vertices of the cycle C we have added ε once and subtracted it once while for all other vertices the variables of the incident edges havent changed their values at all For ε sufficiently small the conditions 0 xe 1 are satisfied too since all components xe are strictly between 0 and 1 Hence x is again a feasible solution of the LP relaxation for all sufficiently small ε positive or negative What happens with the value of the objective function We have wx e E we xe wx ε i1t 1i wei wx εΔ where we have set Δ i1t 1i wei Since x is optimal necessarily Δ 0 for otherwise we could achieve wx wx either by choosing ε 0 for Δ 0 or by choosing ε 0 for Δ 0 This means that x is an optimal solution whenever it is feasible ie for all ε with a sufficiently small absolute value Let us now choose the largest ε 0 such that x is still feasible Then there has to exist e e1 e2 et with xe 0 1 and x has fewer nonintegral components than x in the considered bipartite graph and for vertices v Y the condition e E v e xe 1 every vertex of Y is incident to exactly one edge of M is replaced by e E v e xe 1 every vertex of Y is incident to at most one edge of M The claim of Theorem 321 remains valid If the LP relaxation has a feasible solution then it also has an integral optimal solution The proof presented above can be modified to show this We present a different and more conceptual proof in Section 82 which also yields an alternative proof of Theorem 321 We will also briefly discuss the nonbipartite case in Section 82 33 Minimum Vertex Cover The Internet had been expanding rapidly in the Free Republic of West Mordor and the government issued a regulation purely in the interest of improved security of the citizens that every data link connecting two computers must be equipped with a special device for gathering statistical data about the traffic An operator of a part of the network has to attach the governments monitoring boxes to some of his computers so that each link has a monitored computer on at least one end Which computers should get boxes so that the total price is minimum Let us assume that there is a flat rate per box It is again convenient to use graphtheoretic terminology The computers in the network are vertices and the links are edges So we have a graph G V E and we want to find a subset S V of the vertices such that each edge has at least one endvertex in S such an S is called a vertex cover and S is as small as possible This problem can be written as an integer program Minimize v V xv subject to xu xv 1 for every edge u v E xv 0 1 for all v V 32 For every vertex v we have a variable xv which can attain values 0 or 1 The meaning of xv 1 is v S and xv 0 means v S The constraint xu xv 1 guarantees that the edge u v has at least one vertex in S The objective function is the size of S It is known that finding a minimum vertex cover is a computationally difficult NPhard problem We will describe an approximation algorithm based on linear programming that always finds a vertex cover with at most twice as many vertices as in the smallest possible vertex cover An LP relaxation of the above integer program is minimize v V xv subject to xu xv 1 for every edge u v E 0 xv 1 for all v V 33 The first step of the approximation algorithm for vertex cover consists in computing an optimal solution x of this LP relaxation by some standard algorithm for linear programming The components of x are real numbers in the interval 0 1 In the second step we define the set SLP v V xv 12 This is a vertex cover since for every edge u v we have xu xv 1 and so xu 12 or xv 12 Let SOPT be some vertex cover of the minimum possible size we dont have it but we can theorize about it We claim that SLP 2 SOPT To see this let x be an optimal solution of the integer program 32 which corresponds to the set SOPT ie xv 1 for v SOPT and xv 0 otherwise This x is definitely a feasible solution of the LP relaxation 33 and so it cannot have a smaller value of the objective function than an optimal solution x of 33 v V xv v V xv On the other hand SLP v SLP 1 v V 2xv since xv 12 for each v SLP Therefore SLP 2 v V xv 2 v V xv 2 SOPT This proof illustrates an important aspect of approximation algorithms In order to assess the quality of the computed solution we always need a bound on the quality of the optimal solution although we dont know it The LP relaxation provides such a bound which can sometimes be useful as in the example of this section In other problems it may be useless though as we will see in the next section Remarks A natural attempt at an approximate solution of the considered problem is again a greedy algorithm Select vertices one by one and always take a vertex that covers the maximum possible number of yet uncovered edges Although this algorithm may not be bad in most cases examples can be constructed in which it yields a solution at least ten times worse say than an optimal solution and 10 can be replaced by any other constant Discovering such a construction is a lovely exercise There is another combinatorial approximation algorithm for the minimum vertex cover First we find a maximal matching M that is a matching that cannot be extended by adding any other edge we note 42 Basic Feasible Solutions Among all feasible solutions of a linear program a privileged status is granted to socalled basic feasible solutions In this section we will consider them only for linear programs in equational form Let us look again at the picture of the set of feasible solutions for a linear program with n 3 m 1 Among the feasible solutions p q and r only r is basic Expressed geometrically and very informally a basic feasible solution is a tip corner spike of the set of feasible solutions We will formulate this kind of geometric description of a basic feasible solution later see Theorem 441 The definition that we present next turns out to be equivalent but it looks rather different It requires that very roughly speaking a basic feasible solution have sufficiently many zero components Before stating it we introduce a new piece of notation In this section A is always a matrix with m rows and n columns n m of rank m For a subset B 1 2 n we let AB denote the matrix consisting of the columns of A whose indices belong to B For instance for A 1 5 3 4 6 0 1 3 5 6 and B 2 4 we have AB 5 4 1 5 We will use a similar notation for vectors eg for x 3 5 7 9 11 and B 2 4 we have xB 5 9 Now we are ready to state a formal definition A basic feasible solution of the linear program maximize cTx subject to Ax b and x 0 is a feasible solution x ℝn for which there exists an melement set B 1 2 n such that the square matrix AB is nonsingular ie the columns indexed by B are linearly independent and xj 0 for all j B 40 3 Integer Programming and LP Relaxation the vertices and so for such graphs too an optimal solution of the LP relaxation tells us almost nothing about the maximum independent set It is even known that the size of a maximum independent set can not be approximated well by any reasonably efficient algorithm what soever provided that some widely believed but unproved assumptions hold such as P NP This result is from J Hastad Clique is hard to approximate within n1ε Acta Mathematica 1821999 105142 and httpwwwnadakthseviggoproblemlistcompendium html is a comprehensive website for inapproximability results 4 Theory of Linear Programming First Steps 41 Equational Form In the introductory chapter we explained how each linear program can be converted to the form maximize cT x subject to Ax b But the simplex method requires a different form which is usually called the standard form in the literature In this book we introduce a less common but more descriptive term equational form It looks like this Equational form of a linear program Maximize cT x subject to Ax b x 0 As usual x is the vector of variables A is a given mn matrix c Rn b Rm are given vectors and 0 is the zero vector in this case with n components The constraints are thus partly equations and partly inequalities of a very special form xj 0 j 1 2 n called nonnegativity constraints Warning Although we call this form equational it contains inequalities as well and these must not be forgotten Let us emphasize that all variables in the equational form have to satisfy the nonnegativity constraints In problems encountered in practice we often have nonnegativity con straints automatically since many quantities such as the amount of con sumed cucumber cannot be negative Transformation of an arbitrary linear program to equational form We illustrate such a transformation for the linear program maximize 3x1 2x2 subject to 2x1 x2 4 x1 3x2 5 x2 0 42 4 Theory of Linear Programming First Steps We proceed as follows 1 In order to convert the inequality 2x1 x2 4 to an equation we in troduce a new variable x3 together with the nonnegativity constraint x3 0 and we replace the considered inequality by the equation 2x1 x2 x3 4 The auxiliary variable x3 which wont appear any where else in the transformed linear program represents the difference between the righthand side and the lefthand side of the inequality Such an auxiliary variable is called a slack variable 2 For the next inequality x1 3x2 5 we first multiply by 1 which reverses the direction of the inequality Then we introduce another slack variable x4 with the nonnegativity constraint x4 0 and we replace the inequality by the equation x1 3x2 x4 5 3 We are not finished yet The variable x1 in the original linear program is allowed to attain both positive and negative values We introduce two new nonnegative variables y1 and z1 y1 0 z1 0 and we substitute for x1 the difference y1z1 everywhere The variable x1 itself disappears The resulting equational form of our linear program is maximize 3y1 3z1 2x2 subject to 2y1 2z1 x2 x3 4 y1 z1 3x2 x4 5 y1 0 z1 0 x2 0 x3 0 x4 0 So as to comply with the conventions of the equational form in full we should now rename the variables to x1 x2 x5 The presented procedure converts an arbitrary linear program with n vari ables and m constraints into a linear program in equational form with at most m 2n variables and m equations and of course nonnegativity constraints for all variables Geometry of a linear program in equational form Let us consider a linear program in equational form Maximize cT x subject to Ax b x 0 As is derived in linear algebra the set of all solutions of the system Ax b is an affine subspace F of the space Rn Hence the set of all feasible solutions of the linear program is the intersection of F with the nonnegative orthant which is the set of all points in Rn with all coordinates nonnegative1 The following picture illustrates the geometry of feasible solutions for a linear program with n 3 variables and m 1 equation namely the equation x1 x2 x3 1 1 In the plane n 2 this set is called the nonnegative quadrant in R3 it is the nonnegative octant and the name orthant is used for an arbitrary dimension 41 Equational Form 43 x1 x3 x2 0 The set of all solutions of Ax b a plane The set of all feasible solutions a triangle In interesting cases we usually have more than 3 variables and no picture can be drawn A preliminary cleanup Now we will be talking about solutions of the system Ax b By this we mean arbitrary real solutions whose components may be positive negative or zero So this is not the same as feasible solutions of the considered linear program since a feasible solution has to satisfy Ax b and have all components nonnegative If we change the system Ax b by some transformation that preserves the set of solutions such as a row operation in Gaussian elimination it influ ences neither feasible solutions nor optimal solutions of the linear program This will be amply used in the simplex method Assumption We will consider only linear programs in equational form such that the system of equations Ax b has at least one solution and the rows of the matrix A are linearly independent As an explanation of this assumption we need to recall a few facts from linear algebra Checking whether the system Ax b has a solution is easy by Gaussian elimination and if there is no solution the considered linear program has no feasible solution either and we can thus disregard it If the system Ax b has a solution and if some row of A is a linear combination of the other rows then the corresponding equation is redundant and it can be deleted from the system without changing the set of solutions We may thus assume that the matrix A has m linearly independent rows and therefore rank m For example x 0 2 0 1 0 is a basic feasible solution for A 1 5 3 4 6 0 1 3 5 6 b 14 7 with B 2 4 If such a B is fixed we call the variables xj with j B the basic variables while the remaining variables are called nonbasic We can thus briefly say that all nonbasic variables are zero in a basic feasible solution Let us note that the definition doesnt consider the vector c at all and so basic feasible solutions depend solely on A and b For some considerations it is convenient to reformulate the definition of a basic feasible solution a little 421 Lemma A feasible solution x of a linear program in equational form is basic if and only if the columns of the matrix AK are linearly independent where K j 1 2 n xj 0 Proof One of the implications is obvious If x is a basic feasible solution and B is the corresponding melement set as in the definition then K B and thus the columns of the matrix AK are linearly independent Conversely let x be feasible and such that the columns of AK are linearly independent If K m then we can simply take B K Otherwise for K m we extend K to an melement set B by adding m K more indices so that the columns of AB are linearly independent This is a standard fact of linear algebra which can be verified using the algorithm described next We initially set the current B to K and repeat the following step If A has a column that is not in the linear span of the columns of AB we add the index of such a column to B As soon as this step is no longer possible that is all columns of A are in the linear span of the columns of B it is easily seen that the columns of AB constitute a basis of the column space of A Since A has rank m we have B m as needed 42 Basic Feasible Solutions 45 For example x 0 2 0 1 0 is a basic feasible solution for A 1 5 3 4 6 0 1 3 5 6 b 14 7 with B 2 4 If such a B is fixed we call the variables xj with j B the basic variables while the remaining variables are called nonbasic We can thus briefly say that all nonbasic variables are zero in a basic feasible solution Let us note that the definition doesnt consider the vector c at all and so basic feasible solutions depend solely on A and b For some considerations it is convenient to reformulate the definition of a basic feasible solution a little 421 Lemma A feasible solution x of a linear program in equational form is basic if and only if the columns of the matrix AK are linearly independent where K j 1 2 n xj 0 Proof One of the implications is obvious If x is a basic feasible solution and B is the corresponding melement set as in the definition then K B and thus the columns of the matrix AK are linearly independent Conversely let x be feasible and such that the columns of AK are linearly independent If K m then we can simply take B K Otherwise for K m we extend K to an melement set B by adding m K more indices so that the columns of AB are linearly independent This is a standard fact of linear algebra which can be verified using the algorithm described next We initially set the current B to K and repeat the following step If A has a column that is not in the linear span of the columns of AB we add the index of such a column to B As soon as this step is no longer possible that is all columns of A are in the linear span of the columns of B it is easily seen that the columns of AB constitute a basis of the column space of A Since A has rank m we have B m as needed 422 Proposition A basic feasible solution is uniquely determined by the set B That is for every melement set B 1 2 n with AB nonsingular there exists at most one feasible solution x ℝn with xj 0 for all j B Let us stress right away that a single basic feasible solution may be obtained from many different sets B Proof of Proposition 422 For x to be feasible we must have Ax b The lefthand side can be rewritten to Ax ABxB ANxN where N 1 2 n B For x to be a basic feasible solution the vector xN of nonbasic variables must equal 0 and thus the vector xB of basic variables satisfies ABxB b And here we use the fact that AB is a nonsingular square matrix The system ABxB b has exactly one solution xB If all components of xB are nonnegative then we have exactly one basic feasible solution for the considered B we amend xB by zeros and otherwise we have none We introduce the following terminology We call an melement set B 1 2 n with AB nonsingular a basis² If moreover B determines a basic feasible solution or in other words if the unique solution of the system AB xB b is nonnegative then we call B a feasible basis The following theorem deals with the existence of optimal solutions and moreover it shows that it suffices to look for them solely among basic feasible solutions 423 Theorem Let us consider a linear program in equational form maximize cᵀ x subject to Ax b x 0 i Optimal solutions may fail to exist only for obvious reasons If there is at least one feasible solution and the objective function is bounded from above on the set of all feasible solutions then there exists an optimal solution ii If an optimal solution exists then there is a basic feasible solution that is optimal A proof is not necessary for further reading and we defer it to the end of this section The theorem also follows from the correctness of the simplex method which will be discussed in the next chapter The theorem just stated implies a finite although entirely impractical algorithm for solving linear programs in equational form We consider all melement subsets B 1 2 n one by one and for each of them we check whether it is a feasible basis by solving a system of linear equations we obtain at most one basic feasible solution for each B by Proposition 422 Then we calculate the maximum of the objective function over all basic feasible solutions found in this way Strictly speaking this algorithm doesnt work if the objective function is unbounded Formulating a variant of the algorithm that functions properly even in this case ie it reports that the linear program is unbounded we leave as an exercise Soon we will discuss the considerably more efficient simplex method and there we show in detail how to deal with unboundedness We have to consider n choose m sets B in the above algorithm³ For example for n 2m the function 2m choose m grows roughly like 4m ie exponentially and this is too much even for moderately large m ² This is a shortcut The index set B itself is not a basis in the sense of linear algebra of course Rather the set of columns of the matrix AB constitutes a basis of the column space of A ³ We recall that the binomial coefficient n choose m nm nm counts the number of melement subsets of an nelement set 42 Basic Feasible Solutions 47 As we will see in Chapter 5 the simplex method also goes through basic feasible solutions but in a more clever way It walks from one to another while improving the value of the objective function all the time until it reaches an optimal solution Let us summarize the main findings of this section A linear program in equational form has finitely many basic feasible solutions and if it is feasible and bounded then at least one of the basic feasible solutions is optimal Consequently any linear program that is feasible and bounded has an optimal solution Proof of Theorem 423 We will use some steps that will reappear in the simplex method in a more elaborate form and so the present proof is a kind of preparation We prove the following statement If the objective function of a linear program in equational form is bounded above then for every feasible solution x0 there exists a basic feasible solution x with the same or larger value of the objective function ie cT x cT x0 How does this imply the theorem If the linear program is feasible and bounded then according to the statement for every feasible solution there is a basic feasible solution with the same or larger objective function Since there are only finitely many basic feasible solutions some of them have to give the maximum value of the objective function which means that they are optimal We thus get both i and ii at once In order to prove the statement let us consider an arbitrary feasible solu tion x0 Among all feasible solutions x with cT x cT x0 we choose one that has the largest possible number of zero components and we call it x it need not be determined uniquely We define an index set K j 1 2 n xj 0 If the columns of the matrix AK are linearly independent then x is a basic feasible solution as in the statement by Lemma 421 and we are done So let us suppose that the columns of AK are linearly dependent which means that there is a nonzero Kcomponent vector v such that AKv 0 We extend v by zeros in positions outside K to an ncomponent vector w so wK v and Aw AKv 0 Let us assume for a moment that w satisfies the following two conditions we will show later why we can assume this i cT w 0 ii There exists j K with wj 0 48 For a real number t 0 let us consider the vector xt x tw We show that for some suitable t1 0 the vector xt1 is a feasible solution with more zero components than x At the same time cT xt1 cT x t1cT w cT x0 t1cT w cT x0 and so we get a contradiction to the assumption that x has the largest possible number of zero components We have Axt b for all t since Axt Ax tAw Ax b because x is feasible Moreover for t 0 the vector x0 x has all components from K strictly positive and all other components zero For the jth component of xt we have xtj xj twj and if wj 0 as in condition ii we get xtj 0 for all sufficiently large t 0 If we begin with t 0 and let t grow then those xtj with wj 0 are decreasing and at a certain moment t the first of these decreasing components reaches 0 At this moment obviously xt still has all components nonnegative and thus it is feasible but it has at least one extra zero component compared to x This as we have already noted is a contradiction Now what do we do if the vector w fails to satisfy condition i or ii If cT w 0 then i holds and ii can be recovered by changing the sign of w since w 0 So we assume cT w 0 and again after a possible sign change we can achieve cT w 0 and thus i Now if ii fails we must have w 0 But this means that xt x tw 0 for all t 0 and hence all such xt are feasible The value of the objective function for xt is cT xt cT x tcT w and it tends to infinity as t Hence the linear program is unbounded This concludes the proof 43 ABC of Convexity and Convex Polyhedra Convexity is one of the fundamental notions in all mathematics and in the theory of linear programming it is encountered very naturally Here we recall the definition and present some of the most basic notions and results which at the very least help in gaining a better intuition about linear programming On the other hand linear programming can be presented without these notions and in concise courses there is usually no time for such material Accordingly this section and the next are meant as extending material and the rest of the book should mostly be accessible without them A set X Rn is convex if for every two points x y X it also contains the segment xy Expressed differently for every x y X and every t 0 1 we have tx 1ty X A word of explanation might be in order tx 1ty is the point on the segment xy at distance t from y and distance 1t from x if we take the length of the segment as unit distance Here are a few examples of convex and nonconvex sets in the plane 4 Theory of Linear Programming First Steps nonconvex convex The convex set at the bottom right in this picture a stadium is worth remembering since often it is a counterexample to statements about convex sets that may look obvious at first sight but are false In calculus one works mainly with convex functions Both notions convex sets and convex functions are closely related For instance a real function f R R is convex if and only if its epigraph ie the set x y R2 y fx is a convex set in the plane In general a function f X R is called convex if for every x y X and every t 0 1 we have ftx 1 ty t fx 1 tfy The function is called strictly convex if the inequality is strict for all x y Convex hull and convex combinations It is easily seen that the intersection of an arbitrary collection of convex sets is again a convex set This allows us to define the convex hull Let X Rn be a set The convex hull of X is the intersection of all convex sets that contain X Thus it is the smallest convex set containing X in the sense that any convex set containing X also contains its convex hull X the convex hull of X This is not a very constructive definition The convex hull can also be described using convex combinations in a way similar to the description of the linear span of a set of vectors using linear combinations Let x1 x2 xm be points in Rn Every point of the form t1 x1 t2 x2 tm xm where t1 t2 tm 0 and Σti 1 is called a convex combination of x1 x2 xm A convex combination is thus a particular kind of a linear combination in which the coefficients are nonnegative and sum to 1 Convex combinations of two points x and y are of the form tx 1 ty t 0 1 and as we said after the definition of a convex set they fill exactly the segment xy It is easy but instructive to show that all convex combinations of three points x y z fill exactly the triangle xyz unless the points are collinear that is 431 Lemma The convex hull C of a set X Rn equals the set C Σti xi m 1 x1 xm X t1 tm 0 Σti 1 of all convex combinations of finitely many points of X Proof First we prove by induction on m that each convex combination has to lie in the convex hull C For m 1 it is obvious and for m 2 it follows directly from the convexity of C Let m 3 and let x t1 x1 tm xm be a convex combination of points of X If tm 1 then we have x xm C For tm 1 let us put ti ti1 tm i 1 2 m 1 Then x t1 x1 tm1 xm1 is a convex combination of the points x1 xm1 the ti sum to 1 and by the inductive hypothesis x C So x 1 tmx tm xm is a convex combination of two points of the convex set C and as such it also lies in C We have thus proved C C For the reverse inclusion it suffices to prove that C is convex that is to verify that whenever x y C are two convex combinations and t 0 1 then tx 1 ty is again a convex combination This is straightforward and we take the liberty of omitting further details Convex sets encountered in the theory of linear programming are of a special type and they are called convex polyhedra Hyperplanes halfspaces polyhedra We recall that a hyperplane in Rn is an affine subspace of dimension n 1 In other words it is the set of all solutions of a single linear equation of the form a1 x1 a2 x2 an xn b where a1 a2 an are not all 0 Hyperplanes in R2 are lines and hyperplanes in R3 are ordinary planes A hyperplane divides Rn into two halfspaces and it constitutes their common boundary For the hyperplane with equation a1 x1 a2 x2 an xn b the two halfspaces have the following analytic expression x Rn a1 x1 a2 x2 an xn b and x Rn a1 x1 a2 x2 an xn b More exactly these are closed halfspaces that contain their boundary 43 ABC of Convexity and Convex Polyhedra 51 A convex polyhedron is an intersection of finitely many closed half spaces in Rn A halfspace is obviously convex and hence an intersection of halfspaces is convex as well Thus convex polyhedra bear the attribute convex by right A disk in the plane is a convex set but it is not a convex polyhedron because roughly speaking a convex polyhedron has to be edgy but try proving this formally A halfspace is the set of all solutions of a single linear inequality with at least one nonzero coefficient of some variable xj The set of all solutions of a system of finitely many linear inequalities aka the set of all feasible solutions of a linear program is geometrically the intersection of finitely many halfspaces alias a convex polyhedron We should perhaps also mention that a hyperplane is the intersection of two halfspaces and so the constraints can be both inequalities and equations Let us note that a convex polyhedron can be unbounded since for ex ample a single halfspace is also a convex polyhedron A bounded convex polyhedron ie one that can be placed inside some large enough ball is called a convex polytope The dimension of a convex polyhedron P Rn is the smallest dimension of an affine subspace containing P Equivalently it is the largest d for which P contains points x0 x1 xd such that the dtuple of vectors x1x0 x2 x0 xd x0 is linearly independent The empty set is also a convex polyhedron and its dimension is usually defined as 1 All convex polygons in the plane are twodimensional convex polyhe dra Several types of threedimensional convex polyhedra are taught at high schools and decorate mathematical cabinets such as cubes boxes pyramids or even regular dodecahedra which can also be met as desktop calendars Simple examples of convex polyhedra of an arbitrary dimension n are The ndimensional cube 1 1n which can be written as the intersec tion of 2n halfspaces which ones n 1 n 2 n 3 52 the ndimensional crosspolytope x Rn x1 x2 xn 1 n 1 n 2 n 3 For n 3 we get the regular octahedron For expressing the ndimensional crosspolytope as an intersection of halfspaces we need 2n halfspaces can you find them The regular ndimensional simplex n 1 n 2 n 3 can be defined in a quite simple and nice way as a subset of Rn1 x Rn1 x1 x2 xn1 0 x1 x2 xn1 1 We note that this is exactly the set of all feasible solutions of the linear program with the single equation x1 x2 xn1 1 and non negativity constraints4 see the picture in Section 41 In general any ndimensional convex polytope bounded by n1 hyperplanes is called a simplex Many interesting examples of convex polyhedra are obtained as sets of feasible solutions of natural linear programs For example the LP relaxation of the problem of maximumweight matching Section 32 for a complete bipartite graph leads to the Birkhoff polytope Geometric properties of such polyhedra 4 On the other hand the set of feasible solutions of a linear program in equational form certainly isnt always a simplex The simplex method is so named for a rather complicated reason related to an alternative geometric view of a linear program in equational form different from the one discussed in this book Ac cording to this view the mtuple of numbers in the jth column of the matrix A together with the number cj is interpreted as a point in Rm1 Then the simplex method can be interpreted as a walk through certain simplices with vertices at these points It was this view that gave Dantzig faith in the simplex method and convinced him that it made sense to study it 4 Theory of Linear Programming First Steps 44 Vertices and Basic Feasible Solutions 53 are often related to properties of combinatorial objects and to solutions of combinatorial optimization problems in an interesting way A nice book about convex polyhedra is G M Ziegler Lectures on Polytopes SpringerVerlag Heidelberg 1994 corrected 2nd edition 1998 The book B Grunbaum Convex Polytopes second edition prepared by Volker Kaibel Victor Klee and Gunter Ziegler SpringerVerlag Heidelberg 2003 is a new edition of a 1967 classics with extensive updates on the material covered in the original book 44 Vertices and Basic Feasible Solutions A vertex of a convex polyhedron can be thought of as a tip or spike For instance a threedimensional cube has 8 vertices and a regular octahedron has 6 vertices Mathematically a vertex is defined as a point where some linear function attains a unique maximum Thus a point v is called a vertex of a convex polyhedron P Rn if v P and there exists a nonzero vector c Rn such that cT v cT y for all y P v Geometrically it means that the hyperplane x Rn cT x cT v touches the polyhedron P exactly at v c cT x cT v v Threedimensional polyhedra have not only vertices but also edges and faces A general polyhedron P Rn of dimension n can have vertices edges 2dimensional faces 3dimensional faces up to n1 dimensional faces They are defined as follows A subset F P is a kdimensional face of a convex polyhedron P if F has dimension k and there exist a nonzero vector c Rn and a number z R such that 54 cT x z for all x F and cT x z for all x P F In other words there exists a hyperplane that touches P exactly at F Since such an F is the intersection of a hyperplane with a convex polyhedron it is a convex polyhedron itself and its dimension is thus well defined An edge is a 1dimensional face and a vertex is a 0dimensional face Now we prove that vertices of a convex polyhedron and basic feasible solutions of a linear program are the same concept 441 Theorem Let P be the set of all feasible solutions of a linear program in equational form so P is a convex polyhedron Then the following two conditions for a point v P are equivalent i v is a vertex of the polyhedron P ii v is a basic feasible solution of the linear program Proof The implication iii follows immediately from Theorem 423 with c being the vector defining v It remains to prove iii Let us consider a basic feasible solution v with a feasible basis B and let us define a vector c Rn by cj 0 for j B and cj 1 otherwise We have cT v 0 and cT x 0 for any x 0 and hence v maximizes the objective function cT x Moreover cT x 0 whenever x has a nonzero component outside B But by Proposition 422 v is the only feasible solution with all nonzero components in B and therefore v is the only point of P maximizing cT x Basic feasible solutions for arbitrary linear programs A sim ilar theorem is valid for an arbitrary linear program not only for one in equational form We will not prove it here but we at least say what a basic feasible solution is for a general linear program 442 Definition A basic feasible solution of a linear program with n variables is a feasible solution for which some n linearly inde pendent constraints hold with equality A constraint that is an equation always has to be satisfied with equality while an inequality constraint may be satisfied either with equality or with a strict inequality The nonnegativity constraints satisfied with equality are also counted The linear independence of constraints means that the vectors of the coefficients of the vari ables are linearly independent For example for n 4 the constraint 3x1 5x3 7x4 10 has the corresponding vector 3 0 5 7 As is known from linear algebra a system of n linearly independent linear equations in n variables has exactly one solution Hence if x is a basic feasible solution and it satisfies some n linearly independent 4 Theory of Linear Programming First Steps 44 Vertices and Basic Feasible Solutions 55 constraints with equality then it is the only point in Rn that satisfies these n constraints with equality Geometrically speaking the con straints satisfied with equality determine hyperplanes x lies on some n of them and these n hyperplanes meet in a single point The definition of a basic feasible solution for the equational form looks quite different but in fact it is a special case of the new defini tion as we now indicate For a linear program in equational form we have m linearly independent equations always satisfied with equality and so it remains to satisfy with equality some n m of the non negativity constraints and these must be linearly independent with the equations The coefficient vector of the nonnegativity constraint xj 0 is ej with 1 at position j and with zeros elsewhere If x is a ba sic feasible solution according to the new definition then there exists a set N 1 2 n of size nm such that xj 0 for all j N and the rows of the matrix A together with the vectors ej j N con stitute a linearly independent collection This happens exactly if the matrix AB has linearly independent rows where B 1 2 nN and we are back at the definition of a basic feasible solution for the equational form For a general linear program none of the optimal solutions have to be basic as is illustrated by the linear program maximize x1 x2 subject to x1 x2 1 This contrasts with the situation for the equational form cf Theo rem 423 and it is one of the advantages of the equational form Vertices and extremal points The intuitive notion of a tip of a convex set can be viewed mathematically in at least two ways One of them is captured by the above definition of a vertex of a convex polyhedron A tip is a point for which some linear function attains a unique maximum The other one leads to a definition talking about points that cannot be generated by segments These are called ex tremal points thus a point x is an extremal point of a convex set C Rn if x C and there are no two points y z C different from x such that x lies on the segment yz For a convex polyhedron it is not difficult to show that the extremal points are exactly the vertices Hence we have yet another equivalent description of a basic feasible solution A convex polytope is the convex hull of its vertices A gen eral convex polyhedron need not have any vertices at allconsider a halfspace However a convex polytope P ie a bounded convex polyhedron always has vertices and even more is true P equals the convex hull of the set of its vertices This may look intuitively obvi ous from examples in dimensions 2 and 3 but a proof is nontrivial Zieglers book cited in the previous section calls this the Main The 56 orem of polytope theory Consequently every convex polytope can be represented either as the intersection of finitely many halfspaces or as the convex hull of finitely many points 4 Theory of Linear Programming First Steps 5 The Simplex Method In this chapter we explain the simplex method for solving linear programs We will make use of the terms equational form and basic feasible solution from the previous chapter Gaussian elimination in linear algebra has a fundamental theoretical and didactic significance as a starting point for further developments But in practice it has mostly been replaced by more complicated and more efficient algorithms Similarly the basic version of the simplex method that we discuss here is not commonly used for solving linear programs in practice We do not put emphasis on the most efficient possible organization of the computations but rather we concentrate on the main ideas 51 An Introductory Example We will first show the simplex method in action on a small concrete example namely on the following linear program Maximize x1 x2 subject to x1 x2 1 x1 3 x2 2 x1 x2 0 51 We intentionally do not take a linear program in equational form The vari ables are nonnegative but the inequalities have to be replaced by equations by introducing slack variables The equational form is maximize x1 x2 subject to x1 x2 x3 1 x1 x4 3 x2 x5 2 x1 x2 x5 0 with the matrix A 1 1 1 0 0 1 0 0 1 0 0 1 0 0 1 58 5 The Simplex Method In the simplex method we first express each linear program in the form of a simplex tableau In our case we begin with the tableau x3 1 x1 x2 x4 3 x1 x5 2 x2 z x1 x2 The first three rows consist of the equations of the linear program in which the slack variables have been carried over to the lefthand side and the re maining terms are on the righthand side The last row separated by a line contains a new variable z which expresses the objective function Each simplex tableau is associated with a certain basic feasible solution In our case we substitute 0 for the variables x1 and x2 from the righthand side and without calculation we see that x3 1 x4 3 x5 2 This feasible solution is indeed basic with B 3 4 5 we note that AB is the identity matrix The variables x3 x4 x5 from the lefthand side are basic and the vari ables x1 x2 from the righthand side are nonbasic The value of the objective function z 0 corresponding to this basic feasible solution can be read off from the last row of the tableau From the initial simplex tableau we will construct a sequence of tableaus of a similar form by gradually rewriting them according to certain rules Each tableau will contain the same information about the linear program only written differently The procedure terminates with a tableau that represents the information so that the desired optimal solution can be read off directly Let us go to the first step We try to increase the value of the objective function by increasing one of the nonbasic variables x1 or x2 In the above tableau we observe that increasing the value of x1 ie making x1 positive increases the value of z The same is true for x2 because both variables have positive coefficients in the zrow of the tableau We can choose either x1 or x2 let us decide arbitrarily for x2 We will increase it while x1 will stay 0 By how much can we increase x2 If we want to maintain feasibility we have to be careful not to let any of the basic variables x3 x4 x5 go below zero This means that the equations determining x3 x4 x5 may limit the increment of x2 Let us consider the first equation x3 1 x1 x2 Together with the implicit constraint x3 0 it lets us increase x2 up to the value x2 1 while keeping x1 0 The second equation x4 3 x1 does not limit the increment of x2 at all and the third equation x5 2 x2 51 An Introductory Example 59 allows for an increase of x2 up to x2 2 before x5 gets negative The most stringent restriction thus follows from the first equation We increase x2 as much as we can obtaining x2 1 and x3 0 From the remaining equations of the tableau we get the values of the other variables x4 3 x1 3 x5 2 x2 1 In this new feasible solution x3 became zero and x2 nonzero Quite natu rally we thus transfer x3 to the righthand side where the nonbasic variables live and x2 to the lefthand side where the basic variables reside We do it by means of the most stringent equation x3 1 x1 x2 from which we express x2 1 x1 x3 We substitute the righthand side for x2 into the remaining equations and we arrive at a new tableau x2 1 x1 x3 x4 3 x1 x5 1 x1 x3 z 1 2x1 x3 Here B 2 4 5 which corresponds to the basic feasible solution x 0 1 0 3 1 with the value of the objective function z 1 This process of rewriting one simplex tableau into another is called a pivot step In each pivot step some nonbasic variable in our case x2 enters the basis while some basic variable in our case x3 leaves the basis In the new tableau we can further increase the value of the objective function by increasing x1 while increasing x3 would lead to a smaller zvalue The first equation does not restrict the increment of x1 in any way from the second one we get x1 3 and from the third one x1 1 so the strictest limitation is implied by the third equation Similarly as in the previous step we express x1 from it and we substitute this expression into the remaining equations Thereby x1 enters the basis and moves to the lefthand side and x5 leaves the basis and migrates to the righthand side The tableau we obtain is x1 1 x3 x5 x2 2 x5 x4 2 x3 x5 z 3 x3 2x5 with B 1 2 4 basic feasible solution x 1 2 0 2 0 and z 3 After one more pivot step in which x3 enters the basis and x4 leaves it we arrive at the tableau x1 3 x4 x2 2 x5 x3 2 x4 x5 z 5 x4 x5 60 5 The Simplex Method with basis 1 2 3 basic feasible solution x 3 2 2 0 0 and z 5 In this tableau no nonbasic variable can be increased without making the objective function value smaller so we are stuck Luckily this also means that we have already found an optimal solution Why Let us consider an arbitrary feasible solution x x1 x5 of our linear program with the objective function attaining some value z Now x and z satisfy all equations in the final tableau which was obtained from the original equations of the linear program by equivalent transformations Hence we necessarily have z 5 x4 x5 Together with the nonnegativity constraints x4 x5 0 this implies z 5 The tableau even delivers a proof that x 3 2 2 0 0 is the only optimal solution If z 5 then x4 x5 0 and this determines the values of the remaining variables uniquely A geometric illustration For each feasible solution x1 x2 of the original linear program 51 with inequalities we have exactly one corresponding feasible solution x1 x2 x5 of the modified linear program in equational form and conversely The sets of feasible solutions are isomorphic in a suitable sense and we can thus follow the progress of the simplex method narrated above in a planar picture for the original linear program 51 0 1 1 2 3 2 0 0 x1 3 x2 2 x1 0 x2 0 x1 x2 1 We can see the simplex method moving along the edges from one feasible solution to another while the value of the objective function grows until it reaches the optimum In the example we could also take a shorter route if we decided to increase x1 instead of x2 in the first step Potential troubles In our modest example the simplex method has run smoothly without any problems In general we must deal with several com plications We will demonstrate them on examples in the next sections 52 Exception Handling Unboundedness 61 52 Exception Handling Unboundedness What happens in the simplex method for an unbounded linear program We will show it on the example maximize x1 subject to x1 x2 1 x1 x2 2 x1 x2 0 illustrated in the picture below x1 0 x2 0 x1 x2 1 x1 x2 2 After the usual transformation to equational form by introducing slack vari ables x3 x4 we can use these variables as a feasible basis and we obtain the initial simplex tableau x3 1 x1 x2 x4 2 x1 x2 z x1 After the first pivot step with entering variable x1 and leaving variable x3 the next tableau is x1 1 x2 x3 x4 3 x3 z 1 x2 x3 If we now try to introduce x2 into the basis we discover that none of the equations in the tableau restrict its increase in any way We can thus take x2 arbitrarily large and we also get z arbitrarily largethe linear program is unbounded Let us analyze this situation in more detail From the tableau one can see that for an arbitrarily large number t 0 we obtain a feasible solution by setting x2 t x3 0 x1 1 t and x4 3 with the value of the objective function z 1 t In other words the semiinfinite ray 1 0 0 3 t1 1 0 0 t 0 62 5 The Simplex Method is contained in the set of feasible solutions It witnesses the unbounded ness of the linear program since the objective function attains arbitrarily large values on it The corresponding semiinfinite ray for the original two dimensional linear program is drawn thick in the picture above A similar ray is the output of the simplex method for all unbounded linear programs 53 Exception Handling Degeneracy While we can make some nonbasic variable arbitrarily large in the unbounded case the other extreme happens in a situation called a degeneracy The equa tions in a tableau do not permit any increment of the selected nonbasic vari able and it may actually be impossible to increase the objective function z in a single pivot step Let us consider the linear program maximize x2 subject to x1 x2 0 x1 2 x1 x2 0 52 x1 0 x2 0 x1 2 x1 x2 0 In the usual way we convert it to equational form and construct the initial tableau x3 x1 x2 x4 2 x1 z x2 The only candidate for entering the basis is x2 but the first row of the tableau shows that its value cannot be increased without making x3 negative Unfortunately the impossibility of making progress in this case does not imply optimality so we have to perform a degenerate pivot step ie one with zero progress in the objective function In our example bringing x2 into the basis with x3 leaving results in another tableau with the same basic feasible solution 0002 x2 x1 x3 x4 2 x1 z x1 x3 Nevertheless the situation has improved The nonbasic variable x1 can now be increased and by entering it into the basis replacing x4 we already obtain the final tableau x1 2 x4 x2 2 x3 x4 z 2 x3 x4 with an optimal solution x 2 2 0 0 A situation that forces a degenerate pivot step may occur only for a linear program in which several feasible bases correspond to a single basic feasible solution Such linear programs are called degenerate It is easily seen that in order that a single basic feasible solution be obtained from several bases some of the basic variables have to be zero In this example after one degenerate pivot step we could again make progress In general there might be longer runs of degenerate pivot steps It may even happen that some tableau is repeated in a sequence of degenerate pivot steps and so the algorithm might pass through an infinite sequence of tableaux without any progress This phenomenon is called cycling An example of a linear program for which the simplex method may cycle can be found in Chvátals textbook cited in Chapter 9 the smallest possible example has 6 variables and 3 equations and we will not present it here If the simplex method doesnt cycle then it necessarily finishes in a finite number of steps This is because there are only finitely many possible simplex tableaux for any given linear program namely at most n choose m which we will prove in Section 55 How can cycling be prevented This is a nontrivial issue and it will be discussed in Section 58 54 Exception Handling Infeasibility In order that the simplex method be able to start at all we need a feasible basis In examples discussed up until now we got a feasible basis more or less for free It works this way for all linear programs of the form maximize cT x subject to Ax b and x 0 with b 0 Indeed the indices of the slack variables introduced in the transformation to equational form can serve as a feasible basis 64 5 The Simplex Method However in general finding any feasible solution of a linear program is equally as difficult as finding an optimal solution see the remark in Sec tion 13 But computing the initial feasible basis can be done by the simplex method itself if we apply it to a suitable auxiliary problem Let us consider the linear program in equational form maximize x1 2x2 subject to x1 3x2 x3 4 2x2 x3 2 x1 x2 x3 0 Let us try to produce a feasible solution starting with x1 x2 x3 0 0 0 This vector is nonnegative but of course it is not feasible since it does not satisfy the equations of the linear program We introduce auxiliary variables x4 and x5 as corrections of infeasibility x4 4x13x2 x3 expresses by how much the original variables x1 x2 x3 fail to satisfy the first equation and x5 2 2x2 x3 plays a similar role for the second equation If we managed to find nonnegative values of x1 x2 x3 for which both of these corrections come out as zeros we would have a feasible solution of the considered linear program The task of finding nonnegative x1 x2 x3 with zero corrections can be captured by a linear program Maximize x4 x5 subject to x1 3x2 x3 x4 4 2x2 x3 x5 2 x1 x2 x5 0 The optimal value of the objective function x4 x5 is 0 exactly if there exist values of x1 x2 x3 with zero corrections ie a feasible solution of the original linear program This is the right auxiliary linear program The variables x4 and x5 form a feasible basis with the basic feasible solution 0 0 0 4 2 Here we use that the righthand sides 4 and 2 are nonnegative but since we deal with equations this can always be achieved by sign changes Once we express the objective function using the nonbasic variables that is in the form z 6 x1 5x2 2x3 we can start the simplex method on the auxiliary linear program The auxiliary linear program is surely bounded since the objective func tion cannot be positive The simplex method thus computes a basic feasible solution that is optimal As training the reader can check that if we let x1 enter the basis in the first pivot step and x3 in the second the final simplex tableau comes out as x1 2 x2 x4 x5 x3 2 2x2 x5 z x4 x5 55 Simplex Tableaus in General 65 The corresponding optimal solution 2 0 2 0 0 yields a basic feasible solu tion of the original linear program x1 x2 x3 2 0 2 The initial simplex tableau for the original linear program can even be obtained from the final tableau of the auxiliary linear program by leaving out the columns of the auxiliary variables x4 and x51 and by changing the objective function back to the original one expressed in terms of the nonbasic variables x1 2 x2 x3 2 2x2 z 2 x2 Starting from this tableau a single pivot step already reaches the optimum 55 Simplex Tableaus in General In this section and the next one we formulate in general and mostly with proofs what has previously been explained on examples Let us consider a general linear program in equational form maximize cT x subject to Ax b and x 0 The simplex method applied to it computes a sequence of simplex tableaus Each of them corresponds to a feasible basis B and it determines a basic feasible solution as we will soon verify Let us recall that a feasible basis is an melement set B 1 2 n such that the matrix AB is nonsingular and the unique solution of the system ABxB b is nonnegative Formally we will define a simplex tableau as a certain system of linear equations of a special form in which the basic variables and the variable z representing the value of the objective function stand on the lefthand side and they are expressed in terms of the nonbasic variables A simplex tableau T B determined by a feasible basis B is a system of m1 linear equations in variables x1 x2 xn and z that has the same set of solutions as the system Ax b z cT x and in matrix notation looks as follows xB p Q xN z z0 rT xN where xB is the vector of the basic variables N 1 2 nB xN is the vector of nonbasic variables p Rm r Rnm Q is an mnm matrix and z0 R 1 It may happen that some auxiliary variables are zero but still basic in the final tableau of the auxiliary program and so they cannot simply be left out Section 56 discusses this easy issue 66 5 The Simplex Method The basic feasible solution corresponding to this tableau can be read off immediately It is obtained by substituting xN 0 that is we have xB p From the feasibility of the basis B we see that p 0 The objective function for this basic feasible solution has value z0 rT 0 z0 The values of p Q r z0 can easily be expressed using B and A b c 551 Lemma For each feasible basis B there exists exactly one simplex tableau and it is given by Q A1 B AN p A1 B b z0 cT BA1 B b and r cN cT BA1 B ANT It is neither necessary nor very useful to remember these formulas they are easily rederived if needed The proof is not very exciting and we write it more concisely than other parts of the text and we leave some details to a diligent reader We will proceed similarly with subsequent proofs of a similar kind Proof First let us see how these formulas can be discovered We rewrite the system Ax b to ABxB b ANxN and we multiply it by the inverse matrix A1 B from the left these transformations preserve the solution set which leads to xB A1 B b A1 B ANxN We substitute the righthand side for xB into the equation z cT x cT BxB cT NxN and we obtain z cT BA1 B b A1 B ANxN cT NxN cT BA1 B b cT N cT BA1 B ANxN Thus the formulas in the lemma do yield a simplex tableau and it remains to verify the uniqueness Let p Q r z0 determine a simplex tableau for a feasible basis B and let p Q r z 0 do as well Since each choice of xN determines xB uniquely the equality p QxN p QxN has to hold for all xN Rnm The choice xN 0 gives p p and if we substitute the unit vectors ej of the standard basis for xN one by one we also get Q Q The equalities z0 z 0 and r r are proved similarly 56 The Simplex Method in General Optimality Exactly as in the concrete example in Section 51 we have the following criterion of optimality of a simplex tableau If TB is a simplex tableau such that the coefficients of the nonbasic variables are nonpositive in the last row ie if r 0 then the corresponding basic feasible solution is optimal Indeed the basic feasible solution corresponding to such a tableau has the objective function equal to z0 while for any other feasible solution x we have xN 0 and cT x z0 rT xN z0 A pivot step who enters and who leaves In each step of the simplex method we go from an old basis B and simplex tableau TB to a new basis B and the corresponding simplex tableau TB A nonbasic variable xv enters the basis and a basic variable xu leaves the basis and hence B B u v We always select the entering variable xv first A nonbasic variable may enter the basis if and only if its coefficient in the last row of the simplex tableau is positive Only incrementing such nonbasic variables increases the value of the objective function Usually there are several positive coefficients in the last row and hence several possible choices of the entering variable For the time being the reader may think of this choice as arbitrary We will discuss ways of selecting one of these possibilities in Section 57 Once we decide that the entering variable is some xv it remains to pick the leaving variable The leaving variable xu has to be such that its nonnegativity together with the corresponding equation in the simplex tableau having xu on the lefthand side limits the increment of the entering variable xv most strictly Expressed by a formula this condition might look complicated because of some double indices but the idea is simple and we have already seen it in examples Let us write B k1 k2 km k1 k2 km and N ℓ1 ℓ2 ℓnm ℓ1 ℓ2 ℓnm Then the ith equation of the simplex tableau has the form xki pi j1nm qij xℓj We now want to write the index v of the chosen entering variable as v ℓβ In more detail we define β 1 2 nm as the index for which v ℓβ Similarly the index u of the leaving variable which hasnt been selected yet will be written in the form u kα Since all nonbasic variables xℓj j β should remain zero the nonnegativity condition xki 0 limits the possible values of the entering variable xℓβ by the inequality qiβ xℓβ pi If qiβ 0 then this inequality doesnt restrict the increase of xℓβ in any way while for qiβ 0 it yields the restriction xℓβ piqiβ The leaving variable xkα is thus always such that qαβ 0 and pαqαβ min piqiβ qiβ 0 i 1 2 m 53 That is in the simplex tableau we consider only the rows in which the coefficient of xv is negative In such rows we divide by this coefficient the component of the vector p we change sign and we seek a minimum among these ratios If there is no row with a negative coefficient of xv ie the minimum of the righthand side of equation 53 is over an empty set then the linear program is unbounded and the computation finishes For a proof that the simplex method really goes through a sequence of feasible bases we need the following lemma 561 Lemma If B is a feasible basis and TB is the corresponding simplex tableau and if the entering variable xv and the leaving variable xu have been selected according to the criteria described above and otherwise arbitrarily then B B u v is again a feasible basis If no xu satisfies the criterion for a leaving variable then the linear program is unbounded For all t 0 we obtain a feasible solution by substituting t for xv and 0 for all other nonbasic variables and the value of the objective function for these feasible solutions tends to infinity as t The proof is one of those not essential for a basic understanding of the material Proof sketch We first need to verify that the matrix AB is nonsingular This holds exactly if AB1 AB is nonsingular since we assume nonsingularity of AB The matrix AB agrees with AB in m 1 columns corresponding to the basic variable indices B u For the basic variable with index ki i α we get the unit vector ei in the corresponding column of AB1 AB The negative of the remaining column of the matrix AB1 AB occurs in the simplex tableau TB as the column of the entering variable xv since Q AB1 AN by Lemma 551 There is a nonzero number qαβ in row α corresponding to the leaving variable xu since we have selected 56 The Simplex Method in General 69 xu that way and the other columns of A1 B AB have 0 in that row Hence the matrix is nonsingular as claimed Next we need to check feasibility of the basis B Here we use the fact that the new basic feasible solution that for B can be written in terms of the old one and the nonnegativity of its basic variables are exactly those conditions that are used for choosing the leaving variable In practically the same way one can show the part of the lemma dealing with unbounded linear programs We omit further details A geometric view As we saw in Section 44 basic feasible solutions are vertices of the polyhedron of feasible solutions It is not hard to verify that a pivot step of the simplex method corresponds to a move from one vertex to another along an edge of the polyhedron where an edge is a 1dimensional face ie a segment connecting the considered vertices see Section 44 Degenerate pivot steps are an exception where we stay at the same vertex and only the feasible basis changes A vertex of an n dimensional convex polyhedron is generally determined by n of the bounding hyperplanes think of a 3dimensional cube say Degener acy can occur only if we have more than n of the bounding hyperplanes meeting at a vertex this happens for the 3dimensional regular octa hedron for example Organization of the computations Whenever we find a new feasible basis as above we could compute the new simplex tableau according to the formulas from Lemma 551 But this is never done since it is inefficient For hand calculation the new simplex tableau is computed from the old one We have already illustrated one possible approach in the examples We take the equation of the old tableau with the leaving variable xu on the left and in this equation we carry the entering variable xv over to the left and xu to the right The modified equation becomes the equation for xv in the new tableau The righthand side is then substituted for xv into all of the other equations including the one for z in the last row This finishes the construction of the new tableau In computer implementations of the simplex method the simplex tableau is typically not computed in full Rather only the basic com ponents of the basic feasible solution ie the vector p A1 B b and the matrix A1 B are maintained The latter allows for a fast computa tion of other entries of the simplex tableau when they are needed Let us note that for the optimality test and for selecting the entering vari able we need only the last row and for selecting the leaving variable we need only p and the column of the entering variable With respect to efficiency and numerical accuracy the explicit inverse A1 B is not 70 5 The Simplex Method the best choice and in practice it is often represented by an approxi mate LUfactorization of the matrix AB or by other devices that can easily be updated during a pivot step of the simplex method Since an efficient implementation of the simplex method is not among our main concerns we will not describe how these things are actually done This computational approach is called the revised simplex method For m considerably smaller than n it is usually much more efficient than maintaining all of the simplex tableau In particular Om2 arithmetic operations per pivot step are sufficient for maintaining an LUfactorization of AB as opposed to about mn operations required for maintaining the simplex tableau Computing an initial feasible basis If the given linear program has no obvious feasible basis we look for an initial feasible basis by the procedure indicated in Section 54 For a linear program in the usual equational form maximize cT x subject to Ax b and x 0 we first arrange for b 0 We multiply the equations with bi 0 by 1 Then we introduce m new variables xn1 through xnm and we solve the auxiliary linear program maximize xn1 xn2 xnm subject to A x b x 0 where x x1 xnm is the vector of all variables including the new ones and A A Im is obtained from A by appending the mm identity matrix to the right The original linear program is feasible if and only if every optimal solution of the auxiliary linear program satisfies xn1 xn2 xnm 0 Indeed it is clear that an optimal solution of the auxiliary linear program with xn1 xn2 xnm 0 yields a feasible solution of the original linear program Conversely any feasible solution of the original linear program provides a feasible solution of the auxiliary linear program that has the objective function equal to 0 and is thus optimal The auxiliary linear program can be solved by the simplex method di rectly since the new variables xn1 through xnm constitute an initial fea sible basis In this way we obtain some optimal solution If it doesnt satisfy xn1 xn2 xnm 0 we are donethe original linear program is infeasible Let us assume that the optimal solution of the auxiliary linear program has xn1 xn2 xnm 0 The simplex method always returns a basic feasible solution If none of the new variables xn1 through xnm are in the basis for the returned optimal solution then such a basis is then a feasible basis for the original linear program too and it allows us to start the simplex method 57 Pivot Rules 71 In some degenerate cases it may happen that the basis returned by the simplex method for the auxiliary linear program contains some of the vari ables xn1 xnm and such a basis cannot directly be used for the original linear program But this is a cosmetic problem only From the returned op timal solution one can get a feasible basis for the original linear program by simple linear algebra Namely the optimal solution has at most m nonzero components and their columns in the matrix A are linearly independent If these columns are fewer than m we can add more linearly independent columns and thus get a basis see the proof of Lemma 421 57 Pivot Rules A pivot rule is a rule for selecting the entering variable if there are several possibilities which is usually the case Sometimes there may also be more than one possibility for choosing the leaving variable and some pivot rules specify this choice as well but this part is typically not so important The number of pivot steps needed for solving a linear program depends substantially on the pivot rule See the example in Section 51 The problem is of course that we do not know in advance which choices will be good in the long run Here we list some of the common pivot rules By an improving variable we mean any nonbasic variable with a positive coefficient in the zrow of the simplex tableau in other words a candidate for the entering variable Largest coefficient Choose an improving variable with the largest co efficient in the row of the objective function z This is the original rule sug gested by Dantzig that maximizes the improvement of z per unit increase of the entering variable Largest increase Choose an improving variable that leads to the largest absolute improvement in z This rule is computationally more expensive than the largest coefficient rule but it locally maximizes the progress Steepest edge Choose an improving variable whose entering into the basis moves the current basic feasible solution in a direction closest to the direction of the vector c Written by a formula the ratio cT xnew xold xnew xold should be maximized where xold is the basic feasible solution for the current simplex tableau and xnew is the basic feasible solution for the tableau that would be obtained by entering the considered improving variable into the basis We recall that v v2 1 v2 2 v2 n12 vT v denotes the Euclidean length of the vector v and the expression uT vu v is the cosine of the angle of the vectors u and v The STEEPEST EDGE rule is a champion among pivot rules in practice According to extensive computational studies it is usually faster than all other pivot rules described here and many others An efficient approximate implementation of this rule is discussed in the glossary under the heading Devex BLANDS RULE Choose the improving variable with the smallest index and if there are several possibilities for the leaving variable also take the one with the smallest index BLANDS RULE is theoretically very significant since it prevents cycling as we will discuss in Section 58 RANDOM EDGE Select the entering variable uniformly at random among all improving variables This is the simplest example of a randomized pivot rule where the choice of the entering variable uses random numbers in some way Randomized rules are also very important theoretically since they lead to the current best provable bounds for the number of pivot steps of the simplex method 58 The Struggle Against Cycling As we have already mentioned it may happen that for some linear programs the simplex method cycles and theoretically this is the only possibility of how it may fail Such a situation is encountered very rarely in practice if at all and thus many implementations simply ignore the possibility of cycling There are several ways that provably avoid cycling One of them is the already mentioned BLANDS RULE We prove below that the simplex method never cycles if Blands rule is applied consistently Unfortunately regarding efficiency Blands rule is one of the slowest pivot rules and it is almost never used in practice Another possibility can be found in the literature under the heading lexicographic rule and here we only sketch it Cycling can occur only for degenerate linear programs Degeneracy may lead to ties in the choice of the leaving variable The lexicographic method breaks these ties as follows Suppose that we have a set S of row indices such that for all α S qαβ 0 and pα qαβ min pi qiβ qiβ 0 i 1 2 m In other words all indices in S are candidates for the leaving variable We then choose the index α S for which the vector qα1 qαβ qαnm qαβ 58 The Struggle Against Cycling 73 is the smallest in the lexicographic ordering We recall that a vector x Rk is lexicographically smaller than a vector y Rk if x1 y1 or if x1 y1 and x2 y2 etc in general if there is an index j k such that x1 y1 xj1 yj1 and xj yj Since the matrix A has rank m it can be checked that any two of those vectors indeed differ at some index and so we can resolve ties between any set S of rows The chosen row index determines the leaving variable It can be shown that under the lexicographic rule cycling is im possible In very degenerate cases the lexicographic rule can be quite costly since it may have to compute many components of the afore mentioned vectors before the ties can eventually be broken Geometrically the lexicographic rule has the following interpreta tion For linear programs in equational form degeneracy means that the set F of solutions of the system Ax b contains a point with more than n m zero components and thus it is not in general position with respect to coordinate axes The lexicographic rule has essentially the same effect as a wellchosen perturbation of the set F achieved by changing the vector b a little This brings F into general position and therefore resolves all ties while the optimal solution changes only by very little The lexicographic rule simulates the effects of a suitable infinitesimal perturbation Now we return to Blands rule 581 Theorem The simplex method with Blands pivot rule the entering variable is the one with the smallest index among the eligible variables and similarly for the leaving variable is always finite ie cycling is impossible This is a basic result in the theory of linear programming the duality theorem is an easy consequence for example Unfortunately the proof is somewhat demanding Its plot is simple though Assuming that there is a cycle we get a contradiction in the form of an auxiliary linear program that has an optimal solution and is unbounded at the same time Proof We assume that there is a cycle and we let the set F consist of the indices of all variables that enter and therefore also leave the basis at least once during the cycle We call these the fickle variables First we verify a general claim about cycling of the simplex method valid for any pivot rule Claim All bases encountered in the cycle yield the same basic feasible solu tion and all the fickle variables are 0 in it Proof of the claim Since the objective function never decreases it has to stay constant along the cycle Let B be a feasible basis encountered along the cycle let N 1 2 n B as usual and let B B u v be the next basis The only one among the nonbasic variables that may possibly change value in the pivot step from B to B is the entering variable xv all others remain nonbasic 74 5 The Simplex Method and thus 0 By the rule for selecting the entering variable the coefficient of xv in the zrow of the tableau T B ie in the vector r is strictly pos itive Since the objective function is given by z z0 rT xN we see that if xv became strictly positive the objective function would increase Hence the basic feasible solutions corresponding to B and B respectively agree in all components in N Since these components determine the remaining ones uniquely Proposition 422 the basic feasible solution does not change at all Finally since every fickle variable is nonbasic at least once during the cycle it has to be 0 all the time The claim is proved The first trick in the proof of Theorem 581 is to consider the largest index v in the set F Let B be a basis in the cycle just before xv enters and B another basis just before xv leaves and xu enters say Let p Q r z0 be the parameters of the simplex tableau T B and let p Q r z 0 be the parameters of T B We remark that neither B nor B has to be determined uniquely Next we use Blands rule to infer some properties of the tableaus T B and T B First we focus on the situation at B As in Section 56 we write B and N 1 2 n B as ordered sets B k1 k2 km k1 k2 km and N ℓ1 ℓ2 ℓnm ℓ1 ℓ2 ℓnm Since we have chosen v as the largest index in F and Blands rule requires v to be the smallest index of a candidate for entering the basis no other fickle variable is a candidate at this point Thus all fickle variables except for xv have nonpositive coefficients in the zrow of T B Expressed formally if β is the index such that v ℓβ we have rβ 0 and rj 0 for all j such that ℓj F v 54 Second we consider the tableau T B We write B k 1 k 2 k m N 1 2 n B ℓ 1 ℓ 2 ℓ nm we let α be the index of the leaving variable xv in B ie the one with k α v and we let β be the index of the entering variable xu in N ie the one with ℓ β u By the same logic as above xv is the only candidate for leaving the basis among all the basic fickle variables in T B Recalling the criterion 53 for leaving the basis we get that i α is the only i with k i F and q iβ 0 that minimizes the ratio p iq iβ Since p specifies the values of the basic variables and all fickle variables remain 0 during the cycle we have p i 0 for all i with k i F Consequently q αβ 0 and q iβ 0 for all i such that k i F v 55 The idea is now to construct an auxiliary linear program for which 54 proves that it has an optimal solution while 55 shows that it is unbounded This is a clear contradiction which rules out the initial assumption the existence of a cycle The auxiliary linear program is the following Maximize c T x subject to A x b xF v 0 xv 0 xN F 0 We want stress that here the variables xB F may assume any signs Optimality of the auxiliary linear program We let x be the basic feasible solution of our original linear program associated with the basis B Since xN 0 and xF 0 by the claim x is feasible for the auxiliary program Moreover for every x satisfying A x b the value of the objective function can be expressed as c T x z z0 r T xN For all feasible solutions x of the auxiliary linear program we have xℓj 0 if ℓj F v 0 if ℓj ℓβ v and so 54 implies rj xℓj 0 for all j such that ℓj F Together with xN F 0 we get r T xN 0 and hence z z0 for all feasible solutions of the auxiliary linear program It follows that x is an optimal solution of the auxiliary linear program Unboundedness of the auxiliary linear program By the claim at the beginning of the proof x is also the basic feasible solution of our original linear program associated with the basis B For all solutions x of A x b we have xB p Q xN 56 Let us now change xN by letting xu grow from its current value 0 to some value t 0 Using 56 this determines a new solution xt of A x b we will show that for all t 0 this solution is feasible for the auxiliary problem but that the objective function value c T xt tends to infinity as t Here are the details We set x ℓjt 0 if ℓj N u t if ℓj ℓβ u With xv 0 and t 0 56 and 55 together show that x kit x ki t qiβ 0 if ki F v 0 if ki kα v 76 5 The Simplex Method In particular xt is again feasible for the auxiliary linear program Since the variable xu xℓ β was a candidate for entering the basis B we know that r β 0 and hence cT xt z 0 rT xN t z 0 tr β for t This means that the auxiliary linear program is unbounded 59 Efficiency of the Simplex Method In practice the simplex method performs very satisfactorily even for large lin ear programs Computational experiments indicate that for linear programs in equational form with m equations it typically reaches an optimal solution in something between 2m and 3m pivot steps It was thus a great surprise when Klee and Minty constructed a linear program with n nonnegative variables and n inequalities for which the simplex method with Dantzigs original pivot rule largest coefficient needs exponentially many pivot steps namely 2n 1 The set of feasible solutions is an ingeniously deformed ndimens ional cube called the KleeMinty cube constructed in such a way that the simplex method passes through all of its vertices It is not hard to see that there is a deformed ndimensional cube with an xnincreasing path say through all vertices Instead of a formal description we il lustrate such a construction by pictures for dimensions 2 and 3 n 3 n 2 The deformed cube is inscribed in an ordinary cube in order to better convey the shape With some pivot rules the simplex method may traverse the path marked with a thick line The particular deformed 59 Efficiency of the Simplex Method 77 cube shown in the picture wont fool Dantzigs rule for which the orig inal example of Klee and Minty was constructed though A deformed cube that does fool Dantzigs rule looks more bizarre The direction of the objective function is drawn vertically The corre sponding linear program with n 3 variables is simple Maximize 9x1 3x2 x3 subject to x1 1 6x1 x2 9 18x1 6x2 x3 81 x1 x2 x3 0 It is instructive to see how after the standard conversion to equational form this linear program forces Dantzigs rule to go through all feasi ble bases Later on very slow examples of a similar type were discovered for many other pivot rules among them all the rules mentioned above Many people have tried to design a pivot rule and prove that the number of pivot steps is always bounded by some polynomial function of m and n but nobody has succeeded so far The best known bound has been proved for the following simple randomized pivot rule Choose a random ordering of the variables at the beginning of the computation in other words randomly permute the indices of the variables in the input linear program then use Blands rule for choosing the entering variable and the lexicographic method for choosing the leaving variable For every linear program with at most n variables and at most n constraints the expected number of pivot steps is bounded by eC n ln n where C is a not too large constant Here the expectation means the arithmetic average over all possible orderings of the variables This bound is considerably better than 2n say but much worse than a polynomial bound This algorithm was found independently and almost at the same time by Kalai and by Matouˇsek Sharir and Welzl For a recent treat ment in a somewhat broader context see B Gartner and E Welzl Explicit and implicit enforcingran domized optimization in Lectures of the Graduate Program 78 5 The Simplex Method Computational Discrete Mathematics Lecture Notes in Com puter Science 2122 2001 Springer Berlin etc pages 2649 A very good bound is not known even for the cleverest possible pivot rule let us call it the clairvoyants rule that would always se lect the shortest possible sequence of pivot steps leading to an optimal solution The Hirsch conjecture one of the famous open problems in mathematics claims that the clairvoyants rule always reaches opti mum in On pivot steps But the best result proved so far gives only the bound of n1ln n due to G Kalai and D Kleitman Quasipolynomial bounds for the diameter of graphs of polyhedra Bull Amer Math Soc 261992 315316 This is better than eC n ln n but still worse than any polynomial func tion of n and it doesnt provide a real pivot rule since nobody knows how to simulate clairvoyants decisions by an efficient algorithm Here is an approach that looks promising and has been tried more recently although without a clear success so far One tries to modify the given linear program in such a way that polynomiality of a suitable pivot rule for the modified linear program would be easier to prove and of course so that an optimal solution of the original linear program could easily be derived from an optimal solution of the modified linear program In spite of the KleeMinty cube and similar artificial examples the simplex method is being used successfully Remarkable theoret ical results indicate that these willful examples are rare indeed For instance it is known that if a linear program in equational form is generated in a suitable precisely defined way at random then the number of pivot steps is of order at most m2 with high probability More recent results in the general framework of the socalled smoothed complexity claim that if we take an arbitrary linear program and then we change its coefficients by small random amounts then the simplex method with a certain pivot rule reaches the optimum of the resulting linear program by polynomially many steps with high probability a concrete bound on the polynomial depends on a precise specification of the small random amounts of change The first theorem of this kind is due to Spielman and Teng and for recent progress see R Vershynin Beyond Hirsch conjecture Walks on ran dom polytopes and the smoothed complexity of the simplex method preprint 2006 An exact formulation of these results requires a number of rather technical notions that we do not want to introduce here and so we omit it 510 Summary 79 510 Summary Let us review the simplex method once again Algorithm SIMPLEX METHOD 1 Convert the input linear program to equational form maximize cT x subject to Ax b and x 0 with n variables and m equations where A has rank m see Section 41 2 If no feasible basis is available arrange for b 0 and solve the following auxiliary linear program by the simplex method Maximize xn1 xn2 xnm subject to A x b x 0 where xn1 xnm are new variables x x1 xnm and A A Im If the optimal value of the objective function comes out negative the original linear program is infeasible stop Otherwise the first n components of the optimal solution form a basic feasible solution of the original linear program 3 For a feasible basis B 1 2 n compute the simplex tableau T B of the form xB p Q xN z z0 rT xN 4 If r 0 in the current simplex tableau return an optimal solution p specifies the basic components while the nonbasic components are 0 stop 5 Otherwise select an entering variable xv whose coefficient in the vector r is positive If there are several possibilities use some pivot rule 6 If the column of the entering variable xv in the simplex tableau is non negative the linear program is unbounded stop 7 Otherwise select a leaving variable xu Consider all rows of the simplex tableau where the coefficient of xv is negative and in each such row divide the component of the vector p by that coefficient and change sign The row of the leaving variable is one in which this ratio is minimal If there are several possibilities decide by a pivot rule or arbitrarily if the pivot rule doesnt specify how to break ties in this case 8 Replace the current feasible basis B by the new feasible basis B u v Update the simplex tableau so that it corresponds to this new basis Go to Step 4 This is all we wanted to say about the simplex method here May your pivot steps lead you straight to the optimum and never cycle 6 Duality of Linear Programming 61 The Duality Theorem Here we formulate arguably the most important theoretical result about lin ear programs Let us consider the linear program maximize 2x1 3x2 subject to 4x1 8x2 12 2x1 x2 3 3x1 2x2 4 x1 x2 0 61 Without computing the optimum we can immediately infer from the first inequality and from the nonnegativity constraints that the maximum of the objective function is not larger than 12 because for nonnegative x1 and x2 we have 2x1 3x2 4x1 8x2 12 We obtain a better upper bound if we first divide the first inequality by two 2x1 3x2 2x1 4x2 6 An even better bound results if we add the first two inequalities together and divide by three which leads to the inequality 2x1 3x2 1 34x1 8x2 2x1 x2 1 312 3 5 and hence the objective function cannot be larger than 5 How good an upper bound can we get in this way And what does in this way mean Let us begin with the latter question From the constraints we are trying to derive an inequality of the form d1x1 d2x2 h where d1 2 d2 3 and h is as small as possible Then we can claim that for all x1 x2 0 we have 82 6 Duality of Linear Programming 2x1 3x2 d1x1 d2x2 h and therefore h is an upper bound on the maximum of the objective function How can we derive such inequalities We combine the three inequalities in the linear program with some nonnegative coefficients y1 y2 y3 nonnegativity is needed so that the direction of inequality is not reversed We obtain 4y1 2y2 3y3x1 8y1 y2 2y3x2 12y1 3y2 4y3 and thus d1 4y1 2y2 3y3 d2 8y1 y2 2y3 and h 12y1 3y2 4y3 How do we choose the best coefficients y1 y2 y3 We must ensure that d1 2 and d2 3 and we want h to be as small as possible under these constraints This is again a linear program Minimize 12y1 3y2 4y3 subject to 4y1 2y2 3y3 2 8y1 y2 2y3 3 y1 y2 y3 0 It is called the linear program dual to the linear program 61 we started with The dual linear program guards the original linear program from above in the sense that every feasible solution y1 y2 y3 of the dual linear program provides an upper bound on the maximum of the objective function in 61 How well does it guard Perfectly The optimal solution of the dual linear program is y 5 16 0 1 4 with objective function equal to 475 and this is also the optimal value of the linear program 61 which is attained for x 1 2 5 4 The duality theorem asserts that the dual linear program always guards perfectly Let us repeat the above considerations in a more general setting for a linear program of the form maximize cT x subject to Ax b and x 0 P where A is a matrix with m rows and n columns We are trying to combine the m inequalities of the system Ax b with some nonnegative coefficients y1 y2 ym so that the resulting inequality has the jth coefficient at least cj j 1 2 n and the righthand side is as small as possible This leads to the dual linear program minimize bT y subject to AT y c and y 0 D whoever doesnt believe this may write it in components In this context the linear program P is referred to as the primal linear program From the way we have produced the dual linear program D we obtain the following result 61 The Duality Theorem 83 611 Proposition For each feasible solution y of the dual linear program D the value bT y provides an upper bound on the maximum of the objective function of the linear program P In other words for each feasible solution x of P and each feasible solution y of D we have cT x bT y In particular if P is unbounded D has to be infeasible and if D is unbounded from below then P is infeasible This proposition is usually called the weak duality theorem weak because it expresses only the guarding of the primal linear program P by the dual linear program D but it doesnt say that the guarding is perfect The latter is expressed only by the duality theorem sometimes also called the strong duality theorem Duality theorem of linear programming For the linear programs maximize cT x subject to Ax b and x 0 P and minimize bT y subject to AT y c and y 0 D exactly one of the following possibilities occurs 1 Neither P nor D has a feasible solution 2 P is unbounded and D has no feasible solution 3 P has no feasible solution and D is unbounded 4 Both P and D have a feasible solution Then both have an optimal solution and if x is an optimal solution of P and y is an optimal solution of D then cT x bT y That is the maximum of P equals the minimum of D The duality theorem might look complicated at first encounter For un derstanding it better it may be useful to consider a simpler version called the Farkas lemma and discussed in Section 64 This simpler statement has several intuitive interpretations and it contains the essence of the duality theorem Proving the duality theorem which we will undertake in Sections 63 and 64 does take some work unlike a proof of the weak duality theorem which is quite easy The heart of the duality theorem is the equality cT x bT y in the fourth possibility ie for both P and D feasible 84 6 Duality of Linear Programming Since a linear program can be either feasible and bounded or fea sible and unbounded or infeasible there are 3 possibilities for P and 3 possibilities for D which at first sight gives 9 possible com binations for P and D The three cases P unbounded and D feasible bounded P unbounded and D unbounded and P feasible bounded and D unbounded are ruled out by the weak du ality theorem In the proof of the duality theorem we will rule out the cases P infeasible and D feasible bounded as well as P fea sible bounded and D infeasible This leaves us with the four cases listed in the duality theorem All of them can indeed occur Once again feasibility versus optimality In Chapter 1 we remarked that finding a feasible solution of a linear program is in general computation ally as difficult as finding an optimal solution There we briefly substantiated this claim using binary search The duality theorem provides a considerably more elegant argument The linear program P has an optimal solution if and only if the following linear program obtained by combining the constraints of P the constraints of D and an inequality between the objective func tions has a feasible solution Maximize cT x subject to Ax b AT y c cT x bT y x 0 y 0 the objective function is immaterial here and the variables are x1 xn and y1 ym Moreover for each feasible solution x y of the last linear program x is an optimal solution of the linear program P All of this is a simple consequence of the duality theorem 62 Dualization for Everyone The duality theorem is valid for each linear program not only for one of the form P we have only to construct the dual linear program properly To this end we can convert the given linear program to the form P using the tricks from Sections 11 and 41 and then the dual linear program has the form D The result can often be simplified for example the difference of two nonnegative variables can be replaced by a single unbounded variable one that may attain all real values Simpler than doing this again and again is to adhere to the recipe be low whose validity can be proved by the just mentioned procedure Let us assume that the primal linear program has variables x1 x2 xn among 62 Dualization for Everyone 85 which some may be nonnegative some nonpositive and some unbounded Let the constraints be C1 C2 Cm where Ci has the form ai1x1 ai2x2 ainxn bi The nonnegativity or nonpositivity constraints for the variables are not counted among the Ci The objective function cT x should be maximized Then the dual linear program has variables y1 y2 ym where yi cor responds to the constraint Ci and satisfies yi 0 yi 0 yi R if we have in Ci The constraints of the dual linear program are Q1 Q2 Qn where Qj corresponds to the variable xj and reads a1jy1 a2jy2 amjym cj if xj satisfies xj 0 xj 0 xj R The objective function is bT y and it is to be minimized Note that in the first part of the recipe from primal constraints to dual variables the direction of inequalities is reversed while in the second part from primal variables to dual constraints the direction is preserved Dualization Recipe Primal linear program Dual linear program Variables x1 x2 xn y1 y2 ym Matrix A AT Righthand side b c Objective function max cT x min bT y Constraints ith constraint has yi 0 yi 0 yi R xj 0 xj 0 xj R jth constraint has 86 6 Duality of Linear Programming If we want to dualize a minimization linear program we can first trans form it to a maximization linear program by changing the sign of the objective function and then follow the recipe In this way one can also find out that the rules work symmetrically there and back By this we mean that if we start with some linear program construct the dual linear program and then again the dual linear program we get back to the original primal linear program two consecutive dualizations cancel out In particular the linear programs P and D in the duality theorem are dual to each other A physical interpretation of duality Let us consider a linear program maximize cT x subject to Ax b According to the dualization recipe the dual linear program is minimize bT y subject to AT y c and y 0 Let us assume that the primal linear program is feasible and bounded and let n 3 We regard x as a point in threedimensional space and we interpret c as the gravitation vector it thus points downward Each of the inequalities of the system Ax b determines a half space The intersection of these halfspaces is a nonempty convex poly hedron bounded from below Each of its twodimensional faces is given by one of the equations aT i x bi where the vectors a1 a2 am are the rows of the matrix A but interpreted as column vectors Let us denote the face given by aT i x bi by Si not every inequality of the system Ax b has to correspond to a face and so Si is not necessarily defined for every i Let us imagine that the boundary of the polyhedron is made of cardboard and that we drop a tiny steel ball somewhere inside the polyhedron The ball falls and rolls down to the lowest vertex or possibly it stays on a horizontal edge or face Let us denote the re sulting position of the ball by x thus x is an optimal solution of the linear program In this stable position the ball touches several two dimensional faces typically 3 Let D be the set of i such that the ball touches the face Si For i D we thus have aT i x bi 62 Gravity exerts a force F on the ball that is proportional to the vector c This force is decomposed into forces of pressure on the faces touched by the ball The force Fi by which the ball acts on face Si is orthogonal to Si and it is directed outward from the polyhedron if we neglect friction see the schematic twodimensional picture below The forces acting on the ball are in equilibrium and thus F iD F i The outward normal of the face S i is a i hence F i is proportional to a i and for some nonnegative numbers yi we have iD yi a i c If we set yi 0 for i D we can write m i1 yi a i c or A T y c in matrix form Therefore y is a feasible solution of the dual linear program Let us consider the product yT A x b For i D the ith component of y equals 0 while for i D the ith component of A x b is 0 according to 62 So the product is 0 and hence yT b yT A x c T x We see that x is a feasible solution of the primal linear program y is a feasible solution of the dual linear program and c T x b T y By the weak duality theorem y is an optimal solution of the dual linear program and we have a situation exactly as in the duality theorem We have just physically verified a special threedimensional case of the duality theorem We remark that the dual linear program also has an economic interpretation The dual variables are called shadow prices in this context The interested reader will find this nicely explained in Chvátals textbook cited in Chapter 9 63 Proof of Duality from the Simplex Method The duality theorem of linear programming can be quickly derived from the correctness of the simplex method To be precise we will prove the following If the primal linear program P is feasible and bounded then the dual linear program D is feasible and bounded as well by weak duality with the same optimum value as the primal 88 6 Duality of Linear Programming Since the dual of the dual is the primal we may interchange P and D in this statement Together with our considerations about the possible cases after the statement of the duality theorem this will prove the theorem The key observation is that we can extract an optimal solution of the dual linear program from the final tableau We should recall though that proving the correctness of the simplex method and in particular the fact that one can always avoid cycling requires considerable work Let us consider a primal linear program maximize cT x subject to Ax b and x 0 P After a conversion to equational via slack variables xn1 xnm we arrive at the linear program maximize cT x subject to Ax b and x 0 where x x1 xnm c c1 cn 0 0 and A A Im If this last linear program is feasible and bounded then according to Theorem 581 the simplex method with Blands rule always finds some optimal solution x with a feasible basis B The first n components of the vector x constitute an optimal solution x of the linear program P By the optimality criterion we have r 0 in the final simplex tableau where r is the vector in the zrow of the tableau as in Section 55 The following lemma and the weak duality theorem Proposition 611 then easily imply the duality theorem 631 Lemma In the described situation the vector y cT B A1 B T is a feasible solution of the dual linear program D and the equality cT x bT y holds Proof By Lemma 551 x is given by x B A1 B b and x N 0 and so cT x cT x cT Bx B cT B A1 B b cT B A1 B b yT b bT y The equality cT x bT y thus holds and it remains to check the feasibility of y that is AT y c and y 0 The condition y 0 can be rewritten to Imy 0 and hence both of the feasibility conditions together are equivalent to AT y c 63 After substituting y cT B A1 B T the lefthand side becomes AT cT B A1 B T cT B A1 B AT Let us denote this nmcomponent vector by w For the basic components of w we have wB cT B A1 B ABT cT BImT cB and thus we even have equality in 63 for the basic components For the nonbasic components we have wN cBT AB1 ANT cN r cN since r cN cBT AB1 ANT by Lemma 551 and r 0 by the optimality criterion The lemma is proved 64 Proof of Duality from the Farkas Lemma Another approach to the duality theorem of linear programming consists in first proving a simplified version called the Farkas lemma and then substituting a skillfully composed matrix into it and thus deriving the theorem A nice feature is that the Farkas lemma has very intuitive interpretations Actually the Farkas lemma comes in several natural variants We begin by discussing one of them which has a very clear geometric meaning 641 Proposition Farkas lemma Let A be a real matrix with m rows and n columns and let b Rm be a vector Then exactly one of the following two possibilities occurs F1 There exists a vector x Rn satisfying Ax b and x 0 F2 There exists a vector y Rm such that yT A 0T and yT b 0 It is easily seen that both possibilities cannot occur at the same time Indeed the vector y in F2 determines a linear combination of the equations witnessing that Ax b cannot have any nonnegative solution All coefficients on the lefthand side of the resulting equation yT Ax yT b are nonnegative but the righthand side is negative The Farkas lemma is not exactly a difficult theorem but it is not trivial either Many proofs are known and we will present some of them in the subsequent sections The reader is invited to choose the best one according to personal taste A geometric view In order to view the Farkas lemma geometrically we need the notion of convex hull see Section 43 Further we define for vectors a1 a2 an Rm the convex cone generated by a1 a2 an as the set of all linear combinations of the ai with nonnegative coefficients that is as t1 a1 t2 a2 tn an t1 t2 tn 0 In other words this convex cone is the convex hull of the rays p1 p2 pn where pi t ai t 0 emanates from the origin and passes through the point ai 642 Proposition Farkas lemma geometrically Let a1 a2 an b be vectors in Rm Then exactly one of the following two possibilities occurs F1 The point b lies in the convex cone C generated by a1 a2 an 90 6 Duality of Linear Programming F2 There exists a hyperplane h passing through the point 0 of the form h x Rm yT x 0 for a suitable y Rm such that all the vectors a1 a2 an and thus the whole cone C lie on one side and b lies strictly on the other side That is yT ai 0 for all i 1 2 n and yT b 0 A drawing illustrates both possibilities for m 2 and n 3 0 a1 a2 a3 b C 0 a1 a2 a3 b C h F1 F2 To see that Proposition 641 and Proposition 642 really tell us the same thing it suffices to take the columns of the matrix A for a1 a2 an The existence of a nonnegative solution of Ax b can be reexpressed as b t1a1t2a2 tnan t1 t2 tn 0 and this says exactly that b C The equivalence of F2 and F2 hopefully doesnt need any further explanation This result is an instance of a separation theorem for convex sets Sepa ration theorems generally assert that disjoint convex sets can be separated by a hyperplane There are several versions depending on whether one re quires strict or nonstrict separation etc and several proof strategies Sepa ration theorems in infinitedimensional Banach spaces are closely related to the HahnBanach theorem one of the cornerstones of functional analysis In Section 65 we prove the Farkas lemma along these lines viewing it as a geometric separation theorem Variants of the Farkas lemma Proposition 641 provides an answer to the question When does a system of linear equalities have a nonnegative solution In part i of the following proposition we restate Proposition 641 in a slightly different but clearly equivalent form and in parts ii and iii we add two more variants of the Farkas lemma Part ii answers the question When does a system of linear inequalities have a nonnegative solution and part iii the question When does a system of linear inequalities have any solution at all 643 Proposition Farkas lemma in three variants Let A be a real matrix with m rows and n columns and let b Rm be a vector i The system Ax b has a nonnegative solution if and only if every y Rm with yT A 0T also satisfies yT b 0 ii The system Ax b has a nonnegative solution if and only if every nonnegative y Rm with yT A 0T also satisfies yT b 0 iii The system Ax b has a solution if and only if every nonnegative y Rm with yT A 0T also satisfies yT b 0 The three parts of Proposition 643 are mutually equivalent in the sense that any of them can easily be derived from any other Having three forms at our disposal provides more flexibility both for applying the Farkas lemma and for proving it The proof of the equivalence iiiiii is easy using the tricks familiar from transformations of linear programs to equational form We will take a utilitarian approach Since we will use ii in the proof of the duality theorem we prove only the implications iii and iiiii leaving the remaining implications to the reader Proof of iii In ii we need an equivalent condition for Ax b having a nonnegative solution To this end we form the matrix A A Im We note that Ax b has a nonnegative solution if and only if A x b has a nonnegative solution By i this is equivalent to the condition that all y with yT A 0T satisfy yT b 0 And finally yT A 0T says exactly the same as yT A 0T and y 0 and hence we have the desired equivalence Proof of iiiii Again we need an equivalent condition for Ax b having a nonnegative solution This time we form the matrix A and the vector b according to A A In b b 0 Then Ax b has a nonnegative solution if and only if A x b has any solution The latter is equivalent by iii to the condition that all y 0 with yT A 0T satisfy yT b 0 Writing y y y y a vector with m components we have y 0 yT A 0T exactly if y 0 yT yT A 0T and yT b yT b From this and our chain of equivalences we deduce that Ax b has a nonnegative solution if and only if all y 0 with yT A 0T satisfy yT b 0 and this is the statement of ii 92 6 Duality of Linear Programming Remarks A reader with a systematic mind may like to see the variants of the Farkas lemma summarized in a table The system Ax b The system Ax b has a solution x 0 iff y 0 yT A 0 yT b 0 yT A 0T yT b 0 has a solution x Rn iff y 0 yT A 0 yT b 0 yT A 0T yT b 0 We had three variants of the Farkas lemma but the table has four entries We havent mentioned the statement corresponding to the bottom right corner of the table telling us when a system of linear equations has any solution We havent mentioned it because it doesnt deserve to be called a Farkas lemmathe proof is a simple exercise in linear algebra and there doesnt seem to be any way of deriving the Farkas lemma from this variant along the lines of our previous reductions However we will find this statement useful in Section 66 where it will serve as a basis of a proof of a real Farkas lemma Let us note that similar to dualization for everyone we could also establish a unifying Farkas lemma for everyone dealing with a system containing both linear equations and inequalities and with some of the variables nonnegative and some unrestricted This would contain all of the four variants considered above as special cases but we will not go in this direction A logical view Now we explain yet another way of understanding the Farkas lemma this time variant iii in Proposition 643 We begin with something seemingly different namely deriving new linear inequalities from old ones From two given inequalities say 4x1 x2 4 and x1 x2 1 we can derive new inequalities by multiplying the first inequality by a positive real number α the second one by a positive real number β and adding the resulting inequalities together we must be careful so that both inequality signs have the same direction we have already used this many times For instance for α 3 and β 2 we derive the inequality 10x1 5x2 14 More generally if we start with a system of several linear inequalities of the form Ax b we can derive new inequalities by repeating this operation for various pairs which may involve both the original inequalities and new ones derived earlier So if we start with the system 4x1 x2 4 x1 x2 1 and 2x1 x2 3 we can first derive 10x1 5x2 14 from the first two as before and then we can add to this new inequality the third inequality multiplied by 5 In 64 Proof of Duality from the Farkas Lemma 93 this case both of the coefficients on the lefthand side cancel out and we get the inequality 0 1 This last inequality obviously never holds and so the original triple of inequalities cannot be satisfied by any x1 x2 R either as is easy to check using a picture The Farkas lemma turns out to be equivalent to the following statement Whenever a system Ax b of finitely many linear inequalities is in consistent that is there is no x Rn satisfying it we can derive the obviously inconsistent inequality 0 1 from it by the above proce dure A little thought reveals that each inequality derived by the procedure repeated combinations of pairs has the form yT Ax yT b for some non negative vector y Rm and thus equivalently we claim that whenever Ax b is inconsistent there exists a vector y 0 with yT A 0T and yT b 1 This is clearly equivalent to part iii of Proposition 643 The reader may wonder why we have bothered to consider repeated pairwise combinations of inequalities instead of using a single vector y specifying a combination of all of the inequalities right away The reason is that the pairwise formulation makes the statement more similar to a number of important and famous statements in various branches of mathematics In logic for example theorems are derived proved from axioms by repeated application of certain simple deriva tion rules In the firstorder propositional calculus there is a complete ness theorem Any true statement that is a statement valid in every model can be derived from the axioms by a finite sequence of steps using the appropriate derivation rules such as modus ponens In con trast the celebrated Godels first incompleteness theorem asserts that in Peano arithmetic as well as in any theory containing it there are statements that are true but cannot be derived In analogy to this we can view the inequalities of the original sys tem Ax b as axioms and we have a single derivation rule derive a new inequality from two existing ones by a positive linear combination as above Then the Farkas lemma tells us that any inconsistent system of axioms can be refuted by a suitable derivation This is a weak completeness theorem we could also consider a more general com pleteness theorem stating that whenever a linear inequality is valid for all x Rn satisfying Ax b then it can be derived from Ax b but we will not go into this here Such a completeness result means that the theory of linear inequalities is in a sense easy Moreover the simplex method or also the FourierMotzkin elimination consid ered in Section 67 provide ways to construct such a derivation This view makes the Farkas lemma a small cousin of various completeness theorems of logic and of other famous results such as Hilberts Nullstellensatz in algebraic geometry Computer science also frequently investigates the possibility of deriving some object from given initial objects by certain derivation rules say in the theory of formal languages Proof of the duality theorem from the Farkas lemma Let us assume that the linear program P has an optimal solution x As in the proof of the duality theorem from the simplex method we show that the dual D has an optimal solution as well and that the optimum values of both programs coincide We first define γ cT x to be the optimum value of P Then we know that the system of inequalities Ax b cT x γ 64 has a nonnegative solution but for any ε 0 the system Ax b cT x γ ε 65 has no nonnegative solution If we define an m1n matrix  and a vector bε Rm by  A cT bε b γ ε then 64 is equivalent to Âx b0 and 65 is equivalent to Âx bε Let us apply the variant of the Farkas lemma in Proposition 643ii For ε 0 the system Âx bε has no nonnegative solution so we conclude that there is a nonnegative vector ŷ u z Rm1 such that ŷT  0T but ŷT bε 0 These conditions boil down to AT u zc bT u zγ ε 66 Applying the Farkas lemma in the case ε 0 the system has a nonnegative solution we see that the very same vector ŷ must satisfy ŷT b0 0 and this is equivalent to bT u zγ It follows that z 0 since z 0 would contradict the strict inequality in 66 But then we may set v 1z u 0 and 66 yields AT v c bT v γ ε In other words v is a feasible solution of D with the value of the objective function smaller than γ ε By the weak duality theorem every feasible solution of D has value of the objective function at least γ Hence D is a feasible and bounded linear program and so we know that it has an optimal solution y Theorem 423 Its value bT y is between γ and γ ε for every ε 0 and thus it equals γ This concludes the proof of the duality theorem 65 Farkas Lemma An Analytic Proof 95 65 Farkas Lemma An Analytic Proof In this section we prove the geometric version of the Farkas lemma Propo sition 642 by means of elementary geometry and analysis We are given vectors a1 an in Rm and we let C be the convex cone generated by them ie the set of all linear combinations with nonnegative coefficients Proving the Farkas lemma amounts to showing that for any vector b C there exists a hyperplane separating it from C and passing through 0 In other words we want to exhibit a vector y Rm with yT b 0 and yT x 0 for all x C The plan of the proof is straightforward We let z be the point of C nearest to b in the Euclidean distance and we check that the vector y z b is as required see the following illustration 0 a1 a2 a3 b C z y The main technical part of the proof is to show that the nearest point z exists Indeed in principle it might happen that no point is the nearest for example such a situation occurs for the point 0 on the real line and the open interval 1 2 the interval contains points with distance to 0 as close to 1 as desired but no point at distance exactly 1 651 Lemma Let C be a convex cone in Rm generated by finitely many vectors a1 an and let b C be a point Then there exists a point z C nearest to b it is also unique but we wont need this Proof of Proposition 642 assuming Lemma 651 As announced we set y z b where z is a point of C nearest to b First we check that yT z 0 This is clear for z 0 For z 0 if z were not perpendicular to y we could move z slightly along the ray tz t 0 C and get a point closer to b More formally let us first assume that yT z 0 and let us set z 1 αz for a small α 0 We calculate z b2 y αzT y αz y2 2αyT z α2z2 We have 2αyT z α2z2 for all sufficiently small α 0 and thus z b2 y2 z b2 This contradicts z being a nearest point The case yT z 0 is handled similarly To verify yT b 0 we recall that y 0 and we compute 0 yT y yT z yT b yT b Next let x C x z The angle bzx has to be at least 90 degrees for otherwise points on the segment zx sufficiently close to z would lie closer To show formally that LPi has an optimal solution we first argue that it has a feasible solution Indeed by the assumption the system Aix leq mathbfbi has at least one solution Let us fix an arbitrary solution of this system and denote it by overlinemathbfx We put overlinemathbfz aiT overlinemathbfx bi and we note that the vector overlinemathbfx overlinemathbfz is a feasible solution of the linear program LPi Next we note that LPi is also bounded since Ax leq mathbfb has no solution Therefore the linear program has an optimal solution ildemathbfxi ildemathbfzi with ildemathbfzi 0 by Theorem 423 We claim that the just defined ildemathbfxi satisfies Ai ildemathbfxi mathbfbi We already know that Ai ildemathbfxi leq mathbfbi Let us suppose for contradiction that ajT ildemathbfxi bj varepsilon for some j eq i and varepsilon 0 We will show that then ildemathbfxi ildemathbfzi cannot be optimal for LPi To this end let us consider an optimal solution ildemathbfxj ildemathbfzj of LPj The idea is that by moving the point ildemathbfxi ildemathbfzi slightly towards ildemathbfxj ildemathbfzj we remain feasible for LPi but we improve the objective function of LPi More formally for a real number t geq 0 we define ildemathbfxt 1 t ildemathbfxi t ildemathbfxj It follows that ajT ildemathbfxt leq bj 1 t varepsilon t ildezj aiT ildemathbfxt leq bi 1 t ildezi akT ildemathbfxt leq bk ext for all k eq i j Thus for t sufficiently small namely for 0 leq t leq 1 t varepsilon ildezj the pair ildemathbfxt 1 t ildezi is a feasible solution of LPi with objective function strictly smaller than ildezi contradicting the assumed optimality of ildemathbfxi ildezi Thus Ai ildemathbfxi mathbfbi and the lemma is proved square We need another lemma that proves an easy variant of the Farkas lemma concerned with arbitrary solutions of systems of equalities This lemma establishes the implication in the bottom right corner of the table of Farkas lemma variants on page 92 662 Lemma The system Ax mathbfb has a solution if and only if every mathbfy in mathbbRm with mathbfyT A mathbf0T also satisfies mathbfyT mathbfb 0 Proof One direction is easy If Ax mathbfb has some solution ildemathbfx and if mathbfyT A mathbf0T then 0 mathbf0T ildemathbfx mathbfyT A ildemathbfx mathbfyT mathbfb If Ax mathbfb has no solution we need to find a vector mathbfy such that mathbfyT A mathbf0T and mathbfyT mathbfb eq 0 Let us define r operatornamerankA and consider the m imes n1 matrix A mid mathbfb This matrix has rank r 1 since the last column is not a linear combination of the first n columns For the very same reason the matrix left beginarraycc A mathbfb mathbf0T 1 endarray right 96 6 Duality of Linear Programming to b than z equivalently b zT x z 0 this is similar to the above argument for yT z 0 and we leave a formal verification to the reader Thus 0 bzT xz yT xyT z yT x The Farkas lemma is proved It remains to prove Lemma 651 We do it in several steps and each of them is an interesting little fact in itself 652 Lemma Let X Rm be a nonempty closed set and let b Rm be a point Then X has at least one point nearest to b Proof This is simple but it needs basic facts about compact sets in Rd Let us fix an arbitrary x0 X let r x0 b and let K x X x b r Clearly if K has a nearest point to b then the same point is a point of X nearest to b Since K is the intersection of X with a closed ball of radius r it is closed and bounded and hence compact We define the function f K R by fx x b Then f is a continuous function on a compact set and any such function attains a minimum that is there exists z K with fz fx for all x K Such a z is a point of K nearest to b So it remains to prove the following statement 653 Lemma Every finitely generated convex cone is closed This lemma is not as obvious as it might seem As a warning example let us consider a closed disk D in the plane with 0 on the boundary Then the cone generated by D that is the set tx x D is an open halfplane plus the point 0 and thus it is not closed Of course this doesnt contradict to the lemma but it shows that we must use the finiteness somehow Let us define a primitive cone in Rm as a convex cone generated by some k m linearly independent vectors Before proving Lemma 653 we deal with the following special case 654 Lemma Every primitive cone P in Rm is closed Proof Let P0 Rk be the cone generated by the vectors e1 ek of the standard basis of Rk In other words P0 is the nonnegative orthant and its closedness is hopefully beyond any doubt for example it is the intersection of the closed halfspaces xi 0 i 1 2 k Let the given primitive cone P Rm be generated by linearly independent vectors a1 ak We define a linear mapping f Rk Rm by fx x1a1 x2a2 xkak This f is injective by the linear independence of the aj and we have P fP0 So it suffices to prove the following claim The image P fP0 of a closed set P0 under an injective linear map f Rk Rm is closed To see this we let L fRk be the image of f Since f is injective it is a linear isomorphism of Rk and L A linear isomorphism f has a linear inverse map g f1 L o mathbbRk Every linear map between Euclidean spaces is continuous this can be checked using a matrix form of the map and we have P g1P0 The preimage of a closed set under a continuous map is closed by definition while the image of a closed set under a continuous map need not be closed in general so P is closed as a subset of L Since L is closed in mathbbRm being a linear subspace we get that P is closed as desired square Lemma 653 is now a consequence of Lemma 654 of the fact that the union of finitely many closed sets is closed and of the next lemma 655 Lemma Let C be a convex cone in mathbbRm generated by finitely many vectors a1an Then C can be expressed as a union of finitely many primitive cones Proof For every mathbfx in C we are going to verify that it is contained in a primitive cone generated by a suitable set of linearly independent vectors among the ai We may assume mathbfx eq mathbf0 since mathbf0 is the primitive cone generated by the empty set of vectors Let I subseteq 12 ldots n be a set of minimum possible size such that mathbfx lies in the convex cone generated by AI ai i in I this is a standard trick in linear algebra and in convex geometry That is there exist nonnegative coefficients alphai i in I with mathbfx sumi in I alphai ai The alphai are even strictly positive since if some alphai 0 we could delete i from I We now want to show that the set AI is linearly independent For contradiction we suppose that there is a nontrivial linear combination sumi in I betai ai mathbf0 where not all betai are 0 Then there exists a real t such that all the expressions alphai t betai are nonnegative and at least one of them is zero To see this we can first consider the case that some betai is strictly positive we start with t 0 we let it grow and see what happens The case of a strictly negative betai is analogous with t decreasing from the initial value 0 Then the equation mathbfx sumi in I alphai t betai ai expresses mathbfx as a linear combination with positive coefficients of fewer than I vectors square 66 Farkas Lemma from Minimally Infeasible Systems Here we derive the Farkas lemma from an observation concerning minimally infeasible systems A system Ax leq mathbfb of m inequalities is called minimally infeasible if the system has no solution but every subsystem obtained by dropping one inequality does have a solution 98 6 Duality of Linear Programming 661 Lemma Let Ax b be a minimally infeasible system of m inequal ities and let Aix bi be the subsystem obtained by dropping the ith inequality i 1 2 m Then for every i there exists a vector xi such that Aixi bi Let us set ai ai1 ai2 ain and write the ith inequality as aT i x bi Here is an illustration for an example in the plane n 2 with m 3 inequalities x1 x2 x3 aT 1 x b1 aT 2 x b2 aT 3 x b3 Proof We consider the linear program minimize z subject to Ax b zei LPi where ei is the ith unit vector The idea of LPi is to translate the halfspace x aT i x bi by the minimum amount necessary to achieve feasibility For the example illustrated above and i 3 this results in the following picture x1 x2 x3 aT 1 x b1 aT 2 x b2 aT 3 x b3 aT 3 x b3 z3 67 Farkas Lemma from the FourierMotzkin Elimination 103 This situation can be reached by multiplying each inequality in Ax leq mathbfb by a suitable positive number which does not change the set of solutions Now we can eliminate x1 between all pairs of ceilings and floors by simply adding up the two inequalities for each pair If mathbfx is the possibly empty vector x2 ldots xn and mathbfai is the possibly empty vector ai2 ldots ain then the following inequalities are implied by Ax leq mathbfb mathbfajprime T mathbfx mathbfakprime T mathbfx leq bj bk quad j in C k in F 611 The level inequalities of Ax leq mathbfb can be rewritten as mathbfaellT mathbfx leq bell quad ell in L 612 So if Ax leq mathbfb has a solution then the system of C cdot F L inequalities in n 1 variables given by 611 and 612 has a solution as well Conversely if the latter system has a solution ildemathbfx ildex2 ldots ildexn we can determine a suitable value ildex1 such that the vector ildex1 ildex2 ldots ildexn solves Ax leq mathbfb To find ildex1 we first observe that 611 is equivalent to mathbfakprime T mathbfx bk leq bj mathbfajprime T mathbfx quad j in C k in F This in particular implies maxk in F mathbfakprime T ildemathbfx bk leq min j in C bj mathbfajprime T ildemathbfx We let ildex1 be any value between these bounds It follows that ildex1 mathbfajprime T ildemathbfx leq bj quad j in C ildex1 mathbfakprime T ildemathbfx leq bk quad k in F By our assumption 610 we have a feasible solution of the original system Ax leq mathbfb We note that this argument also works for C emptyset or F emptyset with the usual convention that maxt in emptyset ft infty and mint in emptyset ft infty Now we can prove the Farkas lemma The variant that results most naturally from the FourierMotzkin elimination is as in Section 66 the one with an arbitrary solution of Ax leq mathbfb that is Proposition 643iii Proof of Proposition 643iii One direction is easy If Ax leq mathbfb has some solution ildemathbfx and mathbfy geq mathbf0 satisfies mathbfyT A mathbf0T we get 0 mathbf0T ildemathbfx mathbfyT A ildemathbfx leq mathbfyT mathbfb If Ax leq mathbfb has no solution then our task is to construct a vector mathbfy satisfying mathbfy geq mathbf0 quad mathbfyT A mathbf0T quad extand quad mathbfyT mathbfb 0 613 100 6 Duality of Linear Programming has rank r1 This shows that the row vector 0T 1 is a linear combination of rows of A b and the coefficients of this linear combination define a vector y Rm with yT A 0T and yT b 1 as desired Now we proceed to the proof of the Farkas lemma The variant that results most naturally is the one with an arbitrary solution of Ax b that is Proposition 643iii Proof of Proposition 643iii As in Lemma 662 one direction is easy If Ax b has some solution x and if y 0 yT A 0T we get 0 0T x yT Ax yT b The interesting case is that Ax b has no solution Our task is then to construct a vector y 0 satisfying yT A 0T and yT b 0 We may assume that Ax b is minimally infeasible by restricting to a suitable subsystem A vector y for this subsystem can be extended to work for the original system by inserting zeros at appropriate places Since Ax b has no solution the system Ax b has no solution either By Lemma 662 there exists a vector y Rm such that yT A 0T and yT b 0 By possibly changing signs we may assume that yT b 0 We will show that this vector also satisfies y 0 and this will finish the proof To this end we fix i 1 2 m and consider the vector xi as in Lemma 661 above With the terminology of the lemma we have Aixi bi and using yT A 0T we can write yiaT i xi bi yT Axi b yT b 0 Proposition 643iii is proved This proof of the Farkas lemma is based on the paper M Conforti M Di Summa and G Zambelli Minimally in feasible set partitioning problems with balanced constraints Mathematics of Operations Research to appear The proof given there is even more elementary than ours in the sense that it does not use linear programming We have chosen the linear programming approach since we find it somewhat more transparent 67 Farkas Lemma from the FourierMotzkin Elimination When explaining the logical view of the Farkas lemma in Section 64 we started with a system of 3 inequalities and combined pairs of inequalities together until we managed to eliminate all variables and obtained the obvi ously unsatisfiable inequality 0 1 The FourierMotzkin elimination is a 67 Farkas Lemma from the FourierMotzkin Elimination 101 systematic procedure for eliminating all variables from an arbitrary system Ax b of linear inequalities If the final inequalities with no variables hold we can reconstruct a solution of the original system by tracing the computa tions backward and if one of the final inequalities does not hold it certifies that the original system has no solution The FourierMotzkin elimination is similar in spirit to Gaussian elimina tion for systems of linear equations and it is just as simple As in Gaussian elimination variables are removed one at a time but there is a price to pay To get rid of one variable we typically have to introduce many new inequal ities so that the method becomes impractical already for moderately large systems The FourierMotzkin elimination can be considered as a simple but inefficient alternative to the simplex method For the purpose of proving statements about systems of inequalities efficiency is not a concern so it is the simplicity of the FourierMotzkin elimination that makes it a very handy tool As an example let us consider the following system of 5 inequalities in 3 variables 2x 5y 4z 10 3x 6y 3z 9 5x 10y z 15 x 5y 2z 7 3x 2y 6z 12 67 In the first step we would like to eliminate x For a moment let us imagine that y and z are some fixed real numbers and let us ask under what conditions we can choose a value of x such that together with the given values y and z it satisfies 67 The first three inequalities impose an upper bound on x while the remaining two impose a lower bound To make this clearer we rewrite the system as follows x 5 5 2y 2z x 3 2y z x 3 2y 1 5z x 7 5y 2z x 4 2 3y 2z So given y and z the admissible values of x are exactly those in the interval from max75y2z 4 2 3y2z to min5 5 2y2z 32yz 32y 1 5z If this interval happens to be empty there is no admissible x So the inequality max7 5y 2z 4 2 3y 2z min5 5 2y 2z 3 2y z 3 2y 1 5z 68 is equivalent to the existence of x that together with the considered y and z solves 67 The key observation in the FourierMotzkin elimination is that 68 can be rewritten as a system of linear inequalities in the variables y and z The inequalities simply say that each of the lower bounds is less than or equal to each of the upper bounds 102 6 Duality of Linear Programming 7 5y 2z 5 5 2y 2z 7 5y 2z 3 2y z 7 5y 2z 3 2y 1 5z 4 2 3y 2z 5 5 2y 2z 4 2 3y 2z 3 2y z 4 2 3y 2z 3 2y 1 5z If we rewrite this system in the usual form Ax b we arrive at 5 2y 2 3y z 4 7y 11 5 z 4 11 6 y 4z 9 4 3y 3z 7 8 3y 9 5z 7 69 This system has a solution exactly if the original system 67 has one but it has one variable fewer The reader is invited to continue with this example eliminating y and then z We note that 69 gives 4 upper bounds for y and 2 lower bounds and hence we obtain 8 inequalities after eliminating y For larger systems the number of inequalities generated by the Fourier Motzkin elimination tends to explode This wasnt so apparent for our small example but if we have m inequalities and say half of them impose upper bounds on the first variable and half impose lower bounds then we get about m24 inequalities after eliminating the first variable about m416 after elimi nating the second variable again provided that about half of the inequalities give upper bounds for the second variable and half lower bounds etc Now we formulate the procedure in general Claim Let Ax b be a system with n 1 variables and m inequalities There is a system Ax b with n 1 variables and at most maxm m24 inequalities with the following properties i Ax b has a solution if and only if Ax b has a solution and ii each inequality of Ax b is a positive linear combination of some inequalities from Ax b Proof We classify the inequalities into three groups depending on the coefficient of x1 We call the ith inequality of Ax b a ceiling if ai1 0 and we call it a floor if ai1 0 Otherwise if ai1 0 it is a level Let C F L 1 m collect the indices of ceilings floors and levels We may assume that ai1 1 if i C 1 if i F 0 if i L 610 104 6 Duality of Linear Programming To find such a witness of infeasibility we use induction on the number of variables Let us first consider the base case in which the system Ax b has no variables meaning that it is of the form 0 b with bi 0 for some i We set y ei the ith unit vector and this clearly satisfies the requirements for y being a witness of infeasibility the condition yT A 0T is vacuous since A has no column If Ax b has at least one variable we perform a step of the Fourier Motzkin elimination This yields an infeasible system Ax b consisting of the inequalities 611 and 612 Because the latter system has one variable fewer we inductively find a witness of infeasibility y for it We recall that all inequalities of Ax b are positive linear combinations of original inequal ities equivalently there is an m m matrix M with all entries nonnegative and 0 A MA b Mb We claim that y M T y is a witness of infeasibility for the original system Ax b Indeed we have yT A yT MA yT 0 A 0T and yT b yT Mb yT b 0 since y is a witness of infeasibility for Ax b The condition y 0 follows from y 0 by the nonnegativity of M 7 Not Only the Simplex Method Tens of different algorithms have been suggested for linear programming over the years Most of them didnt work very well and only very few have turned out as serious competitors to the simplex method the historically first al gorithm But at least two methods raised great excitement at the time of discovery and they are surely worth mentioning The first of them the ellipsoid method cannot compete with the simplex method in practice but it had immense theoretical significance It is the first linear programming algorithm for which it was proved that it always runs in polynomial time which is not known about the simplex method up to the present and for many pivot rules it is not even true The second is the interior point method or rather we should say interior point methods since it is an entire group of algorithms For some of them a polynomial bound on the running time has also been proved but moreover these algorithms successfully compete with the simplex method in practice It seems that for some types of linear programs the simplex method is better while for others interior point methods are the winners Let us remark that several other algorithms closely related to the simplex method are used for linear programming as well The dual simplex method can roughly be described as the simplex method applied to the dual linear program But details of the implementa tion which are crucial for the speed of the algorithm in practice are somewhat different The dual simplex method is particularly suitable for linear programs that in equational form have n m significantly smaller than m The primaldual method goes through a sequence of feasible so lutions of the dual linear program To get from one such solution to the next it does not perform a pivot step but it solves an auxiliary problem that may be derived from the primal linear program or by other means This greater freedom can be useful for instance in ap proximation algorithms for combinatorial optimization problems A little more about the dual simplex method and the primaldual method can be found in the glossary 71 The Ellipsoid Method The ellipsoid method was invented in 1970 by Shor Judin and Nemirovski as an algorithm for certain nonlinear optimization problems In 1979 Leonid Khachyian outlined in a short note how linear programs can be solved by this method in provably polynomial time The world press made a sensation out of this since the journalists contorted the result and presented it as an unprecedented breakthrough in practical computational methods giving the Soviets a technological edge over the West 1 However the ellipsoid method has never been interesting for the practice of linear programmingKhachyians discovery was indeed extremely significant but for the theory of computational complexity It solved an open problem that many people had attacked in vain for many years The solution was conceptually utterly different from previous approaches which were mostly variations of the simplex method Input size and polynomial algorithms In order to describe what we mean by a polynomial algorithm for linear programming we have to define the input size of a linear program Roughly speaking it is the total number of bits needed for writing down the input to a linear programming algorithm First we define the bit size of an integer i as i log2i 1 1 which is the number of bits of i written in binary including one bit for the sign For a rational number r ie a fraction r pq the bit size is defined as r p q For an ncomponent rational vector v we put v ni1vi 1 The following quotation from E L Lawler The Great Mathematical Sputnik of 1979 Math Intelligencer 21980 191198 which is a remarkable article about the history of Khachyians result is not only of historical interest The Times story appears to have been based on certain unshakable preconceptions of its writer Malcolm W Browne Browne called George Dantzig of Stanford University a great pioneering authority on linear programming and tried to force him into various admissions Dantzigs version of the interview bears repeating What about the traveling salesman problem asked Browne If there is a connection I dont know what it is said Dantzig The Russian discovery proposed an approach for solving a class of problems related to the Traveling Salesman Problem reported Browne What about cryptography asked Browne If there is a connection I dont know what it is said Dantzig The theory of codes could eventually be affected reported Browne Is the Russian method practical asked Browne No said Dantzig Mathematicians describe the discovery as a method by which computers can find solutions to a class of very hard problems that has hitherto been attacked on a hitormiss basis reported Browne and similarly A mi1nj1aij for a rational mn matrix A If we consider a linear program L say in the form maximize cTx subject to Ax b and if we restrict ourselves to the case of A b and c rational which is a reasonable assumption from a computational perspective then the bit size of L is L A b c We say that an algorithm is a polynomial algorithm for linear programming if a polynomial px exists such that for every linear program L with rational A b and c the algorithm finds a correct solution in at most pL steps The steps are counted in some of the usual models of computation for example as steps of a Turing machine usually the chosen computational model is not crucial whatever is polynomial in one model is also polynomial in other reasonable models We stress right away that a single arithmetic operation is not counted as a single step here We count as steps operations with single bits and hence addition of two kbit integers requires at least k steps Let us digress briefly from linear programming and let us consider Gaussian elimination a wellknown algorithm for solving systems of linear equations For a system Ax b where for simplicity A is an nn matrix and both A and b are rational we naturally define the input size as Ab Is Gaussian elimination a polynomial algorithm This is a tricky question Although this algorithm needs only order of n3 arithmetic operations the catch is that too large intermediate values could come up during the computation even if all entries in A and in b are small integers If for example integers with as many as 2n bits ensued which can indeed happen in a naive implementation of Gaussian elimination the computation would need exponentially many steps although it would involve only n3 arithmetic operations All of this concerns exact computations while many implementations use floatingpoint arithmetic and hence the numbers are continually rounded But then there is no guarantee that the results are correct We do not want to scare the reader needlessly It is known how Gaussian elimination can be implemented in polynomial time We want only to point out that this is not selfevident and not too simple either and call attention to one kind of trouble that may develop in attempts at proving polynomiality The ellipsoid method as well as some of the interior point methods are polynomial while the simplex method with Blands rule and with many other pivot rules too is not polynomial2 2 The nonpolynomiality is proved by means of the KleeMinty cube see Section 59 One has to check that an ndimensional KleeMinty cube can be represented by input of size polynomial in n Strongly polynomial algorithms For algorithms whose input is described by a sequence of integers or rationals such as algorithms for linear programming the number of arithmetic operations addition subtraction multiplication division exponentiation is also considered together with the number of bit operations This often gives a more realistic picture of the running time because contemporary computers usually execute an arithmetic operation as an elementary step provided that the operands are not too large A suitable implementation of Gaussian elimination is on the one hand a polynomial algorithm in the sense discussed above and on the other hand the number of arithmetic operations is bounded by a polynomial namely by the polynomial C n3 for a suitable constant C where n is the number of equations in the system and also the number of variables The number of arithmetic operations thus depends only on n and it is the same for input numbers with 10 bits as for input numbers with a million bits We say that Gaussian elimination is a strongly polynomial algorithm for solving systems of linear equations A strongly polynomial algorithm for linear programming would be one that first would be polynomial in the sense defined above and second for every linear program with n variables and m constraints it would find a solution using at most pm n arithmetic operations where px is a fixed polynomial But no strongly polynomial algorithm for linear programming is known and finding one is a major open problem The ellipsoid method is not strongly polynomial For every natural number M one can find a linear program with only 2 variables and 2 constraints for which the ellipsoid method executes at least M arithmetic operations the coefficients in such linear programs must have bit size tending to infinity as M In particular the number of arithmetic operations for the ellipsoid method cannot be bounded by any polynomial in m n Ellipsoids A twodimensional ellipsoid is an ellipse plus its interior An ellipsoid in general can most naturally be introduced as an affine transformation of a ball We let Bn x Rn xTx 1 be the ndimensional ball of unit radius centered at 0 Then an ndimensional ellipsoid is a set of the form E Mx s x Bn where M is a nonsingular nn matrix and s Rn is a vector The mapping x Mx s is a composition of a linear function and a translation this is called an affine map By manipulating the definition we can describe the ellipsoid by an inequality E y Rⁿ M¹y s Bⁿ y Rⁿ y sᵀM¹ᵀM¹y s 1 y Rⁿ y sᵀQ¹y s 1 where we have set Q MMT It is well known and easy to check that such a Q is a positive definite matrix that is a symmetric square matrix satisfying xᵀQx 0 for all nonzero vectors x Conversely from matrix theory it is known that each positive definite matrix Q can be factored as Q MMT for some nonsingular square matrix M Therefore an equivalent definition is that an ellipsoid is a set described by 71 for some positive definite Q and some s Geometrically s is the center of the ellipsoid E If Q is a diagonal matrix and s 0 then we have an ellipsoid in axial position of the form y Rⁿ y₁²q₁₁ y₂²q₂₂ yₙ²qₙₙ 1 The axes of this ellipsoid are parallel to the coordinate axes The numbers q₁₁ q₂₂ qₙₙ are the lengths of the semiaxes of the ellipsoid E which may sound familiar to those accustomed to the equation of an ellipse of the form x²a² y²b² 1 As is taught in linear algebra in connection with eigenvalues each positive definite matrix Q can be diagonalized by an orthogonal basis change That is there exists an orthogonal matrix T such that the matrix TQT¹ is diagonal with the eigenvalues of Q on the diagonal Geometrically T represents a rotation of the coordinate system that brings the ellipsoid into axial position A lion in the Sahara A traditional mathematical anecdote gives directions for hunting a lion in the Sahara under the assumption that there is at most one We fence all of the Sahara we divide it into two halves by another fence and we detect one half that has no lion in it Then we divide the other half by a fence and we continue in this manner until the fenced piece of ground is so small that the lion cannot move and so it is caught or if there is no lion in it we have proved that there was none in the Sahara either Although the qualities of this hunting guide can be disputed for us it is essential that it gives a reasonably good description of the ellipsoid method But in the real ellipsoid method we insist that the currently fenced piece is always an ellipsoid even at the price that the lion can sometimes return to places from where it was expelled earlier it is only guaranteed that the area of its territory shrinks all the time The ellipsoid method doesnt directly solve a linear program but rather it seeks a solution of a system of linear inequalities Ax b But as we know 110 7 Not Only the Simplex Method this is sufficient for solving a linear program see Section 61 For a simpler exposition we will first consider the following softened version of the problem Together with the matrix A and vector b we are given rational numbers R ε 0 We assume that the set P x Rn Ax b is contained in the ball B0 R centered at 0 with radius R If P contains a ball of radius ε then the algorithm has to return a point y P However if P contains no ball of radius ε then the algorithm may return either some y P or the answer NO SOLUTION The ball B0 R thus plays the role of the Sahara and we assume that the lion if present is at least ε large If there is only a smaller lion in the Sahara it may escape or we may catch itwe dont care Under these assumptions the ellipsoid method generates a sequence of ellipsoids E0 E1 Et where P Ek for each k as follows 1 Set k 0 and E0 B0 R 2 Let the current ellipsoid Ek be of the form Ek y Rn y skT Q1 k ysk 1 If sk satisfies all inequalities of the system Ax b return sk as a solution stop 3 Otherwise choose an inequality of the system that is violated by sk Let it be the ith inequality so we have aT i sk bi Define Ek1 as the ellipsoid of the smallest possible volume containing the halfellipsoid Hk Ek x Rn aT i x aT i sk see the following picture Ek Ek1 sk sk1 aT i x aT i sk aT i x bi Hk 4 If the volume of Ek1 is smaller than the volume of a ball of radius ε return NO SOLUTION stop Otherwise increase k by 1 and continue with Step 2 Let H k denote the intersection of the ellipsoid Ek with the halfspace x Rn aT i x bi defined by the ith inequality of the system If P Ek then also P Hₖ and the more so P Hₖ Why is the smallest ellipsoid containing Hₖ taken for Eₖ₁ instead of the smallest ellipsoid containing Hₖ Purely in order to simplify the analysis of the algorithm since the equation of Eₖ₁ comes out less complicated this way The ellipsoid Eₖ₁ ie the ellipsoid of the smallest volume containing Hₖ is always determined uniquely For illustration we mention that it is given by sₖ₁ sₖ 1n 1 Qsₖ sₖᵀQsₖ Qₖ₁ n²n² 1 Qₖ 2n 1 QsₖsₖᵀQ sₖᵀQsₖ We also leave without proof a fact crucial for the proof of correctness and efficiency of the ellipsoid method We always have volumeEₖ₁ volumeEₖ e12n2 Hence the volume of the ellipsoid Eₖ is at least ek2n2 times smaller than the volume of the initial ball B0 R Since the volume of an ndimensional ball is proportional to the nth power of the radius for k satisfying R ek2n2 ε the volume of Eₖ is smaller than that of a ball of radius ε Such k provides an upper bound of n2n 2 lnRε on the maximum number of iterations This is bounded by a polynomial in n R ε So much for the simple and beautiful idea of the ellipsoid methodnow we are coming to manageable but unpleasant complications First of all we cannot compute the ellipsoid Eₖ₁ exactly at least not in rational arithmetic since the defining formulas contain square roots To get around this Eₖ₁ is computed only approximately with a suitable precision But one has to be careful so that P is still guaranteed to be contained in Eₖ₁ and thus the approximate Eₖ₁ has to be expanded slightly Another trouble arises for example when the same inequality is used for cutting the current ellipsoid in many iterations in a row Then the ellipsoids may become too long needlelike and they have to be shortened artificially Yet another problem is that we dont really want to solve the softened problem with R and ε but an arbitrary system of linear inequalities without any restrictions Here the bound on the bit size of the entries of A and b comes into play through the following facts E1 If a solution exists then there is a not too large solution Let φ A b denote the input size for the system Ax b Then this system has a solution if and only if the system Ax b K x₁ K K x₂ K K xₙ K has a solution where K 2φ Clearly all solutions of the latter system are contained in the ball B0 R where R Kn E2 If a solution exists then the solution set of a slightly relaxed system contains a small ball Let us put η 25φ ε 26φ and let η be the ncomponent vector with all components equal to η Then the system Ax b has a solution if and only if the system Ax b η has a solution and in such case the solution set of the latter system contains a ball of radius ε It is not hard to see how these facts can be used for solving an arbitrary system Ax b by the ellipsoid method Instead of this system we solve the softened problem by the ellipsoid method but for a new system Ax b η K η x₁ K η K η x₂ K η K η xₙ K η where K R ε and η are chosen suitably first we add the constraints K xᵢ K as in E1 and then we apply E2 to the resulting system It is important that the bit size of R and ε as well as the input size of the new system are bounded by a polynomial function of φ Thus the ellipsoid method runs in polynomial time and it always finds a solution of Ax b if it exists We will not prove facts E1 and E2 here but we sketch the basic ideas For E1 we first discuss the case n 2 in the plane Let us consider a system of m inequalities aᵢ₁x aᵢ₂y bᵢ i 1 2 m Let ℓᵢ be the line x y R² aᵢ₁x aᵢ₂y bᵢ It is easy to calculate that the intersection of ℓᵢ and ℓⱼ if it exists has coordinates aᵢ₂bⱼ aⱼ₂bᵢ aᵢ₂aⱼ₁ aᵢ₁aⱼ₂ aⱼ₁bᵢ aᵢ₁bⱼ aᵢ₂aⱼ₁ aᵢ₁aⱼ₂ If for example all aᵢⱼ and bᵢ are integers with absolute value at most 1000 then the coordinates of all intersections are fractions with numerators and denominators bounded by 2 10⁶ in absolute value Thus if the solution set of the considered system of inequalities has at least one vertex such a vertex has to lie in the square 2 10⁶ 2 10⁶² If the solution set has no vertex and it is nonempty it can be shown that it has to contain one of the lines ℓᵢ and that each ℓᵢ intersects 71 The Ellipsoid Method 113 the justmentioned square Fact E1 can be verified along these lines for the considered system with two variables For a general system in dimension n the idea is the same and Cramers rule and a bound on the magnitude of the determinant of a matrix are used for estimating the coordinates of vertices of the solution set Fact E2 requires more work but the idea is similar Each solution x of the original system Ax b also satisfies the modified system Ax b η and all x from the ball Bx ε satisfy it as well because changing x by ε cannot change any coordinate of Ax by more than η If Ax b has no solution then by a suitable variant of the Farkas lemma namely Proposition 643iii there exists a nonnegative y Rm such that yT A 0T and yT b 0 and by normalizing y we may assume yT b 1 By Cramers rule again it is shown that there also exists a y with not too large components and such y then witnesses unsolvability for the system Ax b η as well Here we finish the outline of the ellipsoid method If some parts were too incomplete and hazy for the reader we can only recommend a more extensive treatment for instance in the excellent book M Grotschel L Lovasz L Schrijver Geometric Algorithms and Combinatorial Optimization 2nd edition Springer Hei delberg 1994 We have taken the Sahara metaphor from there among others Why ellipsoids They are used in the ellipsoid method since they constitute probably the simplest class of ndimensional convex sets that is closed under nonsingular affine maps Popularly speaking this class is rich enough to approximate all convex polyhedra including flat ones and needlelike ones If desired ellipsoids can be replaced by sim plices for example but the formulas in the algorithm and its analysis become considerably more unpleasant than those for ellipsoids The ellipsoid method need not know all of the linear pro gram The system of inequalities Ax b can also be given by means of a separation oracle This is an algorithm black box that accepts a point s Rn as input and if s is a solution of the system it returns the answer YES while if s is not a solution it returns one arbitrary inequality of the system that is violated by s Such an inequality sep arates s from the solution set and hence the name separation oracle The ellipsoid method calls the separation oracle with the centers sk of the generated ellipsoids and it always uses the violated inequality returned by the oracle for determining the next ellipsoid We talk about this since a separation oracle can be implemented efficiently for some interesting optimization problems even when the full system has exponentially many inequalities or even infinitely many so far we havent considered infinite systems at all 114 7 Not Only the Simplex Method Probably the most important example of a situation in which an infinite system of linear inequalities can be solved by the ellipsoid method is semidefinite programming In a semidefinite program we consider not an unknown vector x but rather an unknown square matrix X xijn ij1 We optimize a linear function of the variables xij The constraints are linear inequalities and equations for the xij plus the requirement that the matrix X has to be positive semidefinite The last constraint distinguishes semidefinite programming from lin ear programming It can be expressed by a system of infinitely many linear inequalities namely aT Xa 0 for each a Rn A separa tion oracle for this system can be constructed based on algorithms for computing eigenvalues and eigenvectors The ellipsoid method can then approximate the optimum with a prescribed precision in poly nomial time In reality though things are not quite as simple as it might seem from our minidescription Numerous computational problems can be solved in polynomial time via semidefinite programming some of them exactly and some approximately and sometimes this yields the only known polynomial algorithm A nice example of application of semidefinite programming is the maximum cut problem MAXCUT in which the vertex set of a given graph G V E should be divided into two parts so that the maximum possible number of edges go between the parts Semidefinite programming is an essential component of an approxi mation algorithm for MAXCUT called the GoemansWilliamson al gorithm that always computes a partition with the number of edges going between the parts at least 878 of the optimal number This is the best known approximation guarantee and most likely also the best possible one for any polynomial algorithm More about this and related topics can be found for instance in the survey L Lovasz Semidefinite programs and combinatorial opti mization in Recent Advances in Algorithms and Combina torics B Reed and C LinharesSales editors pages 137 194 Springer New York 2003 Let us remark that in the just outlined applications the ellipsoid method can be replaced by certain interior point methods the so called volumetriccenter methods which are not mentioned in our brief discussion of interior point methods in Section 72 below and this yields algorithms efficient both in theory and in practice See K Krishnan T Terlaky Interior point and semidefinite ap proaches in combinatorial optimization in D Avis A Hertz and O Marcotte editors Graph Theory and Combinatorial Optimization Springer Berlin etc 2005 pages 101158 72 Interior Point Methods 115 Theory versus practice The notion of polynomial algorithm was suggested in the 1970s by Jack Edmonds as a formalized counterpart of the intuitive notion of an efficient algorithm Today a theoreticians first question for every algorithm is Is it polynomial How is it possible that the ellipsoid method which is polynomial is much slower in practice than the simplex method which is not poly nomial One of the reasons is that even though the ellipsoid method is polynomial the degree of the polynomial is quite high The second and main reason is that the simplex method is slow only on artifi cially constructed linear programs which it almost never encounters in practice while the ellipsoid method seldom behaves better than in the worst case But the good behavior on all inputs with rare ex ceptions of the simplex method seems hard to capture theoretically Moreover a guaranteed efficiency for all inputs is much more satis factory than only an empirically supported belief that an algorithm is usually fast The notion of polynomial algorithm thus has great shortcomings from a practical point of view But attempts at constructing a poly nomial algorithm in theory usually also leads over time to practically efficient algorithms An impressive example in the area of linear pro gramming are interior point methods 72 Interior Point Methods The next time linear programming made it to press headlines was in 1984 Narendra Karmakar a researcher at IBM suggested an algorithm that is named after him and belongs to the large family of interior point methods He proved its polynomiality and published results of computational experi ments suggesting that in practice it is much faster than the simplex method Although his statements in the latter direction turned out to be somewhat exaggerated interior point methods are nowadays commonly used in linear programming and often they beat the simplex method especially on very large linear programs They are also applied with success to semidefinite programming and other important classes of optimization problems such as convex quadratic programming Interior point methods have been used for nonlinear optimization prob lems at least since the 1950s For linear programs they were actually tested by the early 1970s and interestingly none was found competitive to the simplex method This was because theory and hardware were not advanced enough indeed interior point methods typically outperform the simplex methods only on problems so large that they were beyond the capabilities of the computers at that time and moreover efficient implementation of interior point meth ods relies on powerful routines for solving large but sparse systems of linear 116 7 Not Only the Simplex Method equations which were not available either The success story began only with Karmakars results The basic approach When solving a linear program the simplex method crawls along the boundary of the set of feasible solutions The ellipsoid method encircles the set of feasible solutions and up until the last step it remains outside of it Interior point methods walk through the interior of the set of feasible solutions toward an optimum carefully avoiding the boundary Only at the very end when they get almost to an optimum they jump to an exact optimum by a rounding step See the following schematic picture c Working with an interior point all the time is the key idea that gave the methods their name Interior points possess various pleasant mathematical properties a kind of nondegeneracy and they allow one to avoid intricacies of the combinatorial structure of the boundary which have been haunting research of the simplex method for decades The art of interior point methods is how to keep away from the boundary while still progressing to the optimum For the sake of exposition let us now assume that we are dealing with a linear program in which some initial interior point is available There are several rather different basic approaches in interior point meth ods and each has many variants Interior point methods in linear program ming are classified as central path methods or central trajectory methods potential reduction methods and affine scaling methods and for almost every approach one can consider a primal version a dual version a primaldual version or a selfdual version Here we will consider only central path methods which have been com putationally the most successful We will present a single algorithm from this family one with the best known theoretical complexity bounds and very good practical performance It is fair to say though that a number of differ ent interior point methods yield the same theoretical bounds and that many practitioners might say that from their point of view there are even better algorithms The analysis of the algorithm as well as some important details of its implementation are somewhat complicated and we will not discuss them Modern linear programming software is based on quite advanced mathemat ics for example as was remarked above one of the keys to the success of interior point methods is an efficient solver of special systems of linear equations much more sophisticated than good old Gaussian elimination In addition to the existence of innumerable variants of interior point methods in the literature different expositions of the same algorithm may be based on different intuition For example what is presented as a step of Newtons iteration method in one source may be derived by linear approximation in another source We say this so that a reader who finds in the literature something seemingly rather different from what we say below is not confused more than necessary The central path First we explain the mathematical notion of central path To this end we consider an arbitrary convex polyhedron P in Rn defined by a system Ax b of m linear inequalities and a linear objective function fx cT x as usual We introduce a family of auxiliary objective functions fμ depending on a parameter μ0 fμx cT x μ i1m lnbi ai x where ai is the ith row of the matrix A regarded here as a row vector Thus f0 f is the original linear objective function while the fμ for μ 0 are nonlinear due to the logarithms The function fμ is constructed in such a way that when x approaches the boundary of P ie the difference of the righthand side and lefthand side of some inequality approaches 0 then fμ tends to The expression following μ in the definition of fμ is called a barrier function or more definitely a logarithmic barrier The word barrier is more fitting for a minimization problem in our case minimizing fμ where the graph of the objective function has barriers preventing minimization algorithms from hitting the walls We thus consider the auxiliary problem of maximizing fμx over P for given μ 0 Since fμ is undefined on the boundary of P we actually maximize over intP the interior of P and so for the problem to make sense we need to assume that intP If we assume that moreover P is bounded then fμ attains a maximum at a unique point in the interior of P which we denote by xμ Indeed the existence of a maximum follows from the wellknown fact that every continuous function attains a maximum on a compact set The appropriate compact set is x intP fμx fμx0 where x0 intP is arbitrary a little argument which we leave to the reader is needed to verify that this set is closed As for uniqueness let us assume that fμ attains a maximum at two distinct feasible points x and y Then fμx fμy and since fμ is easily seen to be concave meaning that fμ is convex it follows 118 7 Not Only the Simplex Method that fµ has to be constant on the segment xy Since the logarithm is strictly concave this can happen only if Ax Ay But then P would contain all of the line x sy x s R and would not be boundeda contradiction In short the maximum is unique because fµ is strictly concave The condition of P bounded can be relaxed somewhat but some condition forcing fµ to be bounded above on P is clearly necessary as is documented by P x R1 x1 0 and either c 1 or c 0 If µ is a very large number the influence of the term cT x in fµ is negligible and xµ is a point farthest from the boundary called the analytic center of P The following picture shows for a twodimensional P contour lines of the function fµ for µ 100 the vector c is depicted by an arrow and the point xµ by a dot3 On the other hand for small µ the point xµ is close to an optimum of cT x see the illustrations below for µ 05 and µ 01 The central path is defined as the set xµ µ 0 We stress that the central path is not associated with P itself but rather with a particular system of inequalities defining P and a particular linear objective function cT x 3 A contour line of a function f R2 R is a set of the form f 1α α R The idea of central path methods is to start at xμ with a suitable large μ and then follow the central path decreasing μ until an optimum of cT x is reached Computing xμ exactly would be difficult and so actual algorithms follow the central path only approximately A linear program in equational form and the primaldual central path Having introduced the concept of central path in a geometrically transparent way we change the setting slightly It turns out that for the actual algorithm it is better to replace general inequality constraints by equations and nonnegativity constraints this is similar to the case of the simplex method So now we consider the usual linear program in equational form maximize cT x subject to Ax b x 0 72 where A is an mn matrix of rank m Here the barrier function should prevent violation of the nonnegativity constraints while the equations only restrict everything to a subspace of Rn and they do not enter the objective function So we set fμx cT x μ j1n ln xj and consider the auxiliary problem maximize fμx subject to Ax b x 0 where the notation x 0 means that all coordinates of x are strictly positive We would again like to claim that under suitable conditions the auxiliary problem has a unique maximizer xμ for every μ 0 Obviously we need to assume that there is a feasible x 0 Also once we make sure that fμ attains at least one maximum it is easily seen as above that the maximum is unique We now derive necessary conditions for the existence of a maximum and at the same time we express xμ as a solution of a system of equations Later on we will check that the necessary conditions are also sufficient We derive these conditions by the method of Lagrange multipliers from analysis We recall that this is a general method for maximization of fx subject to m constraints g1x 0 g2x 0 gmx 0 where f and g1 gm are functions from Rn to R It can be seen as a generalization of the basic calculus trick for maximizing a univariate function by seeking a zero of its derivative It introduces the following system of equations with unknowns x Rn and y Rm the yi are auxiliary variables called the Lagrange multipliers g1x g2x gmx 0 and fx i1m yi gix 73 Here denotes the gradient which by convention is a row vector Other variants of algorithms following the central path may use different distance notions In a typical iteration of the algorithm we have the current μ and x y s such that cdistμx s is sufficiently small concretely one can take the smallness condition to be cdistμx s 2 Then we decrease μ slightly in the considered algorithm we replace μ by 1 12n μ For this new μ we again want to approximate the solution xμ yμ sμ to 74 sufficiently closely Since μ changed by only a little we expect that the x y s from the previous iteration will be good initial guesses in other words in order to get xμ yμ sμ we need to change x y s by only a little Let us denote the required changes by Δx Δy and Δs respectively So we look for Δx Δy Δs such that 74 is satisfied by x Δx y Δy and s Δs that is Ax Δx b ATy Δy s Δs c s1 Δs1x1 Δx1 sn Δsnxn Δxn μ1 x Δx 0 s Δs 0 Using the fact that x y s satisfy the linear equations in 74 exactly that is Ax b and AT y s c the first two lines simplify to AΔx 0 and AT Δy Δs 0 So far things were exact but now we make a heuristic step Since Δx and Δs are supposedly small compared to s and x we will neglect the secondorder products Δxj Δsj in the equation system from the third line We will thus approximate the required changes in x s y by a solution to the following system AΔx 0 AT Δy Δs 0 s1 Δx1 x1 Δs1 sn Δxn xn Δsn μ1 s1 x1 sn xn The unknowns are Δx Δy Δs while x y s are regarded as constant Hence this is a system of linear equations and it is the system whose fast solution is a computational bottleneck of this interior point method For an actual computation the system can still be simplified by algebraic manipulations but this is not our concern here In general we also need to worry about the positivity of x Δx and s Δs but in the algorithm considered below luckily it turns out that this is satisfied automatically we will comment on this later It can be shown that passing from x s y to x Δx y Δy s Δs with Δx Δy Δs given by 76 can be regarded as a step of the Newton iterative method for solving the system 74 We are ready to describe the algorithm The input consists of a real mn matrix A of rank m vectors b Rm c Rn and a real ε 0 which is a fx fxx1 fxx2 fxxn That is f is a vector function from Rn to Rn whose ith component is the partial derivative of f with respect to xi Thus the equation fx i1m yi gix stipulates the equality of two ncomponent vectors The method of Lagrange multipliers tells us that a maximum of fx subject to g1x g2x gmx 0 occurs at x satisfying 73 that is there exists y such that the considered x and this y together fulfill 73 a special case of this result is derived in Section 87 Of course we need some conditions on f and the gi It suffices to require that f and the gi be defined on a nonempty open subset of Rn and have continuous first partial derivatives there and this will be obviously satisfied in our simple application We apply the method of Lagrange multipliers to maximizing fμx subject to Ax b the nonnegativity constraints are taken care of implicitly by the barriers So we set gix bi ai x Then after a little manipulation the system 73 becomes Ax b c μ1x1 1x2 1xn AT y A more convenient form of this system is obtained by introducing an auxiliary nonnegative vector s μ 1x1 1x2 1xn Rn We rewrite the relation of s and x to s1 x1 s2 x2 sn xn μ1 with 1 denoting the vector of all 1s Then xμ is expressed as the xpart of a solution of the following system with unknowns x s Rn and y Rm Ax b AT y s c s1 x1 s2 x2 sn xn μ1 x s 0 74 All of these equations are linear except for s1 x1 s2 x2 sn xn μ1 Although we have derived the system 74 assuming μ 0 let us make a small digression and look at what 74 tells us for μ 0 Then for nonnegative x and s the equation s1 x1 s2 x2 sn xn 0 is equivalent to sT x 0 and since s AT y c we have 0 sT x yT Ax cT x yT b cT x using Ax b This may remind one of the equality of objective functions for the primal and dual linear programs And indeed if we take 72 as a primal linear program the dual is minimize bT y subject to AT y c y Rm 75 Then 74 for μ 0 tells us exactly that x is a feasible solution of the primal linear program 72 y is a feasible solution of the dual linear program 75 here the sj serve as slack variables expressing the difference AT y c and the objective functions are equal Hence such x and y are optimal So far we have shown that if the problem of maximizing fμx subject to Ax b and x 0 has a maximum at x then there exist s 0 and y Rm such that x y s satisfy 74 Next we formulate conditions for the existence of the maximum we prove only their sufficiency but it can be shown that they are also necessary and we show that under these conditions the maximum is characterized by 74 721 Lemma Let us suppose that the linear program 72 has a feasible solution 𝑥 0 and that the dual linear program 75 has a feasible solution 𝑦 such that the slack vector 𝑠 AT 𝑦 c satisfies 𝑠 0 Less formally both the primal and dual linear programs have an interior feasible point Then for every μ 0 the system 74 has a unique solution x xμ y yμ s sμ and xμ is the unique maximizer of fμ subject to Ax b and x 0 Proof Let μ 0 be fixed We begin with the following claim Claim Under the assumptions of the lemma the set Q x Rn Ax b x 0 fμx fμ𝑥 is bounded Proof of the claim We have fμx cT x μ j1n ln xj cT x 𝑦T b Ax μ j1n ln xj since Ax b cT 𝑦T A x 𝑦T b μ j1n ln xj 𝑠T x 𝑦T b μ j1n ln xj since AT 𝑦 𝑠 c 𝑦T b j1n μ ln xj 𝑠j xj The first term of the last line is a constant and the rest is a sum of univariate functions Each of these univariate functions is of the form hαx μ ln x αx with μ α 0 Elementary calculus shows that hαx attains a unique maximum at x μ α and in particular it is bounded from above Moreover for every constant C the set x 0 hαx C is bounded Setting K fμ𝑥 𝑦T b we have Q x 0 j1n h𝑠jxj K j1n x 0 h𝑠jx K ij maxz R h𝑠iz The last set is a Cartesian product of bounded intervals and the claim is proved So the set Q is bounded and it is not hard to check that it is closed Hence the continuous function fμ attains a maximum on it which as we know is unique This shows that fμ attains a maximum under the assumptions of the lemma and by means of Lagrange multipliers we have shown that this maximum yields a solution of 74 It remains to verify that this is the only solution of 74 What we do is to show that for every solution 𝑥 𝑦 𝑠 of 74 𝑥 maximizes fμ we note that 𝑠 and 𝑦 are uniquely determined by 𝑥 through the relations sj xj μ and AT y s c from 74 using the assumption that A has full rank Let 𝑥 𝑦 𝑠 be a solution of 74 and let x satisfy Ax b and x 0 Exactly as above we can express fμx 𝑦T b i1m μ ln xj 𝑠j xj and the righthand side is maximized by setting xj μ 𝑠j that is for x 𝑥 The lemma is proved The set xμ yμ sμ R2nm μ 0 is called the primaldual central path of the linear program 72 and this is actually what the algorithm will follow approximately The algorithm The algorithm for solving the linear program 72 maintains current vectors x s Rn and y Rm with x 0 and s 0 that satisfy all of the linear equations in 74 that is Ax b and AT y s c This makes sense since it is the quadratic equations sj xj μ that make the problem complicated and moreover these are the only equations in 74 in which μ enters We are still postponing the question of obtaining the initial x y s The current x y s will in general fail to satisfy the conditions sj xj μ since we follow the primaldual central path only approximately We need to quantify by how much they fail to satisfy them and one suitable centrality measure turns out to be cdistμx s ρs1 x1 μ ρs2 x2 μ ρsn xn μ where is the Euclidean norm of a vector and ρa μ aμ μa This may look a little arbitrary but the important thing is that it works 1 Set μ 1 and initialize xys so that Ax b ATy s c x 0 s 0 and cdistμxs 2 2 Main loop While μ ε repeat Steps 3 and 4 As soon as μ ε return x as an approximately optimal solution and stop 3 Replace μ with 1 12n μ 4 Newton step Compute Δx Δy Δs as the unique solution of the linear system 76 Replace x by x Δx y by y Δy and s by s Δs Go to the next iteration of the main loop 72 Interior Point Methods 125 functions based on their first derivatives One can also employ higherorder methods where the approximation is done by multi variate polynomials based on higherorder derivatives Truncated Newton steps In the algorithm described above luck ily it can be shown that making the full Newton step ie going from x y s to xx yy ss cannot leave the feasible re gion For other algorithms this need not be guaranteed and then one chooses a parameter α 0 1 in each iteration and moves only to x αx y αy s αs where α is determined so as to maintain feasibility or also depending on other considerations Longstep methods Decreasing µ by the factor 1 1 2n is a care ful shortstep strategy designed so that we do not move too far along the central path and a small change again brings x y s close enough to the new point of the central path In practice it seems advantageous to make longer steps ie to decrease µ more significantly For example some algorithms go from µ to 1 10µ or so There are even adaptive algorithms where the new µ is not given by an explicit formula that asymptotically achieve quadratic con vergence that is for µ sufficiently small a single steps goes from µ to const µ2 After such a large change of µ it is in general not sufficient to make a single Newton step Rather one iterates Newton steps with µ fixed until the current x and s get sufficiently close to the central path The theoretical analysis becomes more difficult for longstep meth ods and for some of the practically most successful such algo rithms no reasonable theoretical bounds are known Initialization It remains to say how the first step of the algorithm finding the initial x y s can be realized There are several approaches Here we discuss one of them an elegant method called a selfdual embedding The idea is this Given an input linear program we set up another auxiliary linear program with the following properties P1 The auxiliary linear program is always feasible and bounded and there is a simple explicitly specified vector lying on its cen tral path from which the above pathfollowing algorithm can be started P2 From the optimal solution of the auxiliary linear program found by the algorithm we can read off an optimal solution of the original linear program or conclude that the original linear program is infeasible or unbounded We develop the auxiliary linear program in several steps Here things come out more nicely if we start with the original linear pro 126 7 Not Only the Simplex Method gram in an inequality form Maximize cT x subject to Ax b x 0 77 As we have noted at the end of Section 61 the duality theorem implies that 77 is feasible and bounded if and only if the following system has a solution Ax b AT y c bT y cT x 0 x y 0 the first inequality is the feasibility of x the second inequality is the feasibility of y for the dual linear program and the third inequality forces equality of the primal and dual objective functions Now comes a small trick We introduce a new scalar variable τ 0 we will use Greek letters to denote such standalone scalar variables and we multiply the righthand sides by it In this way we obtain a homoge neous system called the GoldmanTucker system Ax τb 0 AT y τc 0 bT y cT x 0 x y 0 τ 0 GTS This system being homogeneous always admits the zero solution but this is uninteresting Interesting solutions which allow us to solve the original linear program are those with τ 0 or ρ 0 where ρ ρx y cT x bT y denotes the slack in the last inequality This is because of the following lemma 722 Lemma No solution of GTS has both τ and ρ nonzero and exactly one of the following possibilities always occurs i There is a solution x y τ with τ 0 and ρ 0 in which case 1 τ x is an optimal solution of the primal linear program 77 and 1 τ y is an optimal solution of the dual ii There is a solution x y τ with ρ 0 and τ 0 in which case the primal linear program 77 is infeasible or unbounded Proof sketch By weak duality it is immediate that any solution with τ 0 has ρ 0 and yields a pair of optimal solutions Conversely by the strong duality theorem any pair x y of optimal solutions provides a solution of GTS with τ 1 Hence ρ 0 implies τ 0 and infeasibility or unboundedness of the primal linear program It remains to check that infeasibility of the primal linear program or of the dual linear program gives a solution of GTS with ρ 0 Let us assume for example that the primal linear program is in feasible the dual case is analogous that is Ax b has no nontrivial nonnegative solution Then by the Farkas lemma Proposition 641 there is a y with ATy 0 and bTy 0 and then x 0 y y τ 0 is a solution to GTS with ρ bTy 0 As a side remark we note that applying the Farkas lemma to the GoldmanTucker system yields an alternative derivation of the duality theorem from the Farkas lemma To boost the moral we note that we have achieved something like P2 By means of solutions to GTS of a special kind we can solve the original linear program as well as deal with its infeasibility or unboundedness But how do we compute a solution with ρ 0 or τ 0 avoiding the trivial zero solution Luckily interior point methods are very suitable for this since they converge to a most generic optimal solution of the given linear program Roughly speaking the interior point algorithm described above converges to the analytic center of the set of all optimal solutions and this particular optimal solution does not satisfy any constraint inequality or nonnegativity constraint with equality if this can be avoided at all In particular if we made GTS into a linear program by adding a suitable objective function a most generic optimal solution would have τ 0 or ρ 0 There is still an obstacle to be overcome We do not have an explicit feasible interior point for GTS to start the algorithm and what is worse no feasible interior point one with all variables and slacks strictly positive exists This is because we always have τ 0 or ρ 0 as was noted in the lemma The next step is to enlarge the system so that the vector of all 1s is forced as a feasible interior point Before proceeding we simplify the notation We write the system GTS in matrix form as M₀u 0 u 0 where M₀ 0 A b AT 0 c b c 0 u y x τ We note that M₀ᵀ M₀ that is M₀ is a skewsymmetric matrix If the original matrix A has size mn then M₀ is a kk matrix with k n m 1 Now we set up the appropriate linear program with a feasible interior point and then we gradually explain how and why it works We define a vector r ℝᵏ by r 1 M₀1 a k1 k1 matrix M by M M₀ r rᵀ 0 128 7 Not Only the Simplex Method we note that M is skewsymmetric too which will be useful and a vector q Rk1 by q 0 0 0 k1 We consider the following linear program with variable vector v u ϑ that is u with a new variable ϑ appended to the end Maximize qT v subject to Mv q v 0 SD First we explain the abbreviation SD It stands for selfdual and indeed if we form the dual linear program to SD we obtain minimize qT w subject to M T w q w 0 Using M T M and changing signs which changes minimization to maximization and flips the direction of the inequality we arrive exactly at SD So SD is equivalent to its own dual linear program For a feasible solution v of SD we define the slacks by z zv q Mv z for zlacks Here is a key notion A feasible solution v of SD is called strictly complementary if for every j 1 2 k1 we have vj 0 or zj 0 The next lemma shows that a strictly complementary optimal solution is exactly what we need for solving the original linear program 77 more precisely we dont need the full force of strict complementarity only a strict complementarity for a particular j 723 Lemma The linear program SD is feasible and bounded ev ery optimal solution has ϑ 0 and hence its upart is a solution of the GoldmanTucker system GTS Moreover every strictly comple mentary optimal solution yields a solution of GTS with τ 0 or ρ 0 Proof We have 0 as a feasible solution of SD and also of its dual Thus SD is feasible and bounded For v w 0 both the primal and dual objective functions have value 0 so 0 must be their common optimal value It follows that every optimal solution has ϑ 0 The rest of the lemma is easily checked Hence for solving the original linear program 77 it suffices to find a strictly complementary optimal solution of SD It turns out that this is exactly what the algorithm described above computes We will not prove this in full but let us see what is going on In order to apply the algorithm to SD we first need to convert SD to equational form by adding the slack variables Maximize qT v subject to Mv z q v z 0 Now we want to write down the system 74 specifying points on the central path for the considered linear program So we substitute 72 Interior Point Methods 129 A M Ik1 so A is a k12k1 matrix b q c q 0 and x v z We also need to introduce the extra variables y and s appearing in 74 For reasons that will become apparent later we write v for y and z z for s where z and z are k1component vectors So we have v z z z nonnegative while v is so far arbi trary The equation Ax b from 74 becomes Mv z q From AT y s c we get two equations M T v z q and v z 0 The first of these two can be rewritten as Mv z q The second just means that v z 0 and so z can be disregarded if we add the constraint v 0 Finally sjxj µ in 74 yields vjz j µ and v jzj µ for all j 1 2 k1 The full system is thus Mv z q Mv z q vjz j µ for all j 1 2 k 1 v jzj µ for all j 1 2 k 1 v z v z 0 It is easily verified using the skewsymmetry of M0 that the sys tem Mv z q is satisfied by v z 1 M and q were set up that way Thus v z v z 1 is a solution of the justderived system with µ 1 and we can use it as an initial point on the central path in Step 1 of the algorithm To complete the analysis one needs to show that the algorithm converges to a strictly complementary op timal solution and as we said above this part is omitted As a final remark to the algorithm we note that the system above can be simplified We know that 74 in general has a unique solution provided that the primal and dual linear programs both have a feasi ble interior point which is satisfied in our case At the same time we observe that if v z v z is a solution then interchanging v with v and z with z also yields a solution and so uniqueness implies v v and z z Therefore it is sufficient to work with the simpler system Mv z q vjzj µ for j 1 2 k1 z v 0 and to set up the corresponding linear system for the changes v and z accordingly This concludes the description and partial analysis of the considered interior point algorithm Let us remark that the selfdual embedding trick is quite universal Almost any interior point algorithm can be used for computing an optimum of the selfdual linear program constructed as above and this yields an optimal solution of the original linear program Computational complexity of interior point methods Several interior point methods for linear programming are known to be weakly polynomial 130 7 Not Only the Simplex Method algorithms including the one given above with the selfdual embedding The total number of bit operations for the best of these algorithms is bounded by On3L where L is the maximum bit size of coefficients in the linear program and n is the number of variables The maximum number of iterations before reaching an optimum is OnL On the other hand examples are known based on the KleeMinty cube again for which any algorithm following the central path must make Ωn log n iterations see A Deza E Nematollahi and T Terlaky How good are interior point methods KleeMinty cubes tighten iterationcomplexity bounds Technical Report McMaster University 2004 In practice the number of iterations seems to be bounded by a constant or by Olog n in almost all cases This is somewhat similar to the situation for the simplex method where the worstcase behavior is much worse than the behavior for typical inputs Our presentation of interior point methods is inspired mostly by T Terlaky An easy way to teach interiorpoint methods Eu ropean Journal of Operational Research 130 12001 119 and full proofs of the results in this section can be found in C Roos T Terlaky and JP Vial Interior Point Methods for Linear Optimization 2nd edition Springer Berlin etc 2005 A compact survey is F A Potra S J Wright Interiorpoint methods Journal of Computational and Applied Mathematics 1242000 pages 281302 at time of writing this text it was accessible at websites of the authors Several more books from the late 1990s are cited in these papers and an immense amount of material can be found online 8 More Applications Here we have collected several applications of linear programming and in particular of the duality theorem They are slightly more advanced than those in Chapters 2 and 3 but we have tried to keep everything very con crete and as elementary as possible and we hope that even a mathematically inexperienced reader will have no problems enjoying these small gems 81 ZeroSum Games The Colonel Blotto game Colonel Blotto and his opponent are preparing for a battle over three mountain passes Each of them commands five regi ments The one who sends more regiments to a pass occupies it but when the same number of regiments meet there will be a draw Finally the one who occupies more passes than the other wins the battle with a draw occurring if both occupy the same number of passes Given that all three passes have very similar characteristics the strate gies independently pursued by both Colonel Blotto and his opponent are the following First they partition their five regiments into three groups For example the partition 0 1 4 means that one pass will be attacked by 4 reg iments another pass by 1 regiment and one pass will not be attacked at all Then the groups are assigned to the passes randomly that is each of the 3 6 possible assignments of groups to passes is equally likely The partitions of Colonel Blotto and his opponent determine winning probabilities for both of them in general these do not add up to one because of possible draws Both Colonel Blotto and his opponent want to bias the difference of these probabilities in their direction as much as possible How should they choose their partitions This is an instance of a finite twoplayer zerosum game In such a game each of the two players has a finite set of possible strategies in our case the partitions and each pair of opposing strategies leads to a payoff known to both players In our case we define the payoff as Colonel Blottos winning probability minus the opponents winning probability Whatever one of the players wins the other player loses and this explains the term zero sum game To some extent it has become a part of common vocabulary 132 8 More Applications When we number the strategies 1 2 m for the first player and 1 2 n for the second player the payoffs can be recorded in the form of an m n payoff matrix In the Colonel Blotto game the payoff matrix looks as follows with the rows corresponding to the strategies of Colonel Blotto and the columns to the strategies of the opponent 0 0 5 0 1 4 0 2 3 1 1 3 1 2 2 0 0 5 0 1 3 1 3 1 1 0 1 4 1 3 0 0 1 3 2 3 0 2 3 1 3 0 0 0 1 3 1 1 3 1 1 3 0 0 1 3 1 2 2 1 2 3 1 3 1 3 0 For example when Colonel Blotto chooses 0 1 4 and his opponent chooses 0 0 5 then Colonel Blotto wins actually without fighting if and only if his two nonempty groups arrive at the two passes left unattended by his opponent The probability for this to happen is 1 3 With probability 2 3 there will be a draw so the difference of the winning probabilities is 1 3 0 1 3 Not knowing what the opponent is going to do Colonel Blotto might want to choose a strategy that guarantees the highest payoff in the worst case The only candidate for such a strategy is 0 2 3 No matter what the opponent does Colonel Blotto will get a payoff of at least 0 with this strategy while all other strategies lead to negative payoff in the worst case Anticipating that a spy of the opponent might find out about his plans he must reckon that the worst case will actually happen The whole game is not a particularly cheerful matter anyway In terms of the payoff matrix Colonel Blotto looks at the minimum in each row and he chooses a row where this minimum is the largest possible Similarly the opponent wants to choose a strategy that guarantees the lowest payoff for Colonel Blotto in the worst case It turns out that 0 2 3 is also the unique such choice for the opponent because it guarantees that Colonel Blotto will receive payoff at most 0 while all other strategies allow him to achieve a positive payoff if he happens to guess or spy out the op ponents strategy In terms of the payoff matrix the opponent looks at the maximum in each column and he chooses a column where this maximum is the smallest possible We note that if both Colonel Blotto and his opponent play the strategies selected as above they both see their worst expectations come true exactly those on which they pessimistically based their choice of strategy Seeing the worst case happen might shatter hopes for a better outcome of the battle but on the other hand it is a relief After the battle has been fought neither Colonel Blotto nor his opponent will have to regret their choice Even if both 81 ZeroSum Games 133 had known the others strategy in advance neither of them would have had an incentive to change his own strategy This is an interesting feature of this game The strategy selected by Colonel Blotto and the strategy selected by his opponent as above are best responses against one another In terms of the payoff matrix the entry 0 in the row 0 2 3 and column 0 2 3 is a saddle point it is a minimum in its row and a maximum in its column A pair of strategies that are best responses against one another is called a Nash equilibrium of the game As we will see next not every game has a Nash equilibrium in this sense The RockPaperScissors game Alice and Bob independently choose a hand gesture indicating either a rock a piece of paper or a pair of scissors If both players choose the same gesture the game is a draw and otherwise there is a cyclic pattern Scissors beats paper by cutting it paper beats rock by wrapping it up rock beats scissors by making it blunt Assuming that the payoff to Alice is 1 if she wins 0 if there is a draw and 1 if she loses the payoff matrix is rock paper scissors rock 0 1 1 paper 1 0 1 scissors 1 1 0 This game has no Nash equilibrium in the sense explained above No entry of the payoff matrix is a minimum in its row and a maximum in its column at the same time In more human terms after every game the player who lost may regret not to have played the gesture that would have beaten the gesture of the winner and both may regret in the case of a draw It is impossible for both players to fix strategies that are best responses each against the other But when we generalize the notion of a strategy there is a way for both players to avoid regret Both should decide randomly selecting each of the gestures with probability 13 Even this strategy may lose of course but still there is no reason for regret since with the same probability 13 it could have won and the fact that it didnt is not a fault of the strategy but just bad luck Indeed in this way both Alice and Bob can guarantee that their payoff is 0 in expectation and it is easy to see that neither of them can do better by unilaterally switching to a different behavior We say that we have a mixed Nash equilibrium of the game formally defined below A surprising fact is that every zerosum game has a mixed Nash equilib rium It turns out that such an equilibrium solves the game in the sense that it tells us or rather both of the two players how to play the game optimally As we will see in examples the random decisions that are involved in a mixed Nash equilibrium need not give each of the possible strategies the same probability as was the case in the very simple RockPaperScissors game However we will prove that suitable probability distributions always exist and moreover that they can be computed using linear programming Existence and computation of a mixed Nash equilibrium Let us repeat the setup of zerosum games in a more formal manner We have two players and we stick to calling them Alice and Bob Alice has a set of m pure strategies at her disposal while Bob has a set of n pure strategies we assume that m n 1 Then there is an mn payoff matrix M of real numbers such that mᵢⱼ is Alices gain and Bobs loss when Alices ith pure strategy is played against Bobs jth pure strategy For concreteness we may think of Bob having to pay mᵢⱼ to Alice Of course the situation is symmetric in that mᵢⱼ might be negative in which case Alice has to pay mᵢⱼ to Bob A mixed strategy of a player is a probability distribution over his or her set of pure strategies We encode a mixed strategy of Alice by an mdimensional vector of probabilities x x₁ xₘ ᵢ₁ᵐ xᵢ 1 x 0 and a mixed strategy of Bob by an ndimensional vector of probabilities y y₁ yₙ ⱼ₁ⁿ yⱼ 1 y 0 So a mixed strategy is not a particular case of a pure strategy in the RockPaperScissors game Alice has three possible pure strategies rock paper and scissors but infinitely many possible mixed strategies She can choose any three nonnegative real numbers x₁ x₂ x₃ with x₁ x₂ x₃ 1 and play rock with probability x₁ paper with probability x₂ and scissors with probability x₃ Each such triple x₁ x₂ x₃ specifies a mixed strategy Given mixed strategies x and y of Alice and Bob the expected payoff expected gain of Alice when x is played against y is ᵢⱼ mᵢⱼ ProbₓyAlice plays i Bob plays j ᵢⱼ mᵢⱼ ProbₓAlice plays i ProbᵧBob plays j ᵢⱼ mᵢⱼ xᵢ yⱼ xᵀMy Now we are going to formalize the tenet of Colonel Blotto Prepare for the worst When Alice considers playing some mixed strategy x she expects Bob to play a best response against x a strategy y that minimizes her expected payoff xᵀMy Similarly for given y Bob expects Alice to play a strategy x that maximizes xᵀMy For a fixed matrix M these worstcase payoffs are captured by the following two functions 81 ZeroSum Games 135 βx min y xT My αy max x xT My So βx is the best smallest expected payoff that Bob can achieve against Alices mixed strategy x and similarly αy is the best largest expected payoff that Alice can achieve against Bobs y It may also be worth noting that y0 is Bobs best response against some x exactly if xT My0 βx the symmetric statement for Alice is left to the reader Let us note that β and α are welldefined functions since we are opti mizing over compact sets For β say the set of all x representing probability distributions is an m1dimensional simplex in Rm and hence indeed com pact 811 Definition A pair x y of mixed strategies is a mixed Nash equi librium of the game if x is a best response against y and y is a best response against x the adjective mixed is often omitted in formulas this can be expressed as βx xT M y αy In the Colonel Blotto game we have even found a pure Nash equilibrium a pair of pure strategies that are best responses against each other However the strategies themselves involved random de cisions We regard these decisions as hardwired into the strategies and the payoff matrix Alternatively we can consider each fixed assignment of regiments to passes as a pure strategy Then we have a considerably larger payoff matrix and there is no pure Nash equilibrium Rather we have a mixed Nash equilibrium In practical terms it amounts to the same thing as the strategies described in the previous interpretation of the game namely dividing the regiments in groups of 3 2 and 0 and sending one group to each pass at random These are two different views models of the same game and we are free to investigate either one although one may be more convenient or more realistic than the other Let us say that Alices mixed strategy x is worstcase optimal if βx maxx βx That is Alice expects Bob to play his best response against every mixed strategy of hers and she chooses a mixed strategy x that maximizes her expected payoff under this pessimistic assumption Similarly Bobs mixed strategy y is worstcase optimal if αy miny αy The next simple lemma shows among other things that in order to attain a Nash equilibrium both players must play worstcase optimal strategies 812 Lemma i We have maxx βx miny αy Actually for every two mixed strategies x and y we have βx xT My αy 136 8 More Applications ii If the pair x y of mixed strategies forms a mixed Nash equilibrium then both x and y are worstcase optimal iii If mixed strategies x and y satisfy βx αy then they form a mixed Nash equilibrium Proof It is an amusing mental exercise to try to see the claims of the lemma by thinking informally about players and games But a formal proof is routine which is a nice demonstration of the power of mathematical for malism The first sentence in i follows from the second one which in turn is an immediate consequence of the definitions of α and β In ii for any x we have βx αy by i and since βx αy we obtain βx βx Thus x is worstcase optimal and a symmetric argument shows the worstcase optimality of y This proves ii As for iii if βx αy then by i we have βx xT M y αy and hence x y is a mixed Nash equilibrium The lemma is proved Here is the main result of this section 813 Theorem Minimax theorem for zerosum games For every zerosum game worstcase optimal mixed strategies for both players exist and can be efficiently computed by linear programming If x is a worst case optimal mixed strategy of Alice and y is a worstcase optimal mixed strategy of Bob then x y is a mixed Nash equilibrium and the number βx xT M y αy is the same for all possible worstcase optimal mixed strategies x and y The value xT M y the expected payoff in any Nash equilibrium is called the value of the game Together with Lemma 812ii we get that x y forms a mixed Nash equilibrium if and only if both x and y are worstcase optimal This theorem in a sense tells us everything about playing zerosum games In particular Prepare for the worst is indeed the best policy for nontrivial reasons If Alice plays a worstcase optimal mixed strategy her expected payoff is always at least the value of the game no matter what strat egy Bob chooses Moreover if Bob is well informed and plays a worstcase optimal mixed strategy then Alice cannot secure an expected payoff larger than the value of the game no matter what strategy she chooses So there are no secrets and no psychology involved both players can as well declare their mixed strategies in advance and nothing changes Of course if there are many rounds of the game and Alice suspects that Bob hasnt learned his lesson and doesnt play optimally she can begin to contemplate how she could exploit this Then psychology does come into play However by trying strategies that are not worstcase optimal she is taking a risk since she also gives Bob a chance to exploit her It remains to explain the name minimax theorem If we consider the equality βx αy and use the definitions of β α and of worstcase optimality of x and y we arrive at maxx miny xT M y miny maxx xT M y and this is the explanation we offer The relation of Theorem 813 to Lemma 812i is similar to the relation of the duality theorem of linear programming to the weak duality theorem And indeed we are going to use the duality theorem in the proof of Theorem 813 in a substantial way Proof of Theorem 813 We first show how worstcase optimal mixed strategies x for Alice and y for Bob can be found by linear programming Then we prove that the desired equality βx αy holds We begin by noticing that Bobs best response to a fixed mixed strategy x of Alice can be found by solving a linear program That is βx with x a concrete vector of m numbers is the optimal value of the following linear program in the variables y1 yn Minimize xT M y subject to j1n yj 1 y 0 81 So we can evaluate βx But for finding a worstcase optimal strategy of Alice we need to maximize β Unfortunately βx is not a linear function so we cannot directly formulate the maximization of βx as a linear program Fortunately we can circumvent this issue by using linear programming duality Using the dualization recipe from Section 62 we write down the dual of 81 Maximize x0 subject to MT x 1 x0 0 this is a nice exercise in dualization This dual linear program has only one variable x0 since x1 xm are still regarded as fixed numbers By the duality theorem the optimal value of the dual linear program is the same as that of the primal namely βx In order to maximize βx over all mixed strategies x of Alice we set up a new linear program that optically looks exactly like the previous one but in which x1 xm are regarded as variables this works only because the constraints happen to be linear in x0 x1 xm Maximize x0 subject to MT x 1 x0 0 i1m xi 1 x 0 82 If x0 x denotes an optimal solution of this linear program we have by construction x0 βx maxx βx 83 In a symmetric fashion we can derive a linear program for solving Bobs task of computing a best strategy y We obtain the problem minimize y0 subject to M y 1 y0 0 j1n yj 1 y 0 84 in the variables y0 y1 yn Now an optimal solution y0 y satisfies y0 αy miny αy 85 So both x and y are worstcase optimal strategies and conversely worstcase optimal strategies provide optimal solutions of the respective linear programs The punchline is that the two linear programs 82 and 84 are dual to each other Again the dualization recipe shows this It follows that both programs have the same optimum value x0 y0 Hence βx αy and x y is a Nash equilibrium by Lemma 812iii RockPaperScissors revisited To kill the time between their rare public appearances Santa Claus and the Easter Bunny play the rockpaperscissors game against each other The Easter Bunny however cannot indicate a pair of scissors with his paw and is therefore limited to two pure strategies The payoff matrix in this variant is rock paper rock 0 1 paper 1 0 scissors 1 1 We already see that Santa Claus should never play rock For any possible gesture of the Easter Bunny paper is a better strategy Let us apply the machinery we have just developed to find optimal mixed strategies for Santa Claus and the Easter Bunny Recall that Santa Claus has to solve the linear program 82 to find the probability distribution x x1 x2 x3 that determines his optimal strategy At the same time he will compute the game value x0 his expected gain 81 ZeroSum Games 139 The linear program is maximize x0 subject to x2 x3 x0 0 x1 x3 x0 0 x1 x2 x3 1 x1 x2 x3 0 A unique optimal solution is x0 x1 x2 x3 1 3 0 2 3 1 3 The Easter Bunnys problem 84 is minimize y0 subject to y2 y0 0 y1 y0 0 y1 y2 y0 0 y1 y2 1 y1 y2 0 A unique optimal solution is y0 y1 y2 1 3 1 3 2 3 Let us summarize If both play optimally Santa Claus wins 1 3 on average this is a scientific explanation of why Santa Claus can afford to bring more presents Both play paper with probability 2 3 With the remaining proba bility 1 3 Santa Claus plays scissors while the Easter Bunny plays rock This result is a simple but still nontrivial application of zerosum game theory In retrospect the original rockpaperscissors game might ap pear rather boring but this is relative There is a World RPS So ciety httpwwwworldrpscom that holds an annual rockpaper scissors world championship and sells a book on how to play the game Choosing numbers Here is another game which actually seems fun to play and in which the optimal mixed strategies are not at all obvious Each of the two players independently writes down an integer between 1 and 6 Then the numbers are compared If they are equal the game is a draw If the numbers differ by one the player with the smaller number gets e 2 from the one with the larger number If the two numbers differ by two or more the player with the larger number gets e 1 from the one with the smaller number We want to challenge the reader to compute the optimal mixed strategies for this game by symmetry they are the same for both players The Colonel Blotto game was apparently first considered by the mathematician Emile Borel in 1921 he also served as a French minis ter of the navy although only for several months It can be considered for any number of regiments for 6 regiments there is still a Nash equi librium defined by a pair of pure strategies but for 7 regiments this is 140 8 More Applications the case stated in Borels original paper it becomes necessary to mix Borels paper does not mention the name Colonel Blotto this name appears in Hubert Phillipss Weekend Problems Book a collection of puzzles from 1933 In his paper Borel considers symmetric games in general A sym metric game is defined by a payoff matrix M with M T M Borel erroneously states that if the number n of strategies is sufficiently large one can construct symmetric games in which each player can secure a positive expected payoff knowing the other players mixed strategy He concludes that playing zerosum games requires psychol ogy on top of mathematics It was only in 1926 that John von Neumann not knowing about Borels work and its pessimistic conclusion formally established The orem 813 Bimatrix games An important generalization of finite zerosum games is bimatrix games in which both Alice and Bob want to max imize the payoff with respect to a payoff matrix of their own A for Alice and B for Bob in the zerosum case A B A bimatrix game also has at least one mixed Nash equilibrium a pair of strate gies x y that are best responses against each other meaning that xT Ay maxx xT Ay and xT By maxy xT By We encourage the reader to find a mixed Nash equilibrium in the following variant of the modified rockpaperscissors game played by Santa Claus and the Easter Bunny As before the loser pays e 1 to the winner but in case of a draw each player donates e 050 to charity The problem of finding a Nash equilibrium in a bimatrix game cannot be formulated as a linear program and no polynomialtime algorithm is known for it On the other hand a Nash equilibrium can be computed by a variant of the simplex method called the Lemke Howson algorithm but possibly with exponentially many pivot steps In general Nash equilibria in bimatrix games are not as satisfactory as in zerosum games and there is no such thing as the game value We know that in a Nash equilibrium no player has an incentive to unilaterally switch to a different behavior Yet it may happen that both can increase their payoff by switching simultaneously a situation that obviously cannot occur in a zerosum game This means that the Nash equilibrium was not optimal from the point of view of social welfare and no player has a real desire of being in this particular Nash equilibrium It may even happen that all Nash equilibria are of this suboptimal nature Here is an example At each of the two department stores in town All the Best Deals and Buyers Paradise the respective owner needs to decide whether to launch an advertisement campaign for the upcoming Christmas sale If one store runs a campaign while the competitor doesnt the expected 81 ZeroSum Games 141 extra revenue obtained from customers switching their preferred store e 50000 say and new customers e 10000 easily outweighs the cost of the campaign e 20000 If on the other hand both stores advertise let us assume that the campaigns more or less neutralize themselves with extra revenue coming only from new customers in an almost saturated market e 8000 for each of the stores Listing the net revenues aij bij as pairs in units of e 1 000 we obtain the following matrix with rows corresponding to the strategies of All the Best Deals and columns to Buyers Paradise advertise dont advertise advertise 12 12 40 50 dont advertise 50 40 0 0 If the store owners were friends they might agree on running no campaign in order to save money theyd better keep this agreement private in order to avoid a pricefixing charge But if they do not communicate or mistrust each other rational behavior will force both of them to waste money on campaigns To see this put yourself in the position of one of the store owners Assuming that your competitor will not advertise you definitely want to advertise in order to profit from the extra revenue Assuming that the other store will advertise you must advertise as well in order not to lose customers This means that you will advertise in any case We say that the strategy adver tise strictly dominates the strategy dont advertise so it would be irrational to play the latter Because the other store owner reaches the same conclusion there will be two almost useless campaigns in the end In fact the pair of strategies advertise advertise is the unique Nash equilibrium of the game mixing does not help but it is suboptimal with respect to social welfare In general bimatrix games the players might not be able to reach the social optimum through rational reasoning even if this optimum corresponds to an equilibrium of the game This is probably the most serious deficiency of bimatrix games as models of realworld situations An example is the battle of the sexes A couple wants to go out at night He prefers the boxing match while she prefers the opera but both prefer going together over going alone If both are to decide independently where to go there is no rational way of reaching a social optimum namely both going out together no matter where When the advertisement game is played repeatedly after all there is a Christmas sale every year the situation changes In the long run wasting money every year is such a bad prospect that the following more cooperative behavior makes sense In the first year refrain from advertising and in later years just do what the competitor did the 142 8 More Applications year before This strategy is known as TIT FOR TAT If both stores adopt this policy they will never waste money on campaigns but even if one store deviates from it there is no possibility of exploiting the competitor in the long run It is easy to see that after a possible first loss the one playing TIT FOR TAT can pay for any further loss due to a previous loss of the competitor The Prisoners Dilemma The advertisement game is a variation of the wellknown prisoners dilemma in which two convicts charged with a crime committed together are independently offered somewhat unethical plea bargains if both stay silent a lack of evidence will lead to only minor punishment If each testifies against the other there will be some bearable punishment But ifand this is the unethical part exactly one of the two testifies against the other the betrayer will be rewarded and set free while the one that remains silent will receive a heavy penalty As before rational behavior will force both convicts to testify against each other even if they had nothing to do with the crime A popular introduction to the questions surrounding the prisoners dilemma is W Poundstone Prisoners Dilemma John von Neumann Game Theory and the Puzzle of the Bomb Doubleday New York 1992 and the 1964 Stanley Kubrick movie Dr Strangelove or How I Learned to Stop Worrying and Love the Bomb might also widen ones horizons in this context A general introduction to game theory is J Bewersdorff Luck Logic and White Lies AK Peters Wellesley 2004 This book also contains the references to the work by Borel and von Neumann that we have mentioned above 82 Matchings and Vertex Covers in Bipartite Graphs Let us return to the job assignment problem from Section 32 There the human resources manager of a company was confronted with the problem of filling seven positions with seven employees where every employee has a score reflecting qualification for each position he or she is willing to accept We found that the manager can use linear programming to find an assignment of employees to positions a perfect matching that maximizes the sum of 82 Matchings and Vertex Covers in Bipartite Graphs 143 scores We have also promised to show how the manager can fill the positions optimally if there are more employees than positions Here we will first disregard the scores and ask under what conditions the task has any solution at all that is whether there is any assignment of positions to employees such that every employee gets a position she or he is willing to accept and each position is filled We know that we can decide this by linear programming we just invent some arbitrary scores say all scores 100 but here we ask for a mathematical condition For example let us consider the graph from Section 32 but slightly mod ified some of the people changed their minds A B C D E F G q r s t u v w There is no assignment filling all positions in this situation This is not imme diately obvious but it becomes obvious once we look at the set r s t u w of jobs marked Indeed the set of people willing to take any of these 5 jobs is C D E G This is only 4 people and so they cannot be assigned to 5 different jobs The next theorem known as Halls theorem or the marriage theorem states that if no assignment exists we can always find such a simple reason a subset of k jobs such that the total number of employees willing to take any of them is smaller than k Before we formally state and prove this theorem in the language of bi partite graphs we need to recall the notions of maximum matching Section 32 and minimum vertex cover Section 33 A matching in a graph G V E is a set E E of edges with the property that each vertex is incident to at most one edge in E A matching is maximum if it has the largest number of edges among all matchings in G A vertex cover of G V E is a set V V of vertices with the property that each edge is incident to at least one vertex in V A vertex cover is minimum if it has the smallest number of vertices among all vertex covers of G Halls theorem gives necessary and sufficient conditions for the existence of a best possible matching in a bipartite graph namely a matching that covers all vertices in one class of the vertex bipartition 144 8 More Applications 821 Theorem Halls theorem Let G V E be a bipartite graph with bipartition V X Y For a set T X we define the neighborhood NT Y of T as the set NT w Y v w E for some v T If for every T X NT T holds then G has a matching that covers all vertices in X Actually we derive the following statement from which Halls theorem easily follows 822 Theorem Konigs theorem Let G V E be a bipartite graph Then the size of a maximum matching in G equals the size of a minimum vertex cover of G We prove this theorem using the duality of linear programming There are also combinatorial proofs and they are actually simpler than the proof offered here There are at least two reasons in favor of the proof via duality First it is a simple example illustrating a powerful general technique And second it gives more than just Konigs theorem It shows that a maximum matching as well as a minimum vertex cover in a bipartite graph can be computed by linear programming Moreover the method can be extended to computing a maximumweight matching in a bipartite graph with weights on the edges Let us first see how Halls theorem follows from Konigs theorem Proof of Theorem 821 Let G X Y E be a bipartite graph with NT T for all T X We will show that any minimum vertex cover of G has size n1 X Konigs theorem then implies that G has a matching of size n1 and this matching obviously covers X For contradiction suppose that there is a vertex cover C with k vertices from X and fewer than n1 k vertices from Y for some k The set T X C has size n1 k and satisfies NT n1 k by the assumption But this implies that there is a vertex w NT that is not in C Y Since this vertex has some neighbor v T the edge v w is not covered by C a contradiction Totally unimodular matrices A matrix A is called totally unimodular if every square submatrix of A obtained from A by deleting some rows and some columns has determinant 0 1 or 1 We note that in particular the entries of A can be only 0 1 and 1 Such matrices are interesting since an integer program with a totally uni modular constraint matrix is easy to solveit suffices to solve its LP relax ation as we will show in Lemma 824 below Let us start with a preparatory step 823 Lemma Let A be a totally unimodular matrix and consider the matrix A obtained from A by appending a unit vector ei as a new last column Then A is totally unimodular as well Proof Let us fix an l l submatrix Q of A If Q is a submatrix of A then detQ 1 0 1 by total unimodularity of A Otherwise we pick the column l of Q that corresponds to the newly added column in A and we compute detQ according to the Laplace expansion on this column We recall that detQ i1l 1il qi l detQi l where Qi l is the matrix resulting from Q by removing row i and column l By construction the lth column may be 0 and we get detQ 0 or there is exactly one nonzero entry qk l 1 In that case detQ 1kl detQk l 1 0 1 since Qk l is a submatrix of A The following lemma is the basis of using linear programming for solving integer programs with totally unimodular matrices 824 Lemma Let us consider a linear program with n nonnegative variables and m inequalities of the form maximize cT x subject to Ax b x 0 where b Zm If A is totally unimodular and if the linear program has an optimal solution then it also has an integral optimal solution x Zn Proof We first transform the linear program into equational form The resulting system of equality constraints is A x b with A AIm and x Rnm Then we solve the problem using the simplex method Let x be an optimal basic feasible solution associated with the feasible basis B 1 2 nm Then we know that the nonzero entries of x are given by xB AB1 b see Section 55 By Cramers rule the entries of AB1 can be written as rational numbers with common denominator detAB The matrix AB is a square submatrix of A where A is totally unimodular by repeated application of Lemma 823 Since AB is nonsingular we get detAB 1 1 and the integrality of x follows With these notions the following integer program in the variables t and xij i M j J computes an assignment of jobs to machines that minimizes the makespan Minimize t subject to iM xij 1 for all j J jJ dijxij t for all i M xij 0 for all i M j J xij Z for all i M j J Under the integrality constraints the conditions iM xij 1 and xij 0 imply that xij 0 1 for all i j With the interpretation that xij 1 if job j is assigned to machine i and xij 0 otherwise the first n equations stipulate that each job is assigned to exactly one machine The next m inequalities make sure that no machine needs more time than t to finish all jobs assigned to it Minimizing t leads to equality for at least one of the machines so the best t is indeed the makespan of an optimal schedule As we have already seen in Section 33 the minimum vertex cover problem solving the LP relaxation obtained from an integer program by deleting the integrality constraints can be a very useful step toward an approximate solution of the original problem In our case this approach needs an additional twist We will relax another integer program obtained by adding redundant constraints to the program above After dropping integrality the added constraints are no longer redundant and lead to a better LP relaxation Let topt be the makespan of an optimal schedule If dij topt then we know that job j cannot run on machine i in any optimal schedule so we may add the constraint xij 0 to the integer program without affecting its validity More generally if T is any upper bound on topt we can write down the following integer program which has the same optimal solutions as the original one Minimize t subject to iM xij 1 for all j J jJ dijxij t for all i M xij 0 for all i M j J xij 0 for all i M j J with dij T xij Z for all i M j J But we do not know topt so what is the value of T we use for the relaxation There is no need to specify this right here for the time being you can imagine that we set T maxij dij a value that will definitely work because it makes our second integer program coincide with the first one A good schedule from the LP relaxation As already indicated the first step is to solve the LP relaxation denoted by LPRT of our second integer program minimize ni1 yi subject to AT y 1 y 0 y Zn where A is as before the incidence matrix of G In this integer program the row of AT corresponding to edge ej induces the constraint ivi ej yi 1 This implies that in any optimal solution ỹ we have ỹi 0 1 for all i since any larger value could be decreased to 1 But then the vertices vi with ỹi 1 in an optimal solution ỹ form a minimum vertex cover of G To summarize the optimum value of the first integer program is the size of a maximum matching in G and the optimum value of the second integer program is the size of a minimum vertex cover in G In both integer programs we may now drop the integrality constraints without affecting the optimum values A and therefore also AT are totally unimodular by Lemma 825 and so Lemma 824 applies But the resulting linear programs are dual to each other the duality theorem thus shows that their optimal values are equal and this proves Theorem 822 It remains to explain the algorithmic implications of the proof namely how a maximum matching and a minimum vertex cover can actually be computed To get a maximum matching we simply need to find an integral optimal solution of the first linear program When we use the simplex method to solve the linear program we get this for free see the proof of Lemma 824 and in particular the claim toward the end of its proof Otherwise we can apply Theorem 423ii to construct a basic feasible solution from any given optimal solution and this basic feasible solution will be integral A minimum vertex cover is obtained from the second dual linear program in the same fashion The previous arguments show more Given edge weights w1 wm any optimal solution of the integer program maximize mj1 wj xj subject to Ax 1 x 0 x Zm corresponds to a maximumweight matching in G Since we can as before relax the integrality constraints without affecting the optimum value an integral optimal solution of the relaxation can be found and it yields a maximumweight matching in G This solves the optimal job assignment problem if there are more employees than jobs 148 8 More Applications The fact that a linear program with totally unimodular constraint matrix and integral righthand side has an integral optimal solution implies something much stronger Since every vertex of the feasible region is optimal for some objective function see Section 44 we know that all vertices of the feasible region are integral We say that the feasible region forms an integral polyhedron Such integrality results together with linear programming duality can yield interesting and powerful minimax theorems Konigs theorem is one such example Another classical minimax theorem that can be proved along these lines is the maxflowmincut theorem To state this theorem we consider a network modeled by a directed graph G V E with edge capacities we In Section 22 we have interpreted them as maximum transfer rates of data links Given two designated vertices the source s and the sink t the maximum flow value is the maximum rate at which data can flow from s to t through the network The minimum cut value on the other hand is the minimum total capacity of any set of data links whose breakdown disconnects t from s The maxflowmincut theorem states that the maximum flow value is equal to the minimum cut value One of several proofs writes both values as optimal values of linear programs that are dual to each other When we consider matchings and vertex covers in general not nec essarily bipartite graphs the situation changes Total unimodularity no longer applies and the duality between the two concepts disap pears In fact the problem of finding a minimum vertex cover in a gen eral graph is computationally difficult NPhard see Section 33 A maximumweight matching on the other hand can still be com puted in polynomial time for general graphs although this result is by no means trivial Behind the scenes there is again an integrality result based on the notion of total dual integrality see the glossary 83 Machine Scheduling In the back office of the copy shop Copy Paste the operator is confronted with n copy jobs submitted by customers the night before For processing them she has m photocopying machines with different features at her dis posal For all i j the operator quickly estimates how long it would take the ith machine to process the jth job and she makes a table of the resulting running times like this 83 Machine Scheduling 149 Single BW Duplex BW Duplex Color Masters thesis 90 pages twosided 10 BW copies 45 min 60 min All the Best Deals flyer 1 page onesided 10000 BW copies 2h 45 min 4h 10 min 5h 30 min Buyers Paradise flyer 1 page onesided 10000 BW copies 2h 45 min 4h 10 min 5h 30 min Obituary 2 pages twosided 100 BW copies 2 min 3 min Party platform 10 pages twosided 5000 color copies 3h 30 min Since the operator can go home as soon as all jobs have been processed her goal is to find an assignment of jobs to machines a schedule such that the makespanthe time needed to finish all jobsis minimized In our example this is not hard to figure out For the party platform there is no choice between machines We can also observe that it is advantageous to use both BW machines for processing the two flyers no matter where the thesis and the obituary go Given this the makespan is at least 4h 55 min if the thesis is processed on the Duplex BW machine so it is better put on the color machine to achieve the optimum makespan of 4h 30 min with the obituary running on the BW machine In general finding the optimum makespan is computationally difficult NPhard The obvious approach of trying all possible schedules is of course not a solution for a larger number n of jobs What we show in this section is that the operator can quickly compute a schedule whose makespan is at most twice as long as the optimum makespan All she needs for that are some linear programming skills To really appreciate this result one shouldnt think of a problem with 5 jobs but with thousands of jobs We should emphasize that in this scheduling problem the jobs are consid ered indivisible and so each job must be processed on a single machine This in a sense is what makes the problem difficult As we will soon see an opti mal fractional schedule where a single job could be divided among several machines in arbitrary ratios can be found efficiently by linear programming Two integer programs for the scheduling problem Let us identify the m machines with the set M 1 m and the n jobs with the set J m 1 m n Let dij denote the running time of job j J on machine i M We assume dij 0 To simplify notation we also assume that any machine can process any job An infeasible assignment of job j to machine i can be modeled by a large running time dij K If K is larger than the sum of all real running times the optimal schedule will avoid infeasible assignments given that there is a feasible schedule at all Incidence matrices of bipartite graphs Here is the link between total unimodularity and Königs theorem Let G X Y E be a bipartite graph with n vertices v1 vn and m edges e1 em The incidence matrix of G is the matrix A Rnm defined by aij 1 if vi ej 0 otherwise 825 Lemma Let G X Y E be a bipartite graph The incidence matrix A of G is totally unimodular Proof We need to prove that every ℓℓ submatrix Q of A has determinant 0 or 1 and we proceed by induction on ℓ The case ℓ 1 is immediate since the entries of an incidence matrix are only 0s and 1s Now we consider ℓ 1 and an ℓℓ submatrix Q Since the columns of Q correspond to edges each column of Q has at most two nonzero entries which are 1 If there is a column with only zero entries we get detQ 0 and if there is a column with only one nonzero entry we can expand the determinant on this column as in the proof of Lemma 823 and get that up to sign detQ equals the determinant of an ℓ1ℓ1 submatrix Q By induction detQ 101 so the same holds for Q Finally if every column of Q contains precisely two 1s we claim that detQ 0 To see this we observe that the sum of all rows of Q corresponding to vertices in X is the row vector 1 1 since for each column of Q exactly one of its two 1s comes from a vertex in X For the same reason we get 1 1 by summing up the rows for vertices in Y and it follows that the rows of Q are linearly dependent Now we are ready to prove Königs theorem Proof of Theorem 822 We first consider the integer program maximize mj1 xj subject to Ax 1 x 0 x Zm where A is the incidence matrix of G In this integer program the row of A corresponding to vertex vi induces the constraint j ej vi xj 1 This implies that xj 01 for all j and that the edges ej with xj 1 in an optimal solution x form a maximum matching in G Next we consider the integer program Minimize t subject to iM xij 1 for all j J jJ dijxij t for all i M xij 0 for all i M j J xij 0 for all i M j J with dij T 152 8 More Applications E i j M J x ij 0 For an arbitrary optimal solution this graph could easily be a boring com plete bipartite graph but under Assumption 831 it becomes more interest ing 832 Lemma In any subgraph of G obtained by deleting edges or vertices with their incident edges the number of edges is at most the number of vertices The graph G for m 4 machines and n 6 jobs might look like this for example M J G Proof Let A be the constraint matrix of the LP relaxation It has one row for each machine one for each job and one for each runtime dij exceeding T The columns of A corresponding to the nonzero variables x ij equivalently to the edges of G are linearly independent by our assumption Now we consider any subgraph of G with vertex set M J M J and edge set E E Let A be the submatrix obtained from A by restricting to rows corresponding to M J and to columns corresponding to E Claim The columns of A are linearly independent To see this we first observe that the columns of A corresponding to E are linearly independent simply because E E Any variable xij i j E occurs in the inequality for machine i M and in the equation for job j J but in no other equation since x ij 0 implies dij T This means that the columns of A corresponding to E have zero entries in all rows except for those corresponding to machines or jobs in M J Hence these columns remain linearly independent even when we restrict them to the rows corresponding to M J By the claim we have E M J and this is the statement of the lemma This structural result about G allows us to find a good schedule 833 Lemma Let T 0 be such that the LP relaxation LPRT is feasible and suppose that a feasible solution is given that satisfies Assumption 831 and has value t t Then we can efficiently construct a schedule of makespan at most t T Proof We need to assign each job to some machine We begin with the jobs j that have degree one in the graph G and we assign each such j to its unique neighbor i By the construction of G and the equation iM xij 1 we have xij 1 in this case If machine i has been assigned a set Si of jobs in this way it can process these jobs in time jSi dij jSi dijxij jJ dijxij t So each machine can handle the jobs assigned by this partial schedule in time t Next we remove all assigned jobs and their incident edges from G This leaves us with a subgraph G M J E In G all vertices of degree one are machines In the example depicted above two jobs have degree one and their deletion results in the following subgraph G We will show that we can find a matching in G that covers all the remaining jobs If we assign the jobs according to this matching every machine gets at most one additional job and this job can be processed in time at most T by the construction of our second integer program Therefore the resulting full schedule has makespan at most t T It remains to construct the matching To this end we use Halls theorem from Section 82 According to this theorem a matching exists if for every subset J J of jobs its neighborhood the set of all machines connected to at least one job in J has size at least J To check this condition we let J J be such a subset of jobs and NJ its neighborhood If e is the number of edges in the subgraph of G induced by J NJ then Lemma 832 guarantees that e J NJ On the other hand since every job has at least two neighbors we have e 2J and this shows that NJ J Although this proof is nonconstructive we can easily find the matching once we know that it exists by linear programming as in Section 82 or by other known polynomialtime methods There is a direct way of constructing the matching in the proof of Lemma 833 that relies neither on Hallss theorem nor on general bipartite matching algorithms It goes as follows Lemma 832 implies that each connected component of G is either a tree or a tree with one extra edge connecting two of its vertices In the latter case the 154 8 More Applications component has exactly one cycle of even length because G is bipar tite Therefore we can match all jobs occurring on cycles and after removing the vertices of all cycles we are left with a subgraph G all of whose connected components are trees with at most one vertex per tree being a former neighbor of a cycle vertex It follows that in every tree of G at most one vertex can be a job of degree one since all other degreeone vertices already had degree one in G and are there fore machines The matching of the remaining jobs is easy We root any tree in G at its unique job of degree one or at any vertex if there is no such job and we match every job to one of its children in the rooted tree For this we observe that there cannot be an isolated job in G Since a job in G was on no cycle in G the removal of cycles can affect only one of the at least two neighbors of the job in G For our running example here is a complete assignment of jobs to ma chines obtained from the described procedure M J G Choosing the parameter T How good is the bound we get from the pre vious lemma We will assume that t is the value of an optimal basic feasible solution of the LP relaxation with parameter T Then t is a lower bound for the optimum makespan topt so this part of the bound looks promising But when we recall that T must be an upper bound for topt in order for our second integer program to be valid it seems that we would have to choose T topt to get makespan at most 2topt But this cannot be done since we have argued above that it is hard to compute topt and if it were easy there would be no need for an approximate solution anyway Luckily there is a way out Reading Lemma 833 carefully we see that T only needs to be chosen so that the LP relaxation LPRT is feasible and there is no need for the second integer program to be feasible If LPRT is feasible Lemma 833 allows us to construct a schedule with makespan at most tT so the best T is the one that minimizes tT subject to LPRT being feasible Since t depends on T we make this explicit now by writing t tT If LPRT is infeasible we set tT How to find the best T We seek a point T in which the function fT tT T attains a minimum First we observe that tT is a step function as in the following picture tT tT here d35 d24 d27 max dij T Indeed let us start with the value T maxij dij and let us decrease T continuously The value of tT may change jump up only immediately after moments when a new constraint of the form xij 0 appears in LPRT and this happens only for T dij Between these values the function tT stays constant Consequently the function fT tT T is linearly increasing on each interval between two consecutive dijs and the minimum is attained at some dij So we can compute the minimum and the desired best value T by solving at most mn linear programs of the form LPRT with T ranging over all dij Under our convention that tT if LPRT is infeasible the minimum will be attained at a value T with LPRT feasible 834 Theorem Let T be the value of T that minimizes tT T With T T the algorithm in the proof of Lemma 833 computes a schedule of makespan at most 2topt Proof We know that for T topt the second integer program is feasible and has optimum value topt Hence LPRtopt is feasible as well and its optimum value can be only smaller ttopt topt We thus have tT T minT tT T ttopt topt 2topt The 2approximation algorithm for the scheduling problem is adapted from the paper J K Lenstra D B Shmoys É Tardos Approximation algorithms for scheduling unrelated parallel machines Mathematical Programming 461990 pages 259271 Then for every n and d And is bounded above by the optimum value of the following linear program in variables x0 x1 xn Maximize x0 x1 xn subject to x0 1 xi 0 i1 2 d 1 sum from i0 to n Ktni xi 0 t1 2 n x0 x1 xn 0 Example Using the sphere packing bound we have previously found A17 3 7281 To compute the Delsarte bound we solve the linear program in the theorem after eliminating x0 x1 x2 which are actually constants we have 15 nonnegative variables and 17 constraints The optimum value is 6553 35 which implies A17 3 6553 The current best upper bound is 6552 an improvement by only 1 Toward an explanation The proof of the Delsarte bound will proceed as follows With every subset C subset of 0 1n we associate nonnegative real quantities xtilde0 xtilde1 xtilden such that C xtilde0 xtilden Then we will show that whenever C is a code with distance d the xtildei constitute a feasible solution of the linear program in the theorem It follows that the maximum of the linear program is at least as large as the size of any existing code C with distance d but of course it may be larger since a feasible solution does not necessarily corresponds to a code Given C subset of 0 1n the xtildei are defined by xtildei 1C number of w w in C2 dHw w i i 0 n Thus xtildei is the number of ordered pairs of code words with Hamming distance i divided by the total number of code words Since any of the C2 ordered pairs contributes to exactly one of the xtildei we have xtilde0 xtilde1 xtilden C Some of the constraints in the linear program in Theorem 843 are now easy to understand We clearly have xtilde0 1 since every w in C has distance 0 only to itself The equations xtilde1 0 through xtilded1 0 hold by the assumption that C has distance d that is there are no pairs of code words with Hamming distance between 1 and d1 Interestingly this is the only place in the proof of the Delsarte bound where the assumption of C being a code with distance d is used The remaining set of constraints is considerably harder to derive and it lacks a really intuitive explanation Thus to prove Theorem 843 we have to establish the following 844 Proposition Let C subset of 0 1n be an arbitrary set let xtildei xtildeiC be defined as above and let t in 1 2 n Then we have the inequality 84 Upper Bounds for Codes 157 order to distinguish 16 possibilities we send one of the following 7bit words 0000000 0001011 0010101 0011110 0100110 0101101 0110011 0111000 1000111 1001100 1010010 1011001 1100001 1101010 1110100 1111111 It can be checked that every two of these words differ in at least 3 bits This fact can be checked by brute force but it is also a consequence of an elegant general theory which we do not treat here Therefore if one error occurs in the transmission the sequence that was sent can be reconstructed uniquely Hence the capability of correcting any singlebit error is retained but the number of transmitted bits is reduced to slightly more than half compared to the previous approach A similar problem can be investigated for other settings of the parameters as well In general we want to communicate one of N possibilities we do it by transmitting an nbit word and we want that any at most r errors can be corrected This problem has an enormous theoretical and practical significance Our example with a DVD player was simpleminded but errorcorrecting codes play a role in any technology involving transmission or storage of information from computer disks and cell phones to deepspace probes We now introduce some common terminology related to errorcorrecting codes Terminology The Hamming distance of two words w w 0 1n is the number of bits in which w differs from w dHw w j 1 n wj w j The Hamming distance can be interpreted as the number of errors neces sary to transform w into w The weight of w 0 1n is the number of 1s in w w j 1 n wj 1 Finally for w w 0 1n we define their sum modulo 2 as the word w w w1 w 1 mod 2 wn w n mod 2 0 1n These three notions are interrelated by the formula dHw w w w 86 In the last of the solutions to the DVDplayer problem discussed above the crucial object was the set C 0000000 0001011 0010101 0011110 0100110 0101101 0110011 0111000 1000111 1001100 1010010 1011001 1100001 1101010 1110100 1111111 of 7bit words in which every two dis tinct words had Hamming distance at least 3 In coding theory any subset C 0 1n is called a code This may sound strange since under a code one usually imagines some kind of method or procedure for coding but in the theory of errorcorrecting codes one has to get used to this terminology For correcting errors the crucial parameter is the distance of the code 158 8 More Applications 841 Definition A code C 0 1n has distance d if dHw w d for any two distinct words w w in C For n d 0 let An d denote the maximum cardinality of a code C 0 1n with distance d We claim that a code C can correct any at most r errors if and only if it has distance at least 2r 1 Indeed on the one hand if C contained two distinct words w w that differ in at most 2r bits we consider any word w resulting from w by flipping exactly half of the bits rounded down that distinguish w from w When the word w is received there is no way to tell which of the words w and w was intended On the other hand if any two distinct code words differ by at least 2r 1 bits then for any word w 0 1n there is at most one code word from which w can be obtained through r or fewer errors and this must be the word that was sent when w is received Given the number n of bits we can afford to transmit and the number r of errors we need to be able to correct we want a code C 0 1n with distance at least 2r 1 and with C as large as possible since the number of words in the code corresponds to the amount of information that we can transmit Thus determining or estimating An d the maximum possible size of a code C 0 1n with distance d is one of the main problems of coding theory The problem of finding the largest codes for given n and r can in principle be solved by complete enumeration We can go through all possible subsets of 0 1n and output the largest one that gives a code with distance d However this method becomes practically infeasible already for very small n It turns out that the problem for arbitrary n and d is computationally difficult NP hard Starting from very moderate values of n and d the maximum code sizes are not exactly known except for few lucky cases Tightening the known upper and lower bounds on maximum sizes of errorcorrecting codes is the topic of ongoing research in coding theory In this section we present a technique for proving upper bounds based on linear programming When this technique was introduced by Philippe Delsarte in 1973 it provided upper bounds of unprecedented quality Special cases For all n we have An 1 2n because any code has dis tance 1 The case d 2 is slightly more interesting By choosing C as the set of all words of even weight we see that An 2 2n1 But we actually have An 2 2n1 since it is easy to show by induction that every code with more than 2n1 words contains two words of Hamming distance 1 Given the simplicity of the cases d 1 2 it may come as a surprise that already for d 3 little is known This is the setup for errorcorrecting codes with one error allowed Exact values of An 3 have been determined only up to n 16 for n 17 for example the known bounds are 5312 A17 3 6552 The spherepacking bound For any n and d a simple upper bound on An d can be obtained by a volume argument Let us motivate this with a reallife analogy The local grocery is exhibiting a large glass box filled with peas and the person to make the most accurate estimate of the number of peas in the box wins a prize Without any counting you can conclude that the number of peas is bounded above by the volume of the box divided by the volume of a single pea assuming that all the peas have the same volume The same kind of argument can be used for the number An d where we may assume in our application that d 2r 1 is odd Let us fix any code C of distance d Now we think of the set 0 1n as the glass box and of the C Hamming balls Bw r w in 0 1n dHw w r w in C as the peas Since the code has distance 2r 1 all these Hamming balls are disjoint and correspond in our analogy to peas Consequently their number cannot be larger than the total number of words the volume of the box divided by the number of words in a single Hamming ball the volume of a pea The number of words at Hamming distance exactly i from w is n choose i This implies Bwr sum from i0 to r n choose i and the following upper bound on An 2r 1 is obtained 842 Lemma Spherepacking bound For all n and r An 2r 1 floor of 2n sum from i0 to r n choose i For example the spherepacking bound gives A7 3 16 and so the Hamming code in our initial example is optimal and A17 3 floor of 131072 18 7281 In the following theorem which is the main result of this section an upper bound on An d is expressed as an optimum of a certain linear program 843 Theorem The Delsarte bound For integers n i t with 0 i t n let us put Ktn i sum from j0 to minit 1j i choose j n i choose t j The paper also proves that it is NPhard to approximate the optimum makespan with a factor less than 32 Aiming at simplicity we have presented a somewhat inefficient version of the algorithm A considerably faster version with the same approximation guarantee can be obtained if we do not minimize the function T mapsto tT T but instead we only look for the smallest T with tT T Such a value of T still guarantees tT T 2topt but it can be found by binary search over the sorted sequence of the dij In this way it suffices to solve LPRT for Olog mn values of T instead of mn values as in our presentation See the paper quoted above for details 84 Upper Bounds for Codes Errorcorrecting codes Let us consider a DVD player that has a remote control unit with 16 keys Whenever one of the keys is pressed the unit needs to communicate this event to the player A natural option would be to send a 4bit sequence Since there are 24 16 different 4bit sequences referred to as words in this context a 4bit sequence is an economical way of communicating one of 16 possibilities However let us suppose that the transmission of bits from the remote control to the player is not quite reliable and that each of the transmitted bits can be received incorrectly with some small probability say 0005 Then we expect that about 2 of the commands are received incorrectly which can be regarded as a rather serious flaw of the device One possibility of improvement is to triple each of the four transmitted bits That is instead of a 4bit word abcd the unit sends the 12bit word aaabbbcccbdd Now a transmission error in a single bit can be recognized and corrected For example if the sequence 111001000111 is received and if we assume that at most one bit was received erroneously it is clear that 111000000111 must have been sent Thus the original 4bit sequence was 1001 Of course it might be that actually two or more bits are wrong and then the original sequence is not reconstructed correctly but this has much lower probability Namely if we assume that the errors in different bits are independent and occur with probability 0005 which may or may not be realistic depending on the technical specifications then the probability of two or more errors in a 12bit sequence is approximately 016 This is a significant improvement in reliability However the price to pay is transmitting three times as many bits which presumably exhausts the battery of the remote control much faster A significantly better solution to this problem was discovered by Richard Hamming in the 1950s obviously not in the context of DVD players In n i0 K t ni x i 0 In the next lemma I 12 n is a set of indices and we write d I H ww for the number of indices i I with w i w i thus the components outside I are ignored 845 Lemma Let I 12 n be a set of indices and let C 01 n Then the number of pairs ww C 2 with d I H ww even is at least as large as the number of pairs ww C 2 with d I H ww odd In probabilistic terms if we choose ww C independently at random then the probability that they differ in an even number of positions from I is at least as large as the probability that they differ in an odd number of positions from I Proof Let us write w I i I w i 1 and let us set E w C w I is even O w C w I is odd From the equation d I H ww w w I we see that if d I H ww is even then w I and w I have the same parity and so w and w are both in E or both in O On the other hand for d I H ww odd one of w w lies in E and the other one in O So the assertion of the lemma is equivalent to E 2 O 2 2 E O which follows by expanding E O 2 0 846 Corollary For every C 01 n and every v 01 n we have wwC 2 1 ww T v 0 Proof This is just another way of writing the statement of Lemma 845 Indeed if we set I i v i 1 then w w T v d I H ww and hence the sum in the corollary is exactly the number of pairs ww with d I H ww even minus the number of pairs with d I H ww odd The corollary also has a quick algebraic proof which some readers may prefer It suffices to note that w w T v has the same parity as w w T v addition modulo 2 was replaced by ordinary addition of integers and so wwC 2 1 wwT v wwC 2 1 wwT v wwC 2 1 w T v 1 wT v wC 1 w T v 2 0 Proof of Proposition 844 To prove the tth inequality in the proposition ie n i0 K t ni x i 0 we sum the inequality in Corollary 846 over all v 01 n of weight t Interchanging the summation order we obtain 0 wwC 2 v01 n vt 1 ww T v To understand this last expression let us fix u w w and write S u v01 n vt 1 u T v In this sum the v with u T v j are counted with sign 1 j How many v of weight t and with u T v j are there Let i u d H ww be the number of 1s in u In order to form such a v we need to put j ones in positions where u has 1s and t j ones in positions where u has 0s Hence the number of these v is i j n i t j and S u min it j0 1 j i j n i t j which we recognize as K t ni So we have 0 wwC 2 K t nd H ww and it remains to note that the number of times K t ni appears in this sum is C x i This finishes the proof of Proposition 844 and thus also of Theorem 843 A small strengthening of the Delsarte bound We have seen that Theorem 843 yields A 173 6553 We show how the inequalities in the theorem can be slightly strengthened using a parity argument which leads to the best known upper bound A 173 6552 Similar tricks can improve the Delsarte bound in some other cases as well but the improvements are usually minor For contradiction let us suppose that n 17 and there is a code C 01 n of distance 3 with C 6553 The size of C is odd and we note that for every code of odd size and every t the last inequality in the proof of Corollary 846 can be strengthened to wC 1 w T v 2 1 since an odd number of values from 11 cannot sum to zero If we propagate this improvement through the proof of Proposition 844 we arrive at the following inequality for the x i n i0 K t ni x i n tC Since in our particular case we suppose C 6553 we can replace the constraints n i0 K t ni x i 0 t 12 n in the linear program in Theorem 843 by n i0 K t ni x i n t6553 and the x i defined by our C remain a feasible solution However the optimum of this modified linear program is only 6552 35 which contradicts the assumption C 6553 This proves A 173 6552 The paper M R Best A E Brouwer F J MacWilliams A M Odlyzko and N J A Sloane Bounds for binary codes of length less than 25 IEEE Trans Inform Theory 24 1978 pages 8193 describes this particular strengthening of the Delsarte bound and some similar approaches A continually updated table of the best known bounds for A nd for small n and d is maintained by Andries Brouwer at httpwwwwintuenlaebcodesbinary1html The Delsarte bound explained The result in Theorem 843 goes back to the thesis of Philippe Delsarte P Delsarte An algebraic approach to the association schemes of coding theory Philips Res Repts Suppl 10 1973 Here we sketch Delsartes original proof At a comparable level of detail his proof is more involved than the ad hoc proof above from the paper by Best et al On the other hand Delsartes proof is more systematic and even more importantly it can be extended to prove a stronger result which we mention below For i 0 n let M i be the 2 n 2 n matrix defined by¹ M i vw 1 if d H vw i 0 otherwise The set of matrices of the form n i0 y i M i y 0 y n ℝ is known to be closed under addition and scalar multiplication this is clear and under matrix multiplication this has to be shown A set ¹ We assume that rows and columns are indexed by the words from 01 n 168 8 More Applications the multiplication by A above is not necessarily an equivalent transformation and so it may give rise to spurious solutions However we seek a solution x with the extra property suppx r where we introduce the notation suppx i 1 2 n xi 0 As we will see under suitable conditions relating n m r and A such a sparse solution of Ax b turns out to be unique and thus it has to be the desired error vector and it can be computed efficiently by linear programming Let us summarize the resulting problem once again Sparse solution of underdetermined system of linear equations Given an mn matrix A with m n a vector b Rm and an integer r find an x Rn such that Ax b and suppx r 810 if one exists The coding problem above is only one among several important practical problems leading to the computation of sparse solutions of underdetermined systems We will mention other applications at the end of this section From now on we call any x satisfying 810 a sparse solution to Ax b Warn ing This shouldnt be confused with solutions of sparse systems of equations which is an even more popular topic in numerical mathematics and scientific computing A linear algebra view There is a simple necessary and sufficient condition guaranteeing that there is at most one sparse solution of Ax b 851 Observation With n m r fixed the following two conditions on an mn matrix A are equivalent i The system Ax b has at most one sparse solution x for every b ii Every 2r or fewer columns of A are linearly independent Proof To prove the more interesting implication iii let us assume that x and x are two different sparse solutions of Ax b Then y x x 0 has at most 2r nonzero components and satisfies Ay AxAx 0 and hence it defines a linear dependence of at most 2r columns of A To prove iii we essentially reverse the above argument Supposing that there exists nonzero y Rn with Ay 0 and suppy 2r we write y x x where both x and x have at most r nonzero components For example x may agree with y in the first suppy2 nonzero components and have 0s elsewhere and x x y has the remaining at most r nonzero components of y with opposite sign We set b Ax so that x is a sparse 170 8 More Applications By a trick we have learned in Section 24 this problem can be reformulated as a linear program Minimize u1 u2 un subject to Ax b u x u x u Rn u 0 BP To check the equivalence of BP and BP we just note that in an optimal solution of BP we have ui xi for every i The basis pursuit approach to finding a sparse solution of Ax b thus consists in computing an optimal solution x of BP by linear programming and hoping that with some luck this x might also be the sparse solution or at least close to it At first sight it is not clear why basis pursuit should have any chance of finding a sparse solution After all the desired sparse solution might have a few huge components while x a minimizer of the ℓ1norm might have all components nonzero but tiny Surprisingly experiments have revealed that basis pursuit actually per forms excellently and it usually finds the sparse solution exactly even in con ditions that dont look very favorable Later these findings were confirmed by rather precise theoretical results Here we state the following particular case of such results 852 Theorem Guaranteed success of basis pursuit Let m 075n and let A be a random mn matrix where each entry is drawn from the standard normal distribution N0 1 and the entries are mutually indepen dent5 Then with probability at least 1 ecm where c 0 is a positive constant the matrix A has the following property If b Rm is such that the system Ax b has a solution x with at most r 008n nonzero components then x is a unique optimal solution of BP For brevity we call a matrix A with the property as in the theorem BP exact more precisely we should say BPexact for r where r specifies the maximum number of nonzero components For a BPexact matrix A we can 5 We recall that the distribution N0 1 has density given by the Gaussian bell curve 1 2π ex22 How can we generate a random number with this distribution This is implemented in many software packages and methods for doing it can be found for instance in D Knuth The Art of Computer Programming Vol 2 Seminumerical Al gorithms AddisonWesley Reading Massachusetts 1973 172 8 More Applications Geometric meaning of BPexactness Known proofs of Theorem 852 or similar results use a good deal of geometric probability and highdimensional geometry knowledge which we want neither to assume nor to introduce in this book We thus have to omit a proof Instead we present an appealing geometric characterization of BPexact matrices which is a starting point of existing proofs For its statement we need to recall the crosspolytope a convex polytope already mentioned in Section 43 We will denote the ndimensional crosspoly tope by Bn 1 which should suggest that it is the unit ball of the ℓ1norm Bn 1 x Rn x1 1 854 Lemma Reformulation of BPexactness Let A be an mn ma trix m n let r m and let L x Rn Ax 0 be the kernel null space of A Then A is BPexact for r if and only if the following holds For every z Rn with z1 1 i e z is a boundary point of the crosspolytope and suppz r the affine subspace Lz intersects the crosspolytope only at z that is L z Bn 1 z Let us discuss an example with n 3 m 2 and r 1 about the only sensible setting for which one can draw a picture If the considered 23 matrix A has full rank which we may assume then the kernel L is a onedimensional linear subspace of R3 that is a line passing through the origin The points z coming into consideration have at most r 1 nonzero coordinate and they lie on the boundary of the regular octahedron B3 1 and hence they are precisely the 6 vertices of B3 1 x1 x2 x3 L The line L through the origin is drawn thick and the condition in the lemma says that each of the 6 translates of L to the vertices should only touch the 85 Sparse Solutions of Linear Systems 173 crosspolytope Another way of visualizing this is to look at the projection of B3 1 to the plane orthogonal to L Each of the translates of L is projected to a point and the crosspolytope is projected to a convex polygon The condition then means that all the 6 vertices should appear on the boundary in the projection as in the left picture below while the right picture corresponds to a bad L the condition is violated at the two vertices marked by dots that lie inside the projection In general of course L is not a line but a kdimensional linear subspace of Rn and the considered points z are not only vertices of Bn 1 but they can lie in all r1 dimensional faces of Bn 1 Indeed we note that the points z on the surface of Bn 1 with at most r nonzero components are exactly the points of the union of all r1dimensional faces omitting the easy proof of this fact but look at least at the case n 3 r 2 Proof of Lemma 854 First we assume that A is BPexact we consider a point z with z1 1 and suppz r and we set b Az Then the system Ax b has a sparse solution namely z and hence z has to be the unique point among all solutions of Ax b that minimize the ℓ1norm Noting that the set of all solutions of Ax b is exactly the affine subspace L z we get that z is the only point in L z with ℓ1norm at most 1 That is L z Bn 1 z as claimed Conversely we assume that L satisfies the condition in the lemma and we consider b Rm Let us suppose that the system Ax b has a solution x with at most r nonzero components If x 0 then b 0 and clearly 0 is also the only optimum of BP For x 0 we set z x x1 Then z1 1 and suppz r and so by the assumption z is the only point in L z of ℓ1norm at most 1 By rescaling we get that x is the only point in L x of ℓ1norm at most x1 and since L x is the set of all solutions of Ax b we get that A is BPexact Intuition for BPexactness We dont have means for proving Theorem 852 but now using the lemma just proved we can at least try to convey some intuition as to why a claim like Theorem 852 is plausible and what kind of calculations are needed to prove it The kernel L of a random matrix A defines a random kdimensional subspace6 of Rn where k nm For proving Theorem 852 we need 6 The question What is a random kdimensional subspace is a subtle one For us the simplest way out is to define a random kdimensional subspace as the 85 Sparse Solutions of Linear Systems 175 Here we conclude this very rough outline of the argument with a few words on why one should expect pn k r to be very small for k and r much smaller than n Roughly speaking this is because for n large the ndimensional crosspolytope is a very lean and spiky body and the cones CF are very narrow for lowdimensional faces F Hence a random subspace L of not too large dimension is very likely to avoid CF As a very simplified example of this phenomenon we may consider k r 1 Then F is a vertex and CF is easily described As a manageable exercise the reader may try to estimate the fraction of the unit sphere centered at 0 that is covered by CF this quantity is exactly half of the probability pn 1 1 that a random line through 0 intersects CF nontrivially References Basis pursuit was introduced in S Chen D L Donoho and M A Saunders Atomic decom position by basis pursuit SIAM J Scientific Computing 20 11999 3361 A classical approach to finding a good solution of an underdeter mined system Ax b would be to minimize x2 rather than x1 a least squares or generalized inverse method which typically yields a solution with many nonzero components and is much less suc cessful in applications such as the decoding problem Basis pursuit by minimizing the ℓ1norm instead yields a basic solution with only a few nonzero components Interestingly several groups of researchers independently arrived at the concept of BPexactness and obtained the following general version of Theorem 852 For every constant α 0 1 there exists β βα 0 such that a random αnn matrix A is BPexact for r βn with probability exponentially close to 1 Combined results of two of these groups can be found in E J Candes M Rudelson T Tao and R Vershynin Error correction via linear programming Proc 46th IEEE Sym posium on Foundations of Computer Science FOCS 2005 pages 295308 A third independent proof was given by Donoho Later and by yet an other method he obtained the strongest known quantitative bounds including those in Theorem 852 the previously mentioned proofs yield a much smaller constant for α 075 than 008 Among his several papers on the subject we cite D Donoho Highdimensional centrally symmetric polytopes with neighborliness proportional to dimension Discrete and Computational Geometry 352006 617652 176 8 More Applications where he establishes a connection to the classical theory of convex polytopes using a result in the spirit of Lemma 854 and the remarks following it but more elaborate Through this connection he obtained an interesting upper bound on βα For example in the setting of Theorem 852 there exists no 075nn BPexact matrix at all with r 025n assuming n large Additional upper bounds essentially showing the existence results for BPmatrices in the above papers to be asymptotically optimal were proved by N Linial and I Novik How neighborly can a centrally sym metric polytope be Discr Comput Geom in press We remark that our notation is a compromise among the notations of the papers quoted above and doesnt follow any of them exactly and that the term BPexact is ours More applications of sparse solutions of underdetermined systems The problem of computing a sparse solution of a system of linear equations arises in signal processing The signal considered may be a recording of a sound a measurement of seismic waves a picture taken by a digital camera or any of a number of other things A clas sical method of analyzing signals is Fourier analysis which from a linearalgebraic point of view means expressing a given periodic func tion in a basis consisting of the functions 1 cos x sin x cos 2x sin 2x cos 3x sin 3x the closely related cosine transform is used in the JPEG encoding of digital pictures These functions are linearly inde pendent and so the expression Fourier series is unique In the more recent wavelet analysis7 one typically has a larger collection of basic functions the wavelets which can be of various kinds depending on the particular application They are no longer linearly independent and hence there are many different ways of representing a given signal as a linear combination of wavelets So one looks for a representa tion satisfying some additional criteria and sparsity small number of nonzero coefficients is a very natural criterion It leads to an economic compressed representation and sometimes it may also help in ana lyzing or filtering the signal For example let us imagine that there is a smooth signal that has a nice representation by sine and cosine func tions and then an impulsive noise made of spike functions is added to it We let the basic functions be sines and cosines and suitable spike functions and by computing a sparse representation in such a basis we can often isolate the noise component very well something that the classical Fourier analysis cannot do Thus we naturally arrive at computing sparse solutions of underdetermined linear systems Another source is computer tomography CT where one has an unknown vector x each xi is the density of some small area of tissue 7 Indeed the newer picture encoding standard JPEG 2000 employs wavelets 86 Transversals of dIntervals 177 say and the CT scanner measures various linear combinations of the xi corresponding to rays through the tissue in various directions Sometimes there are reasons to expect that only a small number of the pixels will have values xi different from the background level and when we want to reconstruct x from the scan we again ask for a sparse solution of a linear system More realistically although less intuitively we dont expect a small number of nonzero pixels but rather a small number of significantly nonzero coefficients in a suitable wavelet representation 86 Transversals of dIntervals This section describes an application of the duality theorem in discrete geom etry and combinatorics We begin with a particular geometric result Then we discuss concepts appearing in the proof in a more general context Hellys and Gallais theorems First let I I1 I2 In be a family of closed intervals on the real line such that every two of the intervals intersect It is easily seen that there exists a point common to all of the intervals in I Indeed the rightmost among the left endpoints of the Ii is such a point In more detail writing Ii ai bi and setting a maxa1 an we necessarily have ai a bi for all i since ai a is immediate from the definition of a and if we had a bi for some i then Ii ai bi would be disjoint from the interval beginning with a The statement just proved is a special onedimensional case of a beau tiful and important theorem of Helly If C1 C2 Cn are convex sets in Rd such that any at most d 1 of them have a point in common then there is a point common to all of the Ci We will not prove this result it is mentioned as a background against which the forthcoming results can be better appre ciated It is easily seen that in general we cannot replace d 1 by any smaller number in Hellys theorem For example in the plane the assumption of Hellys theorem requires every three of the sets to have a common point and pairwise intersections are not enough To see this we consider n lines in general position They are convex sets every two of them intersect but no three have a common point Let us now consider planar convex sets of a special kind namely cir cular disks One can easily draw three disks such that every two inter sect but no point is common to all three However there is a theorem Gallais for pairwise intersecting disks in the spirit of Hellys theorem If D D1 D2 Dn is a family of disks in the plane in which every two disks intersect then there exist 4 points such that each Di contains at least 178 8 More Applications one of them With the best possible constant 4 this is a quite difficult the orem but it is not too hard to prove a similar theorem with 4 replaced by some large constant The reader is invited to solve this as a puzzle A set of points as in the theorem that intersects every member of D is called a transversal of D sometimes one also speaks of piercing or stab bing D by a small number of points Thus pairwise intersecting disks in the plane always have a 4point transversal and Hellys theorem asserts that d 1wise intersecting convex sets in Rd have a onepoint transversal What conditions on a family of sets guarantee that it has a small transversal This fairly general question subsumes many interesting particu lar problems and it has been the subject of much research Here we consider a onedimensional situation At the beginning of the section we dealt with a family of intervals and now we admit intervals with some bounded number of holes Transversals for pairwise intersecting dintervals For an integer d 1 a dinterval is defined as the union of d closed intervals on the real line The following picture shows three pairwise intersecting 2intervals drawn solid dashed and dashdotted respectively with no point common to all three Thus we cannot expect a onepoint transversal for pairwise intersecting d intervals But the following theorem shows the existence of a transversal whose size depends only on d 861 Theorem Let J be a finite family of dintervals such that J1 J2 for every J1 J2 J Then J has a transversal of size 2d2 that is there exist 2d2 points such that each dinterval of J contains at least one of them At first sight it is not obvious that there is any bound at all for the size of the transversal that depends only on d This was first proved in 1970 with a bound exponential in d in A Gyarfas J Lehel A Hellytype problem in trees in Combi natorial Theory and its Applications P Erdos A Renyi and VT Sos editors NorthHolland Amsterdam 1970 pages 571584 The best bound known at present is d2 and it has been established using algebraic topology in T Kaiser Transversals of dintervals Discrete Comput Geom 181997 195203 We are going to prove a bound that is worse by a factor of 2 but the proof presented here is much simpler It comes from 86 Transversals of dIntervals 183 and then one shows that τJ d τ J This proof scheme turned out to be very powerful and it works in many other cases as well Fractional concepts Besides the fractional matching and transver sal numbers many other fractional quantities appear in combina torics and combinatorial optimization The general recipe is to take some useful integervalued parameter Q of a graph say reformulate its definition as an integer program and introduce a fractional Q as the optimum value of a suitable LP relaxation of the integer program In many cases such a fractional Q is useful for studying or approxi mating the original Q The book E R Scheinerman and D H Ullman Fractional Graph The ory John Wiley Sons New York 1997 nicely presents such developments Let us conclude this section by quoting an example from that book We consider five committees numbered 12 5 such that 1 and 2 have a common member and so have 2 and 3 3 and 4 4 and 5 and 5 and 1 while any other pair of committees is disjoint A onehour meeting should be scheduled for each committee and meetings of committees with a common member must not overlap What is the length of the shortest time interval in which all the five meetings can be scheduled It seems that a 3hour schedule like the one below should be opti mal 1200 1300 1400 1500 1 2 3 4 5 However if one of the committees is willing to break its meeting into two halfhour parts then a shorter schedule is possible 1200 1300 1400 1430 1 2 3 4 5 3 The first schedule corresponds to a proper coloring of the conflict graph 184 8 More Applications 1 2 3 4 5 by three colors while the second schedule corresponds to a fractional coloring of the same graph with value 25 87 Smallest Balls and Convex Programming The smallest ball problem We are given points p1 pn Rd and we want to find a ball of the smallest radius that contains all the points8 This looks similar to some of the geometric optimization problems that we have addressed in Chapter 2 such as the problem of placing a largest possible disk inside a convex polygon For the smallest ball problem however there is no simple trick that lets us write the problem as a linear program We will see that it can be formulated as a convex quadratic program which is in many respects the next best thing to a linear program There are efficient solvers for convex quadratic programs based on interior point methods or on simplextype methods and so this formulation can be used for computing a smallest enclosing ball in practice We will also derive from this formulation that the smallest enclosing ball always exists and it is determined uniquely This is intuitively very plausible think of a shrinking rubber ball In the course of the proof we will establish 8 In the plane this is sometimes referred to as the smallest bomb problem but we prefer not to elaborate this association into a reallife story 188 8 More Applications Since y is a feasible solution of the dual linear program we have yT aj cj for all j and 813 implies yT A cT x bT y cT x 0 So we have fxjyT aj cjyT aj 0 with equality whenever x j 0 Therefore we have found the desired multipliers y The fact that there is dualization for everyone implies KarushKuhn Tucker conditions for everyone We encourage the reader to work out the details and we mention only one special case here A feasible solution x of the convex programming problem minimize fx subject to Ax b is optimal if and only if there exists a vector y such that fx yT A 0T In this special case the components of y are called Lagrange multipliers and can be obtained from x through Gaussian elimination also see Sec tion 72 where the method of Lagrange multipliers is briefly described in a more general setting If in addition f is a quadratic function its gradient is linear so the minimization problem itself finding an optimal x with a matching y can be solved through Gaussian elimination For example the problem of fitting a line by the method of least squares mentioned in Sec tion 24 is of this easy type because its bivariate quadratic objective function 21 is convex Smallest enclosing ball as a convex program In order to show that the smallest enclosing ball of a point set can be extracted from the solution of a suitable convex quadratic program we use the KarushKuhnTucker conditions and the following geometric fact which is interesting by itself 873 Lemma Let S s1 sk Rd be a set of points on the boundary of a ball B with center s Rd Then the following two statements are equivalent i B is the unique smallest enclosing ball of S ii For every u Rd there is an index j 1 2 k such that uT sj s 0 It is a simple exercise to show that ii can be reexpressed as fol lows There is no hyperplane that strictly separates S from s From the Farkas lemma Section 64 one can in turn derive the follow ing equivalent formulation The point s is in the convex hull of the 87 Smallest Balls and Convex Programming 189 points S We thus have a simple geometric condition that character izes smallest enclosing balls in terms of their boundary points From the geometric intuition in the plane this is quite plausible If s is in the convex hull of S then s cannot be moved without making the distance to at least one point larger But if s is separated from S by a hyperplane then moving s toward this hyperplane results in an enclosing ball of smaller radius The direction u of movement satisfies uT sj s 0 for all j s1 s1 s2 s2 s3 s3 u s s s Proof We start by analyzing the distance between a point sj S and a potential ball center s s Let r be the radius of the ball B Given s s we can uniquely write it in the form s s tu where u is a vector of length 1 and t 0 For j 1 2 k we get sj sT sj s sj s tuT sj s tu sj sT sj s t2uT u 2tuT sj s r2 t2 2tuTsj s Given α R and sufficiently small t 0 we have t2 2tα 0 if and only if α 0 This shows that for sufficiently small t sj sT sj s r2 uT sj s 0 814 where the implication holds for all t 0 This equivalence implies the two directions of the lemma For iii we argue as follows Since s is the unique point with distance at most r from all points in S we know that for every u with u 1 and for all t 0 the point s s tu has distance more than r to one of the points in S By the implication of 814 there is some j with uT sj s 0 To show iii let us consider any point s of the form s tu s Since there is an index j with uT sj s 0 implication of 814 shows 9 Software and Further Reading LPsolvers The most famous and expensive software package for solv ing linear programs and integer programs is called CPLEX Freely available codes with similar functionality although not quite as strong as CPLEX are lp solve GLPK and CLP The website wwwneosmcsanlgovneos contains a guide to many other optimization software systems and it also provides an overview of web solvers to which one can send an input of an optimization problem and with a bit of luck be returned an optimum The computational geometry algorithms library CGAL wwwcgalorg contains software for solving linear and convex quadratic programs using exact rational arithmetic We refer to the website of this book httpwww infethzchpersonalgaertnerlpbook for further information Books The web bookstore Amazon offers more books with linear pro gramming in the title than with astrology and so it is clear that we can mention only a very narrow selection from the literature A reasonably recent accessible and quite comprehensive but not exactly cheap textbook of linear programming is D Bertsimas and J Tsitsiklis Introduction to Linear Optimization Athena Scientific Belmont Massachusetts 1997 Both linear and integer programming are treated on an excellent theoretical level in A Schrijver Theory of Linear and Integer Programming Wiley Interscience New York 1986 The book V Chvatal Linear Programming W H Freeman New York 1983 was considered one of the best textbooks in its time and it is still used widely And those liking classical sources may appreciate G B Dantzig Linear Programming and Extensions Princeton Uni versity Press Princeton 1963 Appendix Linear Algebra Here we summarize facts and concepts from linear algebra used in this book This part is not meant to be a textbook introduction to the subject It is mainly intended for the reader who has some knowledge of the area but may have forgotten the exact definitions or may know them in a slightly different form The number of introductory textbooks is vast in order to cite at least one we mention G Strang Introduction to Linear Algebra 3rd edition Wellesley Cambridge Press Wellesley Massachusetts 2003 Vectors We work exclusively with vectors in Rn and so for us a vector is an ordered ntuple v v1 v2 vn Rn of real numbers We denote vectors by boldface letters and vi denotes the ith component of v For u v Rn we define the sum componentwise u v u1 v1 u2 v2 un vn The multiplication of v Rn by a real number t is also given componentwise by tv tv1 tv2 tvn We use the notation 0 for the zero vector with all components 0 and 1 de notes a vector with all components equal to 1 A linear subspace or a vector subspace of Rn is a set V Rn that contains 0 and is closed under addition and multiplication by a real number that is if u v V and t R we have u v V and tv V For example the linear subspaces of R3 are 0 lines passing through 0 planes passing through 0 and R3 itself An affine subspace is any set of the form u V u v v V Rn where V is a linear subspace of Rn and u Rn Geometrically it is a linear subspace translated by some fixed vector The affine subspaces of R3 are all the onepoint subsets all the lines all the planes and R3 A linear combination of vectors v1 v2 vm Rn is any vector of the form t1v1 t2v2 tmvm where t1 tm are real numbers Vectors v1 v2 vm Rn are linearly independent if the only linear combina tion of v1 v2 vm equal to 0 has t1 t2 tm 0 Equivalently 196 Appendix Linear Algebra linear independence means that none of the vi can be expressed as a linear combination of the remaining ones The linear span of a set X Rn is the smallest with respect to in clusion linear subspace of Rn that contains X Explicitly it is the set of all linear combinations of finitely many vectors from X The linear span of any set even the empty one always contains 0 which is formally considered as a linear combination of the empty set of vectors A basis of a linear subspace V Rn is a linearly independent set of vectors from V whose linear span is V The standard basis of Rn consists of the vectors e1 e2 en where ei is the vector with 1 at the ith position and 0s elsewhere All bases of a given linear subspace V have the same number of vectors and this number dimV is the dimension of V In particular each basis of Rn has n vectors Matrices A matrix is a rectangular table of numbers real numbers in our case An mn matrix has m rows and n columns If a matrix is called A then its entry in the ith row and jth column is usually denoted by aij So for example a 3 4 matrix A has the general form a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 A matrix is denoted by writing large parentheses to enclose the table of elements A submatrix of a matrix A is any matrix that can be obtained from A by deleting some rows and some columns including A itself where we delete nothing A matrix is multiplied by a number t by multiplying each entry by t Two mn matrices A and B are added by adding the corresponding entries That is if we set C A B we have cij aij bij for i 1 2 m and j 1 2 n Matrix multiplication is more complicated A product AB where A and B are matrices is defined only if the number of columns of A is the same as the number of rows of B If A is an mn matrix and B is an np matrix then the product C AB is an m p matrix given by cij ai1b1j ai2b2j ainbnj Pictorially n m n p row i column j A B C cij m p Appendix Linear Algebra 197 Matrix multiplication is associative meaning that ABC ABC when ever at least one of the two sides is defined and this is quite often used in proofs We also recall the wellknown fact that in general matrix multiplica tion is not commutative ie typically AB BA We also multiply matrices and vectors In such context a vector x Rn is usually considered as an n1 matrix thus in the matrix form a vector x x1 x2 xn should be written as a column x1 x2 xn So if A is an mn matrix and x Rn is a vector then the product Ax is a vector in Rm This is used in the matrix notation Ax b for a system of linear equations Here A is a given mn matrix b Rm is a given vector and x x1 x2 xn is a vector of n unknowns So Ax b is a shorthand for the system of m equations a11x1 a12x2 a1nxn b1 a21x1 a22x2 a2nxn b2 am1x1 am2x2 amnxn bm If A is an mn matrix then AT denotes the nm matrix having the element aji in the ith row and jth column The matrix AT is called the transpose of the matrix A For transposing the matrix product we have the formula ABT BT AT A square matrix is an nn matrix ie one with the same number of rows and columns A diagonal matrix is a square matrix D with dij 0 for all i j that is it may have nonzero elements only on the diagonal The nn identity matrix In has 1s on the diagonal and 0s elsewhere For any mn matrix A we have ImA AIn A Rank inverse and linear systems Each row of an mn matrix A can also be regarded as a vector in Rn The linear span of all rows of A is a subspace of Rn called the row space of A and similarly we define the column space of A as the linear subspace of Rm spanned by the columns of A An important result of linear algebra tells us that for every matrix A the row space and the column space have the same dimension and this dimension is called the rank of A In particular an mn matrix A has rank m if and only if the rows of A are linearly independent which can happen only if m n An nn matrix is called nonsingular if it has rank n otherwise it is singular 198 Appendix Linear Algebra Let A be a square matrix A matrix B is called an inverse of A if AB In An inverse to A exists if and only if A is nonsingular In this case it is determined uniquely it is denoted by A1 and it is inverse from both sides that is AA1 A1A In For the inverse of a product we have AB1 B1A1 Let us again consider a system Ax b of m linear equations with n unknowns For b 0 the set of all solutions is easily seen to be a linear subspace of Rn Its dimension equals n minus the rank of A In particular for m n ie A is a square matrix the system Ax 0 has x 0 as the only solution if and only if A is nonsingular For b 0 the system Ax b may or may not have a solution If it has at least one solution then the solution set is an affine subspace of Rn again of dimension n minus the rank of A If m n and A is nonsingular then Ax b always has exactly one solution Row operations and Gaussian elimination By an elementary row operation we mean one of the following two operations on a given matrix A a Multiplying all entries in some row of A by a nonzero real number t b Replacing the ith row of A by the sum of the ith row and jth row for some i j Gaussian elimination is a systematic procedure that given an input ma trix A performs a sequence of elementary row operations on it so that it is converted to a row echelon form This means that the resulting matrix looks as in the following picture 1 r m 1 n 0 the dots denote nonzero elements and the white region contains only 0s In words there exists an integer r such that the rows 1 through r are nonzero the remaining rows are all zero and if ji minj aij 0 then j1 j2 jr The rank of a matrix in this form is clearly r and since elementary row operations preserve rank this procedure can be used to compute the rank It is also commonly used for finding all solutions to the system Ax b In this case the matrix A b the matrix A with b appended as the last column is converted to row echelon form and from this the solution set can be computed easily A variant of Gaussian elimination can also be used to compute the inverse matrix essentially by solving the n linear systems Ax ei i 1 2 n 200 Appendix Linear Algebra The orthogonal complement of a set X Rn is the set X y Rn xT y 0 for all x X It is always a linear subspace of Rn If V is a linear subspace of Rn then dimV dimV n If V is a linear subspace of Rn the orthogonal projection on V is a mapping Rn V that assigns to each x Rn a vector y V such that xy is orthogonal to all vectors in V It can be shown that for every x such a y is unique and it is also the point of V that minimizes the Euclidean distance x y among all points of V Glossary or What Was Neglected The theory of linear programming is huge and many interesting things were not addressed in this book In the glossary we briefly explain some of the common terms found elsewhere This should help the reader in reading more advanced sources say research papers or in a conversation at a linear pro gramming banquet Our coverage is by no means complete and to some extent it is also guided by personal taste Bounds for the variables Linear programs occurring in practice often prescribe upper and lower bounds for some of the variables For sim plicity let us assume that we are dealing with a linear program of the form maximize cT x subject to Ax b 0 x u where uj R for j 1 2 n We have seen in Section 41 how to convert this linear program into equa tional form and after this conversion we can solve it using the simplex method The problem here is that the conversion generates many new variables and constraints making the problem substantially larger and the computations within a single pivot step more expensive A better way is to handle the constraints x u implicitly in the simplex method This is easy During the selection of the leaving variable we also have to consider the possibility that the entering variable reaches its upper bound before any basic variable reaches one of its bounds In this situation the entering variable is simply set to its upper bound and the basis B does not change at all In general every nonbasic variable attains one of its bounds and entering it means to let its value change in the direction of the other bound As before if x ℓj 0 in the current basic feasible solution then xℓj is a candidate for the entering variable if and only if the corresponding coefficient rj is positive On the other hand if x ℓj uj then xℓj is a candidate if and only if rj 0 This scheme easily generalizes to bounds of the form ℓ x u Branch and bound is a general method for solving optimization problems by a clever enumeration of possible solutions Here we describe how it 202 Glossary works for integer programs Chapter 3 We suppose that the integer program is maximize cT x subject to Ax b x Zn ILP and for simplicity that the polyhedron P x Rn Ax b is bounded As a first step we solve the LP relaxation obtained by dropping the integrality constraints x Zn If the LP relaxation is infeasible then ILP is infeasible as well and we stop Otherwise the LP relaxation has an optimal solution x If x Zn we have already solved the program ILP and if x Zn we chose some nonintegral component x j and split the integer program into two integer programs The first one ILP is obtained from ILP by adding the constraint xj x j while the second one ILP arises from ILP by adding the constraint xj x j 1 This is the branching step Since every feasible solution of ILP is feasible for exactly one of the programs ILP and ILP we have only to solve the latter two inte ger programs to find an optimal solution of ILP if it exists The only refers to the fact that both of these programs have strictly smaller fea sible regions than the original one To solve ILP say we proceed in the same fashion solve the LP relaxation split into two subproblems if necessary In this way we explore an implicitly given binary tree whose nodes cor respond to subproblems of the original integer program The exploration stops at a node when the LP relaxation becomes infeasible or has an in tegral optimal solution From the assumption that P is bounded it is not hard to show that this eventually happens along every exploration path Therefore the process terminates and computes an optimal solution of the root program ILP if there is one So far this approach is missing the advertised cleverness but here it comes Whenever an integral solution of a subproblem has been discov ered its value of the objective function is a lower bound for the optimal value of ILP During exploration we maintain the highest such lower bound z If at some subsequent node the optimal solution of the LP relaxation has objective function value at most z we can conclude that the subtree below that node need not be explored No integral solution obtained from it can beat our current best solution with value z This is the bounding step In the worst case the bounding step may not prune any subtree but in many practical applications it results in enormous savings and allows for solving large integer programs The effectiveness of branch and bound also depends on the choice of the nonintegral components x j at the nodes and on the order in which nodes are explored More generally the axis parallel split according to the constraints xj x j and xj x j 1 may be replaced by a split along an arbitrary direction branching on hyperplanes Glossary 203 Branch and cut This method combines the branch and bound technique with cutting planes In many cases branch and bound does not work well since the plain LP relaxation does not provide a useful upper bound for the optimal solution of an integer program we have seen a simple example for that in Section 34 Before branching one therefore tries to get a better upper bound by adding cutting planes to the LP relaxation Sometimes these are tailormade for the problem at hand and cut off many more fractional solutions than generalpurpose cutting planes like Gomory cuts The branching starts only after all cutting planes from a predetermined set of candidate inequalities have been added In the subproblems one proceeds similarly Chvatal rank We consider a polyhedron P Rn If aT x b with a Zn and b Q is some inequality satisfied by all x P then the inequality aT x b is satisfied by all integral points in P This inequality is a Chvatal Gomory cut of P Let us define P as the set of points that satisfy all ChvatalGomory cuts of P It follows that P PI where PI is the convex hull of all integral points in P the integer hull of P Let us now assume that P is a rational polyhedron ie one that can be described by an inequality system Ax b with all components of A and b rational In this case one can show that P is again a polyhedron Moreover it is easy to see that P P We thus have P P 0 P 1 P 2 PI where P k P k1 for k 0 It is known that there is a finite number t such that P t PI the smallest such t is the Chvatal rank of the rational polyhedron P It is a measure of nonintegrality of P Such a number t even exists if P is nonrational but bounded Column generation We have pointed out in Section 71 that even linear programs with a very large possibly infinite number of constraints can efficiently be solved by the ellipsoid method or interior point methods if a separation oracle is available Given a candidate solution s such an oracle either certifies that s is feasible or it returns a violated constraint Let us consider the dual scenarioa linear program with a very large number of variables Even if this linear program is too large to be ex plicitly stored we may be able to solve it using the simplex method The crucial observation is that in every pivot step the simplex method needs just one entering variable and the tableau column associated with it to proceed see Section 56 If we have an improvement oracle that returns such a variable or certifies that the current basic feasible solu tion is optimal we can still use the simplex method to solve the linear 204 Glossary program This method is called delayed column generation It can be used in applications as in Section 27 paper cutting for example where the number of variables can quickly become very large In fact since an improvement oracle can be interpreted as a separation oracle for the dual linear program we can also use the ellipsoid method to solve the linear program using only a polynomial number of calls to the improvement oracle Complementary slackness The following corollary of the duality theo rem is known as the theorem of complementary slackness Let x x 1 x n be a feasible solution of the linear pro gram maximize cT x subject to Ax b and x 0 P and let y y 1 y m be a feasible solution of the dual linear program minimize bT y subject to AT y c and y 0 D Then the following two statements are equivalent i x is optimal for P and y is optimal for D ii For all i 1 2 m x satisfies the ith constraint of P with equality or y i 0 similarly for all j 1 2 n y satisfies the jth constraint of D with equality or x j 0 In words statement ii means the following If we pair up each primal or dual nonnegativity constraint with its corresponding dual or primal inequality then at least one of the constraints in each pair is satisfied with equality has no slack at x or y Complementary slackness is often encountered as a combinatorial proof of optimality as opposed to the numerical proof obtained by comparing the values of the objective functions We have come across complemen tary slackness at various places without calling it so in the physical in terpretation of duality Section 62 in connection with the primaldual central path Section 72 and in the KarushKuhnTucker conditions Section 87 Crisscross method This is yet another method for solving linear pro grams in equational form Like the simplex method and the dual simplex method it goes through a sequence of simplex tableaus xB p Q xN z z0 rT xN The crisscross method only requires the set B to be a basis meaning that the submatrix AB is regular This property allows us to form the tableau but it guarantees neither p 0 as in the simplex method nor r 0 as in the dual simplex method Glossary 205 The crisscross method has two types of pivot steps A primal pivot step starts by choosing β 1 2 n m and α 1 2 m such that rβ 0 and qαβ 0 This is as in the simplex method except that α does not have to satisfy a minimum ratio condition as in equation 53 of Section 56 The absence of this condition is explained by the fact that the crisscross method does not have to maintain feasibility p 0 In a dual pivot step α 1 2 m and β 1 2 n m are chosen such that pα 0 and qαβ 0 This is as in the dual simplex method again without any minimumratio condition since the criss cross method does not maintain dual feasibility r 0 either In both situations B Bkαℓβ is the next basis where qαβ 0 guarantees that B is indeed a basis see the proof of Lemma 561 If the linear program has an optimal solution it can be shown that a basis either induces an optimal basic feasible solution p 0 r 0 or allows for a primal or a dual pivot step This means that the crisscross method cannot get stuck It can cycle though even if the linear program is not degenerate However there is a pivot rule for the crisscross method that does not cycle This rule is reminiscent of Blands rule for the simplex method and goes as follows We let α be the smallest value of α in any possible primal pivot step and β the smallest value of β in any possible dual pivot step If α β then we perform a primal pivot step with α α and β as small as possible and otherwise we perform a dual pivot step with β β and α as small as possible Despite its simplicity and the fact that the computations can start from any basis there is no need for an auxiliary problem the crisscross method is not used in practice since it is slow The feature that makes it theoretically appealing is that the computations depend only on the signs of the involved numbers but not on their magnitudes This allows the method to be generalized to situations beyond linear programming where no concept of magnitude exists such as linear programming over oriented matroids Cutting plane Given a system S of linear inequalities another linear in equality is called a cutting plane for S if it is satisfied by all integral solutions of S but it is violated by some nonintegral solution of S A cutting plane for S exists if and only if the polyhedron corresponding to the S is nonintegral The cutting plane cuts off a part of this poly hedron that is free of integer points DantzigWolfe decomposition Often the constraint matrix A of a lin ear program has a special structure that can be exploited in order to solve the problem faster The DantzigWolfe decomposition is a particu lar technique to do this within the framework of the simplex method Given a linear program maximize cT x subject to Ax b and x 0 Glossary 209 dual simplex method is not really a new tool for solving linear programs However it is useful in practice since it works with the original linear program as opposed to the dual linear program and most simplex based linear programming solvers offer an option of using it For certain classes of problems it can be substantially faster than the primal simplex method The dual simplex method is also useful after adding cutting planes since the search for the new solution can start from the old one which remains dual feasible Gomory cut We consider a linear program maximize cT x subject to Ax b x 0 in equational form with c Zn along with an optimal basic feasible solution x A Gomory cut is a specific cutting plane for this linear pro gram derived from a feasible basis B associated with x Given B we can rewrite the equations Ax b and z cT x in tableau form xB p Q xN z z0 rT xN see Section 55 It is easy to show that if for some i 1 2 m the value pi x i is nonintegral then the inequality xi pi qixN is a cutting plane where qi is the ith row of Q This cutting plane is called a Gomory cut A special Gomory cut is obtained if z0 Z z cT x z0 rT xN We may now add the Gomory cuts as new inequalities to the linear program and recompute the optimal solution Let us call this a round The remarkable property of Gomory cuts is that we get an integral optimal solution after a finite number of rounds assuming rational data Since we never cut off integral solutions of the original linear program this final solution is an optimal solution of the integer program maximize cT x subject to Ax b x 0 x Zn The method of Gomory cuts is a simple but inefficient algorithm for solving integer programs Phases I and II Traditionally the computation of the simplex method is subdivided into phase I and phase II In phase I the auxiliary linear program for finding an initial feasible basis is solved and phase II solves the original linear program starting from this initial feasible basis Sec tion 56 210 Glossary Phase transition Even if phase I of the simplex method reveals that the original problem is feasible it may happen that some of the auxiliary variables are still basic with value 0 in the final feasible basis of the auxiliary problem We have indicated in Section 56 that it is nevertheless easy to get a feasible basis for the original problem In the simplex method this part can elegantly be implemented by pivot steps of a special kind in which the possible leftovers of the auxiliary variables xn1 through xnm are forced to leave the basis These pivot steps are said to form the phase transition Under our assumption that the matrix A has rank m the phase transition is guaranteed to succeed But even the case in which A has rank smaller than m can be handled during these special pivot steps and this is an important aspect in practice since it allows the simplex method to work with any input If A does not have full rank the consequence is that some of the auxiliary variables xn1 through xnm cannot be forced to leave the basis But any such variable can be shown to correspond to a redundant constraint one that may be deleted from the linear program without changing the set of feasible solutions Moreover after deleting all the redundant constraints discovered in this way the resulting subsystem Ax b has a matrix A of full rank A basis of the linear program with respect to this new system is obtained from the current basis by removing the auxiliary variables that could not be forced to leave Pivot column The column of the simplex tableau corresponding to the entering variable is called a pivot column Depending on the context this may or may not include the coefficient of the vector r corresponding to the entering variable Pivot element This is the element of the simplex tableau in the pivot row and pivot column Pivot row The row of the simplex tableau corresponding to the leaving variable is called the pivot row Depending on the context this may or may not include the coefficient of the vector p that holds the value of the leaving variable Pricing The process of selecting the entering variable during a pivot step of the simplex method is sometimes referred to as pricing We say that a nonbasic variable xℓj is priced when its coefficient rj in the last row of the simplex tableau is computed Section 56 Primaldual method This is a method for solving a linear program by iteratively improving a feasible solution of the dual linear program Let us start with a linear program in equational form Maximize cT x subject to Ax b and x 0 P We assume that b 0 and that P is bounded The dual linear program is minimize bT y subject to AT y c D 212 Glossary This is not a priori clear Even if the dual objective function strictly decreases in every iteration the improvement could become smaller and smaller in fact there are examples in which this happens and in which sets J reappear infinitely often But this can be overcome by choosing for example the lexicographically largest optimal solution y of RD There are two aspects that make the primaldual method attractive First of all the restricted primal RP and its dual RD do not involve the original objective function vector c This means that the primal dual method reduces an optimization problem to a sequence of decision problems These decision problems are in many cases much easier to solve and do not require linear programming techniques For instance if P is the linear program that computes a maximumweight matching in a bipartite graph Section 32 then RP computes a matching of maximum cardinality in the subgraph with edges indexed by J Moreover if RP has been solved once its solution in the next iteration is easy to get through an augmenting path These insights form the basis of the Hungarian method for maximumweight matchings which can be interpreted as a sophisticated implementation of the primaldual method A second aspect appears in connection with approximation algorithms for NPhard problems Starting from the LP relaxation P of a suitable integer program we may search for a vector x that satisfies a condition weaker than G2 but in return it allows us to construct a related inte gral feasible solution x to P If we cannot find x we know as before that the dual solution y can be improved If we find x then y may not yet be optimal but we may still be able to argue that x is a reasonably good solution of the integer program A number of approximation algo rithms based on the primaldual method are described in M X Goemans and D P Williamson The primaldual method for approximation algorithms and its application in network de sign problems in Approximation Algorithms D Hochbaum ed itor PWS Publishing Company Boston 1997 pages 144191 Ratio test The process of selecting the leaving variable during a pivot step of the simplex method is called the ratio test The leaving variable xkα is such that it has minimum ratio pα qαβ Section 56 Reduced costs The vector r in a simplex tableau rj is the reduced cost of variable xℓj Section 55 Sensitivity analysis The components of the matrix A and the vectors b and c that define a linear program are often results of measurements or estimates Then an important question is how stable the optimal solution x is Ideally if the components vary by small amounts then x or at least cT x varies by a small amount as well In this case small errors in collecting data can safely be ignored It may well be that small changes in some of the components have a more drastic impact on x Sensitivity analysis tries to assess how the Glossary 215 We now have an exponentially large but explicit inequality description of the matching polytope of a general graph Having such a description for an integral polyhedron is an indication that it might be possible to optimize a linear function over the polyhedron in polynomial time In the case of matchings in general graphs this can indeed be done A maximumweight matching in a general graph can be found in polynomial time but the known algorithms are much more involved than those for the bipartite case discussed in Section 32 218 Index central path 118 central path method 116 Chvatal rank 203 ChvatalGomory cut 203 code errorcorrecting 156 167 Colonel Blotto game 131 column generation 203 column space 197 combination convex 49 linear 195 combinatorics polyhedral 31 complement orthogonal 200 complementarity strict 128 complementary slackness 204 complexity smoothed 78 conditions KarushKuhnTucker 187872 conjecture Hirsch 78 constraint 3 nonnegativity 41 convex combination 49 convex function 49 convex hull 49 convex polyhedron 8 51 convex polytope 51 convex program 185 quadratic 184 convex programming 185 convex set 48 cover vertex 37 Cramers rule 199 crisscross method 204 crosspolytope 52 cube 51 KleeMinty 76 130 cut maximum 114 cutting planes 31 35 205 cycling 63 72 dinterval 178 dH Hamming distance 157 Dantzig George 9 13 degenerate linear program 62 Delsarte bound 159843 determinant 199 Devex pivot rule 206 diagonal matrix 197 dietary problem 12 dilemma prisoners 142 dimension 196 dimension of a convex polyhedron 51 distance Euclidean 199 Hamming 157 of a code 158841 point from a line 24 distribution normal 170852 dual linear program 82 dual simplex method 105 208 duality theorem 9 83 and physics 86 proof 94 127 weak 83611 ei ith vector of the standard basis 196 Easter Bunny 138 edge of a polyhedron 54 elimination FourierMotzkin 100 Gaussian 198 polynomiality 107 ellipsoid 108 ellipsoid method 106 embedding selfdual 125 entering variable 67 epigraph of a function 49 equational form 41 equilibrium Nash mixed 135811 pure 135 errorcorrecting code 156 167 Euclidean distance 199 Euclidean norm 199 expansion of a determinant 199 extremal point 55 face of a polyhedron 53 Index 219 Farkas lemma 89104 feasible basis 46 feasible solution 3 basic 44 54442 fitting a line 19 flow 11 14 form equational 41 standard 41 FourierMotzkin elimination 100 fractional matching 182 fractional transversal 182 function convex 49 objective 3 strictly convex 49 Gallais theorem 178 game battle of the sexes 141 bimatrix 140 Colonel Blotto 131 rockpaperscissors 133 modified 138 value 136 zerosum 131 Gaussian elimination 198 polynomiality 107 GoldmanTucker system 126 Gomory cut 209 greedy algorithm 32 38 halfspace 50 Halls theorem 144821 Hamming distance 157 Hamming Richard 156 Hellys theorem 177 Hirsch conjecture 78 hull convex 49 hyperplane 50 In identity matrix 197 identity matrix 197 incidence matrix of a graph 146 independence linear 195 independent set 39 infeasible linear program 4 63 integer hull 203 integer programming 29 mixed 31 integral polyhedron 148 integrality total dual 213 interior point method 115 dinterval 178 inverse matrix 198 JPEG encoding 176 Konigs theorem 144822 Kantorovich Leonid 9 Karmakars algorithm 115 KarushKuhnTucker conditions 187872 Khachyian Leonid 106 KleeMinty cube 76 130 Koopmans Tjalling 9 ℓ1norm 169 Lagrange multipliers 119 188 Laplace expansion 199 largest coefficient pivot rule 71 largest increase pivot rule 71 leaving variable 67 LemkeHowson algorithm 140 lemma Farkas 89104 lexicographic ordering 73 lexicographic rule 72 line fitting 19 linear combination 195 linear inequality system 7 101 109 linear program 1 degenerate 62 dual 82 infeasible 4 63 unbounded 4 61 linear programming meaning 1 linear span 196 linear subspace 195 linearly independent 195 LP relaxation 33 220 Index makespan 149 map affine 108 marriage theorem 144821 matching fractional 182 maximumweight 31 147 perfect 33 matching number 181 matching polytope 213 matrix 196 diagonal 197 identity 197 inverse 198 multiplication 196 nonsingular 197 orthogonal 199 payoff 132 skewsymmetric 127 totally unimodular 144 transposed 197 matrix algebra 164 matroid oriented 205 maxflowmincut theorem 148 maximum cut 114 maximumweight matching 31 147 method affine scaling 116 central path 116 dual simplex 208 ellipsoid 106 interior point 115 least squares 20 potential reduction 116 primaldual 105 210 simplex 8 57 dual 105 efficiency 76 history 8 13 revised 70 minimax theorem 148 for zerosum games 136813 minimum bounding sphere 184 minimum vertex cover 37 mixed integer programming 31 mixed strategy 134 multiplier KarushKuhnTucker 187872 Lagrange 119 188 Nash equilibrium mixed 135811 pure 135 network flow 11 14 Neumann John von 140 nonbasic variable 45 nonsingular matrix 197 norm Euclidean 199 ℓ1 169 normal distribution 170852 NPhard problem 9 number matching 181 transversal 181 objective function 3 octant 42 operation row 198 optimal solution 3 existence 46 optimum 3 oracle separation 113 ordering lexicographic 73 oriented matroid 205 orthant 42 orthogonal complement 200 orthogonal matrix 199 orthogonal projection 200 orthogonal vectors 199 path central 118 payoff matrix 132 134 perfect matching 33 phase transition 210 phases I and II 209 physics and duality 86 pivot column 210 pivot element 210 Index 221 pivot row 210 pivot rule 71 Blands 72 Devex 206 largest coefficient 71 largest increase 71 randomized 72 77 steepest edge 71 implementation 206 pivot step 59 67 planes cutting 35 point extremal 55 polyhedral combinatorics 31 polyhedron convex 51 integer hull 203 integral 148 rational 203 polynomial algorithm 107 polytope Birkhoff 52 convex 51 matching 213 potential reduction method 116 pricing 210 primaldual method 105 210 prisoners dilemma 142 problem dietary 12 NPhard 9 product scalar 199 program convex 185 quadratic 184 integer 30 linear 1 programming convex 185 integer 29 linear meaning 1 semidefinite 114 projection orthogonal 200 pure strategy 134 quadrant 42 randomized pivot rule 72 77 rank 197 ratio test 212 rational polyhedron 203 reduced cost 212 relaxation LP 33 revised simplex method 70 rockpaperscissors game 133 modified 138 row operation elementary 198 rule Blands 72 finiteness 73581 Cramers 199 Devex 206 largest coefficient 71 largest increase 71 pivot 71 randomized 72 77 steepest edge 71 implementation 206 Santa Claus 138 scalar product 199 scheduling 148 selfdual embedding 125 semidefinite programming 114 sensitivity analysis 212 separation of points 21 separation oracle 113 set convex 48 independent 39 signal processing 176 simplex 52 simplex method 8 57 and a simplex 52 dual 105 208 efficiency 76 history 8 13 lexicographic rule 72 origin of the term 52 phase transition 210 phases I and II 209 222 Index pricing 210 ratio test 212 reduced costs 212 revised 70 simplex tableau 58 65 pivot column 210 pivot element 210 pivot row 210 reduced cost vector 212 size bit 106 skewsymmetric matrix 127 slack variable 42 slackness complementary 204 smallest enclosing ball 184 smoothed complexity 78 solution feasible 3 basic 44 54442 optimal 3 existence 46 sparse 168 space column 197 span linear 196 sparse solution 168 spherepacking bound 159842 square matrix 197 standard basis 196 standard form 41 steepest edge pivot rule 71 implementation 206 step pivot 59 67 strategy mixed 134 strategy pure 134 strict complementarity 128 strictly convex function 49 strongly polynomial algorithm 108 submatrix 196 subspace affine 195 linear 195 symmetric zerosum game 140 system GoldmanTucker 126 linear underdetermined 167 of linear inequalities 7 101 109 tableau simplex 58 65 theorem complementary slackness 204 duality 9 83 proof 94 127 weak 83611 Gallais 178 Halls 144821 Hellys 177 Konigs 144822 marriage 144821 minimax 148 for zerosum games 136813 TIT FOR TAT 142 total dual integrality 213 totally unimodular matrix 144 transposed matrix 197 transversal 178 fractional 182 transversal number 181 unbounded linear program 4 61 underdetermined linear system 167 value of a game 136 variable basic 45 entering 67 leaving 67 slack 42 vector 195 vertex of a polyhedron 53 vertex cover 37 minimum 37 weak duality theorem 83611 weight 157 Z integers 30 zerosum game 131 symmetric 140 Universitext Aguilar M Gitler S Prieto C Al gebraic Topology from a Homotopical Viewpoint Aksoy A Khamsi M A Methods in Fixed Point Theory Alevras D Padberg M W Linear Op timization and Extensions Andersson M Topics in Complex Analysis Aoki M State Space Modeling of Time Series Arnold V I Lectures on Partial Differ ential Equations Arnold V I Cooke R Ordinary Dif ferential Equations Audin M Geometry Aupetit B A Primer on Spectral The ory Bachem A Kern W Linear Program ming Duality Bachmann G Narici L Beckenstein E Fourier and Wavelet Analysis Badescu L Algebraic Surfaces Balakrishnan R Ranganathan K A Textbook of Graph Theory Balser W Formal Power Series and Linear Systems of Meromorphic Ordi nary Differential Equations Bapat RB Linear Algebra and Linear Models Benedetti R Petronio C Lectures on Hyperbolic Geometry Benth F E Option Theory with Sto chastic Analysis Berberian S K Fundamentals of Real Analysis Berger M Geometry I and II Bliedtner J Hansen W Potential Theory Blowey J F Coleman J P Craig A W Eds Theory and Numerics of Differential Equations Blyth T S Lattices and Ordered Alge braic Structures Borger E Gradel E Gurevich Y The Classical Decision Problem Bottcher A Silbermann B Introduc tion to Large Truncated Toeplitz Matri ces Boltyanski V Martini H Soltan P S Excursions into Combinatorial Geom etry Boltyanskii V G Efremovich V A Intuitive Combinatorial Topology Bonnans J F Gilbert J C Lemarechal C Sagastizabal C A Numerical Optimization Booss B Bleecker D D Topology and Analysis Borkar V S Probability Theory Brunt B van The Calculus of Varia tions Carleson L Gamelin T W Complex Dynamics Cecil T E Lie Sphere Geometry With Applications of Submanifolds Chae S B Lebesgue Integration Chandrasekharan K Classical Fourier Transform Charlap L S Bieberbach Groups and Flat Manifolds Chern S Complex Manifolds without Potential Theory Chorin A J Marsden J E Mathe matical Introduction to Fluid Mechanics Cohn H A Classical Invitation to Al gebraic Numbers and Class Fields Curtis M L Abstract Linear Algebra Curtis M L Matrix Groups Cyganowski S Kloeden P Ombach J From Elementary Probability to Stochastic Differential Equations with MAPLE Da Prato G An Introduction to Infi nite Dimensional Analysis Dalen D van Logic and Structure Das A The Special Theory of Relativ ity A Mathematical Exposition Debarre O HigherDimensional Alge braic Geometry Deitmar A A First Course in Har monic Analysis Demazure M Bifurcations and Cata strophes Devlin K J Fundamentals of Contem porary Set Theory DiBenedetto E Degenerate Parabolic Equations Diener F Diener MEds Nonstan dard Analysis in Practice Dimca A Sheaves in Topology Dimca A Singularities and Topology of Hypersurfaces DoCarmo M P Differential Forms and Applications Duistermaat J J Kolk J A C Lie Groups Dumortier Qualitative Theory of Pla nar Differential Systems Edwards R E A Formal Background to Higher Mathematics Ia and Ib Edwards R E A Formal Background to Higher Mathematics IIa and IIb Emery M Stochastic Calculus in Man ifolds Endler O Valuation Theory Engel KJ Nagel R A Short Course on Operator Semigroups Erez B Galois Modules in Arithmetic Everest G Ward T Heights of Poly nomials and Entropy in Algebraic Dy namics Farenick D R Algebras of Linear Transformations Foulds L R Graph Theory Applica tions Franke J Hrdle W Hafner C M Statistics of Financial Markets An In troduction Frauenthal J C Mathematical Model ing in Epidemiology Friedman R Algebraic Surfaces and Holomorphic Vector Bundles Fuks D B Rokhlin V A Beginners Course in Topology Fuhrmann P A A Polynomial Ap proach to Linear Algebra Gallot S Hulin D Lafontaine J Riemannian Geometry Gardiner C F A First Course in Group Theory Garding L Tambour T Algebra for Computer Science Godbillon C Dynamical Systems on Surfaces Godement R Analysis I and II Goldblatt R Orthogonality and Space time Geometry Gouvˆea F Q pAdic Numbers Gross M et al CalabiYau Manifolds and Related Geometries Gustafson K E Rao D K M Nu merical Range The Field of Values of Linear Operators and Matrices Gustafson S J Sigal I M Mathemat ical Concepts of Quantum Mechanics Hahn A J Quadratic Algebras Clif ford Algebras and Arithmetic Witt Groups Hajek P Havranek T Mechanizing Hypothesis Formation Heinonen J Lectures on Analysis on Metric Spaces Hlawka E Schoißengeier J Taschner R Geometric and Analytic Number Theory Holmgren R A A First Course in Dis crete Dynamical Systems Howe R Tan E Ch NonAbelian Harmonic Analysis Howes N R Modern Analysis and Topology Hsieh PF Sibuya Y Eds Basic Theory of Ordinary Differential Equa tions Humi M Miller W Second Course in Ordinary Differential Equations for Sci entists and Engineers Hurwitz A Kritikos N Lectures on Number Theory Huybrechts D Complex Geometry An Introduction Isaev A Introduction to Mathematical Methods in Bioinformatics Istas J Mathematical Modeling for the Life Sciences Iversen B Cohomology of Sheaves Jacod J Protter P Probability Es sentials Jennings G A Modern Geometry with Applications Jones A Morris S A Pearson K R Abstract Algebra and Famous Impossi bilities Jost J Compact Riemann Surfaces Jost J Dynamical Systems Examples of Complex Behaviour Jost J Postmodern Analysis Jost J Riemannian Geometry and Geometric Analysis Kac V Cheung P Quantum Calculus Kannan R Krueger C K Advanced Analysis on the Real Line Kelly P Matthews G The Non Euclidean Hyperbolic Plane Kempf G Complex Abelian Varieties and Theta Functions Kitchens B P Symbolic Dynamics Kloeden P Ombach J Cyganowski S From Elementary Probability to Stochastic Differential Equations with MAPLE Kloeden P E Platen E Schurz H Numerical Solution of SDE Through Computer Experiments Kostrikin A I Introduction to Algebra Krasnoselskii M A Pokrovskii A V Systems with Hysteresis Kurzweil H Stellmacher B The The ory of Finite Groups An Introduction Lang S Introduction to Differentiable Manifolds Luecking D H Rubel L A Complex Analysis A Functional Analysis Ap proach Ma ZhiMing Roeckner M Introduc tion to the Theory of nonsymmetric Dirichlet Forms Mac Lane S Moerdijk I Sheaves in Geometry and Logic Marcus D A Number Fields Martinez A An Introduction to Semi classical and Microlocal Analysis Matouˇsek J Using the BorsukUlam Theorem Matsuki K Introduction to the Mori Program Mazzola G Milmeister G Weiss man J Comprehensive Mathematics for Computer Scientists 1 Mazzola G Milmeister G Weiss man J Comprehensive Mathematics for Computer Scientists 2 Mc Carthy P J Introduction to Arith metical Functions McCrimmon K A Taste of Jordan Al gebras Meyer R M Essential Mathematics for Applied Field MeyerNieberg P Banach Lattices Mikosch T NonLife Insurance Math ematics Mines R Richman F Ruitenburg W A Course in Constructive Algebra Moise E E Introductory Problem Courses in Analysis and Topology MontesinosAmilibia J M Classical Tessellations and Three Manifolds Morris P Introduction to Game The ory Nikulin V V Shafarevich I R Geometries and Groups Oden J J Reddy J N Variational Methods in Theoretical Mechanics Øksendal B Stochastic Differential Equations Øksendal B Sulem A Applied Sto chastic Control of Jump Diffusions Poizat B A Course in Model Theory Polster B A Geometrical Picture Book Porter J R Woods R G Extensions and Absolutes of Hausdorff Spaces Radjavi H Rosenthal P Simultane ous Triangularization Ramsay A Richtmeyer R D Intro duction to Hyperbolic Geometry Rautenberg W A Concise Introduc tion to Mathematical Logic Rees E G Notes on Geometry Reisel R B Elementary Theory of Metric Spaces Rey W J J Introduction to Robust and QuasiRobust Statistical Methods Ribenboim P Classical Theory of Al gebraic Numbers Rickart C E Natural Function Alge bras Rotman J J Galois Theory Rubel L A Entire and Meromorphic Functions RuizTolosa J R Castillo E From Vectors to Tensors Runde V A Taste of Topology Rybakowski K P The Homotopy In dex and Partial Differential Equations Sagan H SpaceFilling Curves Samelson H Notes on Lie Algebras Sauvigny F Partial Differential Equa tions I Sauvigny F Partial Differential Equa tions II Schiff J L Normal Families Sengupta J K Optimal Decisions un der Uncertainty Seroul R Programming for Mathe maticians Seydel R Tools for Computational Fi nance Shafarevich I R Discourses on Alge bra Shapiro J H Composition Operators and Classical Function Theory Simonnet M Measures and Probabili ties Smith K E Kahanpaa L Kekalainen P Traves W An Invitation to Alge braic Geometry Smith K T Power Series from a Com putational Point of View Smorynski C Logical Number Theory I An Introduction Stichtenoth H Algebraic Function Fields and Codes Stillwell J Geometry of Surfaces Stroock D W An Introduction to the Theory of Large Deviations Sunder V S An Invitation to von Neu mann Algebras Tamme G Introduction to Etale Co homology Tondeur P Foliations on Riemannian Manifolds Toth G Finite Mobius Groups Mini mal Immersions of Spheres and Moduli Verhulst F Nonlinear Differential Equations and Dynamical Systems Wong M W Weyl Transforms XamboDescamps S Block ErrorCor recting Codes Zaanen AC Continuity Integration and Fourier Theory Zhang F Matrix Theory Zong C Sphere Packings Zong C Strange Phenomena in Con vex and Discrete Geometry Zorich V A Mathematical Analysis I Zorich V A Mathematical Analysis II