queation 1,4,5,6
philxu1988
Probability and Dynamic Programming
Review for
Dynamic Pricing and Revenue Management
Probability Review • Poisson: http://en.wikipedia.org/wiki/Poisson_random_variable
– Read Sections: 1, 2, 3,4
• Compound Poisson: http://en.wikipedia.org/wiki/Compound_Poisson_distribution
• Poisson Process: http://en.wikipedia.org/wiki/Poisson_process • Compound Poisson Process:
http://en.wikipedia.org/wiki/Compound_Poisson_process
• Normal: http://en.wikipedia.org/wiki/Normal_random_variable – Read Sections 1,2.1,2.2,3,1,3.2,3.3
• Brownian Motion: http://en.wikipedia.org/wiki/Wiener_process – Read sections 1, 2.
DP AS AN OPTIMIZATION METHODOLOGY
• Basic optimization problem
min u∈U
g(u)
where u is the optimization/decision variable, g(u) is the cost function, and U is the constraint set
• Categories of problems: − Discrete (U is finite) or continuous − Linear (g is linear and U is polyhedral) or
nonlinear − Stochastic or deterministic: In stochastic prob-
lems the cost involves a stochastic parameter w, which is averaged, i.e., it has the form
g(u) = Ew { G(u, w)
} where w is a random parameter.
• DP can deal with complex stochastic problems where information about w becomes available in stages, and the decisions are also made in stages and make use of this information.
{ } ∑ { } ∑ ( )
INVENTORY CONTROL EXAMPLE
Inventory S y s t e m
Stock Ordered at Period k
Stock at Period k Stock at Period k + 1
Demand at Period k
xk
wk
xk + 1 = xk + uk - wk
u k Co s t o f P e rio d k
c uk + r (xk + uk - wk)
• Discrete-time system
xk+1 = fk(xk, uk, wk) = xk + uk − wk • Cost function that is additive over time
N−1 E gN (xN ) + gk(xk, uk, wk)
k=0
N−1 = E cuk + r(xk + uk − wk)
k=0
• Optimization over policies: Rules/functions uk = µk(xk) that map states to controls
BASIC STRUCTURE OF STOCHASTIC DP
• Discrete-time system
xk+1 = fk(xk, uk, wk), k = 0, 1, . . . , N − 1
− k: Discrete time − xk: State; summarizes past information that
is relevant for future optimization − uk: Control; decision to be selected at time
k from a given set − wk: Random parameter (also called distur-
bance or noise depending on the context) − N : Horizon or number of times control is
applied
• Cost function that is additive over time
E
{ gN (xN ) +
N−1∑ k=0
gk(xk, uk, wk)
}
BASIC PROBLEM
• System xk+1 = fk(xk, uk, wk), k = 0, . . . , N −1 • Control constraints uk ∈ U (xk) • Probability distribution Pk(· | xk, uk) of wk • Policies π = {µ0, . . . , µN−1}, where µk maps states xk into controls uk = µk(xk) and is such that µk(xk) ∈ Uk(xk) for all xk • Expected cost of π starting at x0 is
Jπ(x0) = E
{ gN (xN ) +
N−1∑ k=0
gk(xk, µk(xk), wk)
}
• Optimal cost function
J∗(x0) = min π
Jπ(x0)
• Optimal policy π∗ is one that satisfies
Jπ∗ (x0) = J∗(x0)
PRINCIPLE OF OPTIMALITY
• Let π∗ = {µ∗0, µ∗1, . . . , µ∗N−1} be an optimal pol- icy
• Consider the “tail subproblem” whereby we are at xi at time i and wish to minimize the “cost-to- go” from time i to time N
E
{ gN (xN ) +
N−1∑ k=i
gk ( xk, µk(xk), wk
)}
and the “tail policy” {µ∗i , µ∗i+1, . . . , µ∗N−1}
0 Ni
xi Tail Subproblem
• Principle of optimality: The tail policy is opti- mal for the tail subproblem
• DP first solves ALL tail subroblems of final stage
• At the generic step, it solves ALL tail subprob- lems of a given time length, using the solution of the tail subproblems of shorter time length
DP ALGORITHM
• Start with
JN (xN ) = gN (xN ),
and go backwards using
Jk(xk) = min uk∈Uk(xk)
E wk
{ gk(xk, uk, wk)
+ Jk+1 ( fk(xk, uk, wk)
)} , k = 0, 1, . . . , N − 1.
• Then J0(x0), generated at the last step, is equal to the optimal cost J∗(x0). Also, the policy
π∗ = {µ∗0, . . . , µ∗N−1} where µ∗k(xk) minimizes in the right side above for each xk and k, is optimal.
• Justification: Proof by induction that Jk(xk) is equal to J∗k (xk), defined as the optimal cost of the tail subproblem that starts at time k at state xk.
• Note that ALL the tail subproblems are solved in addition to the original problem, and the inten- sive computational requirements.
DETERMINISTIC FINITE-STATE PROBLEM
Terminal Arcs
. . .
. . .
. . .
Initial State s
t Artificial Terminal N o d e
with Cost Equal to Terminal Cost
S t a g e 0 S t a g e 1 S t a g e 2 . . . S t a g e N - 1 S t a g e N
• States <==> Nodes • Controls <==> Arcs • Control sequences (open-loop) <==> paths from initial state to terminal states
• ak : Cost of transition from state i ∈ Sk to state ij j ∈ Sk+1 at time k (view it as “length” of the arc) • aN : Terminal cost of state i ∈ SNit • Cost of control sequence <==> Cost of the cor responding path (view it as “length” of the path)
BACKWARD AND FORWARD DP ALGORITHMS
• DP algorithm: JN (i) = aNit , i ∈ SN ,
Jk(i) = min j∈Sk+1
[ akij +Jk+1(j)
] , i ∈ Sk, k = 0, . . . , N−1.
The optimal cost is J0(s) and is equal to the length of the shortest path from s to t.
• Observation: An optimal path s → t is also an optimal path t → s in a “reverse” shortest path problem where the direction of each arc is reversed and its length is left unchanged.
• Forward DP algorithm (= backward DP algo- rithm for the reverse problem):
J̃N (j) = a0sj , j ∈ S1, J̃k(j) = min
i∈SN−k
[ aN−kij + J̃k+1(i)
] , j ∈ SN−k+1
The optimal cost is J̃0(t) = mini∈SN [ aNit + J̃1(i)
] .
• View J̃k(j) as optimal cost-to-arrive to state j from initial state s.
A NOTE ON FORWARD DP ALGORITHMS
• There is no forward DP algorithm for stochastic problems.
• Mathematically, for stochastic problems, we cannot restrict ourselves to open-loop sequences, so the shortest path viewpoint fails.
• Conceptually, in the presence of uncertainty, the concept of “optimal-cost-to-arrive” at a state xk does not make sense. The reason is that it may be impossible to guarantee (with prob. 1) that any given state can be reached.
• By contrast, even in stochastic problems, the concept of “optimal cost-to-go” from any state xk makes clear sense.
GENERIC SHORTEST PATH PROBLEMS
• {1, 2, . . . , N, t}: nodes of a graph (t: the desti- nation)
• aij : cost of moving from node i to node j • Find a shortest (minimum cost) path from each node i to node t
• Assumption: All cycles have nonnegative length. Then an optimal path need not take more than N moves
• We formulate the problem as one where we re- quire exactly N moves but allow degenerate moves from a node i to itself with cost aii = 0.
Jk(i) = optimal cost of getting from i to t in N−k moves.
J0(i): Cost of the optimal path from i to t.
• DP algorithm: Jk(i) = min
j=1,...,N
[ aij +Jk+1(j)
] , k = 0, 1, . . . , N−2,
with JN−1(i) = ait, i = 1, 2, . . . , N.
EXAMPLE
2 7 5
2 5 5
6 1
3
0 . 5 3
1
2
4
0 1 2 3 4
1
2
3
4
5
State i
S t a g e k
3 3 3 3
4 4 4 5
4 . 5 4 . 5 5 . 5 7
2 2 2 2
Destination 5
(a) (b)
JN−1(i) = ait, i = 1, 2, . . . , N,
Jk(i) = min j=1,...,N
[ aij +Jk+1(j)
] , k = 0, 1, . . . , N−2.
- Probability and Dynamic ProgrammingReview
- Probability Review