-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tutorial Example: inventory management #795
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #795 +/- ##
=======================================
Coverage 93.54% 93.54%
=======================================
Files 26 26
Lines 3519 3519
=======================================
Hits 3292 3292
Misses 227 227 ☔ View full report in Codecov by Sentry. |
docs/src/tutorial/inventory.jl
Outdated
α^(t - 1) * ( | ||
p * UD / 2 + (c - p) * y.in - c * x.in + (h + p) / (2 * UD) * y.in^2 | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bernardokp I don't really understand how this formulation matches the description.
- Where does the
y.in^2
come from? - Is the buy decision Decision-Hazard?
Can we do something more explicit like this?
model = SDDP.LinearPolicyGraph(
stages = T + 1,
sense = :Min,
lower_bound = 0.0,
optimizer = Ipopt.Optimizer,
) do sp, t
@variable(sp, x_inventory >= 0, SDDP.State, initial_value = x0)
@variable(sp, x_unsatisfied_demand >= 0, SDDP.State, initial_value = 0)
@variable(sp, u_buy >= 0)
@variable(sp, u_sell >= 0)
@variable(sp, w_demand == 0)
@constraint(sp, x_inventory.out - x_inventory.in == u_buy - u_sell)
@constraint(
sp,
x_unsatisfied_demand.out - x_unsatisfied_demand.in == w_demand - u_sell,
)
if t == 1
@stageobjective(sp, 0)
elseif t == T + 1
@stageobjective(
sp,
α^(t - 1) * c * (x_unsatisfied_demand.in - x_inventory.in),
)
else
@stageobjective(
sp,
α^(t - 1) * (
c * u_buy +
h * x_inventory.out +
p * x_unsatisfied_demand.out,
)
)
SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω)
end
return
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Oscar,
thanks for adding this example.
1 - Looking at the cost expression, we have linear integrands in
2 - Yes, it is decision-hazard.
What you wrote is correct, as it correctly captures what the loss function is. However, it adds another layer of approximation at the stage objective cost. The way I wrote makes this cost explicit and deterministic, without the need to resort to sampling.
One thing that bothers me is the plots you added to the example. I obtained completely different ones. For the finite case, the policy is to order up to 741, and then some end-of-horizon effects take place. For the infinite horizon case, the policy is a horizontal line at the number 741.
Best,
Bernardo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then the stage objective for stage t includes the expected cost of meeting demand in stage t+1? That isn't how one would typically formulate a problem. It's cheating because you're assuming a uniform distribution and ignoring the samples. If we simulated the policy out-of-sample (with a non-uniform distribution) it wouldn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. My claim is that it is not cheating because you know the distribution and you can integrate it in the present, so there is no need for samples for that. However, you cannot incorporate it explicitly into the future cost function because the functional form is too complicated. In that case, you must use samples.
In summary, you remove one layer of approximation by having an exact cost at each stage simply because you can integrate a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did you get the discount factor of α = 0.995
from? That's an interest rate of 0.5%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standard, textbook parameter choices. Please take a look at Hillier and Lieberman, chapter 19.
@bernardokp I think you should write the model like this. It's far more explicit and readable. I still don't really get what the Since we've constructed an SAA, we shouldn't expect to get the analytical solution of 741. That's only true in the limit as the number of samples goes to infinity. graph = SDDP.LinearGraph(2)
SDDP.add_edge(graph, 2 => 1, α)
model = SDDP.PolicyGraph(
graph;
sense = :Min,
lower_bound = 0.0,
optimizer = Gurobi.Optimizer,
) do sp, t
@variable(sp, x_inventory >= 0, SDDP.State, initial_value = x0)
@variable(sp, x_demand >= 0, SDDP.State, initial_value = 0)
## u_buy is a Decision-Hazard control variable. We decide u.out for use in
## the next stage
@variable(sp, u_buy >= 0, SDDP.State, initial_value = 0)
@variable(sp, u_sell >= 0)
@variable(sp, w_demand == 0)
@constraint(sp, x_inventory.out == x_inventory.in + u_buy.in - u_sell)
@constraint(sp, x_demand.out == x_demand.in + w_demand - u_sell)
if t == 1
fix(u_sell, 0; force = true)
@stageobjective(sp, c * u_buy.out)
else
@stageobjective(
sp,
c * u_buy.out + h * x_inventory.out + p * x_demand.out,
)
SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω)
end
return
end
SDDP.train(model; iteration_limit = 100, log_every_iteration = true)
simulations = SDDP.simulate(
model,
200,
[:x_inventory, :u_buy];
sampling_scheme = SDDP.InSampleMonteCarlo(;
max_depth = 50,
terminate_on_dummy_leaf = false,
),
);
plt = SDDP.SpaghettiPlot(simulations)
SDDP.add_spaghetti(plt) do data
return data[:x_inventory].in
end
SDDP.add_spaghetti(plt) do data
return data[:x_inventory].out
end
SDDP.add_spaghetti(plt) do data
return data[:u_buy].out
end
SDDP.plot(plt) |
Fair enough, 0.995 as a discount factor can be a bit extreme. Still, sddp.jl handled it pretty well, and the final policy coincided with the theoretical optimal solution for that case. FYI, if \alpha = 0.95, the theoretical optimal policy would be y = 662. |
@bernardokp how about now? There is ambiguity in the infinite horizon policy: does the agent get to make an initial "buy" decision before observing the first realization of demand? Or do they need to meet that demand from their initial inventory? |
Before, always before. |
Cool that's the same model then. The |
Yes, it is the same model. I don't fully agree with what you said. There are two options: 1 - Express the present cost in exact form, using quadratic programming. That is what the L(y) function does, without any need to introduce new variables. I disagree with that part, in the exact formulation, there is no need to introduce additional variables. Those are the two possibilities. |
Preview: https://sddp.dev/previews/PR795/tutorial/inventory/