Add tutorial Example: inventory management #795

odow · 2024-10-21T02:33:29Z

Preview: https://sddp.dev/previews/PR795/tutorial/inventory/

codecov · 2024-10-21T02:41:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.54%. Comparing base (4091155) to head (29a8a8b).
Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #795   +/-   ##
=======================================
  Coverage   93.54%   93.54%           
=======================================
  Files          26       26           
  Lines        3519     3519           
=======================================
  Hits         3292     3292           
  Misses        227      227

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

odow · 2024-10-21T03:20:01Z

docs/src/tutorial/inventory.jl

+            α^(t - 1) * (
+                p * UD / 2 + (c - p) * y.in - c * x.in + (h + p) / (2 * UD) * y.in^2
+            )
+        )


@bernardokp I don't really understand how this formulation matches the description.

Where does the y.in^2 come from?

Is the buy decision Decision-Hazard?

Can we do something more explicit like this?

model = SDDP.LinearPolicyGraph( stages = T + 1, sense = :Min, lower_bound = 0.0, optimizer = Ipopt.Optimizer, ) do sp, t @variable(sp, x_inventory >= 0, SDDP.State, initial_value = x0) @variable(sp, x_unsatisfied_demand >= 0, SDDP.State, initial_value = 0) @variable(sp, u_buy >= 0) @variable(sp, u_sell >= 0) @variable(sp, w_demand == 0) @constraint(sp, x_inventory.out - x_inventory.in == u_buy - u_sell) @constraint( sp, x_unsatisfied_demand.out - x_unsatisfied_demand.in == w_demand - u_sell, ) if t == 1 @stageobjective(sp, 0) elseif t == T + 1 @stageobjective( sp, α^(t - 1) * c * (x_unsatisfied_demand.in - x_inventory.in), ) else @stageobjective( sp, α^(t - 1) * ( c * u_buy + h * x_inventory.out + p * x_unsatisfied_demand.out, ) ) SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω) end return end

Hello Oscar,

thanks for adding this example.
1 - Looking at the cost expression, we have linear integrands in $y$. If the density function is constant, it will give rise to quadratic terms on $y$.
2 - Yes, it is decision-hazard.

What you wrote is correct, as it correctly captures what the loss function is. However, it adds another layer of approximation at the stage objective cost. The way I wrote makes this cost explicit and deterministic, without the need to resort to sampling.

One thing that bothers me is the plots you added to the example. I obtained completely different ones. For the finite case, the policy is to order up to 741, and then some end-of-horizon effects take place. For the infinite horizon case, the policy is a horizontal line at the number 741.

Best,
Bernardo

But then the stage objective for stage t includes the expected cost of meeting demand in stage t+1? That isn't how one would typically formulate a problem. It's cheating because you're assuming a uniform distribution and ignoring the samples. If we simulated the policy out-of-sample (with a non-uniform distribution) it wouldn't work?

I see your point. My claim is that it is not cheating because you know the distribution and you can integrate it in the present, so there is no need for samples for that. However, you cannot incorporate it explicitly into the future cost function because the functional form is too complicated. In that case, you must use samples.

In summary, you remove one layer of approximation by having an exact cost at each stage simply because you can integrate a function.

Where did you get the discount factor of α = 0.995 from? That's an interest rate of 0.5%

Standard, textbook parameter choices. Please take a look at Hillier and Lieberman, chapter 19.

odow · 2024-10-23T02:57:56Z

@bernardokp I think you should write the model like this. It's far more explicit and readable. I still don't really get what the x and y in your model are doing.

Since we've constructed an SAA, we shouldn't expect to get the analytical solution of 741. That's only true in the limit as the number of samples goes to infinity.

graph = SDDP.LinearGraph(2)
SDDP.add_edge(graph, 2 => 1, α)
model = SDDP.PolicyGraph(
    graph;
    sense = :Min,
    lower_bound = 0.0,
    optimizer = Gurobi.Optimizer,
) do sp, t
    @variable(sp, x_inventory >= 0, SDDP.State, initial_value = x0)
    @variable(sp, x_demand >= 0, SDDP.State, initial_value = 0)
    ## u_buy is a Decision-Hazard control variable. We decide u.out for use in
    ## the next stage
    @variable(sp, u_buy >= 0, SDDP.State, initial_value = 0)
    @variable(sp, u_sell >= 0)
    @variable(sp, w_demand == 0)
    @constraint(sp, x_inventory.out == x_inventory.in + u_buy.in - u_sell)
    @constraint(sp, x_demand.out == x_demand.in + w_demand - u_sell)
    if t == 1
        fix(u_sell, 0; force = true)
        @stageobjective(sp, c * u_buy.out)
    else
        @stageobjective(
            sp,
            c * u_buy.out + h * x_inventory.out + p * x_demand.out,
        )
        SDDP.parameterize(ω -> JuMP.fix(w_demand, ω), sp, Ω)
    end
    return
end

SDDP.train(model; iteration_limit = 100, log_every_iteration = true)
simulations = SDDP.simulate(
    model,
    200,
    [:x_inventory, :u_buy];
    sampling_scheme = SDDP.InSampleMonteCarlo(;
        max_depth = 50,
        terminate_on_dummy_leaf = false,
    ),
);

plt = SDDP.SpaghettiPlot(simulations)
SDDP.add_spaghetti(plt) do data
    return data[:x_inventory].in
end
SDDP.add_spaghetti(plt) do data
    return data[:x_inventory].out
end
SDDP.add_spaghetti(plt) do data
    return data[:u_buy].out
end
SDDP.plot(plt)

odow · 2024-10-24T04:09:08Z

Okay, now I have

We do't get the analytic solution because of the SAA.

docs/src/tutorial/inventory.jl

bernardokp · 2024-10-24T06:56:25Z

Fair enough, 0.995 as a discount factor can be a bit extreme. Still, sddp.jl handled it pretty well, and the final policy coincided with the theoretical optimal solution for that case. FYI, if \alpha = 0.95, the theoretical optimal policy would be y = 662.

docs/src/tutorial/inventory.jl

odow · 2024-10-24T22:44:29Z

@bernardokp how about now?

There is ambiguity in the infinite horizon policy: does the agent get to make an initial "buy" decision before observing the first realization of demand? Or do they need to meet that demand from their initial inventory?

bernardokp · 2024-10-25T06:42:15Z

Before, always before.

odow · 2024-10-25T06:53:09Z

Cool that's the same model then.

The L(y) formulation isn't any nicer, because you need to introduce new variables to represent the positive and negative components of y.

bernardokp · 2024-10-25T07:33:23Z

Yes, it is the same model. I don't fully agree with what you said. There are two options:

1 - Express the present cost in exact form, using quadratic programming. That is what the L(y) function does, without any need to introduce new variables. I disagree with that part, in the exact formulation, there is no need to introduce additional variables.
2 - Use samples to approximate the present cost function, add additional variables to represent excess and shortage, and then solve the problem using linear programming.

Those are the two possibilities.

Add tutorial Example: inventory management

0b51f4d

odow commented Oct 21, 2024

View reviewed changes

odow and others added 7 commits October 24, 2024 11:06

Update

1ce7749

Update

cf2f386

Update

54871d2

Update inventory.jl

3a6b6c4

Update

82c48c8

Update

fbfc6ab

Update

953b4b2

odow and others added 2 commits October 24, 2024 17:53

Update

fdd67b3

Update accept.txt

c91c676

odow commented Oct 24, 2024

View reviewed changes