Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial Example: inventory management #795

Merged
merged 15 commits into from
Oct 25, 2024
Merged

Add tutorial Example: inventory management #795

merged 15 commits into from
Oct 25, 2024

Conversation

odow
Copy link
Owner

@odow odow commented Oct 21, 2024

Copy link

codecov bot commented Oct 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.54%. Comparing base (4091155) to head (29a8a8b).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #795   +/-   ##
=======================================
  Coverage   93.54%   93.54%           
=======================================
  Files          26       26           
  Lines        3519     3519           
=======================================
  Hits         3292     3292           
  Misses        227      227           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

α^(t - 1) * (
p * UD / 2 + (c - p) * y.in - c * x.in + (h + p) / (2 * UD) * y.in^2
)
)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernardokp I don't really understand how this formulation matches the description.

  • Where does the y.in^2 come from?
  • Is the buy decision Decision-Hazard?

Can we do something more explicit like this?

model = SDDP.LinearPolicyGraph(
    stages = T + 1,
    sense = :Min,
    lower_bound = 0.0,
    optimizer = Ipopt.Optimizer,
) do sp, t
    @variable(sp, x_inventory >= 0, SDDP.State, initial_value = x0)
    @variable(sp, x_unsatisfied_demand >= 0, SDDP.State, initial_value = 0)
    @variable(sp, u_buy >= 0)
    @variable(sp, u_sell >= 0)
    @variable(sp, w_demand == 0)
    @constraint(sp, x_inventory.out - x_inventory.in == u_buy - u_sell)
    @constraint(
        sp,
        x_unsatisfied_demand.out - x_unsatisfied_demand.in == w_demand - u_sell,
    )
    if t == 1
        @stageobjective(sp, 0)
    elseif t == T + 1
        @stageobjective(
            sp,
            α^(t - 1) * c * (x_unsatisfied_demand.in - x_inventory.in),
        )
    else
        @stageobjective(
            sp,
            α^(t - 1) * (
                c * u_buy +
                h * x_inventory.out +
                p * x_unsatisfied_demand.out,
            )
        )
        SDDP.parameterize-> JuMP.fix(w_demand, ω), sp, Ω)
    end
    return
end

Copy link

@bernardokp bernardokp Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Oscar,

thanks for adding this example.
1 - Looking at the cost expression, we have linear integrands in $y$. If the density function is constant, it will give rise to quadratic terms on $y$.
2 - Yes, it is decision-hazard.

What you wrote is correct, as it correctly captures what the loss function is. However, it adds another layer of approximation at the stage objective cost. The way I wrote makes this cost explicit and deterministic, without the need to resort to sampling.

One thing that bothers me is the plots you added to the example. I obtained completely different ones. For the finite case, the policy is to order up to 741, and then some end-of-horizon effects take place. For the infinite horizon case, the policy is a horizontal line at the number 741.

Best,
Bernardo

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then the stage objective for stage t includes the expected cost of meeting demand in stage t+1? That isn't how one would typically formulate a problem. It's cheating because you're assuming a uniform distribution and ignoring the samples. If we simulated the policy out-of-sample (with a non-uniform distribution) it wouldn't work?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. My claim is that it is not cheating because you know the distribution and you can integrate it in the present, so there is no need for samples for that. However, you cannot incorporate it explicitly into the future cost function because the functional form is too complicated. In that case, you must use samples.

In summary, you remove one layer of approximation by having an exact cost at each stage simply because you can integrate a function.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you get the discount factor of α = 0.995 from? That's an interest rate of 0.5%

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard, textbook parameter choices. Please take a look at Hillier and Lieberman, chapter 19.

@odow
Copy link
Owner Author

odow commented Oct 23, 2024

@bernardokp I think you should write the model like this. It's far more explicit and readable. I still don't really get what the x and y in your model are doing.

Since we've constructed an SAA, we shouldn't expect to get the analytical solution of 741. That's only true in the limit as the number of samples goes to infinity.

graph = SDDP.LinearGraph(2)
SDDP.add_edge(graph, 2 => 1, α)
model = SDDP.PolicyGraph(
    graph;
    sense = :Min,
    lower_bound = 0.0,
    optimizer = Gurobi.Optimizer,
) do sp, t
    @variable(sp, x_inventory >= 0, SDDP.State, initial_value = x0)
    @variable(sp, x_demand >= 0, SDDP.State, initial_value = 0)
    ## u_buy is a Decision-Hazard control variable. We decide u.out for use in
    ## the next stage
    @variable(sp, u_buy >= 0, SDDP.State, initial_value = 0)
    @variable(sp, u_sell >= 0)
    @variable(sp, w_demand == 0)
    @constraint(sp, x_inventory.out == x_inventory.in + u_buy.in - u_sell)
    @constraint(sp, x_demand.out == x_demand.in + w_demand - u_sell)
    if t == 1
        fix(u_sell, 0; force = true)
        @stageobjective(sp, c * u_buy.out)
    else
        @stageobjective(
            sp,
            c * u_buy.out + h * x_inventory.out + p * x_demand.out,
        )
        SDDP.parameterize-> JuMP.fix(w_demand, ω), sp, Ω)
    end
    return
end

SDDP.train(model; iteration_limit = 100, log_every_iteration = true)
simulations = SDDP.simulate(
    model,
    200,
    [:x_inventory, :u_buy];
    sampling_scheme = SDDP.InSampleMonteCarlo(;
        max_depth = 50,
        terminate_on_dummy_leaf = false,
    ),
);

plt = SDDP.SpaghettiPlot(simulations)
SDDP.add_spaghetti(plt) do data
    return data[:x_inventory].in
end
SDDP.add_spaghetti(plt) do data
    return data[:x_inventory].out
end
SDDP.add_spaghetti(plt) do data
    return data[:u_buy].out
end
SDDP.plot(plt)

@odow
Copy link
Owner Author

odow commented Oct 24, 2024

Okay, now I have
image

We do't get the analytic solution because of the SAA.

docs/src/tutorial/inventory.jl Outdated Show resolved Hide resolved
docs/src/tutorial/inventory.jl Outdated Show resolved Hide resolved
docs/src/tutorial/inventory.jl Outdated Show resolved Hide resolved
@bernardokp
Copy link

Fair enough, 0.995 as a discount factor can be a bit extreme. Still, sddp.jl handled it pretty well, and the final policy coincided with the theoretical optimal solution for that case. FYI, if \alpha = 0.95, the theoretical optimal policy would be y = 662.

docs/src/tutorial/inventory.jl Outdated Show resolved Hide resolved
docs/src/tutorial/inventory.jl Outdated Show resolved Hide resolved
@odow
Copy link
Owner Author

odow commented Oct 24, 2024

@bernardokp how about now?

There is ambiguity in the infinite horizon policy: does the agent get to make an initial "buy" decision before observing the first realization of demand? Or do they need to meet that demand from their initial inventory?

@odow odow merged commit b4c0a25 into master Oct 25, 2024
8 checks passed
@odow odow deleted the od/inventory branch October 25, 2024 05:58
@bernardokp
Copy link

Before, always before.

@odow
Copy link
Owner Author

odow commented Oct 25, 2024

Cool that's the same model then.

The L(y) formulation isn't any nicer, because you need to introduce new variables to represent the positive and negative components of y.

@bernardokp
Copy link

Yes, it is the same model. I don't fully agree with what you said. There are two options:

1 - Express the present cost in exact form, using quadratic programming. That is what the L(y) function does, without any need to introduce new variables. I disagree with that part, in the exact formulation, there is no need to introduce additional variables.
2 - Use samples to approximate the present cost function, add additional variables to represent excess and shortage, and then solve the problem using linear programming.

Those are the two possibilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants