Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline/Online LP for CIM scenario 📚 #359

Open
riccardopoiani opened this issue Jun 14, 2021 · 28 comments
Open

Offline/Online LP for CIM scenario 📚 #359

riccardopoiani opened this issue Jun 14, 2021 · 28 comments
Labels
📚 doc Improvements or additions to documentation.

Comments

@riccardopoiani
Copy link

Description

Add Offline and Online Linear programming agent for CIM scenario

@riccardopoiani riccardopoiani added the 📚 doc Improvements or additions to documentation. label Jun 14, 2021
@Jinyu-W
Copy link
Collaborator

Jinyu-W commented Jun 23, 2021

Hi @riccardopoiani , thank you for your attention to our work. Currently, we don't have the plan to transfer all these methods used in the paper to MARO platform. If you have some questions about the implementation, we can have a discussion. Thanks again!

@Ahmedest61
Copy link

Hi @Jinyu-W, how about the methods proposed in these papers. Does the current MARO contains the implementation of those? In "Cooperative Policy Learning with Pre-trained Heterogeneous Observation Representations paper, it is written that source code is attached in 'code' folder but I couldn't find it. Can you help me, please. Thank you very much :)

@Jinyu-W
Copy link
Collaborator

Jinyu-W commented Jun 25, 2021

Hi @Jinyu-W, how about the methods proposed in these papers. Does the current MARO contains the implementation of those? In "Cooperative Policy Learning with Pre-trained Heterogeneous Observation Representations paper, it is written that source code is attached in 'code' folder but I couldn't find it. Can you help me, please. Thank you very much :)

@wesley-stone could you provide more details about the methods in this paper?

@Jinyu-W Jinyu-W closed this as completed Jun 25, 2021
@Jinyu-W Jinyu-W reopened this Jun 25, 2021
@Ahmedest61
Copy link

Hi @Jinyu-W, thank you for the response. I would be grateful to @wesley-stone for sharing code and related info. Many thanks again! :)

@riccardopoiani
Copy link
Author

riccardopoiani commented Jun 25, 2021

Hi @Jinyu-W thanks for your response and availability.
I am trying to replicate results for Offline and Online LP in different configurations.
Currently, I am able to achieve 0% and 1% shortage in toy domains with Offline and Online LP respectively.
However, when I try to run the program on global trade configurations, I obtain 4% with Offline and 33% with the Online one (which are significantly higher than the ones in the papers).

So, first of all, I have seen that there is an open pull request to adjust the topology for CIM and I was wondering if this was the cause. However, no issue describes the change made there. So, I would start from here, and then maybe we can build a discussion on how to implement correctly the method.

Also, have you got any method that is able to reach high performances with the current version of the simulator? (any sort of method)

@Jinyu-W
Copy link
Collaborator

Jinyu-W commented Jun 25, 2021

Hi @Jinyu-W thanks for your response and availability.
I am trying to replicate results for Offline and Online LP in different configurations.
Currently, I am able to achieve 0% and 1% shortage in toy domains with Offline and Online LP respectively.
However, when I try to run the program on global trade configurations, I obtain 4% with Offline and 33% with the Online one (which are significantly higher than the ones in the papers).

So, first of all, I have seen that there is an open pull request to adjust the topology for CIM and I was wondering if this was the cause. However, no issue describes the change made there. So, I would start from here, and then maybe we can build a discussion on how to implement correctly the method.

Also, have you got any method that is able to reach high performances with the current version of the simulator? (any sort of method)

Hi @riccardopoiani, I could share more information about the LP methods we used before.

  • The Online LP formulated based on the demand(orders) predicted with a simple MA model.
  • Besides the reward for the order fulfillment, we also added the reward for safety inventory in the objective function. For example, for each tick, if we have empty containers remained after fulfilling all the orders of this tick, an additional reward would be given: safety_inventory_reward_factor * min(port capacity * safety inventory upper ratio, #remaining empty containers).
  • The main difference between the offline LP and the online LP is that the actual demand(orders) is given instead of the prediction.

We may open source a version of Online LP in the future, but it will not happen immediately, it will take some time.

@Jinyu-W
Copy link
Collaborator

Jinyu-W commented Jun 25, 2021

More information about the paper, the author @wesley-stone will have more say. :)

@riccardopoiani
Copy link
Author

riccardopoiani commented Jun 25, 2021

@Jinyu-W thanks for you feedback. I will try for sure to integrate the safety reward factor.
For what concerns the Online LP, are the methods in paper 1 and paper 2 differents? Indeed I have some conflicting information about Operation Research baselines.
In paper 1 they state that "the exact future order information to replace the forecasted future demand in the LP model so as to eliminate the effects of external factors leading to uncertain forecasts" (section 5.3 - Online LP).
Indeed, when I talked to the first author, he said that the main difference between Online and Offline was that the plan (in the offline one) was not fed to the simulator, and so, "illegal" actions were not corrected to make them feasible.

Moreover, it seems to me that the crucial point to have good performances (when feeding the plan to the simulator) is that you need estimates that are as accurate as possible (otherwise the plan will soon diverge from reality).
The crucial point in this step is that you want to estimate the future laden of containers at your best (from the laden you can recover estimates of supplies and vessels capacity for empty containers) . How do you do that? Is there anything specific I should pay attention to?

@riccardopoiani
Copy link
Author

riccardopoiani commented Jun 25, 2021

Also, @Ahmedest61 I have seen something similar to what you are looking for in temp_gnn branch, under the folder examples/cim/ac_gnn, but I have not tried it. (I do not know if that's the code they used to have the results they claim in the paper)

@Ahmedest61
Copy link

Ah, thank you so much @riccardopoiani. I'll have a a look. Just wondering, is there any difference in terms 'gnn-ac' (or any other methodology) used 'temp_gnn' and 'maro-0.2_cim_topology_adjustment' branches? If not, I assume 'maro-0.2_cim_topology_adjustment' is the right, more stable one? Thank you!

@riccardopoiani
Copy link
Author

riccardopoiani commented Jun 25, 2021

No idea on this @Ahmedest61. I am currently working on master. Please, let me know if the code I pointed you to is what you were looking for and if you are able to achieve good performances (at least ~90% container container satisfaction on global configs).

Edit: I had a fast look at it, and the pre-training phase seems to be missing to me, so idk

@Ahmedest61
Copy link

Yup, I think this branch "temp_gnn" is similar to what paper 2 proposed but again you're right about PreLAC part!

@Jinyu-W
Copy link
Collaborator

Jinyu-W commented Jun 28, 2021

@Jinyu-W thanks for you feedback. I will try for sure to integrate the safety reward factor.
For what concerns the Online LP, are the methods in paper 1 and paper 2 differents? Indeed I have some conflicting information about Operation Research baselines.
In paper 1 they state that "the exact future order information to replace the forecasted future demand in the LP model so as to eliminate the effects of external factors leading to uncertain forecasts" (section 5.3 - Online LP).
Indeed, when I talked to the first author, he said that the main difference between Online and Offline was that the plan (in the offline one) was not fed to the simulator, and so, "illegal" actions were not corrected to make them feasible.

Moreover, it seems to me that the crucial point to have good performances (when feeding the plan to the simulator) is that you need estimates that are as accurate as possible (otherwise the plan will soon diverge from reality).
The crucial point in this step is that you want to estimate the future laden of containers at your best (from the laden you can recover estimates of supplies and vessels capacity for empty containers) . How do you do that? Is there anything specific I should pay attention to?

Hi, as far as I know, the LP methods used in paper 1 and paper 2 are not the same. The one used in paper 2 is more closer to the formulation I have done before. So I could say more about it. As you mentioned above, the accuracy of the prediction can affect the final performance a lot, to avoid the prediction error being magnified over time. I have added two more "time window" to the LP solution formulation:

  1. One for the problem formulation/solution. Assume we use a time window for 200 ticks, no matter how long the simulation is, each time we only look forward to 200 ticks (also the prediction), and to get the solution for only the future 200 ticks.
  2. One for the solution applying. The apply window size <= the formulation window size. Assume we use an apply window size of 20 ticks. Then the formulation and applying trace would be: formulating and solving at tick 0, applying the solution..., formulating and solving at tick (0 + 20), applying the solution..., formulating and solving at tick (0 + 20 + 20), .... Of course, as you may know, currently the simulation is triggered by the events. If you trigger the problem formulation by the event, the actual ticks for the formulation may be different from the single example above. Like: tick 7 - event coming, formulating, applying ... ; tick 14 - event coming, applying, ...; tick 21 - event coming, formulating, applying, ... ; tick 28 - event coming, formulating, applying, ...

Wish these may help you.

@riccardopoiani
Copy link
Author

riccardopoiani commented Jun 28, 2021

Hi, as far as I know, the LP methods used in paper 1 and paper 2 are not the same. The one used in paper 2 is more closer to the formulation I have done before. So I could say more about it. As you mentioned above, the accuracy of the prediction can affect the final performance a lot, to avoid the prediction error being magnified over time. I have added two more "time window" to the LP solution formulation:

1. One for the problem formulation/solution. Assume we use a time window for 200 ticks, no matter how long the simulation is, each time we only look forward to 200 ticks (also the prediction), and to get the solution for only the future 200 ticks.

2. One for the solution applying. The apply window size <= the formulation window size. Assume we use an apply window size of 20 ticks. Then the formulation and applying trace would be: formulating and solving at tick 0, applying the solution..., formulating and solving at tick (0 + 20), applying the solution..., formulating and solving at tick (0 + 20 + 20), .... Of course, as you may know, currently the simulation is triggered by the events. If you trigger the problem formulation by the event, the actual ticks for the formulation may be different from the single example above. Like: **tick 7** - event coming, **formulating,** applying ... ; tick 14 - event coming, applying, ...; tick 21 - event coming, formulating, applying, ... ; **tick 28** - event coming, **formulating**, applying, ...

Wish these may help you.

Thanks for the hints, but unfortunately, I am already doing both things. I tried many combinations of these hyperparameters but they not seem to solve my issue. It seems to be something that is more related to the estimation of the laden containers.
Currently, to estimate laden, I first compute the exact future demands in advance. Then, using CimDataCollection I pre-simulate the ordered arrivals of vessels. When there is a vessel arrival, I do the following:

  1. Take the estimated laden that vessel is carrying for port_idx and discharge it (this will lead to a supply for port_idx at arrive_tick)
  2. Iterate over the next future arrivals of vessel and load laden according to the cumulated unsatisfied demand that port_idx has for each of the vessel destination
  3. Compute the remaining capacity to estimate the capacity for empty containers

Am I missing something relevant here? Is this the way in which you estimate laden (and thus the supply and the vessel capacity for empty containers)?

Moreover, what are the most relevant changes between paper 1 and paper 2 that you are aware of?

@riccardopoiani
Copy link
Author

Hi @riccardopoiani, I could share more information about the LP methods we used before.

* The Online LP formulated based on the demand(orders) predicted with a simple MA model.

* Besides the reward for the order fulfillment, we also added the reward for safety inventory in the objective function. For example, for each tick, if we have empty containers remained after fulfilling all the orders of this tick, an additional reward would be given: safety_inventory_reward_factor * min(port capacity * safety inventory upper ratio, #remaining empty containers).

* The main difference between the offline LP and the online LP is that the actual demand(orders) is given instead of the prediction.

We may open source a version of Online LP in the future, but it will not happen immediately, it will take some time.

I also noticed here that you were mentioning port_capacity. However, I was not able to find any usage of this attribute in the simulator. Is this a bug or I wasn't able to find it? It does not seems like port capacity is checked when actions are taken.
Searching in files with .capacity does not report any usage of that attribute to me.

@Jinyu-W
Copy link
Collaborator

Jinyu-W commented Jun 29, 2021

Hi @riccardopoiani, I could share more information about the LP methods we used before.

* The Online LP formulated based on the demand(orders) predicted with a simple MA model.

* Besides the reward for the order fulfillment, we also added the reward for safety inventory in the objective function. For example, for each tick, if we have empty containers remained after fulfilling all the orders of this tick, an additional reward would be given: safety_inventory_reward_factor * min(port capacity * safety inventory upper ratio, #remaining empty containers).

* The main difference between the offline LP and the online LP is that the actual demand(orders) is given instead of the prediction.

We may open source a version of Online LP in the future, but it will not happen immediately, it will take some time.

I also noticed here that you were mentioning port_capacity. However, I was not able to find any usage of this attribute in the simulator. Is this a bug or I wasn't able to find it? It does not seems like port capacity is checked when actions are taken.
Searching in files with .capacity does not report any usage of that attribute to me.

According to our experience before, the port capacity will not be the limitation for the valid action scope, and we also set a big enough capacity for the ports in those topologies. So you the absence of the checking of the port capacity wouldn't make any difference. But I agree that it's better to add it back.

@wesley-stone
Copy link
Contributor

Yup, I think this branch "temp_gnn" is similar to what paper 2 proposed but again you're right about PreLAC part!

Yes, you are right :)

@riccardopoiani
Copy link
Author

Hi, as far as I know, the LP methods used in paper 1 and paper 2 are not the same. The one used in paper 2 is more closer to the formulation I have done before. So I could say more about it. As you mentioned above, the accuracy of the prediction can affect the final performance a lot, to avoid the prediction error being magnified over time. I have added two more "time window" to the LP solution formulation:

1. One for the problem formulation/solution. Assume we use a time window for 200 ticks, no matter how long the simulation is, each time we only look forward to 200 ticks (also the prediction), and to get the solution for only the future 200 ticks.

2. One for the solution applying. The apply window size <= the formulation window size. Assume we use an apply window size of 20 ticks. Then the formulation and applying trace would be: formulating and solving at tick 0, applying the solution..., formulating and solving at tick (0 + 20), applying the solution..., formulating and solving at tick (0 + 20 + 20), .... Of course, as you may know, currently the simulation is triggered by the events. If you trigger the problem formulation by the event, the actual ticks for the formulation may be different from the single example above. Like: **tick 7** - event coming, **formulating,** applying ... ; tick 14 - event coming, applying, ...; tick 21 - event coming, formulating, applying, ... ; **tick 28** - event coming, **formulating**, applying, ...

Wish these may help you.

Thanks for the hints, but unfortunately, I am already doing both things. I tried many combinations of these hyperparameters but they not seem to solve my issue. It seems to be something that is more related to the estimation of the laden containers.
Currently, to estimate laden, I first compute the exact future demands in advance. Then, using CimDataCollection I pre-simulate the ordered arrivals of vessels. When there is a vessel arrival, I do the following:

1. Take the estimated laden that `vessel` is carrying for `port_idx` and discharge it (this will lead to a supply for `port_idx` at `arrive_tick`)

2. Iterate over the next future arrivals of `vessel` and load laden according to the cumulated unsatisfied demand that `port_idx` has for each of the vessel destination

3. Compute the remaining capacity to estimate the capacity for empty containers

Am I missing something relevant here? Is this the way in which you estimate laden (and thus the supply and the vessel capacity for empty containers)?

Moreover, what are the most relevant changes between paper 1 and paper 2 that you are aware of?

Any answer for this?

@riccardopoiani
Copy link
Author

Hi @wesley-stone, could any of you provide the environment configuration that you used in paper 2?
With master/global_trade_cfg I have results of 68% fulfillment, while with these configs I am able to reach 98-100% fulfillment (same LP method). Thus, I was wondering what cfg you used in the paper so that I can actually compare results.

@riccardopoiani
Copy link
Author

Hi, @wesley-stone @Jinyu-W any update on the previous post?

It seems to me somehow relevant to provide good configurations within the environment. Also because in these configs no repositioning performs very good while with master branch configurations it seems to me that the optimal solution is very far from 100%.

@Jinyu-W
Copy link
Collaborator

Jinyu-W commented Jul 29, 2021

Hi, as far as I know, the LP methods used in paper 1 and paper 2 are not the same. The one used in paper 2 is more closer to the formulation I have done before. So I could say more about it. As you mentioned above, the accuracy of the prediction can affect the final performance a lot, to avoid the prediction error being magnified over time. I have added two more "time window" to the LP solution formulation:

1. One for the problem formulation/solution. Assume we use a time window for 200 ticks, no matter how long the simulation is, each time we only look forward to 200 ticks (also the prediction), and to get the solution for only the future 200 ticks.

2. One for the solution applying. The apply window size <= the formulation window size. Assume we use an apply window size of 20 ticks. Then the formulation and applying trace would be: formulating and solving at tick 0, applying the solution..., formulating and solving at tick (0 + 20), applying the solution..., formulating and solving at tick (0 + 20 + 20), .... Of course, as you may know, currently the simulation is triggered by the events. If you trigger the problem formulation by the event, the actual ticks for the formulation may be different from the single example above. Like: **tick 7** - event coming, **formulating,** applying ... ; tick 14 - event coming, applying, ...; tick 21 - event coming, formulating, applying, ... ; **tick 28** - event coming, **formulating**, applying, ...

Wish these may help you.

Thanks for the hints, but unfortunately, I am already doing both things. I tried many combinations of these hyperparameters but they not seem to solve my issue. It seems to be something that is more related to the estimation of the laden containers.
Currently, to estimate laden, I first compute the exact future demands in advance. Then, using CimDataCollection I pre-simulate the ordered arrivals of vessels. When there is a vessel arrival, I do the following:

1. Take the estimated laden that `vessel` is carrying for `port_idx` and discharge it (this will lead to a supply for `port_idx` at `arrive_tick`)

2. Iterate over the next future arrivals of `vessel` and load laden according to the cumulated unsatisfied demand that `port_idx` has for each of the vessel destination

3. Compute the remaining capacity to estimate the capacity for empty containers

Am I missing something relevant here? Is this the way in which you estimate laden (and thus the supply and the vessel capacity for empty containers)?
Moreover, what are the most relevant changes between paper 1 and paper 2 that you are aware of?

Any answer for this?

@riccardopoiani Sorry for the late reply. Originally I want to reproduce the online LP for CIM and share it with you, but I was busy with some other works :( As for the points in your problem formulation:

  • For the vessel arrival, I think it is OK to read from the Data Collection.
  • For the demand of each port, we use the predicted value in the Online solution. And for the ones consumed by the historical demand, we'll add it into the future supply of the specific ports accordingly.
  • For the future supply of each port, we also use the predicted value.
  • Also, as you already formulated, both the demand and supply would be used to calculate the future inventory (and remaining space) for both ports and vessels.

You can refer to this semifinished example (for reference only) to get more details. But currently, limited by the predicted value input, still cannot get the optimal solution.

As for the change between paper 1 and paper 2, sorry but I don't have any idea about it. As for the details of paper 2, sorry but I'm not the author and I don't know the details either.

@riccardopoiani
Copy link
Author

Thank you very much @Jinyu-W.

I just had a few tests on the environment and this somehow confirmed me what I was thinking.

The summary is the following: I believe that the current global trade configurations are "too hard" to achieve reasonably performances. Currently, all the methods that I have implemented and tested can't get beyond 70% order fulfillment (even in the simplest configuration, that is global_trade.22p_l0.0.
Instead, what happens, is that just with a slight decrease in container_usage_proportion, in particular, I decreased its value by a factor of 0.008 or 0.007) depending on the configuration. So, for instance, in global_trade.22p_l0.0:

container_usage_proportion:
  period: 112
  sample_nodes:
  - - 0
    - 0.02
  - - 111
    - 0.02
  sample_noise: 0

is substituted with

container_usage_proportion:
  period: 112
  sample_nodes:
  - - 0
    - 0.012
  - - 111
    - 0.012
  sample_noise: 0

This actually helped in reaching very good performances.
Another think that I tested, is to increase the vessel speed (3x), actually helped in reaching the same high performances.
Instead, increasing the capacities of vessels and ports to a very large number didn't help in increasing the performances of these methods (they were stuck below 70%).
Of course, I cannot know if there are bugs in my implementation, but I'm sure at 99% that the problem is in the plausible unfeasibility of the environment.

The fact that this happened even with your implementation, is even a stronger signal.
Indeed, with the global configuration that I have been proposing, your method achieve around 90% in global_trade.22p_l0.0 (that has constant order distributions). While, with the current configuration it only gets around 63%.

I would like that you consider the possibility that actually the current configurations are "too hard" to be solved with good performances. In this sense I see two options: recover in some way the configurations used in the paper, or manually decrease that hardness as I did.

@Jinyu-W
Copy link
Collaborator

Jinyu-W commented Jul 30, 2021

Firstly, I want to express my heartfelt thanks for your careful and considerable verification @riccardopoiani. There are 2 things I want to clarify:

  1. this semifinished example and the one used in paper 2 share a similar problem formulation, but it is not certain whether the one used in paper 2 is exactly the same as the example.
  2. My personal view: I don't think it is reasonable to adjust the topology according to the performance. It should be reversed. Determine the topology based on the data distribution or some reliable analysis, then tune the algorithm (no matter ML-based/OR-based/heuristic/...) to get better performance. So, if there are some unreasonable settings in the topologies, we will be willing to update them.
    Thanks again!

@riccardopoiani
Copy link
Author

riccardopoiani commented Jul 30, 2021

Thank you very much again @Jinyu-W.
You are right that the configurations should not be tuned to reach good performances. It make perfectly sense.

However, I would expect at least the repository to contain information on the best results that have been achieved by some method of yours on that environment.
I think that reporting performances of no-repositioning and random actions (as in maro's doc) is probably not enough.
For instance, when dealing with gym environment, you have information like "CartPole-v0 defines "solving" as getting average reward of 195.0 over 100 consecutive trials" (as written here).
This let understand who is experimenting with the repo for research purpose, how sub-optimal their algorithm is w.r.t. different configurations. Not having such feature, complicate significantly the job of understanding whether a new good method has been designed or not.

If you still consider this not be enough, I will understand.

Thanks again for your replies :)

EDIT: Let me rephrase it a bit: the main point of having feasible environments (or the ones tested in the papers) is because you have already something to compare yourself to. E.g.: with feasible environments, if your method is good you should have performances close to 100%. If you have the results in the paper, you should have performances close to the ones in the paper.
With current envs, instead, one would expect to obtain performances close to the ones in the paper (since it is a global trade configuration), while this is probably not possible. This should indeed at least be specified in the documentation (so that other people can actually avoid the same error I did).
Moreover, as default configuration I would expect to have something that is in some way "learnable". It does not make much sense to me to have an environment in which basically all the methods performs approx. the same.

@WessZumino
Copy link

Thank you very much again @Jinyu-W.
You are right that the configurations should not be tuned to reach good performances. It make perfectly sense.

However, I would expect at least the repository to contain information on the best results that have been achieved by some method of yours on that environment.
I think that reporting performances of no-repositioning and random actions (as in maro's doc) is probably not enough.
For instance, when dealing with gym environment, you have information like "CartPole-v0 defines "solving" as getting average reward of 195.0 over 100 consecutive trials" (as written here).
This let understand who is experimenting with the repo for research purpose, how sub-optimal their algorithm is w.r.t. different configurations. Not having such feature, complicate significantly the job of understanding whether a new good method has been designed or not.

If you still consider this not be enough, I will understand.

Thanks again for your replies :)

EDIT: Let me rephrase it a bit: the main point of having feasible environments (or the ones tested in the papers) is because you have already something to compare yourself to. E.g.: with feasible environments, if your method is good you should have performances close to 100%. If you have the results in the paper, you should have performances close to the ones in the paper.
With current envs, instead, one would expect to obtain performances close to the ones in the paper (since it is a global trade configuration), while this is probably not possible. This should indeed at least be specified in the documentation (so that other people can actually avoid the same error I did).
Moreover, as default configuration I would expect to have something that is in some way "learnable". It does not make much sense to me to have an environment in which basically all the methods performs approx. the same.

I am just getting into this conversation now, and I hope to provide some useful information. @Jinyu-W please correct me if I am wrong. Concerning comparison with the results in the papers, I think one of the problems comes from using different cost functions, not only among the papers, but also between the papers and the current code. This is true for both the LP model and the RL based model (if you are using the default reward function in main, this is way simpler than the one in the papers).

@riccardopoiani maybe I missed this in the previous messages above, have you tried to use the Cost function in the paper for the LP solver ? I did not, and it is in my TODO list, just curious. Same thing apply for the RL models. I think that changing environment parameters is not the way to go, although I totally agree that it would be good to have a benchmark to use as a reference.

Last question/observation: are you sure that the envs are not feasible? Feasibility of a problem (as a whole) is often independent from the solver we use. What I mean is: say we have a simple supply/demand problem and we need to satisfy all demand as a constraint; if supply << demand, we will never be able to find a feasible solution. Ok I know this is embarrassingly simplistic.
As constraints becomes tighter, the feasible space shrinks and the solver will have problems to find a feasible solution. So here a better solver, or implementing some constraint relaxation may help.

Bottom line, I was wondering if there is a way to have a bound on the feasibility of the problem. The provided sample problems are classified for increasing difficulty, based on the order distribution, the noise etc. Despite how hard is to find a good solution, does the problem at least admits a feasible solution?
This could help us understand if a certain result could be, at lest in principle, improved by using another method.

@riccardopoiani
Copy link
Author

riccardopoiani commented Aug 12, 2021

I am just getting into this conversation now, and I hope to provide some useful information. @Jinyu-W please correct me if I am wrong. Concerning comparison with the results in the papers, I think one of the problems comes from using different cost functions, not only among the papers, but also between the papers and the current code. This is true for both the LP model and the RL based model (if you are using the default reward function in main, this is way simpler than the one in the papers).

@riccardopoiani maybe I missed this in the previous messages above, have you tried to use the Cost function in the paper for the LP solver ? I did not, and it is in my TODO list, just curious. Same thing apply for the RL models. I think that changing environment parameters is not the way to go, although I totally agree that it would be good to have a benchmark to use as a reference.

Hello @WessZumino, both papers report performances in terms of container shortage (even if they use a different reward function and so on), the final metric is always the same. And that is the kind of performances that I am using as well, even if currently my objective function/reward functions are functions of the environments that are a bit more more complex. (i.e., I am not using -shortage in RL to achieve good results). The point is that the final goal stay the same: in other words, I am not taking into account goals that contrast this objective, such as "move as less container as possible in order to minimize the transfer cost while minimizing the shortage". In other words, even if different reward function/objective function are used, the final results are still comparable since the final goal is the one of minimizing the container shortage.
For what concern your question, I have tried many things and many reward/objective function so far in my personal implementation of the LP baseline. I have not for what concern the LP method provided in maro.

Last question/observation: are you sure that the envs are not feasible? Feasibility of a problem (as a whole) is often independent from the solver we use. What I mean is: say we have a simple supply/demand problem and we need to satisfy all demand as a constraint; if supply << demand, we will never be able to find a feasible solution. Ok I know this is embarrassingly simplistic.
As constraints becomes tighter, the feasible space shrinks and the solver will have problems to find a feasible solution. So here a better solver, or implementing some constraint relaxation may help.

Bottom line, I was wondering if there is a way to have a bound on the feasibility of the problem. The provided sample problems are classified for increasing difficulty, based on the order distribution, the noise etc. Despite how hard is to find a good solution, does the problem at least admits a feasible solution?
This could help us understand if a certain result could be, at lest in principle, improved by using another method.

First of all, I apologize for the confusion. With unfeasible envs, I mean that there is no way of obtaining 0 shortage.
This is different from the method being unable to return any solution. All the implementations I have tested are able to compute a feasible plan, the point is that the performance of the plan are not what one may expect after reading the papers.

Of course I am not sure of my claims since I am not sure that my implementation is bug free.
However, the LP method mentioned in paper 1 is based on the assumption that all demands can be satisfied (i.e., minimizing the shortage, you assume that the optimal objective function is 0, or close to 0).
If this is the case (the environment is "feasible"), the method performs as an oracle (indeed, in that algorithm all the unknown information such as demands, arrivals and so on are known in advance, at planning time (this info comes from the first author of the paper)). I am able to obtain performances > 96% in all configurations that satisfy this constraints. As soon as this requirement is not satisfied performances start to decrease. The more this constraint is not satisfied, the more the drop in performance.
This is basically, the main reason why I started to think that it is not possible to reach good performances in current global envs (the method achieves around 60-65% depending on the different settings)

EDIT: This is "confirmed" in principle also by the fact that if you increase the vessel capacity to infinity, it does not affect the solution that you can find at all (even knowing all the info in advance).

@WessZumino
Copy link

WessZumino commented Aug 12, 2021

Hello @WessZumino, both papers report performances in terms of container shortage (even if they use a different reward function and so on), the final metric is always the same. And that is the kind of performances that I am using as well, even if currently my objective function/reward functions are functions of the environments that are a bit more more complex. (i.e., I am not using -shortage in RL to achieve good results). The point is that the final goal stay the same: in other words, I am not taking into account goals that contrast this objective, such as "move as less container as possible in order to minimize the transfer cost while minimizing the shortage". In other words, even if different reward function/objective function are used, the final results are still comparable since the final goal is the one of minimizing the container shortage.

Hi @riccardopoiani thanks for the clarification, it is very helpful. Concerning your point, I totally agree that the final metric is the same and therefore it is meaningful to compare to it.

The part that confuses me is the following, and maybe it is just terminology: if my goal is to compare my result to the results reported in the papers, changing objective/reward doesn't mean I am changing the underlying model ? And here by model I mean what is represented by the objective function. If I think about a physical model, the model is entirely defined by an Hamiltonian H (that is the equivalent of a cost function to minimize); say I am interested in evaluating a certain metric related to H. If I add new terms to H and evaluate the same metric M, can I say I am comparing the same model ? Should I even expect the same result for M coming from different Hs ? In physics the answer is no, but maybe it is fine within this context, I am not sure I understand.

Now let us say everything is the same as in the paper: same objective, same training strategy etc. Can I reproduce the results? Maybe you have already done this and I did not understand, so I am sorry in advance if I said something trivial.

First of all, I apologize for the confusion. With unfeasible envs, I mean that there is no way of obtaining 0 shortage.
This is different from the method being unable to return any solution. All the implementations I have tested are able to compute a feasible plan, the point is that the performance of the plan are not what one may expect after reading the papers.

Of course I am not sure of my claims since I am not sure that my implementation is bug free.
However, the LP method mentioned in paper 1 is based on the assumption that all demands can be satisfied (i.e., minimizing the shortage, you assume that the optimal objective function is 0, or close to 0).
If this is the case (the environment is "feasible"), the method performs as an oracle (indeed, in that algorithm all the unknown information such as demands, arrivals and so on are known in advance, at planning time (this info comes from the first author of the paper)). I am able to obtain performances > 96% in all configurations that satisfy this constraints. As soon as this requirement is not satisfied performances start to decrease. The more this constraint is not satisfied, the more the drop in performance.
This is basically, the main reason why I started to think that it is not possible to reach good performances in current global envs (the method achieves around 60-65% depending on the different settings)

EDIT: This is "confirmed" in principle also by the fact that if you increase the vessel capacity to infinity, it does not affect the solution that you can find at all (even knowing all the info in advance).

Thanks, this is really useful information !

@riccardopoiani
Copy link
Author

Hi @riccardopoiani thanks for the clarification, it is very helpful. Concerning your point, I totally agree that the final metric is the same and therefore it is meaningful to compare to it.

The part that confuses me is the following, and maybe it is just terminology: if my goal is to compare my result to the results reported in the papers, changing objective/reward doesn't mean I am changing the underlying model ? And here by model I mean what is represented by the objective function. If I think about a physical model, the model is entirely defined by an Hamiltonian H (that is the equivalent of a cost function to minimize); say I am interested in evaluating a certain metric related to H. If I add new terms to H and evaluate the same metric M, can I say I am comparing the same model ? Should I even expect the same result for M coming from different Hs ? In physics the answer is no, but maybe it is fine within this context, I am not sure I understand.

Yes @WessZumino, sorry for the misunderstanding; I thought you were mentioning more complex objectives as other papers on ECRs do. You are right that even in this case you would obtain different plans, and therefore different evaluations for M.
Just as a final clarification on this; terms that you add on H are included only to handle difference that you might encounter when you execute your plan in the simulator. For instance, this is the case for the safety inventory bonus: you want to avoid to plan in a way that satisfy the demands in a "too tight" way. Adding that bonus you incentivize the LP method to create a stock of containers. In this context, my experience so far is the following: bonus that you can add to your objective/reward function can help, but the improvement is not incredibly significant. For instance, depending on the configuration, you can go gain a 2% in the fulfillment. In other words, you can expect slightly better results.

Now let us say everything is the same as in the paper: same objective, same training strategy etc. Can I reproduce the results? Maybe you have already done this and I did not understand, so I am sorry in advance if I said something trivial.

I've been trying so far. The problem is that all the global env cfgs are way different from the ones of both papers.
This is clear also from the performance of the no-repositioning policy, that can obtain from 59-63% on the current configurations vs the 81.95 reported in the paper.
With the tweaked configurations that I was mentioning above, the performance of the no-repositioning policy is around 82-83%, and the OnlineLP of paper 1 (that is different from paper 2 and the one in the maro branch) obtains 96%.
Of course, I have no idea is the configuration is the same, but at least performances seems to be more "comparable".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 doc Improvements or additions to documentation.
Projects
None yet
Development

No branches or pull requests

5 participants