Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to get Test and Train Fitness for final solution of Mult-objective Optimization. Fixes #166 #167

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

zahidirfan
Copy link

Fix #166

@jmmcd
Copy link
Collaborator

jmmcd commented Oct 2, 2024

Thanks for the bug report and PR!

This looks good to me, but I have never really used the MOO part of PonyGE. @mikefenton @dvpfagan if still watching this repo, please comment but if not, please let me know and I'll try to test the fix properly.

@mikefenton
Copy link
Collaborator

image

@mikefenton
Copy link
Collaborator

Had a review of the codebase there and I think the proposed solution is only treating one specific symptom rather than curing the disease itself. There are a number of problems that should be addressed as part of this fix:

  1. The code as it stands expects the user to implement a self.training_test attribute in any new fitness functions that are expected to run on training/test data. This is a bit unintuitive and can be an easy pitfall for new users. It is only explicitly set in two locations across the whole codebase:
    1. fitness/supervised_learning/supervised_learning.py
    2. fitness/supervised_learning/regression_random_polynomial.py
  2. The training_test attribute is checked in four loctions:
    1. stats.py, line 118
      This one is for single dimensional stats, and there's no problems here if the user has correctly added the self.training_test attribute to their custom fitness function.
    2. stats.py, line 243
      This is the problem one. Technically speaking the correct solution here is to change it to if any([hasattr(_ff, 'training_test') for _ff in params['FITNESS_FUNCTION']]), but I think the bigger problem is expecting people to have designed their fitness functions correctly. The proposed change in this MR is a good catchall solution, but it should be implemented in more locations than just this one case here.
    3. stats.py, line 367
      Since this line only is triggered for SOO stats, it works fine at the moment.
    4. utilities/stats/file_io.py, line 71
      This is going to silently fail for MOO use cases for the same reason this MR was raised. However, it should work just fine for SOO cases.

I think the best solution to this problem should be:

  1. Get rid of the self.training_test attributes for the two fitness function examples listed above (just delete the single lines in each of the files, that should be enough)
  2. Change all three lines in stats/stats.py that reference the current training_test attribute to just check params['DATASET_TEST'].
  3. Change line 71 of file_io to just check params['DATASET_TEST'].

Side note after 7 years working in the industry: dear god, we need tests.

@zahidirfan
Copy link
Author

@mikefenton : Thanks for the comprehensive review.

The code as it stands expects the user to implement a self.training_test attribute in any new fitness functions that are expected to run on training/test data. This is a bit unintuitive and can be an easy pitfall for new users. It is only explicitly set in two locations across the whole codebase:
fitness/supervised_learning/supervised_learning.py
fitness/supervised_learning/regression_random_polynomial.py

I defined the training_test attribute in the fitness functions as required, but it won't work for multiobjective case because of the moo_ff object being used to store the fitness functions and it does not have direct access to the fitness function's attribute (as currently being checked), yes we can use the fitness functions and then check individual attributes. But even if we define it, the way it works currently is to just check if we have the DATASET_TEST param set then its True otherwise False.

stats.py, line 118
This one is for single dimensional stats, and there's no problems here if the user has correctly added the self.training_test attribute to their custom fitness function.

Agreed

stats.py, line 243

This is the problem one. Technically speaking the correct solution here is to change it to if any([hasattr(_ff, 'training_test') for _ff in params['FITNESS_FUNCTION']]), but I think the bigger problem is expecting people to have designed their fitness functions correctly. The proposed change in this MR is a good catchall solution, but it should be implemented in more locations than just this one case here.

A problem needs test dataset is not dependent on how many fitness functions are used. Probably all the fitness functions that are there would be able to have a training fitness and a test fitness. Please correct me if I am wrong here.

[stats.py, line 367](https://github.com/PonyGE/PonyGE2/blob/master/src/stats/stats.py#L367)
Since this line only is triggered for SOO stats, it works fine at the moment.

Agreed.

utilities/stats/file_io.py, line 71
This is going to silently fail for MOO use cases for the same reason this MR was raised. However, it should work just fine for SOO cases.

This function is only called in the stats.py, line 76 which is for the case of soo, in the moo case this is not called.

@mikefenton
Copy link
Collaborator

I defined the training_test attribute in the fitness functions as required, but it won't work for multiobjective case because of the moo_ff object being used to store the fitness functions and it does not have direct access to the fitness function's attribute (as currently being checked), yes we can use the fitness functions and then check individual attributes. But even if we define it, the way it works currently is to just check if we have the DATASET_TEST param set then its True otherwise False.

I meant we should just remove the self.training_test attribute completely from the codebase since it's only explicitly set in two locations at the moment in the original codebase. This functionality is over-complicated, and the simpler params['DATASET_TEST'] check is more robust.

utilities/stats/file_io.py, line 71
This is going to silently fail for MOO use cases for the same reason this MR was raised. However, it should work just fine for SOO cases.

This function is only called in the stats.py, line 76 which is for the case of soo, in the moo case this is not called.

It's also called in line 109 of file_io in the save_first_front_to_file function which is called by get_moo_stats.

…i-objective optimization PonyGE#166. This was fixed but not pushed in original commit.
@zahidirfan
Copy link
Author

I meant we should just remove the self.training_test attribute completely from the codebase since it's only explicitly set in two locations at the moment in the original codebase. This functionality is over-complicated, and the simpler params['DATASET_TEST'] check is more robust.

Agreed

It's also called in line 109 of file_io in the save_first_front_to_file function which is called by get_moo_stats.

I had fixed this originally. I was getting the Test, Training fitness in the first front files. Now I have pushed the commit.

@jmmcd
Copy link
Collaborator

jmmcd commented Oct 12, 2024

I'm confused. According to #130 we should change from x[<varidx>] to x[:, <varidx>], but I think your PR changes in the opposite direction. Maybe the MOO fitness function needs a fix instead? Please paste the error you see when using the :, if the fix is not obvious for you.

@zahidirfan
Copy link
Author

@jmmcd : I updated the issue with the error as well. You are correct that issue #130 asks to do the reverse. I will go through the codebase again to determine what is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

training_test condition never true for Multi-objective optimization
3 participants