Fix to get Test and Train Fitness for final solution of Mult-objective Optimization. Fixes #166 #167

zahidirfan · 2024-10-02T05:20:52Z

…e Optimization. Fixes PonyGE#166

jmmcd · 2024-10-02T08:16:29Z

Thanks for the bug report and PR!

This looks good to me, but I have never really used the MOO part of PonyGE. @mikefenton @dvpfagan if still watching this repo, please comment but if not, please let me know and I'll try to test the fix properly.

mikefenton · 2024-10-02T08:38:47Z

mikefenton · 2024-10-02T09:25:44Z

Had a review of the codebase there and I think the proposed solution is only treating one specific symptom rather than curing the disease itself. There are a number of problems that should be addressed as part of this fix:

The code as it stands expects the user to implement a self.training_test attribute in any new fitness functions that are expected to run on training/test data. This is a bit unintuitive and can be an easy pitfall for new users. It is only explicitly set in two locations across the whole codebase:
1. fitness/supervised_learning/supervised_learning.py
2. fitness/supervised_learning/regression_random_polynomial.py
The training_test attribute is checked in four loctions:
1. stats.py, line 118
  This one is for single dimensional stats, and there's no problems here if the user has correctly added the self.training_test attribute to their custom fitness function.
2. stats.py, line 243
  This is the problem one. Technically speaking the correct solution here is to change it to if any([hasattr(_ff, 'training_test') for _ff in params['FITNESS_FUNCTION']]), but I think the bigger problem is expecting people to have designed their fitness functions correctly. The proposed change in this MR is a good catchall solution, but it should be implemented in more locations than just this one case here.
3. stats.py, line 367
  Since this line only is triggered for SOO stats, it works fine at the moment.
4. utilities/stats/file_io.py, line 71
  This is going to silently fail for MOO use cases for the same reason this MR was raised. However, it should work just fine for SOO cases.

I think the best solution to this problem should be:

Get rid of the self.training_test attributes for the two fitness function examples listed above (just delete the single lines in each of the files, that should be enough)
Change all three lines in stats/stats.py that reference the current training_test attribute to just check params['DATASET_TEST'].
Change line 71 of file_io to just check params['DATASET_TEST'].

Side note after 7 years working in the industry: dear god, we need tests.

zahidirfan · 2024-10-02T10:35:00Z

@mikefenton : Thanks for the comprehensive review.

The code as it stands expects the user to implement a self.training_test attribute in any new fitness functions that are expected to run on training/test data. This is a bit unintuitive and can be an easy pitfall for new users. It is only explicitly set in two locations across the whole codebase:
fitness/supervised_learning/supervised_learning.py
fitness/supervised_learning/regression_random_polynomial.py

I defined the training_test attribute in the fitness functions as required, but it won't work for multiobjective case because of the moo_ff object being used to store the fitness functions and it does not have direct access to the fitness function's attribute (as currently being checked), yes we can use the fitness functions and then check individual attributes. But even if we define it, the way it works currently is to just check if we have the DATASET_TEST param set then its True otherwise False.

stats.py, line 118
This one is for single dimensional stats, and there's no problems here if the user has correctly added the self.training_test attribute to their custom fitness function.

Agreed

stats.py, line 243

This is the problem one. Technically speaking the correct solution here is to change it to if any([hasattr(_ff, 'training_test') for _ff in params['FITNESS_FUNCTION']]), but I think the bigger problem is expecting people to have designed their fitness functions correctly. The proposed change in this MR is a good catchall solution, but it should be implemented in more locations than just this one case here.

A problem needs test dataset is not dependent on how many fitness functions are used. Probably all the fitness functions that are there would be able to have a training fitness and a test fitness. Please correct me if I am wrong here.

[stats.py, line 367](https://github.com/PonyGE/PonyGE2/blob/master/src/stats/stats.py#L367)
Since this line only is triggered for SOO stats, it works fine at the moment.

Agreed.

utilities/stats/file_io.py, line 71
This is going to silently fail for MOO use cases for the same reason this MR was raised. However, it should work just fine for SOO cases.

This function is only called in the stats.py, line 76 which is for the case of soo, in the moo case this is not called.

mikefenton · 2024-10-02T10:49:20Z

I defined the training_test attribute in the fitness functions as required, but it won't work for multiobjective case because of the moo_ff object being used to store the fitness functions and it does not have direct access to the fitness function's attribute (as currently being checked), yes we can use the fitness functions and then check individual attributes. But even if we define it, the way it works currently is to just check if we have the DATASET_TEST param set then its True otherwise False.

I meant we should just remove the self.training_test attribute completely from the codebase since it's only explicitly set in two locations at the moment in the original codebase. This functionality is over-complicated, and the simpler params['DATASET_TEST'] check is more robust.

utilities/stats/file_io.py, line 71
This is going to silently fail for MOO use cases for the same reason this MR was raised. However, it should work just fine for SOO cases.

This function is only called in the stats.py, line 76 which is for the case of soo, in the moo case this is not called.

It's also called in line 109 of file_io in the save_first_front_to_file function which is called by get_moo_stats.

…i-objective optimization PonyGE#166. This was fixed but not pushed in original commit.

zahidirfan · 2024-10-02T11:17:32Z

I meant we should just remove the self.training_test attribute completely from the codebase since it's only explicitly set in two locations at the moment in the original codebase. This functionality is over-complicated, and the simpler params['DATASET_TEST'] check is more robust.

Agreed

It's also called in line 109 of file_io in the save_first_front_to_file function which is called by get_moo_stats.

I had fixed this originally. I was getting the Test, Training fitness in the first front files. Now I have pushed the commit.

…he example regression and supervised_learning files.

…k as per discussion in PonyGE#130

jmmcd · 2024-10-12T22:22:51Z

I'm confused. According to #130 we should change from x[<varidx>] to x[:, <varidx>], but I think your PR changes in the opposite direction. Maybe the MOO fitness function needs a fix instead? Please paste the error you see when using the :, if the fix is not obvious for you.

zahidirfan · 2024-10-13T17:05:51Z

@jmmcd : I updated the issue with the error as well. You are correct that issue #130 asks to do the reverse. I will go through the codebase again to determine what is happening.

Fix to get Test and Train Fitness for final solution of Mult-objectiv…

4e1b5da

…e Optimization. Fixes PonyGE#166

In addition to the fix of training_test condition never true for Mult…

471ebe7

…i-objective optimization PonyGE#166. This was fixed but not pushed in original commit.

zahidirfan added 2 commits October 11, 2024 19:35

Removing the training_test variable from the stats.py file and from t…

6937700

…he example regression and supervised_learning files.

Changed Grammar file from x[,<varindx>] to x[<varindx>] for it to wor…

db5644a

…k as per discussion in PonyGE#130

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix to get Test and Train Fitness for final solution of Mult-objective Optimization. Fixes #166 #167

Fix to get Test and Train Fitness for final solution of Mult-objective Optimization. Fixes #166 #167

zahidirfan commented Oct 2, 2024

jmmcd commented Oct 2, 2024

mikefenton commented Oct 2, 2024

mikefenton commented Oct 2, 2024

zahidirfan commented Oct 2, 2024

mikefenton commented Oct 2, 2024

zahidirfan commented Oct 2, 2024

jmmcd commented Oct 12, 2024

zahidirfan commented Oct 13, 2024

Fix to get Test and Train Fitness for final solution of Mult-objective Optimization. Fixes #166 #167

Are you sure you want to change the base?

Fix to get Test and Train Fitness for final solution of Mult-objective Optimization. Fixes #166 #167

Conversation

zahidirfan commented Oct 2, 2024

jmmcd commented Oct 2, 2024

mikefenton commented Oct 2, 2024

mikefenton commented Oct 2, 2024

zahidirfan commented Oct 2, 2024

mikefenton commented Oct 2, 2024

zahidirfan commented Oct 2, 2024

jmmcd commented Oct 12, 2024

zahidirfan commented Oct 13, 2024