Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Update] Optimizer visualization improved #345

Merged
merged 6 commits into from
Feb 14, 2025

Conversation

AidinHamedi
Copy link
Contributor

@AidinHamedi AidinHamedi commented Feb 11, 2025

Problem (Why?)

The original code's output plots were highly unstable and failed to fully demonstrate the optimizers' potential. The visualizations lacked clarity, and the optimization process often produced erratic results, making it difficult to evaluate the true performance of each optimizer. Additionally, the loss system did not account for boundary violations.

Solution (What/How?)

  1. Code Refactoring:

    • Added detailed comments and docstrings to improve code readability and maintainability.
    • Organized configuration constants (e.g., OPTIMIZERS_IGNORE, OPTIMIZATION_STEPS) at the top of the script for easier tuning and experimentation.
    • Improved the structure of the code by separating concerns into distinct sections (e.g., configuration, test functions, optimization logic, visualization).
  2. Enhanced Loss System:

    • Introduced a boundary penalty to penalize solutions that go outside the valid range, ensuring the optimizers stay within the defined search space.
    • Combined the final position loss (distance to the global minimum) with the average loss during optimization to encourage smoother and more stable convergence.
  3. Improved Visualization:

    • Added markers for the global minimum and final position to make the plots more informative.
    • Included optimizer hyperparameters in the plot title for better context.

Other Changes (Bug Fixes, Small Refactors)

  • Rastrigin Function Enhancement:
    • Added an option to make the Rastrigin function more challenging by setting DIFFICULT_RASTRIGIN to True. This changes the initial_state to a harder starting location and introduces noise to the function output.
  • Hyper Param Tunning:
    • Added support for other hyper params not just lr like momentum etc...

Notes

While the majority of the optimizers now produce clean and stable plots, there are still a few that exhibit instability or unclear behavior. These include:

  • BSAM
  • ASGD
  • MSVAG
  • DAdaptAdam
  • Muon
  • AliG: execute_steps function modifying best_params

@AidinHamedi AidinHamedi requested a review from kozistr as a code owner February 11, 2025 19:27
@kozistr kozistr added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 13, 2025
@kozistr
Copy link
Owner

kozistr commented Feb 13, 2025

@AidinHamedi, thanks for your awesome work! appreciate it :)

review page is so lacky cuz of lots of changes, so leave some reviews here

  1. Your code and naming are already readable and intuitive enough, and all functions have a docstring. So, in my opinion, adding extra comments is not necessary. could you please remove all comments?
  2. question about calculating the penalty (line 348), which is 75 * total_violation. could you give me more details about the constant 75? did you get this value empirically or just set a decent value?
  3. max_queue_len to 6. I was just wondering, did you see any speed up by setting max_queue_len to 6?

others look great to me!

@AidinHamedi
Copy link
Contributor Author

I really appreciate your feedback. Let me address your comments:

  1. Removing comments: I will remove the unnecessary comments as you recommended. The code and naming are already clear and intuitive, so I agree that additional comments may be excessive.

  2. Penalty calculation: The constant 75 was selected somewhat randomly as a reasonable multiplier for penalizing boundary violations. It wasn’t based on empirical data but was meant to provide a strong enough penalty to deter violations while keeping it proportional to the severity of the issue. If you have ideas for a more data-driven method or a better value, I’m open to making adjustments!

  3. max_queue_len set to 6: You’re absolutely correct—after retesting, I found that setting max_queue_len to 6 doesn’t actually enhance the tuning speed. This was due to inadequate testing on my part, and I’ll either revert this change or modify it based on more thorough testing.

@kozistr
Copy link
Owner

kozistr commented Feb 13, 2025

I really appreciate your feedback. Let me address your comments:

  1. Removing comments: I will remove the unnecessary comments as you recommended. The code and naming are already clear and intuitive, so I agree that additional comments may be excessive.
  2. Penalty calculation: The constant 75 was selected somewhat randomly as a reasonable multiplier for penalizing boundary violations. It wasn’t based on empirical data but was meant to provide a strong enough penalty to deter violations while keeping it proportional to the severity of the issue. If you have ideas for a more data-driven method or a better value, I’m open to making adjustments!
  3. max_queue_len set to 6: You’re absolutely correct—after retesting, I found that setting max_queue_len to 6 doesn’t actually enhance the tuning speed. This was due to inadequate testing on my part, and I’ll either revert this change or modify it based on more thorough testing.
  • 1, 3: checked!
  • 2: if 75 is a reasonable multiplier, then I think it's okay to go. I just want to know if is there any insight or method you use :)

Lastly, could you please run make run & make check?

@AidinHamedi
Copy link
Contributor Author

  • For make run & make check: I reformatted the code to mitigate errors on check. However, the Makefile does not have a run target.
  • Regarding the number 75: I wanted a number that was big enough to highly discourage going outside of the boundaries. A higher value like 100 should have worked, but somewhat randomly, I chose the number 75 and thought that would be enough. No complicated reasoning was involved. Changing it to something like 50 or 100 doesn’t make a massive difference.

Hope that clarifies everything!

Copy link

codecov bot commented Feb 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (b82f7c4) to head (b35b22d).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #345   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          111       111           
  Lines         8891      8891           
=========================================
  Hits          8891      8891           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kozistr
Copy link
Owner

kozistr commented Feb 14, 2025

  • For make run & make check: I reformatted the code to mitigate errors on check. However, the Makefile does not have a run target.
  • Regarding the number 75: I wanted a number that was big enough to highly discourage going outside of the boundaries. A higher value like 100 should have worked, but somewhat randomly, I chose the number 75 and thought that would be enough. No complicated reasoning was involved. Changing it to something like 50 or 100 doesn’t make a massive difference.

Hope that clarifies everything!

oh, sorry for my typo. it was make format, not run lol

thanks for the clarification and thank you again for your contribution!

@kozistr kozistr merged commit 5f4e62f into kozistr:main Feb 14, 2025
4 checks passed
@AidinHamedi AidinHamedi deleted the Vis-Update branch February 14, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants