Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting PL time in MSR #9

Open
Petrusion opened this issue Aug 16, 2024 · 17 comments
Open

Setting PL time in MSR #9

Petrusion opened this issue Aug 16, 2024 · 17 comments

Comments

@Petrusion
Copy link

Would it be difficult to add an option to set PL time in MSR? It would greatly help my use case.

Setting PL time to some large value (hours) in MSR is the only way my CPU can get above 45W in the long-term. It is a notebook so I can't just increase the TDP even though it can withstand 90W without thermal throttling. I already tried setting the MMIO time but it gets ignored in favour of time set in MSR.

Unfortunately I can only get good TDP in Windows because there throttlestop sets MSR time, leaving MMIO time on the default value.

@horshack-dpreview
Copy link
Owner

Sure, I think so. Please give me a few days. It's been a while since I've worked on this code and I need to find an Intel notebook to test the change with.

@Petrusion
Copy link
Author

Awesome, thank you so much! If you need me to try something out on my machine (11980HK) don't hesitate to contact me.

@horshack-dpreview
Copy link
Owner

I found where the PL time can be set using the same /sys tree where my script currently sets the PL watts. Before I consider how to integrate this into the script could you first try setting the values manually and confirm you get the expected increased in perf under Linux? I'm not able to perturb the performance on the Intel laptop I'm using, so I want to make sure the value is being honored, even though it should since I see the value I echo to /sys being encoded into the MSR.

Here's a sample command to set both PL time windows to 4 hours:

echo "14400000000" | tee /sys/class/powercap/intel-rapl/intel-rapl\:0/constraint_0_time_window_us /sys/class/powercap/intel-rapl/intel-rapl\:0/constraint_1_time_window_us

And the command to view the currently set time:

cat /sys/class/powercap/intel-rapl/intel-rapl\:0/constraint_[0-1]_time_window_us

@Petrusion
Copy link
Author

Petrusion commented Aug 18, 2024

This doesn't work unfortunately. It goes back to 45W after a minute. I might very well be wrong about this, but the way I understand it I'm using MMIO by writing into these files, not MSR.

All I know is in Windows Throttlestop claims to set the time in MSR, leaving MMIO time at its default.

even though it should since I see the value I echo to /sys being encoded into the MSR

How do I check if the time is encoded into the MSR on my machine after settings it via MMIO?

@horshack-dpreview
Copy link
Owner

On my system changing those time windows does get reflected in MSR. You can verify this is by running setPL before making the change and again after. You'll see the value reflected in the MSR_PKG_POWER_LIMIT value displayed.

By default setPL sets the MMIO register to all zeros and locks it, which should disable it, leaving only the MSR for the processor to use, which can then be changed any number of times during the session via setPL. There's a remote chance your system may require the MMIO's time window to be set as well, but I can't really see how that could be since setPL disables that register.

@Petrusion
Copy link
Author

I see. In that case it is working correctly but somehow the CPU ignores the new values even though they are in MSR before and after my testing. Weird that it happens in Linux but not in Windows.. any idea what it could be?

@horshack-dpreview
Copy link
Owner

How are you determining the CPU isn't reaching 45W under Linux? Based on what's reported by an app like turbostat? Or indirectly based on observed performance?

@Petrusion
Copy link
Author

It is reaching 45W, the problem is it won't use more than 45W for more than a minute. The processor's tdp is 45 but it is set too aggressively low by the manufacturer because the cooling can handle even 80-90, so I wanna solve that by being permanently in PL1/2 under load.

I'm using MangoHud to see the power draw, which reads it from /sys/class/powercap/intel-rapl\:0/energy_uj.

@horshack-dpreview
Copy link
Owner

Can you try changing "F_DISABLE_MMIO_PL1_PL2" to $FALSE and see if that affects it. This will set the MMIO PL values to the same as the MSR rather than disabling the MMIO PL values.

@Petrusion
Copy link
Author

Can you try changing "F_DISABLE_MMIO_PL1_PL2" to $FALSE and see if that affects it.

It doesn't help.

@horshack-dpreview
Copy link
Owner

I should have mentioned you need to reboot to try the change to F_DISABLE_MMIO_PL1_PL2 because the lock my script puts on the MMIO reigster prevents any changes to it for the duration of that power-on session, so if you ran setPL for that same session before changing F_DISABLE_MMIO_PL1_PL2 then it wouldn't apply.

@Petrusion
Copy link
Author

Yes, I figured as much. I rebooted before attempting it.

@horshack-dpreview
Copy link
Owner

Can you verify that PL1/PL2 is still set to your desired value at the moment MangoHud reports power consumption drops back to 45W? You can use this to see the PL1/PL2 values:

turbostat sleep 0 2>&1 | grep MSR_PKG_POWER_LIMIT -A 2

@Petrusion
Copy link
Author

Yes, turbostat sleep 0 2>&1 | grep MSR_PKG_POWER_LIMIT -A 2 reports 14336.000000 sec and correct Watts even after the CPU gets limited to 45W. Both with F_DISABLE_MMIO_PL1_PL2 as $FALSE and $TRUE

@horshack-dpreview
Copy link
Owner

Hmm, at this point I'm thinking the processor is dropping down due to the throttling events that are designed to override PL1/PL2. Have you tried monitoring those events to see which if any are occurring? There are tools under Windows to monitor this but I'm not sure what's available under Linux.

@Petrusion
Copy link
Author

I have a hypothesis, maybe setPL works after all... I just noticed that temperature of both CPU and GPU doesn't ever seem to go above 80°C. There must be something in linux trying to keep both of them under 80°C, which is extremely annoying since I never set up anything like that. This CPU is designed to thermal throttle at 100°C and the GPU at 86°C, so of course it won't run above 45W when there is some evil piece of code or whatever blocking it at 80.

I'm trying to do something about it but I'm out of my depth here. Do you have any ideas what it might be?

@horshack-dpreview
Copy link
Owner

I'm not familiar with any potential logic that would enforce a software-based throttle below the processor's normal threshold but a quick google search reveals there are such system components. I would check the system/kernel logs first and see if there is any message related to throttling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants