Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPCS3 Crashes on Intel 13th/14th Gen CPUs When Running Above 5 GHz #16575

Open
MantraAU opened this issue Jan 18, 2025 · 17 comments
Open

RPCS3 Crashes on Intel 13th/14th Gen CPUs When Running Above 5 GHz #16575

MantraAU opened this issue Jan 18, 2025 · 17 comments

Comments

@MantraAU
Copy link

Quick summary

RPCS3 crashes randomly when running on Intel 13th or 14th Gen CPUs with clock speeds above 5 GHz.

The cause of the crash is due to "Vmin Shift Instability".

The crash occurs during PPU compilation, and limiting the CPU speed to 5 GHz via Intel Extreme Tuning Utility resolves the issue.

A potential solution for RPCS3 would be to detect affected CPUs and offer an automatic throttling option to 5 GHz to prevent crashes.

Details

It seems that RPCS3 can randomly crash when the CPU is running faster than 5 GHz.

While playing a game, it suddenly crashed and closed without showing an error message. Afterward, every attempt to reboot the game resulted in a crash during PPU Compilation, preventing me from even booting into the XMB.

When it crashes, the screen freezes on: (These figures may vary)

Compiling PPU Modules..  
Progress: file 67 of 412. module 2 of 50

Shortly after, RPCS3 closes entirely without any error message.

I tried multiple fixes, including reinstalling RPCS3, restarting my machine, tweaking various settings, and analyzing the error log with ChatGPT, but nothing worked.

How I Fixed It:

The issue is called "Vmin Shift Instability" and it's present in only 13th & 14th Gen Intel CPUs (Yay Intel)

I found that CPUs running faster than 5 GHz can cause crashes while troubleshooting a similar issue with Call of Duty. In that case, I found (This Video) that suggested capping the CPU frequency.

Applying the same workaround to RPCS3, I used Intel Extreme Tuning Utility to limit my CPU's performance cores to 5 GHz. This immediately resolved the issue, and the emulator worked without further crashes.

I’ve tested this fix multiple times, and it consistently prevents crashes in RPCS3.

How can this be fixed in RPCS3?

There are 2 ways to fix this and it's a BIOS update or to throttle the CPU Speed to 5Ghz.

This can definitely be fixed by the users but some may not be experienced enough to update their BIOS, others may not have an unlocked CPU.

A good solution would be for RPCS3 to:

  • Detect if the CPU is affected.
  • Offer an option to throttle the CPU speed to 5 GHz automatically.
  • Advise User to update their BIOS

If the developers don’t have access to a 13th or 14th Gen CPU, I’m happy to assist with further testing via TeamViewer or AnyDesk.

Attach a log file

RPCS3.log

Attach capture files for visual issues

No response

System configuration

Intel Core i7 14700K (Stock Clock - Except for underclocking it to fix the issue)

NVIDIA GeForce RTX 3080 Ti
Driver Version: 32.0.15.6636

Other details

No response

@kd-11
Copy link
Contributor

kd-11 commented Jan 18, 2025

Settings -> Emulator -> LLVM compiler threads
Choose a value less than your CPU core count.

@yeager
Copy link

yeager commented Jan 18, 2025

Went down to 31 but did not help..
RPCS3.log

@MantraAU
Copy link
Author

MantraAU commented Jan 18, 2025

Settings -> Emulator -> LLVM compiler threads Choose a value less than your CPU core count.

This setting does not resolve Vmin Shift Instability. This issue occurs when the CPU requests voltage, but the voltage applied is insufficient or incorrect due to instability at high frequencies.

The problem does not stem from having too many cores or threads active; it arises specifically with CPUs operating at high frequencies.

Edit:

https://community.intel.com/t5/Blogs/Tech-Innovation/Client/Intel-Core-13th-and-14th-Gen-Desktop-Instability-Root-Cause/post/1633239

@MantraAU
Copy link
Author

Went down to 31 but did not help.. RPCS3.log

Can you show me a screenshot of your Intel Tuning Utility?
6Ghz down to 3.1Ghz is very extreme

You should only need to change your Performance Core Ratio from 60x > 50x

@yeager
Copy link

yeager commented Jan 18, 2025

I didn't reduce the clock at first.. but now I did and that worked great! Thanks!

Image

Image

@AniLeo
Copy link
Member

AniLeo commented Jan 18, 2025

We at least need a way to detect this to discard false positive crash reports, otherwise we'll have to assume that all crash reports from these two gens can be not our bug

@MantraAU
Copy link
Author

We at least need a way to detect this to discard false positive crash reports, otherwise we'll have to assume that all crash reports from these two gens can be not our bug

Unfortunately, there isn’t a direct way to detect this issue, as it’s a low-level hardware bug. However, RPCS3 could potentially mitigate false positives by implementing checks for affected CPUs and their operating frequency during startup. Based on this, RPCS3 could:

  • Advise the user about the potential issue and recommend throttling their CPU to 5 GHz (if applicable to their hardware).
  • Suggest updating their BIOS (Intimidating to newbies & not all manufacturers have released fixes for this).
  • RPCS3 implementing a feature to throttle its own operations to 5 GHz, if technically feasible??
  • Add a warning at the top of the log to alert users and developers that Vmin Shift Instability might be the cause of crashes or other issues.

Since this is a hardware instability, the scope of its impact could extend beyond just crashes. It might cause smaller issues that are harder to identify. Without a warning in the error log, users and developers might not realize it’s a contributing factor.

At the very least i believe a warning in the log or a notification/warning would be a good step forward

@kd-11
Copy link
Contributor

kd-11 commented Jan 18, 2025

Went down to 31 but did not help.. RPCS3.log

I was thinking more like start with a handful and figure out your safe limit. Going down 1 or 2 cores isn't going to get you much headroom.

@Yahfz
Copy link
Contributor

Yahfz commented Jan 22, 2025

@MantraAU If everybody with an unstable system asked us to fix their issue inside RPCS3 we'd be doomed. I'm not gonna say this isn't RPCS3 problem, but this isn't RPCS3 problem.

@CockneyRhymingJedi
Copy link

CockneyRhymingJedi commented Jan 24, 2025

Went down to 31 but did not help.. RPCS3.log

I was thinking more like start with a handful and figure out your safe limit. Going down 1 or 2 cores isn't going to get you much headroom.

Just to give feedback, this fixed it for me; I increased it to 23 cores before the crashes reappeared on my computer (i9-14900KF processor) using version RPCS3 0.0.34-17383-f1f85335 Alpha | master. Coincidentally, this is the number of physical cores -1, or the correct number (24) if you count from 0 (as Windows references the cores).

Once the cache was compiled, I could revert to full 32 cores to play the game (Disgaea D2) if I wished.

This makes me wonder if the issue is the LLVM compiler not handling the fact not all cores are hyper-threaded on the 13/14th gen Intel processors (e.g. I have 8 performance cores with 2 threads each, and 16 efficient cores with 1 thread each). However, this is pure conjecture on my behalf as I don't know enough about this project, the LLVM-project, or indeed C/C++ to back this up.

NB: this computer already had the microcode fix installed before using RPCS3.

@Yahfz
Copy link
Contributor

Yahfz commented Jan 24, 2025

This makes me wonder if the issue is the LLVM compiler not handling the fact not all cores are hyper-threaded on the 13/14th gen Intel processors (e.g. I have 8 performance cores with 2 threads each, and 16 efficient cores with 1 thread each). However, this is pure conjecture on my behalf as I don't know enough about this project, the LLVM-project, or indeed C/C++ to back this up.

That isn't the case here. There's several users including myself using a 13-14th gen cpu with 8+16 cores and we don't have any problems. Your CPU is simply unstable, you can most likely reproduce this in some stress test such as prime95.

@MantraAU
Copy link
Author

@MantraAU If everybody with an unstable system asked us to fix their issue inside RPCS3 we'd be doomed. I'm not gonna say this isn't RPCS3 problem, but this isn't RPCS3 problem.

I’m not asking the devs to fix this directly (though adding a CPU throttle option would be super convenient).

What I’m suggesting is something simple, like a warning or notification to let users know about the issue so they can sort it out themselves via CPU Throttle or BIOS Update.

Even just adding something to the logs would help a lot, making it easier for both users and devs to recognize when Vmin Shift Instability might be causing problems. That way, people don’t waste time chasing other fixes when this could be the culprit.

@kd-11
Copy link
Contributor

kd-11 commented Jan 25, 2025

We cannot detect defective hardware when the defect kills the app. This would require collaboration with intel and motherboard vendors to tell us that a cpu may have already degraded.
The only way to tell users is to show an alarming warning to every user of a 13th or 14th gen intel cpu whether they are affected or not which doesn't seem like a good idea. However some PC games already do that, so maybe it's fine?

@CockneyRhymingJedi
Copy link

This makes me wonder if the issue is the LLVM compiler not handling the fact not all cores are hyper-threaded on the 13/14th gen Intel processors (e.g. I have 8 performance cores with 2 threads each, and 16 efficient cores with 1 thread each). However, this is pure conjecture on my behalf as I don't know enough about this project, the LLVM-project, or indeed C/C++ to back this up.

That isn't the case here. There's several users including myself using a 13-14th gen cpu with 8+16 cores and we don't have any problems. Your CPU is simply unstable, you can most likely reproduce this in some stress test such as prime95.

I wasn't aware that others with similar processors were not getting the issue. My apologies.

To put my mind at rest, I ran Prime95 overnight for 8.5 hrs last night and got no errors or warnings.

I also ran the Intel Processor Diagnostic Tool, and all tests passed.

Lastly, I ran Call of Duty Warzone as mentioned in the video @MantraAU linked, and didn't get any issues (other than not enjoying the game :-D)

I agree that there is no expectation for a fix from the RPCS3 team, as this is not an issue with their software. I am just putting my results here in case it helps others or can help the case for an advisory notice on the website or in the software to help users experiencing this sort of crash and give them the options for fixing/working around the issue.

@MantraAU
Copy link
Author

This makes me wonder if the issue is the LLVM compiler not handling the fact not all cores are hyper-threaded on the 13/14th gen Intel processors (e.g. I have 8 performance cores with 2 threads each, and 16 efficient cores with 1 thread each). However, this is pure conjecture on my behalf as I don't know enough about this project, the LLVM-project, or indeed C/C++ to back this up.

That isn't the case here. There's several users including myself using a 13-14th gen cpu with 8+16 cores and we don't have any problems. Your CPU is simply unstable, you can most likely reproduce this in some stress test such as prime95.

I wasn't aware that others with similar processors were not getting the issue. My apologies.

To put my mind at rest, I ran Prime95 overnight for 8.5 hrs last night and got no errors or warnings.

I also ran the Intel Processor Diagnostic Tool, and all tests passed.

Lastly, I ran Call of Duty Warzone as mentioned in the video @MantraAU linked, and didn't get any issues (other than not enjoying the game :-D)

I agree that there is no expectation for a fix from the RPCS3 team, as this is not an issue with their software. I am just putting my results here in case it helps others or can help the case for an advisory notice on the website or in the software to help users experiencing this sort of crash and give them the options for fixing/working around the issue.

I think there’s a bit of misunderstanding about how this issue works.

  • Not all 13th / 14th gen CPUs are affected
  • Not all heavy tasks will cause the issue
  • This can actually be fixed with a BIOS update which includes patch 0x12B

NB: this computer already had the microcode fix installed before using RPCS3.

If you have the micro-code, then your issue is possibly something unrelated to vmshift instability.

To put my mind at rest, I ran Prime95 overnight for 8.5 hrs last night and got no errors or warnings.

I also ran the Intel Processor Diagnostic Tool, and all tests passed.

Lastly, I ran Call of Duty Warzone as mentioned in the video @MantraAU linked, and didn't get any issues (other than not enjoying the game :-D)

Vmshift instability doesn’t happen with every process. For example, I’ve only run into it when compiling shaders in RPCS3 and Call of Duty. Meanwhile, stress-testing my CPU or compiling shaders in Unreal Engine runs completely fine with no problems.

But the error that you are encountering is possibly something different altogether due to the installed microcode update

We cannot detect defective hardware when the defect kills the app. This would require collaboration with intel and motherboard vendors to tell us that a cpu may have already degraded.

RPCS3 does not need to identify faulty hardware as it's impossible to do this at a software level.
If it were to check for a 13th & 14th gen CPU you could add a log entry that states something like..

"WARNING: CPU may be affected by Vmin shift instability. If experiencing crashes, ensure BIOS is up to date with microcode patch 0x12B."

This will hopefully allow the user to self identify and fix the issue without hassling RPCS3 devs for support, it may also assist the devs in fixing any future issues which may occur.

I already helped close another issue that was related to this, i'm willing to bet theres many others there that are still open, were never resolved or will be opened in the future

@kd-11
Copy link
Contributor

kd-11 commented Jan 26, 2025

One thing to note: If vmin shift already caused internal damage then bios can never truly fix it. It will still crash.
The bios update just prevents further damage from occurring.

The suggestion to warn users in the log is not very useful since most users disable logging for performance reasons. This is one of those things that should be fixable with documentation but with rpcs3 not having a centralized support system that is a bit harder to implement without annoying users.

@CockneyRhymingJedi
Copy link

Last update from me (promise :-))

I had a blue screen the other day and noticed high temps on my CPU while under 60% load, so I checked my BIOS and low and behold, I was overclocking using my Gigabyte motherboard options (I thought I had reverted to Intel settings back in Sept after the microcode update)., So, I reverted to Intel Performance settings and re-ran RPCS shader compilation using all cores, and it worked first try.

Sorry for muddying the waters on this issue; it looks like the CPU profile I used had some instability issues of its own. Hopefully, my posts will provide helpful troubleshooting steps for others with similar problems, but not the same as the OP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants