-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ifpack2: BlockTriDi fix for large blocks #13792
Conversation
In ExtractAndFactorize kernels that use scratch, fall back to level 1 when there isn't sufficient level 0. Signed-off-by: Brian Kelley <[email protected]>
Use max block size of 32 and a small grid. Test standard BTD, BTD with Schur line splitting, and Block Jacobi. Signed-off-by: Brian Kelley <[email protected]>
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
lgtm, thanks! I'll give it a try today. |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_gcc
Jenkins Parameters
Build InformationTest Name: PR_gcc-openmpi_debug
Jenkins Parameters
Build InformationTest Name: PR_clang
Jenkins Parameters
Build InformationTest Name: PR_cuda
Jenkins Parameters
Build InformationTest Name: PR_intel
Jenkins Parameters
Build InformationTest Name: PR_cuda-uvm
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed issue is fixed, thanks!
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ jewatkins ]! |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
In BlockTriDi's ExtractAndFactorize kernels that use scratch, fall back to level 1 when there isn't sufficient level 0.
@trilinos/ifpack2
Motivation
Ifpack2's block TriDi and block Jacobi should work for block sizes to and including 32. For relatively large block sizes like 27, many GPUs didn't have enough shared memory per block for the numeric setup (extract+factorize). When there isn't enough shared, this PR falls back to level 1 scratch.
The level 0 and level 1 cases are instantiated separately to make sure there is no performance hit to the original level 0 case.
Related Issues
No issue for this, was reported by email
Stakeholder Feedback
Issue reported directly by @jewatkins for SPARC.
Testing
Added testing with the maximum block size of 32. Verified that the level 1 scratch code path is taken on nvidia and amd gpus in this test.