Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal backport of CUDA-related changes #7346

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Sep 29, 2021

Update the CUDA external packaging:

  • include the CUDA runtime static library in the external package;
  • create symlinks for the library stubs to satisfy link-time checks.

Add the cuda-compatible-runtime test as a new external.

Update the CUDA external packaging:
  - include the CUDA runtime static library in the external package;
  - create symlinks for the library stubs to satisfy link-time checks.

Add the cuda-compatible-runtime test as a new external.
@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 29, 2021

please test

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_0_X/master.

@smuzaffar, @iarspider, @mrodozov can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here

@fwyzard fwyzard changed the title Minimal backport of CUDA-related changes, to support the new SCRAM hook Minimal backport of CUDA-related changes Sep 29, 2021
@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

@smuzaffar @perrotta @qliphy
do you prefer to

  1. run more extensive tests for this PR by itself, and add the changes to the SCRAM hooks separately
    or
  2. add the the changes to the SCRAM hooks together with this, and test them more extensively together
    ?

@smuzaffar
Copy link
Contributor

@fwyzard , I would say, lets add the scram change (b5d1145 ) and test here extensively

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d9a5b/19257/summary.html
COMMIT: 83e19fd
CMSSW: CMSSW_12_0_X_2021-09-29-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7346/19257/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 2998564
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2998542
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 38 files compared)
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

Pull request #7346 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

ok 👍
thanks for the update

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

please test for slc7_aarch64_gcc9

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

please test for slc7_ppc64le_gcc9

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

please test for slc7_amd64_gcc10

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

please test for cc8_amd64_gcc9

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

I think we can skip the test for cc8_aarch64_gcc9 and cc8_ppc64le_gcc9.
Or should we run those as well ?

@smuzaffar
Copy link
Contributor

I think we can skip the test for cc8_aarch64_gcc9 and cc8_ppc64le_gcc9. Or should we run those as well ?

ppc64le resources are doing nothing so I suggest to start the tests for cc8_ppc64le_gcc9 but we can skip cc8_aarch64_gcc9 as we only have one arm machine available and there is long list of pending jobs to run there

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

ok

@fwyzard
Copy link
Contributor Author

fwyzard commented Sep 30, 2021

please test for cc8_ppc64le_gcc9

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d9a5b/19273/summary.html
COMMIT: 0171f2a
CMSSW: CMSSW_12_0_X_2021-09-29-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7346/19273/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19735
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19729
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 2998564
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2998542
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 38 files compared)
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d9a5b/19276/summary.html
COMMIT: 0171f2a
CMSSW: CMSSW_12_0_X_2021-09-28-2300/slc7_ppc64le_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7346/19276/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test test_PrepareInputDb had ERRORS
---> test test_MpsWorkFlow had ERRORS
---> test TestHeterogeneousCoreSonicTritonProducerGPU had ERRORS
---> test testAlignmentOfflineValidation had ERRORS
and more ...

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d9a5b/19282/summary.html
COMMIT: 0171f2a
CMSSW: CMSSW_12_0_X_2021-09-27-2300/cc8_ppc64le_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7346/19282/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d9a5b/19282/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d9a5b/19282/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test test_PrepareInputDb had ERRORS
---> test test_MpsWorkFlow had ERRORS
---> test test_PixelBaryCentreTool had ERRORS
---> test testFWCoreUtilities had ERRORS
and more ...

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d9a5b/19275/summary.html
COMMIT: 0171f2a
CMSSW: CMSSW_12_0_X_2021-09-29-2300/cc8_amd64_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7346/19275/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 57020 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 2998564
  • DQMHistoTests: Total failures: 294215
  • DQMHistoTests: Total nulls: 257
  • DQMHistoTests: Total successes: 2704070
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -3.872 KiB( 38 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): -0.192 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 136.874 ): -0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 250202.181 ): -0.410 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): 0.133 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • DQMHistoSizes: changed ( 4.53 ): 0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 7.3 ): -3.380 KiB SiStrip/MechanicalView
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: found differences in 12 / 38 workflows

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 1, 2021

+1

Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 51797 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 2998564
  • DQMHistoTests: Total failures: 571325
  • DQMHistoTests: Total nulls: 16
  • DQMHistoTests: Total successes: 2427201
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -1.974 KiB( 38 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): -0.905 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 250202.181 ): -0.469 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): -0.596 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: found differences in 12 / 38 workflows

@smuzaffar
Copy link
Contributor

+externals
looks good to go in 12.0.X. Failed unit tests for non-x86-64 archs are already in the IBs.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 1, 2021

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 1, 2021

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-7d9a5b/19277/summary.html
COMMIT: 0171f2a
CMSSW: CMSSW_12_0_X_2021-09-28-2300/slc7_aarch64_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7346/19277/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test TestFWCoreServicesDriver had ERRORS
---> test testUploadConditions had ERRORS

@perrotta
Copy link
Contributor

perrotta commented Oct 1, 2021

+1

@cmsbuild cmsbuild merged commit e72f37c into cms-sw:IB/CMSSW_12_0_X/master Oct 1, 2021
@fwyzard fwyzard deleted the IB/CMSSW_12_0_X/master_update_CUDA_hooks branch April 1, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants