Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fill function for CUDA matrix #626

Merged
merged 5 commits into from
Aug 21, 2024
Merged

Conversation

sjsprecious
Copy link
Collaborator

This PR updates the Fill function for the CUDA matrix so that it no longer triggers a data transfer between host and device.

@mattldawson @K20shores both CudaDenseMatrix and CudaSparseMatrix are using the same Fill function. Is it possible to define the Fill function only at one place instead?

fix #624

@sjsprecious sjsprecious self-assigned this Aug 21, 2024
@sjsprecious sjsprecious added the enhancement New feature or request label Aug 21, 2024
@sjsprecious sjsprecious added this to the CUDA Rosenbrock Solver milestone Aug 21, 2024
@codecov-commenter
Copy link

codecov-commenter commented Aug 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.51%. Comparing base (649cb0e) to head (1a2a1f5).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #626   +/-   ##
=======================================
  Coverage   93.51%   93.51%           
=======================================
  Files          49       49           
  Lines        3502     3502           
=======================================
  Hits         3275     3275           
  Misses        227      227           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@K20shores
Copy link
Collaborator

@sjsprecious GCC11 was removed from github actions. Would you mind removing 11 in the workflow file for mac and adding gcc14 as part of this PR? It'll fix the failing test

Copy link
Collaborator

@K20shores K20shores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use the same function, you'd need to take the contents out of both matrices and put them in a separate function and then call that function. However, you'd still have two functions called Fill, once on each matrix, that just pass through to another function. I think what we have is good enough as is; if it were much larger we'd want a different solution though

@sjsprecious
Copy link
Collaborator Author

Thanks @K20shores for your comments. I just updated the CI test on Mac to use gcc14 instead of gcc11.

Copy link
Collaborator

@mattldawson mattldawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

I'm not sure off-hand of a way to merge the two Fill functions in a way that wouldn't make the code more difficult to read/maintain, but maybe you could create a discussion for it if you think there is a good way to do this?

@sjsprecious sjsprecious merged commit 2062687 into main Aug 21, 2024
29 checks passed
@sjsprecious sjsprecious deleted the fix_fill_func_for_cuda_matrix branch August 21, 2024 20:32
K20shores added a commit that referenced this pull request Sep 26, 2024
* Add CUDA Rosenbrock tests (#579)

* add sync functions to state variable
add cuda rosenbrock tests

* fix all the compilation errors
analytical tests do not work for CUDA rosenbrock

* fix call to the base class function;
bug fix for CuLudecompose and add
singularity check

* fix the compilation error for CUDA decomposition class

* remove unnecessary calls to the base class functions

* fix all the compilation errors

* add crtp to allow calls to function from either base or derived class

* fix more compilation errors about abstract rosenbrock solver
now the cuda test passes for Troe case

* add lambda functions as arguments for CPU/JIT/CUDA tests

* initialize Yerror on the GPU every time and pass all the analytical
tests

* turn off the cuda memory check for the integration tests

* revert back to the original process class

* clean up unused header

* update JIT test interface

* extend state class to cudastate class

* remove unnecessary cuda device sync

* add cuda state class and address compilation errors

* fix broken CI tests

* more bug fix for CI tests

* fix the compiler warning for cuda code

* more fix for broken CI tests

* resolve the cuda compiler warnings

* address Matt's PR comments

---------

Co-authored-by: Jian Sun <[email protected]>

* Auto-format code changes (#586)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Use Fill to reset the L and U matrices in Rosenbrock solve (#588)

use Fill in Rosenbrock solve

* In-place linear solve (#585)

* removing condensing x and b in nonvectorizable matrix code for linear solve

* adding alias back

* adding back comment

* spacing

* adding back comment

* moving comment

* vectorize version no longer segfaults but something is wrong

* vectorized passes

* removing b from jit linear solver

* removing b from cuda linear solver

* usin function pointer alias

* adding a comment

* fix conflict resolve typo

---------

Co-authored-by: Jian Sun <[email protected]>
Co-authored-by: Jian Sun <[email protected]>

* Auto-format code changes (#589)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* 498 mimic camchem substep convergence failure integration acceptance (#582)

* trying to continue on with current solution

* mimicing camchem

* testing backward euler against hires, e5

* updating citations

* oregonator is too stiff for backward euler

* addressing PR comments

* collecting solver stats

* Update include/micm/solver/backward_euler.inl

Co-authored-by: Matt Dawson <[email protected]>

* removing backward euler for oregonator test

* removing cerr in favor of a solver state

---------

Co-authored-by: Matt Dawson <[email protected]>

* Auto-format code changes (#590)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* 304 reorganize include folder (#591)

* reorganizing files

* correcting cuda imports

* Auto-format code changes (#592)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* 577 test all parameter types of the dense matrix cpu rosenbrock on the analytical policy tests (#593)

Converts HIRES, Oregonator, E5 to chemical equations so that they can be tested on the GPU

All analytical tests are tested with CPU and GPU rosenbrock. Backward euler as well (except oregonator). Renaming to match naming schemes for test files

* Auto-format code changes (#597)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Fix GPU memory leak for the CUDA unit tests (#600)

* fix most GPU memory leak

* allocate a device pointer in the device struct

* remove unused cuda mem copy

* use swap in the move constructor and assignment of CUDA class
initialize the null pointer in the struct definition
pass the cuda memory check for all the unit tests

* remove unnecessary nullptr

* fix the broken CI tests

* more bug fixes

---------

Co-authored-by: Jian Sun <[email protected]>

* Auto-format code changes (#601)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Backware Euler with vectorizable matrix types (#596)

* starting to test all solver parameter types

* saving progress

* saving progress

* testing all stages analytically

* updating all interfaces

* correcting cuda build I hope

* testing jit against hires, e5, oregonator

* adding cuda solver builder test

* removing hires, e5, oregonator from cuda tests; they need their own kernels

* testing e5 from a configuration

* testing e5 jit integration

* testing e5 properly

* removing reset of L and U matrices (#594)

* oregonator from a configuration

* renaming things

* using different tolerances?

* moving state onto and off of host

* saving gpu changes

* updating cuda tests

* adding some better tolerances for cuda tests

* adding different tolerances for e5

* adding citation to e5

* thing

* formed hires equations

* using passing tolerances for cpu tests

* jit tolerances

* backward euler tests

* configuration for hires

* add AddToDiagonal function on sparse matrix

* use ForEach in Backward Euler

* add convergence check function to backward euler

* fix merge problems

* add vector matrix to analytical solver tests

* update JIT analytical tests

* set up general use analytical test function

* add general function for stiff analytical tests

* fix jit analytical tests

* update remaining analytical tests

* address review comments

* update cuda analytical tests

* update tolerances for cuda analytical tests

---------

Co-authored-by: Kyle Shores <[email protected]>
Co-authored-by: Kyle Shores <[email protected]>

* Auto-format code changes (#605)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* 572 check for singularity when the solver parameters flag is turned on (#603)

Add tests to check for singularity in the U matrix after the LU decomposition. If the check for singularity flag is turned on, decrease the timestep and try again. Fixes a bug where a zero in the bottom right of the U matrix would not have been detected

* Auto-format code changes (#606)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Provide a way to access the processes_ data member (#607)

return the process_ member

Co-authored-by: Jian Sun <[email protected]>

* Auto-format code changes (#608)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* adding headers

* Auto-format code changes (#609)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Add missing CUDA tests and fix broken path (#611)

add missing cuda tests and fix broken path

* throwing error on mismatched size (#610)

* throwing error on mismatched size

* using a copy of the paramteres so that a builder can be repeatedly used

* adding const

* correcting number of tolerances for robertson

* Auto-format code changes (#612)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Correct usage of third body species (#614)

using the species map to grab the exact same species for reactants and products

* Auto-format code changes (#615)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* correcting solver builder constructor (#616)

* correcting solver builder constructor

* fix a bug

---------

Co-authored-by: Jiwon Gim <[email protected]>

* Relax the criteria to pass the GPU test with nvhpc/24.7 on Derecho (#618)

relax the criteria to pass the GPU test with nvhpc/24.7 on Derecho

* Auto-format code changes (#623)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Update fill function for CUDA matrix (#626)

* update the fill function for cuda matrix to avoid data transfer

* fix compilation errors

* add a comment about template function

* update fill function for cuda sparse matrix

* remove gcc11 CI test and add gcc14 CI test

* Auto-format code changes (#627)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Remove data transfer in cuda matrix constructor and template some CUDA functions (#630)

* remove data transfer in the cuda dense matrix constructor

* template many cuda functions for cuda dense and sparse matrix

* Auto-format code changes (#633)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Remove redundant variable and optimize the copy assignment for the CUDA matrix (#636)

* test to remove forcing variables

* fix broken unit tests

* fix the bug of calculating forcing term when substepping happens

* update the copy assignment operator for CUDA matrix

* fix the broken unit tests again

* Remove local copy of state in solver functions (#639)

remove local copy of state in solver functions

* Auto-format code changes (#640)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Add CUDA stream for asynchronous kernel launch (#641)

* add the functions to create & get cuda stream

* simplify the CUDA dense matrix destructor

* add cuda stream to cuda matrix functions

* add cuda stream to process_set.cu

* add cuda stream to CudaLuDecomposition

* add cuda stream to CudaLinearSolver

* set cuda stream in the cublas handle
add cuda stream to rosenbrock.cu

* switch to singleton class for cuda stream manager

* update the method to get the cuda stream

* revise the Gtest main function to cleanup the CUDA resources explicitly

* fix broken cuda analytical test

* fix GPU memory leak in the unit test

* clean up unused files

* fix Kyle's review comment

* make cudamemset asynchronous

* Auto-format code changes (#645)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Remove the local copy of Jacobian matrix when doing LU decomposition (#646)

remove the local copy of jaocbian matrix in the LinFactor function

* Auto-format code changes (#647)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Add const to solver functions (#642)

add const to solver functions

* Replace json to yaml 619 (#649)

* reaplce

* json to yaml

* yamle to JSON

* test

* added .string to yaml file

* added string to loadFile

* changes based on the PR. modified the code to use YAML file

* Auto-format code changes (#650)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Add const qualifiers (#651)

add const qualifiers

* Move Yerror construction outside of the inner solve loop for rosenbrock (#652)

* added error outside of the loop

* moved the code to all the way to outer while loop

* Auto-format code changes (#653)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Move temporary variables to the State class (#655)

* add temporary variables in the solver class

* declare temporary variable in the State class; initialize temporary variable in the solver

* fix broken units test build

* rename base class for temporary variables

* make destructor of base class virtual so that the GPU memory is freed correctly

* remove unnecessary data member from the solver class

* add the copy assignment and constructor for the state class

* add JIT rosenbrock parameter type

* maybe this fixes the broken JIT tests

* try is_convertible instead

* Auto-format code changes (#656)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Use CUDA Rosenbrock parameters (#659)

* use cuda rosenbrock parameters instead

* use 0 for fill function

* Added license and copyright (#661)

added copyright

* Auto-format code changes (#660)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* Misc updates (#665)

* add back the getnumberofreactions function

* update cuda thread count to 512

* Set LU matrices to zero when jacobian is a zero element (#666)

* pushing

* pushing fix

* removing unneccesary logic check

* adding cuda stuff

* lowering tolerance

* lowering tolerance

* modified jit ludecomp

* raising tolerance

* testing jit and cuda properly

* raising tolerance

* raising again

* again

* raising again

* lowering tolerance

* adding prints to matrices

* copy LU to host

* printing A

* sparsity

* bernoulli again

* manual engine

* double

* thing

* printing values

* larger matrix

* 2 cells

* now

* dense

* 20

* 4000

* things

* uncomment

* uncomment

* print

* 9

* 8

* 6

* 5

* 2e-6

* lu decomp

* 10

* 8

* 0

* comment

* checking

* uncomment

* 7

* 1

* 10

* print

* print

* again

* 100

* data check

* remove check results

* 13

* 16

* eq

* equal

* uncomment

* 1 block

* 5

* print

* 1

* testing LU decomp specifically

* trying to correct cuda test

* lowering

* lowering tolerance

* lowering again

* thing

* variable

* all tests pass on derecho

* setting values to zero for lu decomp

* defaulting LU to 0 instead of 1e-30

* copying block values to other blocks

* removing small value initialization

* correcting version copyright

* using absolute error

* making index once

* camel case

* Auto-format code changes (#667)

Auto-format code using Clang-Format

Co-authored-by: GitHub Actions <[email protected]>

* bumping version

---------

Co-authored-by: Jian Sun <[email protected]>
Co-authored-by: Jian Sun <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Actions <[email protected]>
Co-authored-by: Matt Dawson <[email protected]>
Co-authored-by: Jiwon Gim <[email protected]>
Co-authored-by: Montek Thind <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The Fill function for the CUDA matrix is inefficient
4 participants