Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query register capacity for segred and segscan codegen. #2057

Merged
merged 1 commit into from
Dec 7, 2023

Conversation

athas
Copy link
Member

@athas athas commented Dec 6, 2023

This is very tedious code, and required adding the notion of "kernel constant expressions", as we have some expressions that must be constant at kernel compilation time (which is at program runtime). We actually had this notion in the ImpCode representation, but now ImpGen provides some manual control as well.

@athas athas requested a review from sortraev December 6, 2023 22:07
@athas
Copy link
Member Author

athas commented Dec 6, 2023

As I told you, it's not very interesting.

@athas athas force-pushed the query-registers branch 2 times, most recently from 2066150 to 5c6af29 Compare December 7, 2023 11:41
@athas athas added the run-benchmarks Makes GA run the benchmark suite. label Dec 7, 2023
@athas athas self-assigned this Dec 7, 2023
This is very tedious code, and required adding the notion of "kernel
constant expressions", as we have some expressions that _must_ be
constant at kernel compilation time (which is at program runtime). We
actually had this notion in the ImpCode representation, but now ImpGen
provides some manual control as well.
@athas athas merged commit 8a1502c into master Dec 7, 2023
24 checks passed
@athas athas deleted the query-registers branch December 7, 2023 15:47
@athas
Copy link
Member Author

athas commented Dec 7, 2023

Looks ilke the accurate querying is actually detrimental to performance, compared to the old hardcoded values.

@athas
Copy link
Member Author

athas commented Dec 7, 2023

The old numbers correspond to pretending we have twice as much local memory available as is actually the case. Should we just multiply CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK by two? What is the logic here? @coancea, do you remember?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmarks Makes GA run the benchmark suite.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant