You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In file included from ./src/main.cpp:8:
In file included from ./src/all_pairs.h:4:
In file included from /opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/execution:34:
In file included from /opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/pstl-impl/pstl.hpp:31:
In file included from /opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/pstl-impl/algorithm.hpp:39:
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/algorithms/algorithm.hpp:117:27: error: no matching function for call to object of type 'const (lambda at ./src/kernels.h:214:9)'
117 | f(*it);
| ^
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/algorithms/algorithm.hpp:114:25: note: while substituting into a lambda expression here
114 | [=](sycl::id<1> id) {
| ^
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/pstl-impl/algorithm.hpp:66:26: note: in instantiation of function template specialization 'hipsycl::algorithms::for_each_n<std::ranges::iota_view<unsigned int, unsigned int>::_Iterator, unsigned int, (lambda at ./src/kernels.h:214:9)>' requested here
66 | hipsycl::algorithms::for_each_n(queue, first, n, f);
| ^
./src/kernels.h:211:10: note: in instantiation of function template specialization 'std::for_each_n<std::ranges::iota_view<unsigned int, unsigned int>::_Iterator, unsigned int, (lambda at ./src/kernels.h:214:9)>' requested here
211 | std::for_each_n(
| ^
./src/barnes_hut.h:60:9: note: in instantiation of function template specialization 'barnes_hut_step<float, unsigned int>' requested here
60 | barnes_hut_step<T, Index_t>(system, arguments, tree, step == 0);
| ^
./src/main.cpp:52:51: note: in instantiation of function template specialization 'run_barnes_hut<float>' requested here
52 | return run_simulation<T>(arguments, system, run_barnes_hut<T>);
| ^
./src/main.cpp:65:9: note: in instantiation of function template specialization 'run_precision<float>' requested here
65 | run_precision<float>(arguments);
| ^
./src/kernels.h:214:9: note: candidate function template not viable: 'this' argument has type 'const (lambda at ./src/kernels.h:214:9)', but method is not marked const
214 | [tree] (auto tree_index) mutable { tree.clear(tree_index); });
| ^
In file included from <built-in>:3:
In file included from /opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/detail/sycl_glue.hpp:40:
In file included from /opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/algorithms/util/allocation_cache.hpp:33:
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/vector:62:
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_algobase.h:66:
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:224:7: error: call to deleted function '__advance'
224 | std::__advance(__i, __d, std::__iterator_category(__i));
| ^~~~~~~~~~~~~~
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/pstl-impl/algorithm.hpp:65:10: note: in instantiation of function template specialization 'std::advance<std::ranges::iota_view<unsigned long>::_Iterator, unsigned int>' requested here
65 | std::advance(last, std::max(n, Size{0}));
| ^
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/pstl-impl/algorithm.hpp:63:20: note: while substituting into a lambda expression here
63 | auto offloader = [&](auto& queue) {
| ^
./src/all_pairs.h:52:14: note: in instantiation of function template specialization 'std::for_each_n<std::ranges::iota_view<unsigned long>::_Iterator, unsigned int, (lambda at ./src/all_pairs.h:55:13)>' requested here
52 | std::for_each_n(
| ^
./src/main.cpp:56:51: note: in instantiation of function template specialization 'run_all_pairs_collapsed_step<float>' requested here
56 | return run_simulation<T>(arguments, system, run_all_pairs_collapsed_step<T>);
| ^
./src/main.cpp:65:9: note: in instantiation of function template specialization 'run_precision<float>' requested here
65 | run_precision<float>(arguments);
| ^
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:202:5: note: candidate function [with _OutputIterator = std::ranges::iota_view<unsigned long>::_Iterator, _Distance = __int128] has been explicitly deleted
202 | __advance(_OutputIterator&, _Distance, output_iterator_tag) = delete;
| ^
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:157:5: note: candidate function template not viable: no known conversion from 'typename iterator_traits<_Iterator>::iterator_category' (aka 'std::output_iterator_tag') to 'input_iterator_tag' for 3rd argument
157 | __advance(_InputIterator& __i, _Distance __n, input_iterator_tag)
| ^ ~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:168:5: note: candidate function template not viable: no known conversion from 'typename iterator_traits<_Iterator>::iterator_category' (aka 'std::output_iterator_tag') to 'bidirectional_iterator_tag' for 3rd argument
168 | __advance(_BidirectionalIterator& __i, _Distance __n,
| ^
169 | bidirectional_iterator_tag)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:184:5: note: candidate function template not viable: no known conversion from 'typename iterator_traits<_Iterator>::iterator_category' (aka 'std::output_iterator_tag') to 'random_access_iterator_tag' for 3rd argument
184 | __advance(_RandomAccessIterator& __i, _Distance __n,
| ^
185 | random_access_iterator_tag)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:224:7: error: call to deleted function '__advance'
224 | std::__advance(__i, __d, std::__iterator_category(__i));
| ^~~~~~~~~~~~~~
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/algorithms/algorithm.hpp:116:32: note: in instantiation of function template specialization 'std::advance<std::ranges::iota_view<unsigned long>::_Iterator, unsigned long>' requested here
116 | std::advance(it, id[0]);
| ^
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/algorithms/algorithm.hpp:114:25: note: while substituting into a lambda expression here
114 | [=](sycl::id<1> id) {
| ^
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/pstl-impl/algorithm.hpp:66:26: note: in instantiation of function template specialization 'hipsycl::algorithms::for_each_n<std::ranges::iota_view<unsigned long>::_Iterator, unsigned int, (lambda at ./src/all_pairs.h:55:13)>' requested here
66 | hipsycl::algorithms::for_each_n(queue, first, n, f);
| ^
./src/all_pairs.h:52:14: note: in instantiation of function template specialization 'std::for_each_n<std::ranges::iota_view<unsigned long>::_Iterator, unsigned int, (lambda at ./src/all_pairs.h:55:13)>' requested here
52 | std::for_each_n(
| ^
./src/main.cpp:56:51: note: in instantiation of function template specialization 'run_all_pairs_collapsed_step<float>' requested here
56 | return run_simulation<T>(arguments, system, run_all_pairs_collapsed_step<T>);
| ^
./src/main.cpp:65:9: note: in instantiation of function template specialization 'run_precision<float>' requested here
65 | run_precision<float>(arguments);
| ^
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:202:5: note: candidate function [with _OutputIterator = std::ranges::iota_view<unsigned long>::_Iterator, _Distance = __int128] has been explicitly deleted
202 | __advance(_OutputIterator&, _Distance, output_iterator_tag) = delete;
| ^
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:157:5: note: candidate function template not viable: no known conversion from 'typename iterator_traits<_Iterator>::iterator_category' (aka 'std::output_iterator_tag') to 'input_iterator_tag' for 3rd argument
157 | __advance(_InputIterator& __i, _Distance __n, input_iterator_tag)
| ^ ~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:168:5: note: candidate function template not viable: no known conversion from 'typename iterator_traits<_Iterator>::iterator_category' (aka 'std::output_iterator_tag') to 'bidirectional_iterator_tag' for 3rd argument
168 | __advance(_BidirectionalIterator& __i, _Distance __n,
| ^
169 | bidirectional_iterator_tag)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_iterator_base_funcs.h:184:5: note: candidate function template not viable: no known conversion from 'typename iterator_traits<_Iterator>::iterator_category' (aka 'std::output_iterator_tag') to 'random_access_iterator_tag' for 3rd argument
184 | __advance(_RandomAccessIterator& __i, _Distance __n,
| ^
185 | random_access_iterator_tag)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ./src/main.cpp:8:
In file included from ./src/all_pairs.h:4:
In file included from /opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/execution:34:
In file included from /opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/pstl-impl/pstl.hpp:31:
In file included from /opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/pstl-impl/algorithm.hpp:39:
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/algorithms/algorithm.hpp:117:27: error: no matching function for call to object of type 'const (lambda at ./src/kernels.h:214:9)'
117 | f(*it);
| ^
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/algorithms/algorithm.hpp:114:25: note: while substituting into a lambda expression here
114 | [=](sycl::id<1> id) {
| ^
/opt/adaptivecpp/bin/../include/AdaptiveCpp/hipSYCL/std/stdpar/pstl-impl/algorithm.hpp:66:26: note: in instantiation of function template specialization 'hipsycl::algorithms::for_each_n<std::ranges::iota_view<unsigned int, unsigned int>::_Iterator, unsigned int, (lambda at ./src/kernels.h:214:9)>' requested here
66 | hipsycl::algorithms::for_each_n(queue, first, n, f);
| ^
./src/kernels.h:211:10: note: in instantiation of function template specialization 'std::for_each_n<std::ranges::iota_view<unsigned int, unsigned int>::_Iterator, unsigned int, (lambda at ./src/kernels.h:214:9)>' requested here
211 | std::for_each_n(
| ^
./src/barnes_hut.h:60:9: note: in instantiation of function template specialization 'barnes_hut_step<double, unsigned int>' requested here
60 | barnes_hut_step<T, Index_t>(system, arguments, tree, step == 0);
| ^
./src/main.cpp:52:51: note: in instantiation of function template specialization 'run_barnes_hut<double>' requested here
52 | return run_simulation<T>(arguments, system, run_barnes_hut<T>);
| ^
./src/main.cpp:67:9: note: in instantiation of function template specialization 'run_precision<double>' requested here
67 | run_precision<double>(arguments);
| ^
./src/kernels.h:214:9: note: candidate function template not viable: 'this' argument has type 'const (lambda at ./src/kernels.h:214:9)', but method is not marked const
214 | [tree] (auto tree_index) mutable { tree.clear(tree_index); });
| ^
1 warning and 4 errors generated.
Note that there is some duplication of errors due to float and double template instantiation.
Old commit
The code can work with a slight modification on the first commit. The code compiles but does not run.
Barnes-Hut commit: 6ee6112
Changing the last atomic in clear_tree in src/kernels.h from memory_order_release to memory_order_relaxed fixes this issue.
When running it, we can see that the force is parallelised but build tree and calc mass are not.
$ ./nbody_acpp -s 5 -n 1000000 --print-info
Starting simulation
Tree init complete
Timings:
- Build Tree 559.06 ms
- Calc mass 637.86 ms
- Calc force 914.19 ms
- Calc acceleration 3.73 ms
Tree size: 2883061
Total mass: 1.00000
Timings:
- Build Tree 495.41 ms
- Calc mass 538.83 ms
- Calc force 911.42 ms
- Calc acceleration 0.39 ms
Tree size: 2882085
Total mass: 1.00000
Timings:
- Build Tree 483.94 ms
- Calc mass 535.91 ms
- Calc force 912.28 ms
- Calc acceleration 0.39 ms
Tree size: 2885153
Total mass: 1.00000
Timings:
- Build Tree 542.24 ms
- Calc mass 536.71 ms
- Calc force 910.77 ms
- Calc acceleration 0.39 ms
Tree size: 2884069
Total mass: 1.00000
Timings:
- Build Tree 487.63 ms
- Calc mass 544.22 ms
- Calc force 911.68 ms
- Calc acceleration 0.46 ms
Tree size: 2885065
Total mass: 1.00000
Done simulation
Total time: 9955.12 ms
The code can work with a slight modification on the first commit.
The problem is most likely that AdaptiveCpp does not currently support mutable lambdas because SYCL does not allow mutable lambdas. I'm not sure this is something I can fix easily.
The code compiles but does not run.
Is this with AdaptiveCpp/AdaptiveCpp#1481 merged in? The output indicates that std::atomic calls were not correctly remapped to AdaptiveCpp builtins as they should.
Even with that PR atomics with ordering other than relaxed might not be handled correctly because LLVM dos not handle those. I need to plug in the atomic mappings as PTX inline assembly that @gonzalobg has provided.
EDIT: Support for the needed atomic orderings (but only for the specific operations that we need) is in AdaptiveCpp/AdaptiveCpp#1529
we can see that the force is parallelised but build tree and calc mass are not.
This is likely because you don't have AdaptiveCpp/AdaptiveCpp#1518 which enables std::execution::par offloading if the hardware provides the necessary forward progress guarantees (which basically means: Only recent NVIDIA GPUs). If you don't have that PR it will only offload std::execution::par_unseq.
Current Situation of compiling Barnes-Hut with AdaptiveCpp
Latest commit
Compilation:
AdaptiveCpp commit:
b15cdcfe355be6a5f79d70a0703e67fe0afaa363
(Wed Jun 19 17:53:10 2024 +0200)Barnes-Hut commit: 58fe163 (Thu Jun 27 11:13:05 2024 +0100)
Error:
Note that there is some duplication of errors due to
float
anddouble
template instantiation.Old commit
The code can work with a slight modification on the first commit. The code compiles but does not run.
Barnes-Hut commit: 6ee6112
Changing the last atomic in
clear_tree
insrc/kernels.h
frommemory_order_release
tomemory_order_relaxed
fixes this issue.When running it, we can see that the force is parallelised but build tree and calc mass are not.
Other things
Also during this process I came across this bug as well: AdaptiveCpp/AdaptiveCpp#1087
The text was updated successfully, but these errors were encountered: