-
Notifications
You must be signed in to change notification settings - Fork 23
tracing
Add the following flags to make.inc
:
LIBS += -lnvToolsExt
NVCCFLAGS = -lineinfo
Call nvtxRangePush/Pop
functions in slate/include/slate/internal/Trace.hh
:
#include <nvToolsExt.h>
class Block {
public:
Block(const char* name)
: event_(name)
{ nvtxRangePush(name); }
~Block() { Trace::insert(event_); nvtxRangePop(); }
private:
Event event_;
};
Start and stop profiler in the driver routine:
...
#include <cuda_profiler_api.h>
int main(){
...
cudaProfilerStart();
{
slate::trace::Block trace_block(std::string("gemm").c_str());
slate::gemm(alpha, A, B, beta, C);
}
cudaProfilerStop();
}
The NVVP can be used to view traces.
Profile the code using the command line tool nvprof
found in the CUDA development kit:
nvprof -f -o ../dgeqrf-dim1000-nb1000-ib1000.nvvp --profile-from-start off ./test/tester --origin d --target d --type d --lookahead 1 --dim 1000 --ib 1000 --ref n --check y --nb 1000 --repeat 1 geqrf
nvprof will generate an .nvvp file. Open the this file using NVVP.
The NVIDIA Nsight Systems can be used to view traces instead of older NVVP.
You can download Nsight Systems from: https://developer.nvidia.com/gameworksdownload#?tx=$gameworks,developer_tools
Profile the code using the command line tool nsys
found in the CUDA development kit:
nsys profile --stats=true --gpu-metrics-device=all ./tester --origin h --target d --type d --dim 1000 --ref n --check y --nb 200 --repeat 1 geqrf
The options of nsys may depend on the CUDA version.
nsys will generate a .qdrep file. Open this file using the Nsight Systems. Newer versions of CUDA and nsys produce .nsys-rep files instead of .qdrep files. These .nsys-rep files require newer Nsight System.