### Named regions using the NVIDIA NVTX library ### Add the following flags to `make.inc`: ```makefile LIBS += -lnvToolsExt NVCCFLAGS = -lineinfo ``` Call `nvtxRangePush/Pop` functions in `slate/include/slate/internal/Trace.hh`: ```C++ #include class Block { public: Block(const char* name) : event_(name) { nvtxRangePush(name); } ~Block() { Trace::insert(event_); nvtxRangePop(); } private: Event event_; }; ``` Start and stop profiler in the driver routine: ```C++ ... #include int main(){ ... cudaProfilerStart(); { slate::trace::Block trace_block(std::string("gemm").c_str()); slate::gemm(alpha, A, B, beta, C); } cudaProfilerStop(); } ``` ### NVIDIA Visual Profiler (NVVP) ### The NVVP can be used to view traces. Profile the code using the command line tool `nvprof` found in the CUDA development kit: ```shell nvprof -f -o ../dgeqrf-dim1000-nb1000-ib1000.nvvp --profile-from-start off ./test/tester --origin d --target d --type d --lookahead 1 --dim 1000 --ib 1000 --ref n --check y --nb 1000 --repeat 1 geqrf ``` nvprof will generate an .nvvp file. Open the this file using NVVP. ### NVIDIA Nsight Systems ### The NVIDIA Nsight Systems can be used to view traces instead of older NVVP. You can download Nsight Systems from: https://developer.nvidia.com/gameworksdownload#?tx=$gameworks,developer_tools Profile the code using the command line tool `nsys` found in the CUDA development kit: ```shell nsys profile --stats=true --gpu-metrics-device=all ./tester --origin h --target d --type d --dim 1000 --ref n --check y --nb 200 --repeat 1 geqrf ``` The options of nsys may depend on the CUDA version. nsys will generate a .qdrep file. Open this file using the Nsight Systems. Newer versions of CUDA and nsys produce .nsys-rep files instead of .qdrep files. These .nsys-rep files require newer Nsight System.