You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run some benchmark tests to compare the performance difference between torchsharp and pytorch, both uses libtorch 2.2.1 + cuda 12.1. And I notice that torchsharp is slower than pytorch in most of operators. Below are benchmark result
Torchsharp
Pytorch
Observation
I can achieve comparable result between torchsharp and pytorch if I replace operator with in-place version. The performance also become much better if I explicitly dispose current session during each tests
For example, in adding benchmark, torchsharp runs nearly the same with pytorch if I use tensor.add_ instead of tensor.add
Considering that the major difference between the operator and the in-place operator is the in-place operator won't create a new Tensor object, it's likely that the main overhead might happen in Tensor constructor.
Source code
usingTorchSharp;// Initialize CUDA devicevardevice=torch.CUDA;varrepeatTime=10000;// Test randnvarstartTime=DateTime.Now;for(inti=0;i<repeatTime;i++){var_=torch.randn(newlong[]{1000,1000},device:device);}Console.WriteLine("Time taken for randn: "+(DateTime.Now-startTime).TotalSeconds);// Test matmulstartTime=DateTime.Now;vara=torch.randn(newlong[]{1000,1000},device:device);varb=torch.randn(newlong[]{1000,1000},device:device);for(inti=0;i<repeatTime;i++){varc=torch.matmul(a,b);}Console.WriteLine("Time taken for matmul: "+(DateTime.Now-startTime).TotalSeconds);// Test concatstartTime=DateTime.Now;a=torch.randn(newlong[]{1000,1000},device:device);b=torch.randn(newlong[]{1000,1000},device:device);for(inti=0;i<repeatTime;i++){varc=torch.cat(new[]{a,b},0);}Console.WriteLine("Time taken for concat: "+(DateTime.Now-startTime).TotalSeconds);// Test slicestartTime=DateTime.Now;a=torch.randn(newlong[]{1000,1000},device:device);for(inti=0;i<repeatTime;i++){varc=a[..,0..500];}Console.WriteLine("Time taken for slice: "+(DateTime.Now-startTime).TotalSeconds);// Test addstartTime=DateTime.Now;a=torch.randn(newlong[]{1000,1000},device:device);b=torch.randn(newlong[]{1000,1000},device:device);for(inti=0;i<repeatTime;i++){varc=a+b;}Console.WriteLine("Time taken for add: "+(DateTime.Now-startTime).TotalSeconds);
# create a list of benchmark for pytorch on cudaimporttorchimporttimerepeat=10000total_time=0start_time=time.time()
for_inrange(repeat):
a=torch.randn(1000, 1000).cuda()
print("Time taken for randn: " , time.time()-start_time)
start_time=time.time()
# test matmula=torch.randn(1000, 1000).cuda()
b=torch.randn(1000, 1000).cuda()
for_inrange(repeat):
c=torch.matmul(a, b)
print("Time taken for matmul: ", time.time()-start_time)
start_time=time.time()
# test concat a=torch.randn(1000, 1000).cuda()
b=torch.randn(1000, 1000).cuda()
for_inrange(repeat):
c=torch.cat((a, b), 0)
print("Time taken for concat: ", time.time()-start_time)
start_time=time.time()
# test slicea=torch.randn(1000, 1000).cuda()
for_inrange(repeat):
c=a[:, 0:500]
print("Time taken for slice: ", time.time()-start_time)
start_time=time.time()
# test adda=torch.randn(1000, 1000).cuda()
b=torch.randn(1000, 1000).cuda()
for_inrange(repeat):
c=a+bprint("Time taken for add: ", time.time()-start_time)
The text was updated successfully, but these errors were encountered:
I run some benchmark tests to compare the performance difference between torchsharp and pytorch, both uses
libtorch 2.2.1 + cuda 12.1
. And I notice that torchsharp is slower than pytorch in most of operators. Below are benchmark resultTorchsharp
Pytorch
Observation
I can achieve comparable result between torchsharp and pytorch if I replace operator with in-place version. The performance also become much better if I explicitly dispose current session during each tests
For example, in
adding
benchmark, torchsharp runs nearly the same with pytorch if I usetensor.add_
instead oftensor.add
Considering that the major difference between the operator and the in-place operator is the in-place operator won't create a new
Tensor
object, it's likely that the main overhead might happen inTensor
constructor.Source code
The text was updated successfully, but these errors were encountered: