Skip to content

A repository archiving that time I unwisely tried to write a one-layer FCNet on the GPU with just Vulkan.

Notifications You must be signed in to change notification settings

allie-m/vulkan-fcnet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

One-Layer Fully Connected Net with Vulkan

This was a mistake. It has eaten up so much of my time getting this working before the symposium. At least my Machine Learning teacher liked it lol.

This crate binary implements a one layer FC net from scratch in Rust, on the CPU and GPU.

cargo run --release [cpu/gpu]

The implementations are both very slow in debug mode -- run it in release mode for the most accurate benchmark.

On my home machine, I need to use vulkan-radeon instead of amdvlk because amdvlk does not support the correct features (shader_shared_float32_atomic_add) which your vulkan implementation also needs to support if you want to run this terrible program.

The below commands run the different implementations:

cargo run --release cpu

AMD_VULKAN_ICD=RADV cargo run --release gpu

(AMD_VULKAN_ICD=RADV is just for me unless you too want to use radv instead of amdvlk on your linux machine when running this)

TODO: local Pytorch CPU/GPU benchmarks

Results

(Results are rounded and approximate collected on an R5 3600/RX 5700XT Linux machine)

Batch size: 256

CPU

  • Initialization time: 100us
  • Rng/batching time: 520us
  • Score+loss+gradient+backprop: 250us
  • Per-iteration time: 800us
  • Total time (5000 iterations): 4.08s

GPU

  • Initialization time: 45ms
  • Rng/batching time: 520us
  • Batch load time: 150us
  • Submit time: 10us
  • Score+loss+gradient+backprop: 700us
  • Finalize time: 9ms
  • Per-iteration time: 830us
  • Total time (5000 iterations): 4.21s

Batch size: 512

CPU

  • Initialization time: 176us
  • Rng/batching time: 630us
  • Score+loss+gradient+backprop: 490us
  • Per-iteration time: 1.1ms
  • Total time (5000 iterations): 5.72s

GPU

  • Initialization time: 44ms
  • Rng/batching time: 630us
  • Batch load time: 10us
  • Submit time: 160us
  • Score+loss+gradient+backprop: 930us
  • Finalize time: 9ms
  • Per-iteration time: 1.1ms
  • Total time (5000 iterations): 5.46s

About

A repository archiving that time I unwisely tried to write a one-layer FCNet on the GPU with just Vulkan.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published