ROCm and Vulkan seems like doesn't work #2810

VladislavNekto · 2024-08-08T11:06:07Z

Describe the bug
Tabby ignores the --device setting and always runs on CPU

Information about your version

ldd target/debug/tabby
        linux-vdso.so.1 (0x00007fff91979000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f1a17ed9000)
        libssl.so.3 => /lib64/libssl.so.3 (0x00007f1a1195d000)
        libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007f1a11400000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f1a17eb5000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f1a1131f000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1a1113d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1a17f24000)

./target/debug/tabby --version
tabby 0.14.0

I am at tag 0.14.0 with cherry-picked commit with Elixir support.
Information about your GPU

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 5 3600X 6-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 5 3600X 6-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3800                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32772672(0x1f41240) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32772672(0x1f41240) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32772672(0x1f41240) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1102                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 7600                 
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      2048(0x800) KB                     
  Chip ID:                 29824(0x7480)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2250                               
  BDFID:                   3072                               
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 102                                
  SDMA engine uCode::      17                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1102         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

Additional context
No errors, nothing. I tried running it on Vulkan as well, and the same thing happens

The text was updated successfully, but these errors were encountered:

VladislavNekto · 2024-08-08T11:43:31Z

Docker Container with ROCm works, but that version 0.11.1

wsxiaoys · 2024-08-08T18:36:18Z

Hi, could you try the binary distribution (Vulkan) available at https://github.com/TabbyML/tabby/releases/tag/v0.14.0 and let me know if it works for you? If not, could you please provide the following information:

The command you used to start Tabby.
Enable the RUST_LOG=debug environment variable and share the relevant log information.

Thank you!

VladislavNekto · 2024-08-08T23:23:31Z

Without RUST_LOG=debug spams these

./tabby serve --device vulkan --model TabbyML/DeepseekCoder-6.7B --chat-model TabbyML/Qwen2-1.5B-Instruct
⠴     0.401 s   Starting...2024-08-08T23:22:08.743215Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
2024-08-08T23:22:08.743249Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: warning: not compiled with GPU offload support, --gpu-layers option will be ignored
2024-08-08T23:22:08.743256Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: warning: see main README.md for information on enabling GPU BLAS support
⠇     1.442 s   Starting...2024-08-08T23:22:09.848492Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
2024-08-08T23:22:09.848515Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: warning: not compiled with GPU offload support, --gpu-layers option will be ignored
2024-08-08T23:22:09.848520Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: warning: see main README.md for information on enabling GPU BLAS support
⠹     2.563 s   Starting...2024-08-08T23:22:10.953654Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1

With RUST_LOG=debug spams these:

2024-08-08T23:22:51.396949Z DEBUG reqwest::connect: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/reqwest-0.12.4/src/connect.rs:497: starting new connection: http://127.0.0.1:30888/    
2024-08-08T23:22:51.396958Z DEBUG hyper_util::client::legacy::connect::http: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-util-0.1.5/src/client/legacy/connect/http.rs:631: connecting to 127.0.0.1:30888
2024-08-08T23:22:51.397028Z DEBUG reqwest::connect: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/reqwest-0.12.4/src/connect.rs:497: starting new connection: http://127.0.0.1:30888/    
2024-08-08T23:22:51.397036Z DEBUG hyper_util::client::legacy::connect::http: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-util-0.1.5/src/client/legacy/connect/http.rs:631: connecting to 127.0.0.1:30888
2024-08-08T23:22:51.397103Z DEBUG reqwest::connect: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/reqwest-0.12.4/src/connect.rs:497: starting new connection: http://127.0.0.1:30888/    
2024-08-08T23:22:51.397110Z DEBUG hyper_util::client::legacy::connect::http: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-util-0.1.5/src/client/legacy/connect/http.rs:631: connecting to 127.0.0.1:30888
2024-08-08T23:22:51.397176Z DEBUG reqwest::connect: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/reqwest-0.12.4/src/connect.rs:497: starting new connection: http://127.0.0.1:30888/    
2024-08-08T23:22:51.397182Z DEBUG hyper_util::client::legacy::connect::http: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-util-0.1.5/src/client/legacy/connect/http.rs:631: connecting to 127.0.0.1:30888
2024-08-08T23:22:51.397239Z DEBUG reqwest::connect: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/reqwest-0.12.4/src/connect.rs:497: starting new connection: http://127.0.0.1:30888/    
2024-08-08T23:22:51.397244Z DEBUG hyper_util::client::legacy::connect::http: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-util-0.1.5/src/client/legacy/connect/http.rs:631: connecting to 127.0.0.1:30888
2024-08-08T23:22:51.397300Z DEBUG reqwest::connect: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/reqwest-0.12.4/src/connect.rs:497: starting new connection: http://127.0.0.1:30888/    
2024-08-08T23:22:51.397305Z DEBUG hyper_util::client::legacy::connect::http: /github/home/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-util-0.1.5/src/client/legacy/connect/http.rs:631: connecting to 127.0.0.1:30888

richard-jfc · 2024-08-12T08:15:47Z

I fixed this for ROCM, it's probably the same for Vulkan: #2835

Caused by a change to the flag names in llama-cpp-server. Try changing:

config.define("LLAMA_VULKAN", "ON");

to

config.define("GGML_VULKAN", "ON");

in crates/llama-cpp-server/build.rs

VladislavNekto added the bug-unconfirmed label Aug 8, 2024

VladislavNekto mentioned this issue Aug 8, 2024

Can't seem to get Tabby to run on GPU #2811

Closed

VladislavNekto changed the title ~~--device is ingored at local build~~ ROCm and Vulkan seems like doesn't work Aug 8, 2024

michalwarda linked a pull request Sep 12, 2024 that will close this issue

fix(llama-cpp-server): fix vulkan build by setting GGML_VULKAN #3133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm and Vulkan seems like doesn't work #2810

ROCm and Vulkan seems like doesn't work #2810

VladislavNekto commented Aug 8, 2024 •

edited

Loading

VladislavNekto commented Aug 8, 2024

wsxiaoys commented Aug 8, 2024

VladislavNekto commented Aug 8, 2024

richard-jfc commented Aug 12, 2024

ROCm and Vulkan seems like doesn't work #2810

ROCm and Vulkan seems like doesn't work #2810

Comments

VladislavNekto commented Aug 8, 2024 • edited Loading

VladislavNekto commented Aug 8, 2024

wsxiaoys commented Aug 8, 2024

VladislavNekto commented Aug 8, 2024

richard-jfc commented Aug 12, 2024

VladislavNekto commented Aug 8, 2024 •

edited

Loading