Skip to content

Commit

Permalink
Update default PCIe metrics names
Browse files Browse the repository at this point in the history
Signed-off-by: Koshi Eguchi <[email protected]>
  • Loading branch information
koshieguchi committed Jul 10, 2024
1 parent 54fd1ca commit 96f2e3a
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 10 deletions.
4 changes: 2 additions & 2 deletions deployment/templates/metrics-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ data:
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION, counter, Total energy consumption since boot (in mJ).
# PCIE
# DCGM_FI_DEV_PCIE_TX_THROUGHPUT, counter, Total number of bytes transmitted through PCIe TX (in KB) via NVML.
# DCGM_FI_DEV_PCIE_RX_THROUGHPUT, counter, Total number of bytes received through PCIe RX (in KB) via NVML.
# DCGM_FI_PROF_PCIE_TX_BYTES, counter, Total number of bytes transmitted through PCIe TX via NVML.
# DCGM_FI_PROF_PCIE_RX_BYTES, counter, Total number of bytes received through PCIe RX via NVML.
DCGM_FI_DEV_PCIE_REPLAY_COUNTER, counter, Total number of PCIe retries.
# Utilization (the sample period varies depending on the product)
Expand Down
4 changes: 2 additions & 2 deletions etc/1.x-compatibility-metrics.csv
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ dcgm_power_usage, gauge, Power draw (in W).
dcgm_total_energy_consumption, counter, Total energy consumption since boot (in mJ).

# PCIe
dcgm_pcie_tx_throughput, counter, Total number of bytes transmitted through PCIe TX (in KB) via NVML.
dcgm_pcie_rx_throughput, counter, Total number of bytes received through PCIe RX (in KB) via NVML.
dcgm_fi_prof_pcie_tx_bytes, counter, Total number of bytes transmitted through PCIe TX via NVML.
dcgm_fi_prof_pcie_rx_bytes, counter, Total number of bytes received through PCIe RX via NVML.
dcgm_pcie_replay_counter, counter, Total number of PCIe retries.

# Utilization (the sample period varies depending on the product)
Expand Down
5 changes: 2 additions & 3 deletions etc/dcp-metrics-included.csv
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ DCGM_FI_DEV_POWER_USAGE, gauge, Power draw (in W).
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION, counter, Total energy consumption since boot (in mJ).

# PCIE
# DCGM_FI_DEV_PCIE_TX_THROUGHPUT, counter, Total number of bytes transmitted through PCIe TX (in KB) via NVML.
# DCGM_FI_DEV_PCIE_RX_THROUGHPUT, counter, Total number of bytes received through PCIe RX (in KB) via NVML.
# DCGM_FI_PROF_PCIE_TX_BYTES, counter, Total number of bytes transmitted through PCIe TX via NVML.
# DCGM_FI_PROF_PCIE_RX_BYTES, counter, Total number of bytes received through PCIe RX via NVML.
DCGM_FI_DEV_PCIE_REPLAY_COUNTER, counter, Total number of PCIe retries.

# Utilization (the sample period varies depending on the product)
Expand Down Expand Up @@ -87,4 +87,3 @@ DCGM_FI_PROF_DRAM_ACTIVE, gauge, Ratio of cycles the device memory interf
# DCGM_FI_PROF_PIPE_FP16_ACTIVE, gauge, Ratio of cycles the fp16 pipes are active.
DCGM_FI_PROF_PCIE_TX_BYTES, gauge, The rate of data transmitted over the PCIe bus - including both protocol headers and data payloads - in bytes per second.
DCGM_FI_PROF_PCIE_RX_BYTES, gauge, The rate of data received over the PCIe bus - including both protocol headers and data payloads - in bytes per second.

6 changes: 3 additions & 3 deletions etc/default-counters.csv
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ DCGM_FI_DEV_POWER_USAGE, gauge, Power draw (in W).
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION, counter, Total energy consumption since boot (in mJ).

# PCIE
DCGM_FI_DEV_PCIE_TX_THROUGHPUT, counter, Total number of bytes transmitted through PCIe TX (in KB) via NVML.
DCGM_FI_DEV_PCIE_RX_THROUGHPUT, counter, Total number of bytes received through PCIe RX (in KB) via NVML.
DCGM_FI_PROF_PCIE_TX_BYTES, counter, Total number of bytes transmitted through PCIe TX via NVML.
DCGM_FI_PROF_PCIE_RX_BYTES, counter, Total number of bytes received through PCIe RX via NVML.
DCGM_FI_DEV_PCIE_REPLAY_COUNTER, counter, Total number of PCIe retries.

# Utilization (the sample period varies depending on the product)
Expand Down Expand Up @@ -74,4 +74,4 @@ DCGM_FI_DRIVER_VERSION, label, Driver Version
# DCGM_FI_DEV_ECC_INFOROM_VER, label, ECC inforom version
# DCGM_FI_DEV_POWER_INFOROM_VER, label, Power management object inforom version
# DCGM_FI_DEV_INFOROM_IMAGE_VER, label, Inforom image version
# DCGM_FI_DEV_VBIOS_VERSION, label, VBIOS version of the device
# DCGM_FI_DEV_VBIOS_VERSION, label, VBIOS version of the device

0 comments on commit 96f2e3a

Please sign in to comment.