Problem with multipath via RoCE

gawron737

Member
Oct 3, 2022
21
2
8
Hi
In my setup we decided to change connection to remote storage from FibreChannel to RoCE. All the time, this is the same remote storage source (the same storage array) but used different underlaying storage technology. The array is made by Huawei. Since that change during higher network traffic on proxmoxes multipath shows failing path and start to reinstaling path. We try to change multipath to nvme.core.multipath but this change did not help anything. Previously when we used FC there weren't such problems. I attach multipath configuration below.

Code:
defaults {
  polling_interval    1
  user_friendly_names    yes
  #enable_foreign     nvme
  enable_foreign        "^$"
  #verbosity 4
}

blacklist_exceptions {
    property "(ID_WWN|SCSI_IDENT_.*|ID_SERIAL|DEVTYPE)"
    devnode "nvme*"
}


devices {
  device {
    vendor             "NVME"
    product         "Huawei-XSG1"
    uid_attribute         "ID_WWN"
    no_path_retry         12
    rr_min_io         100
    path_grouping_policy     multibus
    #path_grouping_policy    group_by_prio
    path_checker         "directio"
    #prio             "const"
    detect_prio         "no"
    prio            "const"
    failback         immediate
    retain_attached_hw_handler "no"
  }
}

multipaths {
     
  multipath {
    wwid ******
    alias RoCE_***
  }

  multipath {
    wwid ******
    alias RoCE_***
  }
}

During higher network traffic or if I try to import qcow file to datastore:

Code:
[Fri Feb 14 08:22:56 2025] device-mapper: multipath: 252:16: Failing path 259:16.
[Fri Feb 14 08:23:09 2025] device-mapper: multipath: 252:16: Reinstating path 259:16.
[Fri Feb 14 08:23:26 2025] device-mapper: multipath: 252:16: Failing path 259:18.
[Fri Feb 14 08:23:26 2025] device-mapper: multipath: 252:16: Failing path 259:26.
[Fri Feb 14 08:23:40 2025] device-mapper: multipath: 252:16: Reinstating path 259:18.
[Fri Feb 14 08:23:40 2025] device-mapper: multipath: 252:16: Reinstating path 259:26.
[Fri Feb 14 08:23:56 2025] device-mapper: multipath: 252:16: Failing path 259:22.
[Fri Feb 14 08:24:08 2025] device-mapper: multipath: 252:16: Reinstating path 259:22.
[Fri Feb 14 08:24:09 2025] device-mapper: multipath: 252:16: Failing path 259:20.
[Fri Feb 14 08:24:21 2025] device-mapper: multipath: 252:16: Reinstating path 259:20.
[Fri Feb 14 08:24:39 2025] device-mapper: multipath: 252:16: Failing path 259:30.

Anybody have simillar problem with multipath and is able to help me to resolve this problem?

Best regards
Tom
 
Anyone could help me in this problem? All the time my multipath reinicialize paths so overall performance my VMs is poor.
 
Hello,
I’m experiencing a similar issue with my RoCE connection. When the system is under heavy load (e.g., during data imports or database rebuilds), I occasionally notice small lags on the VM handling the load.

The logs show messages about path failures, followed by reinstatement after a few seconds (but only on the Proxmox node where the VM is located, on the other nodes using RoCE, everything works fine). Once the load decreases, the logs appear normal again.

Would you have any insights on this?

Regards,
=p
 
Is there anyone who could help us to solve this problem or give us some tips to troubleshoot this problem?

Best regards
Tom
 
I would recommend to get a subscription and open a support-case with Proxmox. This looks like a very specific problem which might be out of scope for this forum....
 
Hi @gawron737 ,

Deploying ROCE/RDMA in high-availability environments is notoriously challenging and often leads to debugging nightmares.
I hope you're leveraging a fully integrated, single-vendor solution to mitigate these complexities - because the pain of debugging typically far outweighs the benefits of mixing vendors.

Are you using Mellanox end-to-end (both NICs and switches)?
Are you also utilizing Mellanox OFED?
Are the ports located on the same NIC, or are you utilizing separate NICs?
What specific messages are you encountering in dmesg?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hi
Thanks for respond so much.
Unfortunately in my setup I use multivendor devices such as: Huawei/xFusion 2288H V6 for proxmox servers (and Mellanox network cards) and on the storage side also Huawei array. Each proxmox server has installed Mellanox OFED package through 2 one port cards (ConnectX-6). Each server has two separate path to the array so use separate NIC and interfaces in the same VLAN.

Below some logs wich I see in logs:

Code:
[Thu Mar 13 06:00:22 2025] nvme nvme9: I/O tag 15 (900f) opcode 0x1 (I/O Cmd) QID 1 timeout
[Thu Mar 13 06:00:22 2025] nvme nvme9: starting error recovery
[Thu Mar 13 06:00:22 2025] nvme nvme9: I/O tag 16 (1010) opcode 0x1 (I/O Cmd) QID 1 timeout
[Thu Mar 13 06:00:22 2025] nvme_log_error: 22 callbacks suppressed
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701633840, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] blk_print_req_error: 22 callbacks suppressed
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701633840 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme nvme9: I/O tag 20 (4014) opcode 0x1 (I/O Cmd) QID 1 timeout
[Thu Mar 13 06:00:22 2025] device-mapper: multipath: 252:13: Failing path 259:16.
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701637936, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701637936 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme nvme9: I/O tag 21 (6015) opcode 0x1 (I/O Cmd) QID 1 timeout
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701642032, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701642032 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701808432, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701808432 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701877552, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701877552 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701667632, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701667632 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701668656, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701668656 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701629744, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701629744 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701672752, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701672752 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme9n1: I/O Cmd(0x1) @ LBA 8701673776, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Thu Mar 13 06:00:22 2025] I/O error, dev nvme9n1, sector 8701673776 op 0x1:(WRITE) flags 0xca00 phys_seg 1 prio class 0
[Thu Mar 13 06:00:22 2025] nvme nvme9: Reconnecting in 10 seconds...
[Thu Mar 13 06:00:32 2025] nvme nvme9: queue_size 128 > ctrl sqsize 64, clamping down
[Thu Mar 13 06:00:32 2025] nvme nvme9: creating 8 I/O queues.
[Thu Mar 13 06:00:33 2025] nvme nvme9: mapped 8/0/0 default/read/poll queues.
[Thu Mar 13 06:00:33 2025] nvme nvme9: Successfully reconnected (1 attempts)
[Thu Mar 13 06:00:34 2025] device-mapper: multipath: 252:13: Reinstating path 259:16.

Another log:

Code:
[Wed Feb 26 14:33:40 2025] nvme nvme12: I/O tag 2 (5002) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:33:40 2025] nvme nvme12: starting error recovery
[Wed Feb 26 14:33:40 2025] nvme nvme12: I/O tag 3 (9003) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:33:40 2025] nvme nvme12: I/O tag 7 (9007) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:33:40 2025] nvme nvme12: I/O tag 8 (e008) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:33:40 2025] nvme_log_error: 130 callbacks suppressed
[Wed Feb 26 14:33:40 2025] nvme nvme12: I/O tag 10 (900a) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:33:40 2025] nvme nvme12: I/O tag 11 (600b) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 17999006208, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] blk_print_req_error: 130 callbacks suppressed
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 17999006208 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 17999022080, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 17999022080 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 17999026688, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 17999026688 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 17999030784, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 17999030784 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18055691328, 288 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 18055691328 op 0x1:(WRITE) flags 0xca00 phys_seg 4 prio class 0
[Wed Feb 26 14:33:40 2025] device-mapper: multipath: 252:16: Failing path 259:22.
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 17998990848, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 17998990848 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 17999151616, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 17999151616 op 0x1:(WRITE) flags 0xca00 phys_seg 9 prio class 0
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 17999143424, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 17999143424 op 0x1:(WRITE) flags 0xca00 phys_seg 9 prio class 0
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 17999155712, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 17999155712 op 0x1:(WRITE) flags 0xca00 phys_seg 10 prio class 0
[Wed Feb 26 14:33:40 2025] nvme12n1: I/O Cmd(0x1) @ LBA 17999159808, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:33:40 2025] I/O error, dev nvme12n1, sector 17999159808 op 0x1:(WRITE) flags 0xca00 phys_seg 9 prio class 0
[Wed Feb 26 14:33:40 2025] nvme nvme12: Reconnecting in 10 seconds...
[Wed Feb 26 14:33:50 2025] nvme nvme12: queue_size 128 > ctrl sqsize 64, clamping down
[Wed Feb 26 14:33:50 2025] nvme nvme12: creating 8 I/O queues.
[Wed Feb 26 14:33:51 2025] nvme nvme12: mapped 8/0/0 default/read/poll queues.
[Wed Feb 26 14:33:51 2025] nvme nvme12: Successfully reconnected (1 attempts)
[Wed Feb 26 14:33:51 2025] device-mapper: multipath: 252:16: Reinstating path 259:22.
[Wed Feb 26 14:34:25 2025] nvme nvme12: I/O tag 34 (7022) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:25 2025] nvme nvme12: starting error recovery
[Wed Feb 26 14:34:25 2025] nvme nvme12: I/O tag 35 (4023) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:25 2025] nvme_log_error: 18 callbacks suppressed
[Wed Feb 26 14:34:25 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18000464384, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] blk_print_req_error: 18 callbacks suppressed
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme12n1, sector 18000464384 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:34:25 2025] device-mapper: multipath: 252:16: Failing path 259:22.
[Wed Feb 26 14:34:25 2025] nvme nvme16: I/O tag 30 (201e) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:25 2025] nvme nvme16: starting error recovery
[Wed Feb 26 14:34:25 2025] nvme nvme16: I/O tag 32 (1020) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:25 2025] nvme16n1: I/O Cmd(0x1) @ LBA 18000450560, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme16n1, sector 18000450560 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:34:25 2025] device-mapper: multipath: 252:16: Failing path 259:30.
[Wed Feb 26 14:34:25 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18000890656, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme12n1, sector 18000890656 op 0x1:(WRITE) flags 0xca00 phys_seg 3 prio class 0
[Wed Feb 26 14:34:25 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18000891168, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme12n1, sector 18000891168 op 0x1:(WRITE) flags 0xca00 phys_seg 3 prio class 0
[Wed Feb 26 14:34:25 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18000891680, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme12n1, sector 18000891680 op 0x1:(WRITE) flags 0xca00 phys_seg 4 prio class 0
[Wed Feb 26 14:34:25 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18000892192, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme12n1, sector 18000892192 op 0x1:(WRITE) flags 0xca00 phys_seg 3 prio class 0
[Wed Feb 26 14:34:25 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18000892704, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme12n1, sector 18000892704 op 0x1:(WRITE) flags 0xca00 phys_seg 3 prio class 0
[Wed Feb 26 14:34:25 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18000893216, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme12n1, sector 18000893216 op 0x1:(WRITE) flags 0xca00 phys_seg 3 prio class 0
[Wed Feb 26 14:34:25 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18000893728, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme12n1, sector 18000893728 op 0x1:(WRITE) flags 0xca00 phys_seg 3 prio class 0
[Wed Feb 26 14:34:25 2025] nvme12n1: I/O Cmd(0x1) @ LBA 18000894240, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:25 2025] I/O error, dev nvme12n1, sector 18000894240 op 0x1:(WRITE) flags 0xca00 phys_seg 3 prio class 0
[Wed Feb 26 14:34:25 2025] nvme nvme11: I/O tag 7 (7007) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:25 2025] nvme nvme11: starting error recovery
[Wed Feb 26 14:34:25 2025] nvme nvme11: I/O tag 8 (3008) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:25 2025] nvme nvme11: I/O tag 9 (d009) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:25 2025] nvme nvme11: I/O tag 10 (d00a) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:25 2025] device-mapper: multipath: 252:16: Failing path 259:20.
[Wed Feb 26 14:34:25 2025] nvme nvme16: Reconnecting in 10 seconds...
[Wed Feb 26 14:34:25 2025] nvme nvme12: Reconnecting in 10 seconds...
[Wed Feb 26 14:34:26 2025] nvme nvme11: Reconnecting in 10 seconds...
[Wed Feb 26 14:34:31 2025] nvme nvme10: I/O tag 56 (f038) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:31 2025] nvme nvme10: starting error recovery
[Wed Feb 26 14:34:31 2025] nvme nvme13: I/O tag 6 (e006) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:31 2025] nvme nvme13: starting error recovery
[Wed Feb 26 14:34:31 2025] nvme nvme13: I/O tag 7 (4007) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:31 2025] nvme nvme9: I/O tag 50 (e032) opcode 0x1 (I/O Cmd) QID 1 timeout
[Wed Feb 26 14:34:31 2025] nvme nvme9: starting error recovery
[Wed Feb 26 14:34:31 2025] nvme_log_error: 89 callbacks suppressed
[Wed Feb 26 14:34:31 2025] nvme13n1: I/O Cmd(0x1) @ LBA 18001114624, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:31 2025] blk_print_req_error: 89 callbacks suppressed
[Wed Feb 26 14:34:31 2025] I/O error, dev nvme13n1, sector 18001114624 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:34:31 2025] device-mapper: multipath: 252:16: Failing path 259:24.
[Wed Feb 26 14:34:31 2025] nvme13n1: I/O Cmd(0x1) @ LBA 18000596992, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:31 2025] I/O error, dev nvme13n1, sector 18000596992 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:34:31 2025] nvme13n1: I/O Cmd(0x1) @ LBA 18000879872, 512 blocks, I/O Error (sct 0x3 / sc 0x71)
[Wed Feb 26 14:34:31 2025] I/O error, dev nvme13n1, sector 18000879872 op 0x1:(WRITE) flags 0xca00 phys_seg 5 prio class 0
[Wed Feb 26 14:34:31 2025] device-mapper: multipath: 252:16: Failing path 259:18.
[Wed Feb 26 14:34:31 2025] device-mapper: multipath: 252:16: Failing path 259:16.
[Wed Feb 26 14:34:32 2025] nvme nvme13: Reconnecting in 10 seconds...
[Wed Feb 26 14:34:32 2025] nvme nvme9: Reconnecting in 10 seconds...
[Wed Feb 26 14:34:32 2025] nvme nvme10: Reconnecting in 10 seconds...
[Wed Feb 26 14:34:35 2025] nvme nvme16: queue_size 128 > ctrl sqsize 64, clamping down
[Wed Feb 26 14:34:35 2025] nvme nvme16: creating 8 I/O queues.
[Wed Feb 26 14:34:36 2025] nvme nvme12: queue_size 128 > ctrl sqsize 64, clamping down
[Wed Feb 26 14:34:36 2025] nvme nvme12: creating 8 I/O queues.
[Wed Feb 26 14:34:36 2025] nvme nvme11: queue_size 128 > ctrl sqsize 64, clamping down
[Wed Feb 26 14:34:36 2025] nvme nvme11: creating 8 I/O queues.
[Wed Feb 26 14:34:36 2025] nvme nvme16: mapped 8/0/0 default/read/poll queues.
[Wed Feb 26 14:34:36 2025] nvme nvme16: Successfully reconnected (1 attempts)
[Wed Feb 26 14:34:36 2025] device-mapper: multipath: 252:16: Reinstating path 259:30.
[Wed Feb 26 14:34:37 2025] nvme nvme12: mapped 8/0/0 default/read/poll queues.
[Wed Feb 26 14:34:37 2025] nvme nvme12: Successfully reconnected (1 attempts)
[Wed Feb 26 14:34:37 2025] nvme nvme11: mapped 8/0/0 default/read/poll queues.
[Wed Feb 26 14:34:37 2025] nvme nvme11: Successfully reconnected (1 attempts)
[Wed Feb 26 14:34:38 2025] device-mapper: multipath: 252:16: Reinstating path 259:20.
[Wed Feb 26 14:34:39 2025] device-mapper: multipath: 252:16: Reinstating path 259:22.
[Wed Feb 26 14:34:42 2025] nvme nvme9: queue_size 128 > ctrl sqsize 64, clamping down
[Wed Feb 26 14:34:42 2025] nvme nvme9: creating 8 I/O queues.
[Wed Feb 26 14:34:42 2025] nvme nvme13: queue_size 128 > ctrl sqsize 64, clamping down
[Wed Feb 26 14:34:42 2025] nvme nvme13: creating 8 I/O queues.
[Wed Feb 26 14:34:42 2025] nvme nvme10: queue_size 128 > ctrl sqsize 64, clamping down
[Wed Feb 26 14:34:42 2025] nvme nvme10: creating 8 I/O queues.
[Wed Feb 26 14:34:44 2025] nvme nvme9: mapped 8/0/0 default/read/poll queues.
[Wed Feb 26 14:34:44 2025] nvme nvme9: Successfully reconnected (1 attempts)
[Wed Feb 26 14:34:44 2025] nvme nvme13: mapped 8/0/0 default/read/poll queues.
[Wed Feb 26 14:34:44 2025] nvme nvme13: Successfully reconnected (1 attempts)
[Wed Feb 26 14:34:44 2025] nvme nvme10: mapped 8/0/0 default/read/poll queues.
[Wed Feb 26 14:34:44 2025] nvme nvme10: Successfully reconnected (1 attempts)
[Wed Feb 26 14:34:44 2025] device-mapper: multipath: 252:16: Reinstating path 259:16.
[Wed Feb 26 14:34:44 2025] device-mapper: multipath: 252:16: Reinstating path 259:18.
[Wed Feb 26 14:34:44 2025] device-mapper: multipath: 252:16: Reinstating path 259:24.

Any idea to solve this problem?

Best regards
Tom
 
Hi @gawron737 ,Here's what we know from the output:

First, opcode 0x1 is an NVME_WRITE command. We can see a lot of writes failing on the first queue pair (i.e., QID 1). The NVME layer describes the error as a SCT_PATH (i.e., 0x3), with the type PATH_ABORTED_BY_HOST (i.e., 0x71). These failures are likely due to the host giving up on a failed queue pair.

That said, the NVME queue pair initialization is quite strange. The array is declaring a submission queue size of 64 (i.e., sqsize 64)... That's tiny and somewhat defeats the purpose of the NVMe architecture. In response, the kernel is truncating the default queue size back from 128 to 64.

Given that the issue occurs in the write path, my guess is that you have one of two problems:
  1. You may have flow control set up incorrectly in your switching infrastructure. You should have either Priority Flow Control (i.e., PFC) or Global Pause configured. Dropped packets in your network path will undoubtedly result in queue pair failure.
  2. Your array has a buggy NVMe or NVMe over fabrics implementation.
For situations like this, lean into your storage support contract. Stress to them that this is a Debian based system with Ubuntu Kernel. I would not confuse them with PVE nomenclature, as you are dealing with OS underlying layers rather than with PVE.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox