Proxmox cluster one node hang frequently

premjith_r · Mar 27, 2024

Hi ,

Myself using three node cluster with Virtual Environment 7.1-7.

Recently start an issue on second node, which is hang frequently. When check on console showing the error was,

"echo 0 > /proc/sys/kernel/hung_task_timeout-secs" disables this message.

blk_update_request " I/O error, dev sdh, sector 4305579472 op 0x1: (WRITE) flags 0xca00 phys_seg 2 prio class 0

How can we identify what causing this hanging issue. Every time the same node2 only hanging.

Thank you

bbgeek17 · Mar 27, 2024

premjith_r said:
blk_update_request " I/O error, dev sdh, sector 4305579472 op 0x1: (WRITE) flags 0xca00 phys_seg 2 prio class 0

Are you using a disk mirror of some sort?
Seems like your disk/s could be failing.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

premjith_r · Mar 27, 2024

same SAN storage volume is using on all three nodes as a shared volume.

bbgeek17 · Mar 27, 2024

Ok, that makes sense now as you had two devices reporting same sector access error. So probably multipath to the same device.
I'd check your SAN logs and reach out to vendor with this information.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

premjith_r · Mar 28, 2024

Contact HP vendor they are saying, it may be related to OS problem check with OS vendor. Please note, only one node having this issue out of three cluster nodes.

bbgeek17 · Mar 28, 2024

premjith_r said:
Contact HP vendor they are saying, it may be related to OS problem check with OS vendor. Please note, only one node having this issue out of three cluster nodes.

I am guessing you told them that you use Proxmox?
An I/O disk error is an Input/Output error reading data from a disk. Despite how much HP would want you to go away - "disk" is a keyword. OS/Kernel tried to read or write to disk (in your case - write) and couldn't. Sure, you can have a bad Fibre card, or bad cable, such that only one sector is unwritable, but it's unlikely.

You can try to read the entire disk, with dd, and see if that works. Unfortunately, writing to disk would mean the destruction of your data. You can try to figure out how to read/write to that specific sector.
You can also go back in logs and correlate all failures - is the block always the same? In the same area?
Or, you can go back to HP, tell them you have an Ubuntu-based client (due to Kernel), and have them spend more than 30 seconds on it.

Finally, you can replace them with a vendor that would support you with your application and take you seriously.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

fireon · Mar 28, 2024

How is the SAN connected? Multipath LVM?

premjith_r · Mar 30, 2024

SAN connected using multipath.

Once restart proxmox this issue resolved for some period of time some time 1 or 2 days. Again this same issue showing and stuck. All the time same node only having the issue.

emunt6 · Mar 30, 2024

Hi!

Can you show the "/etc/multipath.conf" file?
Is there any difference in the "/etc/multipath.conf" between the Proxmox nodes?

Other options, if you using 2x FC switch with zones, maybe you did forget to add all Proxmox nodes to the ACL.

bbgeek17 · Mar 30, 2024

premjith_r said:
Once restart proxmox this issue resolved for some period of time some time 1 or 2 days. Again this same issue showing and stuck. All the time same node only having the issue.

is the error always on the same block? In the same general area?
have you tried "dd if=/dev/sdh of=/dev/null" ?

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

fireon · Mar 31, 2024

emunt6 said:
Can you show the "/etc/multipath.conf" file?
Is there any difference in the "/etc/multipath.conf" between the Proxmox nodes?

@premjith_r yes please show us the file. Did you generate it with the RedHat Multipath Generator? https://access.redhat.com/labs/multipathhelper/#/

Hardware is ok? SAN, Fibrechannel, or did you find in dmesg any error?

premjith_r · Apr 3, 2024

Please note our multipath configuration as given below,

cat /etc/multipath.conf

defaults {

polling_interval 2

path_selector "round-robin 0"

path_grouping_policy failover

uid_attribute ID_SERIAL

rr_min_io 100

failback immediate

prio iet

prio_args preferredip=192.168.2.2

no_path_retry queue

user_friendly_names yes

}

blacklist_exceptions {

wwid "3600c0ff000530ac4c81d646201000000"

wwid "3600c0ff0005302c3f51d646201000000"

}

multipaths {

multipath {

wwid "3600c0ff000530ac4c81d646201000000"

alias mpatha

}

multipath {

wwid "3600c0ff0005302c3f51d646201000000"

alias mpathb

}

}

emunt6 said:
Hi!

Can you show the "/etc/multipath.conf" file?
Is there any difference in the "/etc/multipath.conf" between the Proxmox nodes?

Other options, if you using 2x FC switch with zones, maybe you did forget to add all Proxmox nodes to the ACL.

Is there any difference in the "/etc/multipath.conf" between the Proxmox nodes?
>> There is no difference

Other options, if you using 2x FC switch with zones, maybe you did forget to add all Proxmox nodes to the ACL.
>> We are using iSCSI SAN and not configured any zones

premjith_r · Apr 3, 2024

bbgeek17 said:
is the error always on the same block? In the same general area?
have you tried "dd if=/dev/sdh of=/dev/null" ?

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

is the error always on the same block? In the same general area?

>> Seems to be different sectors each time. Please see the details below:

Mar 26 10:03:04 pve21atrs002 kernel: blk_update_request: I/O error, dev sdf, sector 43055279080 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0
Mar 26 10:03:04 pve21atrs002 kernel: blk_update_request: I/O error, dev sdf, sector 43055277048 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0
Mar 26 10:09:13 pve21atrs002 kernel: blk_update_request: I/O error, dev sdg, sector 43055279080 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0
Mar 26 10:09:13 pve21atrs002 kernel: blk_update_request: I/O error, dev sdg, sector 43055277048 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0
Mar 26 10:15:52 pve21atrs002 kernel: blk_update_request: I/O error, dev sdf, sector 43055277048 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0
Mar 26 10:15:52 pve21atrs002 kernel: blk_update_request: I/O error, dev sdf, sector 43055279080 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0
Mar 26 10:22:31 pve21atrs002 kernel: blk_update_request: I/O error, dev sdg, sector 43055279080 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0
Mar 26 10:22:31 pve21atrs002 kernel: blk_update_request: I/O error, dev sdg, sector 43055277048 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0

Mar 22 18:30:15 pve21atrs002 kernel: blk_update_request: I/O error, dev sdh, sector 43055277616 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0
Mar 22 18:36:23 pve21atrs002 kernel: blk_update_request: I/O error, dev sdi, sector 43055277616 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0
Mar 22 18:42:34 pve21atrs002 kernel: blk_update_request: I/O error, dev sdh, sector 43055277616 op 0x1: (WRITE) flags 0xca00 phys_seg 254 prio class 0

Mar 19 14:20:28 pve21atrs002 kernel: blk_update_request: I/O error, dev sdh, sector 43055279472 op 0x1: (WRITE) flags 0xca00 phys_seg 2 prio class 0
Mar 19 14:20:28 pve21atrs002 kernel: blk_update_request: I/O error, dev sdh, sector 43055279416 op 0x1: (WRITE) flags 0xca00 phys_seg 7 prio class 0
Mar 19 14:26:37 pve21atrs002 kernel: blk_update_request: I/O error, dev sdi, sector 43055279472 op 0x1: (WRITE) flags 0xca00 phys_seg 2 prio class 0
Mar 19 14:26:37 pve21atrs002 kernel: blk_update_request: I/O error, dev sdi, sector 43055279416 op 0x1: (WRITE) flags 0xca00 phys_seg 7 prio class 0
Mar 19 14:32:46 pve21atrs002 kernel: blk_update_request: I/O error, dev sdd, sector 43055279416 op 0x1: (WRITE) flags 0xca00 phys_seg 7 prio class 0
Mar 19 14:33:47 pve21atrs002 kernel: blk_update_request: I/O error, dev sdd, sector 43055279472 op 0x1: (WRITE) flags 0xca00 phys_seg 2 prio class 0
Mar 19 14:38:54 pve21atrs002 kernel: blk_update_request: I/O error, dev sdh, sector 43055279416 op 0x1: (WRITE) flags 0xca00 phys_seg 7 prio class 0
Mar 19 14:40:57 pve21atrs002 kernel: blk_update_request: I/O error, dev sdh, sector 43055279472 op 0x1: (WRITE) flags 0xca00 phys_seg 2 prio class 0
Mar 19 14:47:08 pve21atrs002 kernel: blk_update_request: I/O error, dev sdh, sector 43055279416 op 0x1: (WRITE) flags 0xca00 phys_seg 7 prio class 0
Mar 19 14:47:08 pve21atrs002 kernel: blk_update_request: I/O error, dev sdh, sector 43055279472 op 0x1: (WRITE) flags 0xca00 phys_seg 2 prio class 0

have you tried "dd if=/dev/sdh of=/dev/null" ?

>> Have not tried it yet.. Should I do this during late hours when users have gone ?

premjith_r · Apr 3, 2024

fireon said:
@premjith_r yes please show us the file. Did you generate it with the RedHat Multipath Generator? https://access.redhat.com/labs/multipathhelper/#/

Hardware is ok? SAN, Fibrechannel, or did you find in dmesg any error?

fireon · Apr 3, 2024

bbgeek17 said:
I am guessing you told them that you use Proxmox?

I say always i have Ubuntu, they have never recognised the difference, because the Proxmoxkernel is Ubuntu.

@premjith_r
I have here two Clusters with an HP MSA. I had also i/O Errors in the past. Cable was disconnected and the connections cleaned. Since then, the fault has disappeared. I have to admit I had the faults but not in this quantity or intensity.
Normally SANs are very talkative in the management interface and write errors and events very well. I would therefore look directly at the logs on the SAN. Hopefully the device has a web interface.

Controllcommand on Proxmox for the SAN:

Code:

multipath -ll
mpatha (3600c0ff0003c3761f5f37a5b01000000) dm-6 HPE,MSA 2050 SAN
size=2.7T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 0:0:0:0 sdc 8:32 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 3:0:0:0 sdd 8:48 active ready running

Status OK

And my multipathconfig with the redhatconfigurator:

Code:

# Modified by https://access.redhat.com/labs/multipathhelper/#/ v1.3.5

defaults {
    find_multipaths yes
    user_friendly_names yes
}

# Geräte die von multipath nicht gefunden werden dürfen blacklisten
# wwid auslesen: /lib/udev/scsi_id -g -u /dev/sda
# oder einfach mit "ls -l /dev/disk/by-id/" alle Namen die mit "scsi-" anfangen
blacklist {
       wwid 3600508b1001c493a13eba729f0adf278
}

devices {
        device {
                vendor "HP"
                product "MSA2[02]12fc|MSA2012i"
                path_grouping_policy "multibus"
                path_checker "tur"
                features "0"
                hardware_handler "0"
                prio "const"
                failback "immediate"
                rr_weight "uniform"
                no_path_retry "18"
                rr_min_io "100"
        }
}

If you are upgrading to Proxmox 8, it is also important to install the "multipath-tools-boot" package before rebooting into the new system. This is a Debian LVM bug.

premjith_r said:
dmesg | grep -i 'error\|fail\|I/O\|sector\|disk'

You can do also like this:

Code:

dmesg -l err

########

Supported log levels (priorities):
   emerg - system is unusable
   alert - action must be taken immediately
    crit - critical conditions
     err - error conditions
    warn - warning conditions
  notice - normal but significant condition
    info - informational
   debug - debug-level messages

bbgeek17 · Apr 4, 2024

premjith_r said:
>> Seems to be different sectors each time. Please see the details below:

The sectors are different, but they are localized to the same area. That could mean that its just where the VM image/disk is, or where SAN has issues accessing the disk. You can probably deduce based on the sector location which specific VM disk image is affected.

premjith_r said:
have you tried "dd if=/dev/sdh of=/dev/null" ?

>> Have not tried it yet.. Should I do this during late hours when users have gone ?

I would never advise someone to do things during production hours without being familiar with their system load. So, yes, of course, do it during quiet hours.
Keep in mind that its read vs write, but the result may still be educational.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

premjith_r · Apr 4, 2024

>> I say always i have Ubuntu,.....

Thanks, noted this point and will inform the same to HP Vendor.

>> SANs are very talkative in the management interface and write errors and events very well..

Checked the SAN web interface, it's showing all the disks health as Ok..

Will check if there are any other SAN tools available from SAN Vendor to get granular details on disk health including sectors

Not sure if any tools available for Proxmox that can indicate any such potential disk issues?

>> Controllcommand on Proxmox for the SAN:

Ran the same here and status looks Ok

#multipath -ll

mpatha (3600c0ff000530ac4c81d646201000000) dm-16 HPE,MSA 1060 iSCSI

size=28T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw

|-+- policy='round-robin 0' prio=50 status=active

| `- 4:0:0:1 sdi 8:128 active ready running

|-+- policy='round-robin 0' prio=50 status=enabled

| `- 3:0:0:1 sdh 8:112 active ready running

|-+- policy='round-robin 0' prio=10 status=enabled

| `- 2:0:0:1 sdd 8:48 active ready running

`-+- policy='round-robin 0' prio=10 status=enabled

`- 5:0:0:1 sdf 8:80 active ready running

mpathb (3600c0ff0005302c3f51d646201000000) dm-6 HPE,MSA 1060 iSCSI

size=14T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw

|-+- policy='round-robin 0' prio=50 status=active

| `- 2:0:0:2 sde 8:64 active ready running

|-+- policy='round-robin 0' prio=50 status=enabled

| `- 5:0:0:2 sdg 8:96 active ready running

|-+- policy='round-robin 0' prio=10 status=enabled

| `- 3:0:0:2 sdj 8:144 active ready running

`-+- policy='round-robin 0' prio=10 status=enabled

`- 4:0:0:2 sdk 8:160 acti

>> upgrading to Proxmox 8,...

Noted

>> The sectors are different, but they are localized to the same area.

Noted

>> Keep in mind that its read vs write, but the result may still be educational.

Sure, will run dd during quiet hours and update

Another point to add, we are using this 3 node cluster in production for more than 2 years now without a hiccup, with RAM around 70% used and CPU around 50% used. When we started to see the hang in Node2 few weeks back, we suspected of the high load in memory (around 80%) and reduced the memory load in node2 to around 50%. However, this hang happened even when the CPU and Memory load was low (around 50% Mem usage and 25% CPU usage).

Also, is re-use of VM ID recommended config? Recently we had re-used VM ID on node1 before 1 week of node2 hang.

emunt6 · Apr 4, 2024

premjith_r said:
Please note our multipath configuration as given below,

cat /etc/multipath.conf

defaults {

polling_interval 2

path_selector "round-robin 0"

path_grouping_policy failover

uid_attribute ID_SERIAL

rr_min_io 100

failback immediate

prio iet

prio_args preferredip=192.168.2.2

no_path_retry queue

user_friendly_names yes

}

blacklist_exceptions {

wwid "3600c0ff000530ac4c81d646201000000"

wwid "3600c0ff0005302c3f51d646201000000"

}

multipaths {

multipath {

wwid "3600c0ff000530ac4c81d646201000000"

alias mpatha

}

multipath {

wwid "3600c0ff0005302c3f51d646201000000"

alias mpathb

}

}

Is there any difference in the "/etc/multipath.conf" between the Proxmox nodes?
>> There is no difference

Other options, if you using 2x FC switch with zones, maybe you did forget to add all Proxmox nodes to the ACL.
>> We are using iSCSI SAN and not configured any zones

OK.

Suggestion:
Add blacklist all by default, add exceptions what is needed ( you already done the 2nd part ).

/etc/multipath.conf

Code:

blacklist {
         wwid ".*"
}

blacklist_exceptions {
        wwid "3600c0ff000530ac4c81d646201000000"
        wwid "3600c0ff0005302c3f51d646201000000"
}

Can you check the contents of the following files in the Proxmox nodes?

Code:

/etc/multipath/bindings
/etc/multipath/wwids

The error can be, some wwid mismatch in the above files, in that case:
-Stop the multipathd
-Remove the line or comment out "#"
-Start the multipathd

premjith_r · Apr 5, 2024

emunt6 said:
OK.

Suggestion:
Add blacklist all by default, add exceptions what is needed ( you already done the 2nd part ).

/etc/multipath.conf

Code:

blacklist { wwid ".*" } blacklist_exceptions { wwid "3600c0ff000530ac4c81d646201000000" wwid "3600c0ff0005302c3f51d646201000000" }

Can you check the contents of the following files in the Proxmox nodes?

Code:

/etc/multipath/bindings /etc/multipath/wwids

The error can be, some wwid mismatch in the above files, in that case:
-Stop the multipathd
-Remove the line or comment out "#"
-Start the multipathd

Suggestion:
Add blacklist all by default, add exceptions what is needed ( you already done the 2nd part ).

Thanks, will add it.

Can you check the contents of the following files in the Proxmox nodes?
/etc/multipath/bindings
/etc/multipath/wwids

The error can be, some wwid mismatch in the above files,

>> Verified and found no mismatch in the above files on all the 3 nodes

premjith_r · Apr 5, 2024

fireon said:
I say always i have Ubuntu, they have never recognised the difference, because the Proxmoxkernel is Ubuntu.

@premjith_r
I have here two Clusters with an HP MSA. I had also i/O Errors in the past. Cable was disconnected and the connections cleaned. Since then, the fault has disappeared. I have to admit I had the faults but not in this quantity or intensity.
Normally SANs are very talkative in the management interface and write errors and events very well. I would therefore look directly at the logs on the SAN. Hopefully the device has a web interface.

Controllcommand on Proxmox for the SAN:

Code:

multipath -ll mpatha (3600c0ff0003c3761f5f37a5b01000000) dm-6 HPE,MSA 2050 SAN size=2.7T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | `- 0:0:0:0 sdc 8:32 active ready running `-+- policy='service-time 0' prio=10 status=enabled `- 3:0:0:0 sdd 8:48 active ready running

Status OK

And my multipathconfig with the redhatconfigurator:

Code:

# Modified by https://access.redhat.com/labs/multipathhelper/#/ v1.3.5 defaults { find_multipaths yes user_friendly_names yes } # Geräte die von multipath nicht gefunden werden dürfen blacklisten # wwid auslesen: /lib/udev/scsi_id -g -u /dev/sda # oder einfach mit "ls -l /dev/disk/by-id/" alle Namen die mit "scsi-" anfangen blacklist { wwid 3600508b1001c493a13eba729f0adf278 } devices { device { vendor "HP" product "MSA2[02]12fc|MSA2012i" path_grouping_policy "multibus" path_checker "tur" features "0" hardware_handler "0" prio "const" failback "immediate" rr_weight "uniform" no_path_retry "18" rr_min_io "100" } }

If you are upgrading to Proxmox 8, it is also important to install the "multipath-tools-boot" package before rebooting into the new system. This is a Debian LVM bug.

You can do also like this:

Code:

dmesg -l err ######## Supported log levels (priorities): emerg - system is unusable alert - action must be taken immediately crit - critical conditions err - error conditions warn - warning conditions notice - normal but significant condition info - informational debug - debug-level messages

We checked both the SAN web interface and the CLI commands and found all the drives are healthy and functional and no media errors (bad sectors) are reported.

Proxmox cluster one node hang frequently

New Member

Attachments

Distinguished Member

New Member

Distinguished Member

New Member

Distinguished Member

Distinguished Member

New Member

Active Member

Distinguished Member

Distinguished Member

New Member

New Member

New Member

Distinguished Member

Distinguished Member

New Member

Active Member

New Member

New Member