Proxmox 6.1-11 hangs and requires manual restart

tema_tema

Member
Sep 2, 2019
4
0
6
32
Hi everyone.

For the last year I continue to experience the problem which requires me to manually restart Proxmox machine to make whole virtual environment works.
I have update Proxmox system to 6.1-11 last week and expected that this behaviour will be fix. Unfortunately, no.

How it looks like:
1. I start physical server, Proxmox starts as properly, all the autostart containers and virtual machines are up, working great.
2. After several days I will not be able to log in into the web UI: it doesn't respond and I still not able even to ssh to the Proxmox node, ssh reports:
Code:
ssh <mysudouser>@192.168.0.25
ssh_exchange_identification: read: Connection reset by peer
3. I still able to connect via ssh to all LXC containers and Emu VMs, but once I `reboot` LXC container through ssh it will not start up back.

Today I again experienced the same issue. From the syslog I found that log simply stopped at May 13 19:07:00 and was empty until I came to the server and manually made hard reboot of it.

Did anyone experience the same problem? Can someone please help to gather more information to find out the roots of this problem, because now it became super annoying.

System configuration:

Linux grg 5.3.18-3-pve #1 SMP PVE 5.3.18-3 (Tue, 17 Mar 2020 16:33:19 +0100) x86_64 GNU/Linux

Code:
unname -a
Linux grg 5.3.18-3-pve #1 SMP PVE 5.3.18-3 (Tue, 17 Mar 2020 16:33:19 +0100) x86_64 GNU/Linux


pvs output:
Code:
  PV         VG  Fmt  Attr PSize    PFree
  /dev/sda3  pve lvm2 a--  <111.54g <13.75g


fdisk -l output:
Code:
fdisk -l
Disk /dev/sda: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: KINGSTON SA400S3
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 5BCB9516-5CEE-4E2E-BCFE-2D0CA7F299AD

Device      Start       End   Sectors   Size Type
/dev/sda1    2048      4095      2048     1M BIOS boot
/dev/sda2    4096    528383    524288   256M EFI System
/dev/sda3  528384 234441614 233913231 111.6G Linux LVM


Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: ST4000NM0035-1V4
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F298D678-6CF2-9D4A-8D63-539DDCF7F2B8

Device          Start        End    Sectors  Size Type
/dev/sdb1        2048 7814019071 7814017024  3.7T Solaris /usr & Apple ZFS
/dev/sdb9  7814019072 7814035455      16384    8M Solaris reserved 1


Disk /dev/sdc: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: ST4000NM0035-1V4
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 9E65069C-A3FE-D242-9039-DD234DD1F185

Device          Start        End    Sectors  Size Type
/dev/sdc1        2048 7814019071 7814017024  3.7T Solaris /usr & Apple ZFS
/dev/sdc9  7814019072 7814035455      16384    8M Solaris reserved 1




Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-root: 27.8 GiB, 29796335616 bytes, 58195968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/zd0: 64 GiB, 68719476736 bytes, 134217728 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x7ef8b715

Device     Boot  Start       End   Sectors  Size Id Type
/dev/zd0p1 *      2048    718847    716800  350M  7 HPFS/NTFS/exFAT
/dev/zd0p2      718848 134215679 133496832 63.7G  7 HPFS/NTFS/exFAT


Disk /dev/zd16: 64 GiB, 68719476736 bytes, 134217728 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x68606cca

Device      Boot     Start       End   Sectors  Size Id Type
/dev/zd16p1 *         2048 128684031 128681984 61.4G 83 Linux
/dev/zd16p2      128686078 134215679   5529602  2.7G  5 Extended
/dev/zd16p5      128686080 134215679   5529600  2.7G 82 Linux swap / Solaris

Partition 2 does not start on physical sector boundary.


Disk /dev/zd32: 64 GiB, 68719476736 bytes, 134217728 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0x578a96ce

Device      Boot  Start       End   Sectors  Size Id Type
/dev/zd32p1 *      2048    718847    716800  350M  7 HPFS/NTFS/exFAT
/dev/zd32p2      718848 134215679 133496832 63.7G  7 HPFS/NTFS/exFAT


GPT PMBR size mismatch (1992294399 != 2936012799) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.
Disk /dev/zd48: 1.4 TiB, 1503238553600 bytes, 2936012800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: gpt
Disk identifier: E4DBA4C9-DDF1-4D8C-8DCE-F8C07C267374

Device      Start        End    Sectors  Size Type
/dev/zd48p1  2048 1992294366 1992292319  950G Linux filesystem
 

Attachments

Last edited:
Hi,

Your log show that an disk is no more accessible.
Code:
May 14 17:55:46 grg smartd[1801]: Device: /dev/sdb [SAT], 5 Currently unreadable (pending) sectors
May 14 17:55:46 grg smartd[1801]: Device: /dev/sdb [SAT], 5 Offline uncorrectable sectors
If the rootfs are no more able to write you can't connect with ssh or to the WebGUI.
I guess your VM has an isolated disk pool?
If so this is working and that is the reason why the VM/CT is working.
 
Hi,

Your log show that an disk is no more accessible.
Code:
May 14 17:55:46 grg smartd[1801]: Device: /dev/sdb [SAT], 5 Currently unreadable (pending) sectors
May 14 17:55:46 grg smartd[1801]: Device: /dev/sdb [SAT], 5 Offline uncorrectable sectors
If the rootfs are no more able to write you can't connect with ssh or to the WebGUI.
I guess your VM has an isolated disk pool?
If so this is working and that is the reason why the VM/CT is working.
Hi,

Yes, you are right:
- Proxmox is installed on KINGSTON SA400S3 and it actually is /dev/sda
- All the VMs are physically stored on ZFS (/dev/sdb ,/dev/ sdc) raid 1 which is mounted to /storage/vmstorage

For the errors you mentioned, I can't see any errors through zpool status:
Code:
  pool: storage
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 04:59:20 with 0 errors on Sun Mar  8 05:23:21 2020
config:

    NAME                                  STATE     READ WRITE CKSUM
    storage                               ONLINE       0     0     0
      mirror-0                            ONLINE       0     0     0
        ata-ST4000NM0035-1V4107_ZC133ZH0  ONLINE       0     0     0
        ata-ST4000NM0035-1V4107_ZC1341A4  ONLINE       0     0     0

errors: No known data errors
 
Last edited:
If the root disk is sda and sdb is failing 10min before logging stops.
I would guess it could be an disk controller HW problem.