Hello everyone,
We experience strange behavior on one PVE host in our cluster. Our cluster have only three hosts. Hosts are not brand name servers they are custom made desktops. They are not the same hardware but they are very similar. All hosts have the same version of pve, latest uptades from no-production repositories - version is listed bellow
Storages on this hosts are as this: two SSD for hypervisor system and some pct and vms, one ssd partitioned and partition used as cache and logs, two HDDs for all others vms and pcts and for local backup. On each host Two ssd are organized as zfs raid mirror, and two hdd are also organized as zfs raid mirror. [See picture bellow]
Two host have 32 GB of RAM (max. by mobo), and third have 64 GB (max. by mobo)
What is strange is that on one host , which i call pm3 (screen shots posted here are for that host), have high I/O as high as more than 50%, EVEN IF NO running virtual machines or containers on that host. As you can see at picture bellow:
When we turn off and turn on , or reboot/restart, this host it starts normal and work ok for some time (maybe 12h) but after that I/O starts to grow not so fast but after 24h it is on the level you can see on picture, without running anything by system itself.
If we reboot it again this behavior repeats.
The host with that high I/O Delay become slow , irresponsible for any action and sometime other cluster nods don't see that host (red cross icon on web gui).
If we try to start some vms or pct , when system was previously rebooted, it works fine but how time goes they start to be slow and difficult to work with it?
Also new installation od vms was slow as well as host updates.
We check disks by SMART and it says that they are ok.
Please any suggestion how to find what is the problem an how to resolve ?
We experience strange behavior on one PVE host in our cluster. Our cluster have only three hosts. Hosts are not brand name servers they are custom made desktops. They are not the same hardware but they are very similar. All hosts have the same version of pve, latest uptades from no-production repositories - version is listed bellow
Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.131-2-pve)
pve-manager: 7.4-17 (running version: 7.4-17/513c62be)
pve-kernel-5.15: 7.4-9
pve-kernel-5.4: 6.4-19
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.131-1-pve: 5.15.131-2
pve-kernel-5.15.126-1-pve: 5.15.126-1
pve-kernel-5.4.195-1-pve: 5.4.195-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: not correctly installed
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.4-1
proxmox-backup-file-restore: 2.4.4-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-6
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.14-pve1
Two host have 32 GB of RAM (max. by mobo), and third have 64 GB (max. by mobo)
What is strange is that on one host , which i call pm3 (screen shots posted here are for that host), have high I/O as high as more than 50%, EVEN IF NO running virtual machines or containers on that host. As you can see at picture bellow:
When we turn off and turn on , or reboot/restart, this host it starts normal and work ok for some time (maybe 12h) but after that I/O starts to grow not so fast but after 24h it is on the level you can see on picture, without running anything by system itself.
If we reboot it again this behavior repeats.
The host with that high I/O Delay become slow , irresponsible for any action and sometime other cluster nods don't see that host (red cross icon on web gui).
If we try to start some vms or pct , when system was previously rebooted, it works fine but how time goes they start to be slow and difficult to work with it?
Also new installation od vms was slow as well as host updates.
We check disks by SMART and it says that they are ok.
Please any suggestion how to find what is the problem an how to resolve ?