Can you please give advice what to do in this vary situation?the message indicate that the kernel hung for some time, but does not show the root cause (i guess faulty/slow hardware?)
can you post the complete output of 'dmesg' ?
Aug 5 11:35:20 pmx01 kernel: [ 2056.950722] INFO: task kworker/u130:3:6748 blocked for more than 120 seconds.
basically it tells you that it the process hung in io tasks for more than 2 minutes.Aug 5 11:35:20 pmx01 kernel: [ 2056.954918] INFO: task qemu-img:13875 blocked for more than 120 seconds.
I've turned all the VMs on the node, so nothing would use the disk IO, and tried to make several tests on both nodes in cluster:basically it tells you that it the process hung in io tasks for more than 2 minutes.
this is most often an indicator that the storage is too slow for the operations done on it (e.g. too many vms that do io, too many disk operations in parallel, some combination of those, etc.)
Aug 5 11:34:10 pmx01 systemd-udevd[1167]: dm-43: Worker [1221] processing SEQNUM=22532 is taking a long time
Aug 5 11:35:20 pmx01 kernel: [ 2056.950722] INFO: task kworker/u130:3:6748 blocked for more than 120 seconds.
Aug 5 11:35:20 pmx01 kernel: [ 2056.952156] Tainted: P O 5.4.73-1-pve #1
Aug 5 11:35:20 pmx01 kernel: [ 2056.953393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 5 11:35:20 pmx01 kernel: [ 2056.954690] kworker/u130:3 D 0 6748 2 0x80004000
Aug 5 11:35:20 pmx01 kernel: [ 2056.954711] Workqueue: writeback wb_workfn
Aug 5 11:35:20 pmx01 kernel: [ 2056.954718] Call Trace:
Aug 5 11:35:20 pmx01 kernel: [ 2056.954734] __schedule+0x2e6/0x6f0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954738] schedule+0x33/0xa0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954741] io_schedule+0x16/0x40
Aug 5 11:35:20 pmx01 kernel: [ 2056.954750] __lock_page+0x122/0x220
Aug 5 11:35:20 pmx01 kernel: [ 2056.954756] ? file_fdatawait_range+0x30/0x30
Aug 5 11:35:20 pmx01 kernel: [ 2056.954760] write_cache_pages+0x22b/0x4a0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954764] ? __wb_calc_thresh+0x130/0x130
Aug 5 11:35:20 pmx01 kernel: [ 2056.954769] generic_writepages+0x56/0x90
Aug 5 11:35:20 pmx01 kernel: [ 2056.954777] blkdev_writepages+0xe/0x10
Aug 5 11:35:20 pmx01 kernel: [ 2056.954786] do_writepages+0x41/0xd0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954795] ? ttwu_do_wakeup+0x1e/0x150
Aug 5 11:35:20 pmx01 kernel: [ 2056.954798] ? ttwu_do_activate+0x5a/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954802] __writeback_single_inode+0x40/0x350
Aug 5 11:35:20 pmx01 kernel: [ 2056.954806] ? try_to_wake_up+0x67/0x650
Aug 5 11:35:20 pmx01 kernel: [ 2056.954810] writeback_sb_inodes+0x209/0x4a0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954818] __writeback_inodes_wb+0x66/0xd0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954825] wb_writeback+0x25b/0x2f0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954834] wb_workfn+0x33e/0x490
Aug 5 11:35:20 pmx01 kernel: [ 2056.954842] ? __switch_to_asm+0x40/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954848] ? __switch_to_asm+0x34/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954855] ? __switch_to_asm+0x40/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954861] ? __switch_to_asm+0x34/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954866] ? __switch_to_asm+0x40/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954871] ? __switch_to_asm+0x34/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954880] ? __switch_to+0x85/0x480
Aug 5 11:35:20 pmx01 kernel: [ 2056.954883] ? __schedule+0x2ee/0x6f0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954890] process_one_work+0x20f/0x3d0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954893] worker_thread+0x34/0x400
Aug 5 11:35:20 pmx01 kernel: [ 2056.954898] kthread+0x120/0x140
Aug 5 11:35:20 pmx01 kernel: [ 2056.954901] ? process_one_work+0x3d0/0x3d0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954904] ? kthread_park+0x90/0x90
Aug 5 11:35:20 pmx01 kernel: [ 2056.954907] ret_from_fork+0x35/0x40
Aug 5 11:35:20 pmx01 kernel: [ 2056.954918] INFO: task qemu-img:13875 blocked for more than 120 seconds.
Aug 5 11:35:20 pmx01 kernel: [ 2056.956150] Tainted: P O 5.4.73-1-pve #1
whats your 'pveversion -v' and what kind of hardware do you run it on ? (cpu/memory/storage/etc)
# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
ok, first i'd upgrade your nodes to at least 6.4, but even better would be a supported version of proxmox (currently 7.2)
also, which models of the hdds? is your vm storage on the hdds?
ok the raid card may also play a role here.. any chance to remove that from the equation (so using a hba/onboard sata instead?)Each node has 2 SSDs and 1 HDD. Each drive is configured as RAID0 and has LVM Thin pool on it (so total of 3 LVM Thin pools on each node). Generally all VMs are on SSDs, HDD is used for backups or as secondary drive for VM to keep some data
the upgrades should normally go well if you follow the upgrade guide, but in any case it would be good to have proper working backups of your vms/configuration which you can restore if something goes wrongAlready read a lot of articles how to upgrade Proxmox, but I'm afraid that something could go wrong and after upgrade and cluster will be broken or VMs won't start. Is there a way to secure the cluster while upgrading?
ok the raid card may also play a role here.. any chance to remove that from the equation (so using a hba/onboard sata instead?)
the upgrades should normally go well if you follow the upgrade guide, but in any case it would be good to have proper working backups of your vms/configuration which you can restore if something goes wrong
depends on what you configured, most probably the network configHaving backup of VMs is obvious, but what kind of configuration would you recommend to save from node itself?
We use essential cookies to make this site work, and optional cookies to enhance your experience.