Proxmox 6.2 Kernel Error Bad RIP Value

kweevuss · Jun 18, 2020

I have dell r710 which I am running proxmox 6.2 on. The host has a mix of some local storage, and other storage presented via iSCSI/NFS. The local storage consists of 2x500G drives in raid 1 for the proxmox OS, and 2x500G SSDs in raid 1.

Most of the VMs run on the iSCSI share, while I have a few running on the local SSD storage.

I have been running this in my homelab for a while now, but all of a sudden I have started to have a issue I am not sure how to track down. It seems like the host can be online for about a week, then all of a sudden several of my VMs go offline.
When this happens I cannot even access the console in proxmox. They usually time out, and I believe it says "error waiting on systemd". What is in common is that they are all on the SSD storage. But not all VMs are affected on this storage. For example I also have pfsense running on these SSDs,
and it continues to work and also console access works.

It is also interesting that when this happens the I/O delay shoots up to about 8% and stays there.

I find this error in the syslog file, but I'm not sure what it means and what next steps I should take with it is.

Jun 18 03:10:26 compute1 kernel: [623467.883697] INFO: task kvm:2361 blocked for more than 241 seconds.
Jun 18 03:10:26 compute1 kernel: [623467.883732] Tainted: P IOE 5.4.34-1-pve #1
Jun 18 03:10:26 compute1 kernel: [623467.883752] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 18 03:10:26 compute1 kernel: [623467.883777] kvm D 0 2361 1 0x00000000
Jun 18 03:10:26 compute1 kernel: [623467.883779] Call Trace:
Jun 18 03:10:26 compute1 kernel: [623467.883787] __schedule+0x2e6/0x700
Jun 18 03:10:26 compute1 kernel: [623467.883789] schedule+0x33/0xa0
Jun 18 03:10:26 compute1 kernel: [623467.883790] schedule_preempt_disabled+0xe/0x10
Jun 18 03:10:26 compute1 kernel: [623467.883792] __mutex_lock.isra.10+0x2c9/0x4c0
Jun 18 03:10:26 compute1 kernel: [623467.883823] ? kvm_arch_vcpu_put+0xe2/0x170 [kvm]
Jun 18 03:10:26 compute1 kernel: [623467.883825] __mutex_lock_slowpath+0x13/0x20
Jun 18 03:10:26 compute1 kernel: [623467.883826] mutex_lock+0x2c/0x30
Jun 18 03:10:26 compute1 kernel: [623467.883828] sr_block_ioctl+0x43/0xd0
Jun 18 03:10:26 compute1 kernel: [623467.883832] blkdev_ioctl+0x4c1/0x9e0
Jun 18 03:10:26 compute1 kernel: [623467.883835] block_ioctl+0x3d/0x50
Jun 18 03:10:26 compute1 kernel: [623467.883837] do_vfs_ioctl+0xa9/0x640
Jun 18 03:10:26 compute1 kernel: [623467.883838] ksys_ioctl+0x67/0x90
Jun 18 03:10:26 compute1 kernel: [623467.883840] __x64_sys_ioctl+0x1a/0x20
Jun 18 03:10:26 compute1 kernel: [623467.883843] do_syscall_64+0x57/0x190
Jun 18 03:10:26 compute1 kernel: [623467.883846] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 18 03:10:26 compute1 kernel: [623467.883848] RIP: 0033:0x7f2e40f97427
Jun 18 03:10:26 compute1 kernel: [623467.883852] Code: Bad RIP value.
Jun 18 03:10:26 compute1 kernel: [623467.883853] RSP: 002b:00007f2d75ffa098 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jun 18 03:10:26 compute1 kernel: [623467.883855] RAX: ffffffffffffffda RBX: 00007f2e33af4850 RCX: 00007f2e40f97427
Jun 18 03:10:26 compute1 kernel: [623467.883856] RDX: 000000007fffffff RSI: 0000000000005326 RDI: 0000000000000012
Jun 18 03:10:26 compute1 kernel: [623467.883856] RBP: 0000000000000001 R08: 0000559be29be890 R09: 0000000000000000
Jun 18 03:10:26 compute1 kernel: [623467.883857] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2d74a42268
Jun 18 03:10:26 compute1 kernel: [623467.883858] R13: 0000000000000000 R14: 0000559be2ef0d20 R15: 0000559be27fc740

Steps I have done so far:
I ran a memtest for one entire pass, and no issues were found
I ran smart tests on all the local drives, no issues were found
I removed the SSDs from the host, checked their status in windows, ran additional tests (none found) and applied available firmware.

Any help of a next step or details in what the possible message means would be appreciated!

Edit: If it helps, pveversion output is below:

Code:

proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

wolfgang · Jun 22, 2020

Hi,

please upgrade to the current kernel 5.4.44-1pve and install the "intel-microcode" package.

kweevuss · Jun 23, 2020

wolfgang said:
Hi,

please upgrade to the current kernel 5.4.44-1pve and install the "intel-microcode" package.

I appericate the response, I am upgrading now and I will update in about a week how it seems to hold up!

kweevuss · Jun 30, 2020

wolfgang said:
Hi,

please upgrade to the current kernel 5.4.44-1pve and install the "intel-microcode" package.

Unfortunately, again after a week of uptime I have seen the same error. I did make another change, in which I moved 2 of the 3 VMs to other storage than they were originally on to troubleshoot if it was something with the local SSD storage. That did not make a difference, and today the same 3 VMs on this host went down.

I did the intel micro code package through apt, was this the method that you were referring to?

Code:

apt list | grep intel

intel-microcode/oldstable,now 3.20200609.2~deb9u1 amd64 [installed]

Code:

Jun 30 05:10:25 compute1 kernel: [642437.024414] kvm             D    0  1929      1 0x00000000
Jun 30 05:10:25 compute1 kernel: [642437.024417] Call Trace:
Jun 30 05:10:25 compute1 kernel: [642437.024427]  __schedule+0x2e6/0x6f0
Jun 30 05:10:25 compute1 kernel: [642437.024429]  schedule+0x33/0xa0
Jun 30 05:10:25 compute1 kernel: [642437.024431]  schedule_preempt_disabled+0xe/0x10
Jun 30 05:10:25 compute1 kernel: [642437.024433]  __mutex_lock.isra.10+0x2c9/0x4c0
Jun 30 05:10:25 compute1 kernel: [642437.024464]  ? kvm_arch_vcpu_put+0xe2/0x170 [kvm]
Jun 30 05:10:25 compute1 kernel: [642437.024482]  ? kvm_skip_emulated_instruction+0x3b/0x60 [kvm]
Jun 30 05:10:25 compute1 kernel: [642437.024484]  __mutex_lock_slowpath+0x13/0x20
Jun 30 05:10:25 compute1 kernel: [642437.024485]  mutex_lock+0x2c/0x30
Jun 30 05:10:25 compute1 kernel: [642437.024488]  sr_block_ioctl+0x43/0xd0
Jun 30 05:10:25 compute1 kernel: [642437.024493]  blkdev_ioctl+0x4c1/0x9e0
Jun 30 05:10:25 compute1 kernel: [642437.024497]  ? __wake_up_locked_key+0x1b/0x20
Jun 30 05:10:25 compute1 kernel: [642437.024501]  block_ioctl+0x3d/0x50
Jun 30 05:10:25 compute1 kernel: [642437.024503]  do_vfs_ioctl+0xa9/0x640
Jun 30 05:10:25 compute1 kernel: [642437.024505]  ksys_ioctl+0x67/0x90
Jun 30 05:10:25 compute1 kernel: [642437.024506]  __x64_sys_ioctl+0x1a/0x20
Jun 30 05:10:25 compute1 kernel: [642437.024509]  do_syscall_64+0x57/0x190
Jun 30 05:10:25 compute1 kernel: [642437.024512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 30 05:10:25 compute1 kernel: [642437.024514] RIP: 0033:0x7f8571b90427
Jun 30 05:10:25 compute1 kernel: [642437.024519] Code: Bad RIP value.
Jun 30 05:10:25 compute1 kernel: [642437.024520] RSP: 002b:00007f8563d790d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jun 30 05:10:25 compute1 kernel: [642437.024522] RAX: ffffffffffffffda RBX: 00007f85646ea9a0 RCX: 00007f8571b90427
Jun 30 05:10:25 compute1 kernel: [642437.024523] RDX: 000000007fffffff RSI: 0000000000005326 RDI: 0000000000000014
Jun 30 05:10:25 compute1 kernel: [642437.024524] RBP: 0000000000000000 R08: 0000560d289ff710 R09: 0000000000000000
Jun 30 05:10:25 compute1 kernel: [642437.024525] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f83ddb09800
Jun 30 05:10:25 compute1 kernel: [642437.024526] R13: 0000000000000006 R14: 0000560d28f31d20 R15: 0000560d2883d740
Jun 30 05:10:25 compute1 kernel: [642437.024607] kvm             D    0  2043      1 0x00000000

Code:

pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-1-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-3
pve-kernel-helper: 6.2-3
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-7
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

wolfgang · Jul 1, 2020

kweevuss said:
I did the intel micro code package through apt, was this the method that you were referring to?

This is correct.

When you say you moved the VM to another storage. What is this storage?
Is this on the same HBA/Raid/Onboard controller?

kweevuss · Jul 1, 2020

The storage it was on, was on a raid controller local to this proxmox system. I moved it to a iSCSI share running on a freenas install.

I thought as a troubleshooting step I would first reinstall proxmox 6.2, and then try 6.1 again. This morning again, it did the same panic once again on 6.2. I was going to try 6.1 again. It is also strange that this time it happened in under than 24 hours, and usually it takes a week. I thought this might be a good option to rule out any hardware issues on my side.

wolfgang · Jul 2, 2020

You don't have to install PVE 6.1
It should be enough when you install an old kernel and boot it.

kweevuss · Jul 11, 2020

So on PVE 6.1 and older kernel it seems to be up over a week (9 days).

pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2

I am not very familiar with running different kernels, but I assume I would just install these through apt-get? Is there a list of kernel versions so I knew which to install?

I could try upgrading to 6.2 PVE again, and using an older kernel as well.

Denis Kulikov · Jul 11, 2020

kweevuss are you use storage replication in your configuration?

kweevuss · Jul 11, 2020

Denis Kulikov said:
kweevuss are you use storage replication in your configuration?

No nothing regarding storage replication is configured on this system.

wolfgang · Jul 13, 2020

kweevuss said:
I am not very familiar with running different kernels, but I assume I would just install these through apt-get?

Yes, you can install it like this.

Code:

apt install pve-kernel-5.3.18-2-pve

udo · Sep 29, 2020

Hi,
any news on this?
I have an similiar effect, the VMs with lvm are still working, but all lvm-commands (lvs, vgs, pvs) hung and due this the node and all VMs are marked with an ? in the gui.

Code:

Sep 28 21:14:27 pve02 kernel: [ 2783.664724] INFO: task pvs:411112 blocked for more than 362 seconds.
Sep 28 21:14:27 pve02 kernel: [ 2783.692327]       Tainted: P           O      5.4.60-1-pve #1
Sep 28 21:14:27 pve02 kernel: [ 2783.716796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 21:14:27 pve02 kernel: [ 2783.725214] pvs             D    0 411112 401506 0x00000004
Sep 28 21:14:27 pve02 kernel: [ 2783.731267] Call Trace:
Sep 28 21:14:27 pve02 kernel: [ 2783.734179]  __schedule+0x2e6/0x6f0
Sep 28 21:14:27 pve02 kernel: [ 2783.738137]  schedule+0x33/0xa0
Sep 28 21:14:27 pve02 kernel: [ 2783.741746]  schedule_preempt_disabled+0xe/0x10
Sep 28 21:14:27 pve02 kernel: [ 2783.746757]  __mutex_lock.isra.10+0x2c9/0x4c0
Sep 28 21:14:27 pve02 kernel: [ 2783.751588]  __mutex_lock_slowpath+0x13/0x20
Sep 28 21:14:27 pve02 kernel: [ 2783.756325]  mutex_lock+0x2c/0x30
Sep 28 21:14:27 pve02 kernel: [ 2783.760072]  disk_block_events+0x31/0x80
Sep 28 21:14:27 pve02 kernel: [ 2783.764430]  __blkdev_get+0x72/0x560
Sep 28 21:14:27 pve02 kernel: [ 2783.768433]  blkdev_get+0xef/0x150
Sep 28 21:14:27 pve02 kernel: [ 2783.772264]  ? blkdev_get_by_dev+0x50/0x50
Sep 28 21:14:27 pve02 kernel: [ 2783.776787]  blkdev_open+0x87/0xa0
Sep 28 21:14:27 pve02 kernel: [ 2783.780614]  do_dentry_open+0x143/0x3a0
Sep 28 21:14:27 pve02 kernel: [ 2783.784942]  vfs_open+0x2d/0x30
Sep 28 21:14:27 pve02 kernel: [ 2783.788523]  path_openat+0x2e9/0x16f0
Sep 28 21:14:27 pve02 kernel: [ 2783.792615]  ? filename_lookup.part.60+0xe0/0x170
Sep 28 21:14:27 pve02 kernel: [ 2783.797748]  do_filp_open+0x93/0x100
Sep 28 21:14:27 pve02 kernel: [ 2783.801755]  ? __alloc_fd+0x46/0x150
Sep 28 21:14:27 pve02 kernel: [ 2783.805760]  do_sys_open+0x177/0x280
Sep 28 21:14:27 pve02 kernel: [ 2783.809845]  __x64_sys_openat+0x20/0x30
Sep 28 21:14:27 pve02 kernel: [ 2783.814140]  do_syscall_64+0x57/0x190
Sep 28 21:14:27 pve02 kernel: [ 2783.818731]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep 28 21:14:27 pve02 kernel: [ 2783.824695] RIP: 0033:0x7f44154d31ae
Sep 28 21:14:27 pve02 kernel: [ 2783.829163] Code: Bad RIP value.
Sep 28 21:14:27 pve02 kernel: [ 2783.833275] RSP: 002b:00007fffa0944800 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Sep 28 21:14:27 pve02 kernel: [ 2783.841829] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f44154d31ae
Sep 28 21:14:27 pve02 kernel: [ 2783.849860] RDX: 0000000000044000 RSI: 00005590d9097d70 RDI: 00000000ffffff9c
Sep 28 21:14:27 pve02 kernel: [ 2783.857990] RBP: 00007fffa0944960 R08: 00005590d7c5ca17 R09: 00007fffa0944a30
Sep 28 21:14:27 pve02 kernel: [ 2783.866100] R10: 0000000000000000 R11: 0000000000000246 R12: 00005590d7c53c68
Sep 28 21:14:27 pve02 kernel: [ 2783.874252] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000

After an reboot it's work for an short time (app. 1h) . First it's looks that I had trouble with raid-volumes, because I'm tried to expand an raid-volume on the hardware raid controller, but the raidexpansion is working now (this reducing IO) and the issue start again.

Code:

pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-6
pve-kernel-helper: 6.2-6
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.13-3-pve: 5.3.13-3
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.11-pve1
ceph-fuse: 14.2.11-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-1
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-12
pve-cluster: 6.1-8
pve-container: 3.1-13
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-1
pve-qemu-kvm: 5.0.0-13
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1

Same happens an reboot before - where Kernel pve-kernel-5.4.44-2-pve: 5.4.44-2 was active (the issue starts after the online raid-extension, which wasn't successfull and the whole server stops IO and needed an reset).
The Server is an Dell R7415 with EPYC 7451.

Udo

kweevuss · Sep 29, 2020

For me not much of a change. I ended up staying on the older proxmox version and kernel as I posted above:

pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2

Personally I am planning on upgrading this server within the next 6 months. So I was hoping new hardware would change this outcome.

udo · Sep 30, 2020

kweevuss said:
For me not much of a change. I ended up staying on the older proxmox version and kernel as I posted above:

pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2

Personally I am planning on upgrading this server within the next 6 months. So I was hoping new hardware would change this outcome.

Hi,
I'm not sure if new hardware is the key - this happens for me with actual hardware (less than 1 year old).
But I've running the kernel on other (older) Hardware without trouble…

Udo

wolfgang · Sep 30, 2020

Udo do you use an IPERC on this server if yes can you tell me which one it is.
Maybe it is related to the Disk IO devices.
We have here no such problems with EPYC CPUs.
Also, your storage config is interesting.

udo · Sep 30, 2020

wolfgang said:
Udo do you use an IPERC on this server if yes can you tell me which one it is.
Maybe it is related to the Disk IO devices.
We have here no such problems with EPYC CPUs.
Also, your storage config is interesting.

Hi Wolfgang,
yes the lvm storage is behind an perc:

Code:

PERC H740P Mini (Integriert)    Integrated RAID Controller 1    Firmware: 51.13.0-3485    Cache: 8192 MB

The lvm is on an raid6 with 6 hdds (now in expansion-process with 2 further disks)
On this node, there are only 2 lcm-storages defined

Code:

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

lvmthin: hdd-lvm
        thinpool data
        vgname hdd
        content rootdir,images

We have other pve-nodes with Epyc, where lvm is working. But with an different kernel (and lvm on raid-1 SSD):

Code:

proxmox-ve: 6.2-1 (running kernel: 5.4.41-1-pve)

Two days ago, there was an new bios-update published - at weekend this will be installed, but I'm unsure which kernel I should take.

Udo

wolfgang · Oct 1, 2020

I will try here to reproduce it with an LSI 3806 Raid controller. Report if I found something.

udo · Oct 1, 2020

wolfgang said:
I will try here to reproduce it with an LSI 3806 Raid controller. Report if I found something.

Hi,
sounds good!
Don't forget, that we have some IO-Load: (sdb is the lvm hdd-raid)

Code:

ATOP - pve02                                    2020/10/01  10:43:39                                    --------------                                    10s elapsed
PRC | sys    7.17s  | user   7.91s  | #proc    960  | #trun      4  | #tslpi  1048  | #tslpu     3  | #zombie    0  | clones  2116  |               | #exit   2101  |
CPU | sys      70%  | user     79%  | irq       1%  | idle   4475%  | wait    178%  | guest    53%  | ipc     0.89  | cycl  107MHz  | curf 2.89GHz  | curscal   ?%  |
CPL | avg1    5.85  | avg5    5.23  | avg15   5.10  |               |               | csw   642459  | intr  374679  |               |               | numcpu    48  |
MEM | tot   251.6G  | free  108.8G  | cache 882.2M  | buff  157.2M  | slab    3.9G  | shmem 189.6M  | shrss   0.0M  | vmbal   0.0M  | hptot   0.0M  | hpuse   0.0M  |
SWP | tot    32.0G  | free   32.0G  |               |               |               |               |               |               | vmcom 177.1G  | vmlim 157.8G  |
PSI | cs     0/0/0  | ms     0/0/0  | mf     0/0/0  |               | is  38/40/42  | if  38/40/42  |               |               |               |               |
LVM | d-data_tdata  | busy     26%  | read     130  | write   1105  | KiB/r     31  | KiB/w    215  | MBr/s    0.4  | MBw/s   23.2  | avq   101.70  | avio 2.09 ms  |
LVM | d-data-tpool  | busy     26%  | read     130  | write   1105  | KiB/r     31  | KiB/w    215  | MBr/s    0.4  | MBw/s   23.2  | avq   101.70  | avio 2.09 ms  |
LVM |        dm-37  | busy     12%  | read       5  | write   3327  | KiB/r      7  | KiB/w     10  | MBr/s    0.0  | MBw/s    3.3  | avq     0.09  | avio 0.37 ms  |
LVM | 205--disk--0  | busy      9%  | read     130  | write    160  | KiB/r     31  | KiB/w    799  | MBr/s    0.4  | MBw/s   12.5  | avq    56.41  | avio 3.13 ms  |
DSK |          sdb  | busy     28%  | read     135  | write   1040  | KiB/r     30  | KiB/w    203  | MBr/s    0.4  | MBw/s   20.7  | avq    85.64  | avio 2.35 ms  |
DSK |      nvme2n1  | busy     14%  | read      43  | write   4563  | KiB/r      9  | KiB/w     26  | MBr/s    0.0  | MBw/s   11.8  | avq     0.00  | avio 0.31 ms  |
DSK |      nvme1n1  | busy     14%  | read      23  | write   4593  | KiB/r     11  | KiB/w     26  | MBr/s    0.0  | MBw/s   11.8  | avq     0.00  | avio 0.31 ms  |
NET | transport     | tcpi    6790  | tcpo    7718  | udpi    2960  | udpo    2762  | tcpao      2  | tcppo     25  | tcprs      0  | tcpie      0  | udpie      0  |
NET | network       | ipi     9830  | ipo     8595  | ipfrw      0  | deliv   9790  |               |               |               | icmpi     40  | icmpo      0  |

Udo

wolfgang · Oct 1, 2020

Do you have any special setting at the Raid? Blocksize, cache mode,.....

udo · Oct 1, 2020

wolfgang said:
Do you have any special setting at the Raid? Blocksize, cache mode,.....

Hi Wolfgang,
not realy - the special thing was an 100GB-Raidvolume for the proxmox-system and the other space for an big lvm-storage. But due the extension I had to migrate and delete the system-raid (but the issue starts just after reboot, where the system-raid still on the raidgroup).

The raid-setting:

Code:

megacli -LDInfo -L1 -a0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name                :hdd-raid6
RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
Size                : 7.177 TB
Sector Size         : 512
Parity Size         : 3.588 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Cached, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Ongoing Progresses:
  Reconstruction           : Completed 70%, Taken 52 min.
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: No

Udo

Proxmox 6.2 Kernel Error Bad RIP Value

Member

Proxmox Retired Staff

Member

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Active Member

Member

Proxmox Retired Staff

Distinguished Member

Member

Distinguished Member

Proxmox Retired Staff

Distinguished Member

Proxmox Retired Staff

Distinguished Member

Proxmox Retired Staff

Distinguished Member

We value your privacy