On our Proxmox/Ceph clusters we're seeing mutliple Debian Jessie VMs remounting their /tmp partition readonly after some logged (Buffer) I/O errors in syslog.
From what I can tell this happened after updating to pve-kernel-4.15.18-16-pve as we've not seen this issue before and we've not had this issue on our older VMware cluster running the same installations with Debian Jessie.
On the two clusters we've seen the issue we reverted back to pve-kernel-4.15.18-15-pve and the problem has not reappeared.
I'm able to trigger the I/O errors fairly easy by constantly writing many small files in a loop with dd and removing them again. The read-only remount seems a bit harder to trigger but I was able to when cloning the machine in Proxmox.
From what I was able to gather the only difference between the two kernels is only TCP Sack bugfixes which seem somewhat unlikely to be the cause (we were using the iptables workaround before the update and we have no matched packets in the firewall chain.)
Is anyone else seeing such issues or does anyone have suggestions what might be causing this?
This gets logged in /var/log/syslog.
pveversion -v output (note that I downgraded the kernel packages)
From what I can tell this happened after updating to pve-kernel-4.15.18-16-pve as we've not seen this issue before and we've not had this issue on our older VMware cluster running the same installations with Debian Jessie.
On the two clusters we've seen the issue we reverted back to pve-kernel-4.15.18-15-pve and the problem has not reappeared.
I'm able to trigger the I/O errors fairly easy by constantly writing many small files in a loop with dd and removing them again. The read-only remount seems a bit harder to trigger but I was able to when cloning the machine in Proxmox.
From what I was able to gather the only difference between the two kernels is only TCP Sack bugfixes which seem somewhat unlikely to be the cause (we were using the iptables workaround before the update and we have no matched packets in the firewall chain.)
Is anyone else seeing such issues or does anyone have suggestions what might be causing this?
This gets logged in /var/log/syslog.
Code:
Jun 25 02:08:21 server kernel: [942597.045574] sd 0:0:0:0: [sda]
Jun 25 02:08:21 server kernel: [942597.045591] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 25 02:08:21 server kernel: [942597.045593] sd 0:0:0:0: [sda]
Jun 25 02:08:21 server kernel: [942597.045595] Sense Key : Aborted Command [current]
Jun 25 02:08:21 server kernel: [942597.045598] sd 0:0:0:0: [sda]
Jun 25 02:08:21 server kernel: [942597.045600] Add. Sense: I/O process terminated
Jun 25 02:08:21 server kernel: [942597.045603] sd 0:0:0:0: [sda] CDB:
Jun 25 02:08:21 server kernel: [942597.045609] Write(10): 2a 00 01 ab e8 06 00 00 02 00
Jun 25 02:08:21 server kernel: [942597.045621] end_request: I/O error, dev sda, sector 28043270
Jun 25 02:08:21 server kernel: [942597.045626] Buffer I/O error on device sda8, logical block 131075
Jun 25 02:08:21 server kernel: [942597.045627] lost page write due to I/O error on sda8
Jun 25 02:09:12 server kernel: [942647.909617] sd 0:0:0:0: [sda]
Jun 25 02:09:12 server kernel: [942647.909625] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 25 02:09:12 server kernel: [942647.909627] sd 0:0:0:0: [sda]
Jun 25 02:09:12 server kernel: [942647.909628] Sense Key : Aborted Command [current]
Jun 25 02:09:12 server kernel: [942647.909630] sd 0:0:0:0: [sda]
Jun 25 02:09:12 server kernel: [942647.909632] Add. Sense: I/O process terminated
Jun 25 02:09:12 server kernel: [942647.909633] sd 0:0:0:0: [sda] CDB:
Jun 25 02:09:12 server kernel: [942647.909638] Write(10): 2a 00 01 ab e8 06 00 00 02 00
Jun 25 02:09:12 server kernel: [942647.909643] end_request: I/O error, dev sda, sector 28043270
Jun 25 02:09:12 server kernel: [942647.909646] Buffer I/O error on device sda8, logical block 131075
Jun 25 02:09:12 server kernel: [942647.909648] lost page write due to I/O error on sda8
Jun 25 07:02:47 server kernel: [960262.885704] sd 0:0:0:0: [sda]
Jun 25 07:02:47 server kernel: [960262.885731] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 25 07:02:47 server kernel: [960262.885733] sd 0:0:0:0: [sda]
Jun 25 07:02:47 server kernel: [960262.885734] Sense Key : Aborted Command [current]
Jun 25 07:02:47 server kernel: [960262.885736] sd 0:0:0:0: [sda]
Jun 25 07:02:47 server kernel: [960262.885737] Add. Sense: I/O process terminated
Jun 25 07:02:47 server kernel: [960262.885753] sd 0:0:0:0: [sda] CDB:
Jun 25 07:02:47 server kernel: [960262.885758] Write(10): 2a 00 01 ab e8 06 00 00 02 00
Jun 25 07:02:47 server kernel: [960262.885768] end_request: I/O error, dev sda, sector 28043270
Jun 25 07:02:47 server kernel: [960262.885773] Buffer I/O error on device sda8, logical block 131075
Jun 25 07:02:47 server kernel: [960262.885774] lost page write due to I/O error on sda8
Jun 25 21:22:36 server kernel: [1011851.372149] Aborting journal on device sda8-8.
Jun 25 21:22:36 server kernel: [1011851.373252] EXT4-fs (sda8): ext4_writepages: jbd2_start: 13312 pages, ino 33; err -30
Jun 25 21:22:36 server kernel: [1011851.377512] EXT4-fs error (device sda8): ext4_journal_check_start:56: Detected aborted journal
Jun 25 21:22:36 server kernel: [1011851.377994] EXT4-fs (sda8): Remounting filesystem read-only
Jun 25 21:22:36 server kernel: [1011851.378445] EXT4-fs (sda8): ext4_writepages: jbd2_start: 13312 pages, ino 31; err -30
pveversion -v output (note that I downgraded the kernel packages)
Code:
proxmox-ve: 5.4-1 (running kernel: 4.15.18-15-pve)
pve-manager: 5.4-6 (running version: 5.4-6/aa7856c5)
pve-kernel-4.15: 5.4-3
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.15.18-11-pve: 4.15.18-34
ceph: 12.2.12-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-10
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-52
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-43
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-37
pve-container: 2.0-39
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-2
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-52
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3