Hard drives blank after migration

zBrain

Active Member
Apr 27, 2013
37
0
26
I added a new server to my cluster today. It went from 2 nodes to 3, with a few hiccups but overall it seemed ok at first.

One of my nodes was having random reboots so I wanted to be able to pull it and do RAM tests.

I moved a couple hard drives that were on local storage to an NFS volume. I didn't check off remove source. Once the move completed, I tested the VMs and they were fine, so I removed the unused local disks. I then live migrated the VMs to the new node and all seemed fine.

Suddenly a while later one of the VMs disappeared. Upon attempting to reboot it, it said disk not bootable. I booted into a rescue CD and the disk was blank. No partition table, nothing. I happened to know the exact parameters of the partitions so I recreated them, but was still unable to mount them.

Then the web UI started showing nodes offline. I could connect to each node individually, but each one said the others were offline.

Now a minute ago, as I was restoring the first dead drive from backup a second VM died. Same thing, virtual drive is non bootable.

Both drives had been migrated from local to NFS from the old node then the VM moved to the new node. The versions are different (I had also planned to upgrade the nodes but never got that far).

Old nodes:
pve-manager/3.3-1/a06c9f73 (running kernel: 2.6.32-32-pve)


New node:
pve-manager/3.4-11/6502936f (running kernel: 2.6.32-41-pve)

I am now terrified. Please help.
 
Sorry for the delayed reply, been busy restoring backups. I restored them on fresh VMs so I can try to figure out what happened.

I moved the VMs back to node 2, it no longer says not bootable but hangs at "booting from hard disk". I can access the disk itself when I boot from rescue, it just seems there's nothing on it. I can see the new partition table I created but trying to mount the partition gives me bad superblock (it was ext4). fsck says the superblock is invalid, bad magic number. It fails to use backup superblocks as well. I think it's just...gone.

Could it be that my network hiccuped and NFS corrupted something? Here's my exports file...maybe my settings are unsafe?

/data 10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
 
Sorry for the delayed reply, been busy restoring backups. I restored them on fresh VMs so I can try to figure out what happened.

I moved the VMs back to node 2, it no longer says not bootable but hangs at "booting from hard disk". I can access the disk itself when I boot from rescue, it just seems there's nothing on it. I can see the new partition table I created but trying to mount the partition gives me bad superblock (it was ext4). fsck says the superblock is invalid, bad magic number. It fails to use backup superblocks as well. I think it's just...gone.

Could it be that my network hiccuped and NFS corrupted something? Here's my exports file...maybe my settings are unsafe?

/data 10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
Hi,
you wrote "the new partition table"... can you create an test-VM on node 2 (e.g. with an live-distro like grml) with an filesystem on a disk and migrate this VM to node 3?

About your exports... I use nfs for backup only and have "rw,sync,no_subtree_check".

What cache-settings for the disk do you have? If you use cache=writethrough does it change something?

Udo
 
Hi,
you wrote "the new partition table"... can you create an test-VM on node 2 (e.g. with an live-distro like grml) with an filesystem on a disk and migrate this VM to node 3?

About your exports... I use nfs for backup only and have "rw,sync,no_subtree_check".

What cache-settings for the disk do you have? If you use cache=writethrough does it change something?

Udo

I created a test VM with identical settings on the HDD as the ones that failed. I then repeated the steps I made in my original post and no issues. I left it running for a while and it was fine.

I tried every available caching option on both the original failed VM and the test VM to no avail.

I suspect we may never know exactly what happened. It scared me spitless at first, but things seem ok now. I've moved a bunch of stuff around with no issue.
 
Update:

It's happened again. This time on a brand new VM that's never been migrated, and it's on a iSCSI LVM store on a different physical machine, and the VM has never migrated between nodes since creation. So it's not NFS, and it's not specific to a node, and it's nothing to do with migrating.

Alarming ---- [161196.891623] sda: detected capacity change from 214748364800 to 0

How the *expletive* can that happen?

This time I logged in before it got rebooted. Here's dmesg from inside the VM:

# dmesg
[161196.880457] Result: hostbyte=0x04 driverbyte=0x00
[161196.880458] sd 0:0:0:0: [sda] CDB:
[161196.880449] cdb[0]=0x2a: 2a 00 00 d8 54 00 00 01 68 00
[161196.880465] end_request: I/O error, dev sda, sector 14177280
[161196.880461] cdb[0]=0x2a: 2a 00 00 d8 56 98 00 01 68 00
[161196.880470] Buffer I/O error on device sda1, logical block 1771904
[161196.880471] end_request: I/O error, dev sda, sector 14177944
[161196.880475] Buffer I/O error on device sda1, logical block 1771987
[161196.880479] Buffer I/O error on device sda1, logical block 1771905
[161196.880480] Buffer I/O error on device sda1, logical block 1771988
[161196.880482] Buffer I/O error on device sda1, logical block 1771989
[161196.880484] Buffer I/O error on device sda1, logical block 1771906
[161196.880486] Buffer I/O error on device sda1, logical block 1771907
[161196.880487] Buffer I/O error on device sda1, logical block 1771990
[161196.880489] Buffer I/O error on device sda1, logical block 1771991
[161196.880490] Buffer I/O error on device sda1, logical block 1771908
[161196.880492] Buffer I/O error on device sda1, logical block 1771909
[161196.880493] Buffer I/O error on device sda1, logical block 1771992
[161196.880495] Buffer I/O error on device sda1, logical block 1771993
[161196.880496] Buffer I/O error on device sda1, logical block 1771910
[161196.880498] Buffer I/O error on device sda1, logical block 1771911
[161196.880499] Buffer I/O error on device sda1, logical block 1771994
[161196.880501] Buffer I/O error on device sda1, logical block 1771995
[161196.880502] Buffer I/O error on device sda1, logical block 1771912
[161196.880504] Buffer I/O error on device sda1, logical block 1771913
[161196.880505] Buffer I/O error on device sda1, logical block 1771996
[161196.880507] Buffer I/O error on device sda1, logical block 1771997
[161196.880509] Buffer I/O error on device sda1, logical block 1771914
[161196.880510] Buffer I/O error on device sda1, logical block 1771915
[161196.880511] Buffer I/O error on device sda1, logical block 1771998
[161196.880513] Buffer I/O error on device sda1, logical block 1771999
[161196.880514] Buffer I/O error on device sda1, logical block 1771916
[161196.880516] Buffer I/O error on device sda1, logical block 1771917
[161196.880517] Buffer I/O error on device sda1, logical block 1772000
[161196.880519] Buffer I/O error on device sda1, logical block 1772001
[161196.880521] Buffer I/O error on device sda1, logical block 1771918
[161196.880522] Buffer I/O error on device sda1, logical block 1771919
[161196.880523] Buffer I/O error on device sda1, logical block 1772002
[161196.880525] Buffer I/O error on device sda1, logical block 1772003
[161196.880526] Buffer I/O error on device sda1, logical block 1771920
[161196.880528] Buffer I/O error on device sda1, logical block 1771921
[161196.880529] Buffer I/O error on device sda1, logical block 1772004
[161196.880531] Buffer I/O error on device sda1, logical block 1772005
[161196.880532] Buffer I/O error on device sda1, logical block 1771922
[161196.880534] Buffer I/O error on device sda1, logical block 1771923
[161196.880535] Buffer I/O error on device sda1, logical block 1772006
[161196.880537] Buffer I/O error on device sda1, logical block 1772007
[161196.880538] Buffer I/O error on device sda1, logical block 1771924
[161196.880539] Buffer I/O error on device sda1, logical block 1771925
[161196.880541] Buffer I/O error on device sda1, logical block 1772008
[161196.880542] Buffer I/O error on device sda1, logical block 1772009
[161196.880544] Buffer I/O error on device sda1, logical block 1771926
[161196.880545] Buffer I/O error on device sda1, logical block 1771927
[161196.880546] Buffer I/O error on device sda1, logical block 1772010
[161196.880548] Buffer I/O error on device sda1, logical block 1772011
[161196.880550] Buffer I/O error on device sda1, logical block 1771928
[161196.880551] Buffer I/O error on device sda1, logical block 1772012
[161196.880553] Buffer I/O error on device sda1, logical block 1772013
[161196.880555] Buffer I/O error on device sda1, logical block 1771929
[161196.880556] Buffer I/O error on device sda1, logical block 1771930
[161196.880557] Buffer I/O error on device sda1, logical block 1772014
[161196.880559] Buffer I/O error on device sda1, logical block 1772015
[161196.880561] Buffer I/O error on device sda1, logical block 1771931
[161196.880562] Buffer I/O error on device sda1, logical block 1771932
[161196.880563] Buffer I/O error on device sda1, logical block 1772016
[161196.880565] Buffer I/O error on device sda1, logical block 1772017
[161196.880566] Buffer I/O error on device sda1, logical block 1771933
[161196.880568] Buffer I/O error on device sda1, logical block 1771934
[161196.880569] Buffer I/O error on device sda1, logical block 1772018
[161196.880571] Buffer I/O error on device sda1, logical block 1772019
[161196.880572] Buffer I/O error on device sda1, logical block 1771935
[161196.880574] Buffer I/O error on device sda1, logical block 1771936
[161196.880575] Buffer I/O error on device sda1, logical block 1772020
[161196.880577] Buffer I/O error on device sda1, logical block 1772021
[161196.880578] Buffer I/O error on device sda1, logical block 1771937
[161196.880580] Buffer I/O error on device sda1, logical block 1771938
[161196.880581] Buffer I/O error on device sda1, logical block 1772022
[161196.880582] Buffer I/O error on device sda1, logical block 1772023
[161196.880583] Buffer I/O error on device sda1, logical block 1771939
[161196.880585] Buffer I/O error on device sda1, logical block 1771940
[161196.880586] Buffer I/O error on device sda1, logical block 1772024
[161196.880588] Buffer I/O error on device sda1, logical block 1772025
[161196.880589] Buffer I/O error on device sda1, logical block 1771941
[161196.880591] Buffer I/O error on device sda1, logical block 1771942
[161196.880592] Buffer I/O error on device sda1, logical block 1772026
[161196.880594] Buffer I/O error on device sda1, logical block 1772027
[161196.880595] Buffer I/O error on device sda1, logical block 1771943
[161196.880597] Buffer I/O error on device sda1, logical block 1771944
[161196.880597] Buffer I/O error on device sda1, logical block 1772028
[161196.880599] Buffer I/O error on device sda1, logical block 1772029
[161196.880601] Buffer I/O error on device sda1, logical block 1771945
[161196.880602] Buffer I/O error on device sda1, logical block 1772030
[161196.880603] Buffer I/O error on device sda1, logical block 1772031
[161196.880605] Buffer I/O error on device sda1, logical block 1771946
[161196.880606] Buffer I/O error on device sda1, logical block 1771947
[161196.880610] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 470301 (offset 0 size 184320 starting block 1772288)
[161196.880613] Buffer I/O error on device sda1, logical block 1771948
[161196.880616] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 426863 (offset 0 size 184320 starting block 1772205)
[161196.880637] sd 0:0:0:0: [sda] Unhandled error code
[161196.880638] sd 0:0:0:0: [sda]
[161196.880639] Result: hostbyte=0x04 driverbyte=0x00
[161196.880641] sd 0:0:0:0: [sda] CDB:
[161196.880641] cdb[0]=0x2a: 2a 00 00 44 08 00 00 00 08 00
[161196.880647] end_request: I/O error, dev sda, sector 4458496
[161196.880649] Buffer I/O error on device sda1, logical block 557056
[161196.880650] lost page write due to I/O error on sda1
[161196.880680] JBD2: Error -5 detected when updating journal superblock for sda1-8.
[161196.880687] sd 0:0:0:0: [sda] Unhandled error code
[161196.880689] sd 0:0:0:0: [sda]
[161196.880690] Result: hostbyte=0x04 driverbyte=0x00
[161196.880692] sd 0:0:0:0: [sda] CDB:
[161196.880692] cdb[0]=0x2a: 2a 00 02 a0 5a b8 00 00 10 00
[161196.880699] end_request: I/O error, dev sda, sector 44063416
[161196.880701] Buffer I/O error on device sda1, logical block 5507671
[161196.880704] Buffer I/O error on device sda1, logical block 5507672
[161196.880709] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 397008 (offset 2453504 size 8192 starting block 5507929)
[161196.880740] sd 0:0:0:0: [sda] Unhandled error code
[161196.880742] sd 0:0:0:0: [sda]
[161196.880743] Result: hostbyte=0x04 driverbyte=0x00
[161196.880744] sd 0:0:0:0: [sda] CDB:
[161196.880745] cdb[0]=0x2a: 2a 00 02 a0 0a 88 00 00 18 00
[161196.880750] end_request: I/O error, dev sda, sector 44042888
[161196.880752] Buffer I/O error on device sda1, logical block 5505105
[161196.880756] Buffer I/O error on device sda1, logical block 5505106
[161196.880758] Buffer I/O error on device sda1, logical block 5505107
[161196.880760] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4540: Journal has aborted
[161196.880762] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 426866 (offset 2428928 size 12288 starting block 5505364)
[161196.880769] sd 0:0:0:0: [sda] Unhandled error code
[161196.880770] sd 0:0:0:0: [sda]
[161196.880771] Result: hostbyte=0x04 driverbyte=0x00
[161196.880773] sd 0:0:0:0: [sda] CDB:
[161196.880773] cdb[0]=0x2a: 2a 00 02 d0 a1 c8 00 00 10 00
[161196.880779] end_request: I/O error, dev sda, sector 47227336
[161196.880781] Buffer I/O error on device sda1, logical block 5903161
[161196.880784] Buffer I/O error on device sda1, logical block 5903162
[161196.880811] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 2229574 (offset 3379200 size 8192 starting block 5903419)
[161196.880815] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4540: Journal has aborted
[161196.880820] sd 0:0:0:0: [sda] Unhandled error code
[161196.880823] sd 0:0:0:0: [sda]
[161196.880824] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4540: Journal has aborted
[161196.880827] Result: hostbyte=0x04 driverbyte=0x00
[161196.880829] sd 0:0:0:0: [sda] CDB:
[161196.880830] cdb[0]=0x2a: 2a 00 02 d4 17 30 00 00 10 00
[161196.880838] end_request: I/O error, dev sda, sector 47454000
[161196.880842] Buffer I/O error on device sda1, logical block 5931494
[161196.880845] Buffer I/O error on device sda1, logical block 5931495
[161196.880848] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 468554 (offset 1990656 size 8192 starting block 5931752)
[161196.880943] sd 0:0:0:0: [sda] Unhandled error code
[161196.880946] sd 0:0:0:0: [sda]
[161196.880947] Result: hostbyte=0x04 driverbyte=0x00
[161196.880949] sd 0:0:0:0: [sda] CDB:
[161196.880950] cdb[0]=0x2a: 2a 00 00 00 08 00 00 00 08 00
[161196.880957] end_request: I/O error, dev sda, sector 2048
[161196.880959] Buffer I/O error on device sda1, logical block 0
[161196.880960] lost page write due to I/O error on sda1
[161196.881008] sd 0:0:0:0: [sda] Unhandled error code
[161196.881010] sd 0:0:0:0: [sda]
[161196.881012] Result: hostbyte=0x04 driverbyte=0x00
[161196.881013] sd 0:0:0:0: [sda] CDB:
[161196.881014] cdb[0]=0x2a: 2a 00 02 58 2d 10 00 00 18 00
[161196.881029] Buffer I/O error on device sda1, logical block 4916386
[161196.881033] Buffer I/O error on device sda1, logical block 4916387
[161196.881036] Buffer I/O error on device sda1, logical block 4916388
[161196.881037] sd 0:0:0:0: [sda] READ CAPACITY(16) failed
[161196.881039] sd 0:0:0:0: [sda]
[161196.881041] Result: hostbyte=0x04 driverbyte=0x00
[161196.881042] sd 0:0:0:0: [sda] Sense not available.
[161196.881046] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 919617 (offset 1712128 size 12288 starting block 4916645)
[161196.881057] sd 0:0:0:0: [sda] Unhandled error code
[161196.881059] sd 0:0:0:0: [sda]
[161196.881060] Result: hostbyte=0x04 driverbyte=0x00
[161196.881063] sd 0:0:0:0: [sda] CDB:
[161196.881064] cdb[0]=0x2a: 2a 00 02 58 1e 60 00 00 08 00
[161196.881072] Buffer I/O error on device sda1, logical block 4915916
[161196.881076] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 2769139 (offset 114688 size 4096 starting block 4916173)
[161196.881105] sd 0:0:0:0: [sda] Unhandled error code
[161196.881107] sd 0:0:0:0: [sda]
[161196.881108] Result: hostbyte=0x04 driverbyte=0x00
[161196.881109] sd 0:0:0:0: [sda] CDB:
[161196.881110] cdb[0]=0x2a: 2a 00 02 94 11 60 00 00 08 00
[161196.881116] Buffer I/O error on device sda1, logical block 5407020
[161196.881121] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 408034 (offset 585728 size 4096 starting block 5407277)
[161196.881149] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4540: Journal has aborted
[161196.881152] ------------[ cut here ]------------
[161196.881187] WARNING: at fs/buffer.c:1118 mark_buffer_dirty+0x2a/0x8b()
[161196.881189] sd 0:0:0:0: [sda] Unhandled error code
[161196.881191] sd 0:0:0:0: [sda]
[161196.881193] Hardware name: Standard PC (i440FX + PIIX, 1996)
[161196.881197] Result: hostbyte=0x04 driverbyte=0x00
[161196.881198] sd 0:0:0:0: [sda] CDB:
[161196.881201] Modules linked in: ipt_REJECT xt_tcpudp
[161196.881201] cdb[0]=0x2a: 2a 00
[161196.881205] nf_conntrack_ipv4
[161196.881205] 00
[161196.881206] 00 08 00 00 00
[161196.881210] nf_defrag_ipv4
[161196.881212] xt_recent ipv6
[161196.881214] 08 00
[161196.881217] Buffer I/O error on device sda1, logical block 0
[161196.881219] lost page write due to I/O error on sda1
[161196.881215] xt_conntrack nf_conntrack iptable_filter ip_tables x_tables processor intel_agp intel_gtt i2c_piix4 thermal_sys button joydev pcspkr floppy i2c_core microcode container xts gf128mul aes_x86_64 cbc sha256_generic libiscsi scsi_transport_iscsi tg3 libphy e1000 fuse nfs lockd sunrpc jfs multipath linear raid10 raid456 async_pq
[161196.881257] sd 0:0:0:0: [sda] Unhandled error code
[161196.881258] sd 0:0:0:0: [sda]
[161196.881259] Result: hostbyte=0x04 driverbyte=0x00
[161196.881261] sd 0:0:0:0: [sda] CDB:
[161196.881262] cdb[0]=0x2a: 2a 00 05 5c 64 78 00 00 08 00
[161196.881271] Buffer I/O error on device sda1, logical block 11242383
[161196.881283] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 2769144 (offset 880640 size 4096 starting block 11242640)
[161196.881288] EXT4-fs (sda1): I/O error while writing superblock
[161196.881299] sd 0:0:0:0: [sda] Unhandled error code
[161196.881301] sd 0:0:0:0: [sda]
[161196.881302] Result: hostbyte=0x04 driverbyte=0x00
[161196.881303] sd 0:0:0:0: [sda] CDB:
[161196.881304] cdb[0]=0x2a: 2a 00 00 cb f4 f0 00 00 08 00
[161196.881310] Buffer I/O error on device sda1, logical block 1670558
[161196.881316] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 426870 (offset 536576 size 4096 starting block 1670815)
[161196.881329] sd 0:0:0:0: [sda] Unhandled error code
[161196.881330] sd 0:0:0:0: [sda]
[161196.881331] Result: hostbyte=0x04 driverbyte=0x00
[161196.881333] sd 0:0:0:0: [sda] CDB:
[161196.881333] cdb[0]=0x2a: 2a 00 01 c4 5a 70 00 00 08 00
[161196.881340] Buffer I/O error on device sda1, logical block 3705422
[161196.881343] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 920065 (offset 0 size 4096 starting block 3705679)
[161196.881344] sd 0:0:0:0: [sda] Unhandled error code
[161196.881346] sd 0:0:0:0: [sda]
[161196.881349] Result: hostbyte=0x04 driverbyte=0x00
[161196.881351] sd 0:0:0:0: [sda] CDB:
[161196.881352] async_xor xor
[161196.881352] cdb[0]=0x2a: 2a
[161196.881354] async_memcpy
[161196.881355] async_raid6_recov
[161196.881357] raid6_pq
[161196.881358] 00 00
[161196.881360] async_tx
[161196.881360] 00
[161196.881361] 08 00 00 00
[161196.881364] raid1
[161196.881365] 08 00
[161196.881367] raid0

[161196.881370] Buffer I/O error on device sda1, logical block 0
[161196.881373] lost page write due to I/O error on sda1
[161196.881371] dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech sl811_hcd usbhid ohci_hcd uhci_hcd usb_storage ehci_hcd
[161196.881401] sd 0:0:0:0: [sda] Unhandled error code
[161196.881403] EXT4-fs (sda1): previous I/O error to superblock detected
[161196.881405] sd 0:0:0:0: [sda]
[161196.881406] Result: hostbyte=0x04 driverbyte=0x00
[161196.881407] sd 0:0:0:0: [sda] CDB:
[161196.881408] cdb[0]=0x2a: 2a 00 04 cc 45 58 00 00 08 00
[161196.881415] Buffer I/O error on device sda1, logical block 10061739
[161196.881420] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 2229577 (offset 438272 size 4096 starting block 10061996)
[161196.881432] sd 0:0:0:0: [sda] Unhandled error code
[161196.881433] sd 0:0:0:0: [sda]
[161196.881434] Result: hostbyte=0x04 driverbyte=0x00
[161196.881436] sd 0:0:0:0: [sda] CDB:
[161196.881436] cdb[0]=0x2a: 2a 00 04 cc 26 a0 00 00 08 00
[161196.881443] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4540: Journal has aborted
[161196.881445] Buffer I/O error on device sda1, logical block 10060756
[161196.881449] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 918775 (offset 282624 size 4096 starting block 10061013)
[161196.881454] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4540: Journal has aborted
[161196.881487] sd 0:0:0:0: [sda] Unhandled error code
[161196.881490] sd 0:0:0:0: [sda]
[161196.881492] Result: hostbyte=0x04 driverbyte=0x00
[161196.881493] sd 0:0:0:0: [sda] CDB:
[161196.881479] usbcore usb_common aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960
[161196.881496] cdb[0]=0x2a
[161196.881497] :
[161196.881498] 2a
[161196.881498] 00
[161196.881499] 00
[161196.881500] 00
[161196.881501] 08
[161196.881501] 00
[161196.881502] 00
[161196.881503] 00
[161196.881504] 08
[161196.881504] 00

[161196.881506] cciss
[161196.881507] Buffer I/O error on device sda1, logical block 0
[161196.881509] lost page write due to I/O error on sda1
[161196.881510] 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma
[161196.881539] sd 0:0:0:0: [sda] Unhandled error code
[161196.881541] sd 0:0:0:0: [sda]
[161196.881542] Result: hostbyte=0x04 driverbyte=0x00
[161196.881543] sd 0:0:0:0: [sda] CDB:
[161196.881544] cdb[0]=0x2a: 2a 00 01 e0 08 f0 00 00 08 00
[161196.881550] Buffer I/O error on device sda1, logical block 3932190
[161196.881561] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 922057 (offset 119054336 size 4096 starting block 3932447)
[161196.881562] sd 0:0:0:0: [sda] READ CAPACITY failed
[161196.881564] sd 0:0:0:0: [sda]
[161196.881565] Result: hostbyte=0x04 driverbyte=0x00
[161196.881566] sd 0:0:0:0: [sda] Sense not available.
[161196.881573] sd 0:0:0:0: [sda] Unhandled error code
[161196.881574] sd 0:0:0:0: [sda]
[161196.881575] Result: hostbyte=0x04 driverbyte=0x00
[161196.881577] sd 0:0:0:0: [sda] CDB:
[161196.881577] cdb[0]=0x2a: 2a 00 04 cc 9f 98 00 00 08 00
[161196.881583] sd 0:0:0:0: [sda] Unhandled error code
[161196.881584] Buffer I/O error on device sda1, logical block 10064627
[161196.881586] sd 0:0:0:0: [sda]
[161196.881588] EXT4-fs warning (device sda1): ext4_end_bio:319: I/O error writing to inode 2508266 (offset 569344 size 4096 starting block 10064884)
[161196.881590] Result: hostbyte=0x04 driverbyte=0x00
[161196.881591] sd 0:0:0:0: [sda] CDB:
[161196.881592] sata_inic162x sata_mv ata_piix
[161196.881595] cdb[0]=0x2a
[161196.881596] :
[161196.881597] 2a
[161196.881598] 00
[161196.881598] 00
[161196.881599] 00
[161196.881600] 08
[161196.881601] 00
[161196.881601] 00
[161196.881602] 00
[161196.881603] 08
[161196.881603] 00

[161196.881606] Buffer I/O error on device sda1, logical block 0
[161196.881606] ahci
[161196.881610] lost page write due to I/O error on sda1
[161196.881625] EXT4-fs (sda1): previous I/O error to superblock detected
[161196.881641] sd 0:0:0:0: [sda] Unhandled error code
[161196.881643] sd 0:0:0:0: [sda]
[161196.881647] Result: hostbyte=0x04 driverbyte=0x00
[161196.881648] sd 0:0:0:0: [sda] CDB:
[161196.881607] libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_sl82c105 pata_cs5530 pata_cs5520 pata_via pata_jmicron pata_marvell pata_sis pata_netcell pata_sc1200 pata_pdc202xx_old pata_triflex pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmcia pcmcia_core pata_ns87415 pata_ns87410 pata_serverworks pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar pata_rz1000 pata_sil680 pata_radisys
[161196.881649] cdb[0]=0x2a
[161196.881650] :
[161196.881651] 2a
[161196.881652] 00
[161196.881652] 00
[161196.881653] 00
[161196.881654] 08
[161196.881655] 00
[161196.881655] 00
[161196.881656] 00
[161196.881657] 08
[161196.881658] 00

[161196.881660] Buffer I/O error on device sda1, logical block 0
[161196.881660] pata_pdc2027x
[161196.881663] pata_mpiix libata
[161196.881664] lost page write due to I/O error on sda1
[161196.881676] Pid: 17279, comm: postgres Not tainted 3.7.10-gentoo #1
[161196.881677] EXT4-fs (sda1): I/O error while writing superblock
[161196.881679] Call Trace:
[161196.881699] [<ffffffff8102c03a>] warn_slowpath_common+0x7e/0x96
[161196.881702] [<ffffffff8102c067>] warn_slowpath_null+0x15/0x17
[161196.881704] [<ffffffff810dd2d1>] mark_buffer_dirty+0x2a/0x8b
[161196.881712] [<ffffffff81167576>] ext4_commit_super+0x18c/0x1e0
[161196.881713] sd 0:0:0:0: [sda] Unhandled error code
[161196.881715] sd 0:0:0:0: [sda]
[161196.881716] Result: hostbyte=0x04 driverbyte=0x00
[161196.881718] sd 0:0:0:0: [sda] CDB:
[161196.881723] [<ffffffff811676fa>] save_error_info+0x1c/0x21
[161196.881720] cdb[0]=0x2a: 2a

[161196.881726] [<ffffffff8116820a>] __ext4_std_error+0x6e/0x81

[161196.881730] [<ffffffff811748bc>] ? __ext4_journal_get_write_access+0x34/0x5f

[161196.881738] [<ffffffff8115d496>] ext4_reserve_inode_write+0x75/0x83

[161196.881742] [<ffffffff8115d4cd>] ext4_mark_inode_dirty+0x29/0x181

[161196.881746] [<ffffffff8115e0b0>] ext4_da_write_end+0x1fc/0x2de

[161196.881752] [<ffffffff81189cf8>] ? jbd2_journal_stop+0x20b/0x21d

[161196.881765] [<ffffffff8108e411>] generic_file_buffered_write+0x186/0x258

[161196.881768] [<ffffffff8115f20f>] ? ext4_dirty_inode+0x42/0x47

[161196.881770] 00
[161196.881777] [<ffffffff8108f09e>] __generic_file_aio_write+0x2a7/0x2d7
[161196.881771] 00 00 08 00

[161196.881781] [<ffffffff8108f13f>] generic_file_aio_write+0x71/0xda

[161196.881784] [<ffffffff81157093>] ext4_file_write+0x3b6/0x403

[161196.881796] [<ffffffff810c9dd2>] ? core_sys_select+0x1de/0x289

[161196.881802] [<ffffffff810bae90>] do_sync_write+0x8b/0xcb

[161196.881807] [<ffffffff810bb39d>] vfs_write+0xa9/0x119

[161196.881811] [<ffffffff810bb5e1>] sys_write+0x4b/0x73

[161196.881840] [<ffffffff814655d2>] system_call_fastpath+0x16/0x1b
[161196.881842] 00
[161196.881844] ---[ end trace f5f4cb5c73688881 ]---
[161196.881845] 00 08 00
[161196.881848] Buffer I/O error on device sda1, logical block 0
[161196.881849] lost page write due to I/O error on sda1
[161196.881863] EXT4-fs error (device sda1): ext4_journal_start_sb:349: Detected aborted journal
[161196.881866] EXT4-fs (sda1): Remounting filesystem read-only
[161196.881867] EXT4-fs (sda1): previous I/O error to superblock detected
[161196.881885] sd 0:0:0:0: [sda] Unhandled error code
[161196.881887] sd 0:0:0:0: [sda]
[161196.881888] Result: hostbyte=0x04 driverbyte=0x00
[161196.881889] sd 0:0:0:0: [sda] CDB:
[161196.881890] cdb[0]=0x2a: 2a 00 00 00 08 00 00 00 08 00
[161196.881896] Buffer I/O error on device sda1, logical block 0
[161196.881897] lost page write due to I/O error on sda1
[161196.881934] EXT4-fs (sda1): I/O error while writing superblock
[161196.882618] EXT4-fs error (device sda1): ext4_journal_start_sb:349: Detected aborted journal
[161196.883258] EXT4-fs error (device sda1): ext4_journal_start_sb:349: Detected aborted journal
[161196.883452] EXT4-fs error (device sda1): ext4_journal_start_sb:349: Detected aborted journal
[161196.885179] EXT4-fs error (device sda1): ext4_journal_start_sb:349: Detected aborted journal
[161196.891494] sd 0:0:0:0: [sda] Unhandled error code
[161196.891497] sd 0:0:0:0: [sda]
[161196.891498] Result: hostbyte=0x04 driverbyte=0x00
[161196.891500] sd 0:0:0:0: [sda] CDB:
[161196.891501] cdb[0]=0x2a: 2a 00 02 d0 a1 d0 00 00 08 00
[161196.891510] Buffer I/O error on device sda1, logical block 5903162
[161196.891511] lost page write due to I/O error on sda1
[161196.891532] sd 0:0:0:0: [sda] Unhandled error code
[161196.891534] sd 0:0:0:0: [sda]
[161196.891535] Result: hostbyte=0x04 driverbyte=0x00
[161196.891536] sd 0:0:0:0: [sda] CDB:
[161196.891537] cdb[0]=0x2a: 2a 00 02 94 11 60 00 00 08 00
[161196.891547] sd 0:0:0:0: [sda] Unhandled error code
[161196.891549] sd 0:0:0:0: [sda]
[161196.891550] Result: hostbyte=0x04 driverbyte=0x00
[161196.891552] sd 0:0:0:0: [sda] CDB:
[161196.891552] cdb[0]=0x2a: 2a 00 02 a0 0a 98 00 00 08 00
[161196.891563] sd 0:0:0:0: [sda] Unhandled error code
[161196.891564] sd 0:0:0:0: [sda]
[161196.891565] Result: hostbyte=0x04 driverbyte=0x00
[161196.891567] sd 0:0:0:0: [sda] CDB:
[161196.891567] cdb[0]=0x2a: 2a 00 02 a0 5a c0 00 00 08 00
[161196.891578] sd 0:0:0:0: [sda] Unhandled error code
[161196.891579] sd 0:0:0:0: [sda]
[161196.891580] Result: hostbyte=0x04 driverbyte=0x00
[161196.891582] sd 0:0:0:0: [sda] CDB:
[161196.891582] cdb[0]=0x2a: 2a 00 05 5c 64 78 00 00 08 00
[161196.891592] sd 0:0:0:0: [sda] Unhandled error code
[161196.891593] sd 0:0:0:0: [sda]
[161196.891594] Result: hostbyte=0x04 driverbyte=0x00
[161196.891595] sd 0:0:0:0: [sda] CDB:
[161196.891596] cdb[0]=0x2a: 2a 00 00 cb f4 f0 00 00 08 00
[161196.891615] sd 0:0:0:0: [sda] Got wrong page
[161196.891617] sd 0:0:0:0: [sda] Assuming drive cache: write through
[161196.891623] sda: detected capacity change from 214748364800 to 0
[161196.891686] JBD2: Detected IO errors while flushing file data on sda1-8
[175786.351379] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #6837791: comm apache2: reading directory lblock 0
[175786.351666] EXT4-fs (sda1): previous I/O error to superblock detected
[175786.351698] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #6837791: comm apache2: reading directory lblock 0
[178246.280928] EXT4-fs (sda1): previous I/O error to superblock detected
[178246.280950] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #4989785: comm courier-imapd: reading directory lblock 0
[178246.280992] EXT4-fs (sda1): previous I/O error to superblock detected
[178246.281003] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #4989785: comm courier-imapd: reading directory lblock 0
[178387.758502] EXT4-fs (sda1): previous I/O error to superblock detected
[178387.758524] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #2894853: comm courier-imapd: reading directory lblock 0
[178387.758636] EXT4-fs (sda1): previous I/O error to superblock detected
[178387.758648] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #2894853: comm courier-imapd: reading directory lblock 0
[178387.758906] EXT4-fs (sda1): previous I/O error to superblock detected
[178387.758928] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #2769150: comm courier-imapd: reading directory lblock 0
[178387.759016] EXT4-fs (sda1): previous I/O error to superblock detected
[178387.759035] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #2769150: comm courier-imapd: reading directory lblock 0
[178387.759198] EXT4-fs (sda1): previous I/O error to superblock detected
[178387.759218] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #4989785: comm courier-imapd: reading directory lblock 0
[178387.759298] EXT4-fs (sda1): previous I/O error to superblock detected
[178387.759316] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #4989785: comm courier-imapd: reading directory lblock 0
[178387.759480] EXT4-fs (sda1): previous I/O error to superblock detected
[178387.759499] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #2632692: comm courier-imapd: reading directory lblock 0
[178387.759595] EXT4-fs (sda1): previous I/O error to superblock detected
[178387.759614] EXT4-fs error (device sda1): ext4_find_entry:1242: inode #2632692: comm courier-imapd: reading directory lblock 0
 
No one seems to know what the cause of this is. I'm posting my experience in case it helps a Googler down the road.

It has now happened on an NFS share running on top of a hardware RAID, so MD is not to blame either. I now suspect it's possibly a kernel or fsutil bug inside the client VM, as they were all installed from the same image.

It is very odd, however, that these VMs ran for a very long time without issue and as soon as I added that 3.4 machine to the 3.3 cluster is when it started. If any devs have insight it'd be appreciated as my level of trust in my cluster is very low and it's not a nice feeling.

I'm going to update everything inside the clients over the next while and see if it helps.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!