I've been troubleshooting a considerable meltdown of my NAS just after I switched to Proxmox from ESXI. In ESXI my NAS was a VM passing through the LSI 2308 in IT mode and I never had a problem.
I set up Proxmox with a KVM VM for the NAS with IOMMU enabled and the controller passed through with "hostpci0: 2:00.0." Everything was working. After about a week with Proxmox I experienced an event whereby I tried to delete a file on the NAS and it crashed. On reboot, every single hard drive on the controller failed FSCK and complained about missing superblocks. Due to my own lack of knowledge in dealing with this situation, of my 6 drives, I lost 2 that I salvaged with photorec by purchasing another hard drive. The other 4 seemed OK but as a precaution I re-partitioned and reformatted all drives with EXT4 again.
I re-set up the NAS and experienced a similar event when I was trying to write to one of the hard drives. The drives all dropped out of their mounts and on reboot all of them had filesystem corruption again. In PartedMagic I can access most of the essential files (I have other backups) and am copying them out to my desktop in the meantime for easy access.
I updated the bios for my motherboard, Supermicro X10SL7-F, and updated the firmware for the LSI 2308 to the latest version. I ran FSCK on all hard drives and they passed except one, which had a bunch of multiple blocks where from what I can tell the files I had been copying to it had occupied the same spot as other files somehow (even though I had wiped the filesystem prior). FSCK completed on that drive.
I did some research and changed my KVM from "hostpci0: 2:00.0" to "hostpci0: 2:00.0,pcie=1,driver=vfio" with "machine: q35" as mentioned in wiki for PCI-E. My card shows up as PCI-E so I thought maybe that was it.
However, upon boot, I tried to copy some files out of the drives and it all crapped out again -- the drives disappeared from samba, and all drives complain of missing superblocks on reboot and require manual fsck. I'm scared to run these manual fscks because that is what helped me lose 1 drive earlier.
I'm at wit's end here as this goes beyond my knowledge and googling skill and am here to ask for advice. In my searching this is the only reference I could find: https://peterkieser.com/2013/08/07/...ontroller-to-a-guest-causing-data-corruption/
The timing of this issue (right after switching from ESXI) makes me strongly believe that this is something KVM-related. All my HDD's pass Smart test and each time the filesystem corruption happens it happens to all the drives on the controller and not the other drives.
Is there any way to salvage Proxmox with this or should I pack it in? Anyone else had similar issues? Thanks in advance!
I set up Proxmox with a KVM VM for the NAS with IOMMU enabled and the controller passed through with "hostpci0: 2:00.0." Everything was working. After about a week with Proxmox I experienced an event whereby I tried to delete a file on the NAS and it crashed. On reboot, every single hard drive on the controller failed FSCK and complained about missing superblocks. Due to my own lack of knowledge in dealing with this situation, of my 6 drives, I lost 2 that I salvaged with photorec by purchasing another hard drive. The other 4 seemed OK but as a precaution I re-partitioned and reformatted all drives with EXT4 again.
I re-set up the NAS and experienced a similar event when I was trying to write to one of the hard drives. The drives all dropped out of their mounts and on reboot all of them had filesystem corruption again. In PartedMagic I can access most of the essential files (I have other backups) and am copying them out to my desktop in the meantime for easy access.
I updated the bios for my motherboard, Supermicro X10SL7-F, and updated the firmware for the LSI 2308 to the latest version. I ran FSCK on all hard drives and they passed except one, which had a bunch of multiple blocks where from what I can tell the files I had been copying to it had occupied the same spot as other files somehow (even though I had wiped the filesystem prior). FSCK completed on that drive.
I did some research and changed my KVM from "hostpci0: 2:00.0" to "hostpci0: 2:00.0,pcie=1,driver=vfio" with "machine: q35" as mentioned in wiki for PCI-E. My card shows up as PCI-E so I thought maybe that was it.
However, upon boot, I tried to copy some files out of the drives and it all crapped out again -- the drives disappeared from samba, and all drives complain of missing superblocks on reboot and require manual fsck. I'm scared to run these manual fscks because that is what helped me lose 1 drive earlier.
I'm at wit's end here as this goes beyond my knowledge and googling skill and am here to ask for advice. In my searching this is the only reference I could find: https://peterkieser.com/2013/08/07/...ontroller-to-a-guest-causing-data-corruption/
The timing of this issue (right after switching from ESXI) makes me strongly believe that this is something KVM-related. All my HDD's pass Smart test and each time the filesystem corruption happens it happens to all the drives on the controller and not the other drives.
Is there any way to salvage Proxmox with this or should I pack it in? Anyone else had similar issues? Thanks in advance!
Last edited: