I am getting very high IO Delay on any kind of a load. It basically stalls all VMs. We are a web host and we have five CloudLinux/cPanel VMs (all with about 100 sites each), A Windows 2016/Plesk VM, and a CentOS7/Sentora VM running. When this happpens all sites stop responding, I get kernel:NMI watchdog: BUG: soft lockup messages, daemons die or panic, and we have had a MySQL database become corrupt and need repair. I have been struggling with this for weeks and following the other threads on the forums of the many other users having similar problems. Very early this morning I did an update hoping the upgrade to PVE 5.1-36 with the 4.13.4-1 kernel would help, but it did not. Upon boot of the first VM the IO Delay instantly shot to above 25 and within 30 seconds hovered between 30 and 40 the entire time the system was booting. I have to wait a minimum of 15 minutes between starting each VM and let it settle back down or it kills the entire system with IO Delays hitting between 70 and 90. This is the same as it was prior to the update. I have also had to turn off all backups and system updates because a single VM starting either of those is going to push the IO Delay into the 30+ range and basically stall all VMs. Prior to the update I was on PVE 5.0-32 with kernel 4.10.17-4. I have been having these issues since 2 or 3 updates and at least 1 kernel ago. Like I mentioned, it has been a few weeks.
Here is my current setup:
Dell PowerEdge R730
CPU: Dual Xeon E5-2630 v4 (10 core, 20 Thread)
RAM: 128GB (4x32 RDIMM, 2400MT/s, 2RX4, 8G, DDR4, R)
Storage: PERC H330 RAID Controller
Slots 0 & 1 in RAID 1 mirror - OS Drives - 128 GB SSD
Slot 2 is 128GB SSD setup as the cache drive for ZFS Pool (No longer in use, I removed the ZIL and L2ARC while troubleshooting)
Slots 3-6 are 2TB platter drives in RAIDZ configuration (HBA mode on controller)
The OS is installed on the mirrored SSD drives. The images are on a RAIDZ1 pool. The images are RAW, I would like to change that - more on this later. I will now post some interesting if not relevant lines from my last boot dmesg. It should answer questions about hardware setup and what the system is seeing, and possibly provide some clues.
I have a couple of different goals here, and some of the required steps to meet the goals may be related to each other or be convenient to do at the same time.
Goal 1 and biggest priority -- Fix the performance issue that is killing this production server. Starting to wonder if I made a mistake not going with VMWare.
Goal 2 - Storage problem, after reading how great ZFS is in the documentation and the threads I decided to go that route. I don't fully regret this, but there are some consequences I was not aware of and would like to fix. The only image option is RAW and that means thick provisioning and wasted space. PVE thinks I am completely out of drive space even though "zpool list" shows I have 4.95TB free. I am thinking I might have missed a step and PVE is using the raw RAIDZ pool as a block device. I have been wondering if there was a way to put a file system on it before telling PVE to use the storage. I mean for example, could I have created the RAIDZ and gotten all of the advantages of that for the LVM portion of the picture, but then formatted it EXT4 or something so I could have used a better image format that would have allowed for thin provisioning? How do I get there from here? Can I move my images to an external HD, redo the storage pool, move the images back and reattach them? I assume so, but I have been unable to figure out how to do that.
This server (provided I add the storage space and RAM to keep up) should easily be able to do 5-10 times this number of VMs without breaking a sweat, and it was purchased with the promise of being able to do that. Something is seriously wrong here. below is some more information that might be of interest.
Entire dmesg from last boot: pastebin(dot)com/FScr3LUe -- I can't post external links, because that might be helpful.
Here is my current setup:
Dell PowerEdge R730
CPU: Dual Xeon E5-2630 v4 (10 core, 20 Thread)
RAM: 128GB (4x32 RDIMM, 2400MT/s, 2RX4, 8G, DDR4, R)
Storage: PERC H330 RAID Controller
Slots 0 & 1 in RAID 1 mirror - OS Drives - 128 GB SSD
Slot 2 is 128GB SSD setup as the cache drive for ZFS Pool (No longer in use, I removed the ZIL and L2ARC while troubleshooting)
Slots 3-6 are 2TB platter drives in RAIDZ configuration (HBA mode on controller)
The OS is installed on the mirrored SSD drives. The images are on a RAIDZ1 pool. The images are RAW, I would like to change that - more on this later. I will now post some interesting if not relevant lines from my last boot dmesg. It should answer questions about hardware setup and what the system is seeing, and possibly provide some clues.
Code:
[ 0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0xb000020 (or later)
[ 0.000000] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[ 0.124043] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[ 0.124043] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
[ 0.225138] smp: Brought up 2 nodes, 40 CPUs
[ 0.225138] smpboot: Total of 40 processors activated (176003.68 BogoMIPS)
[ 2.377991] scsi 0:0:3:0: Direct-Access ATA ST2000NX0403 NA05 PQ: 0 ANSI: 6
[ 2.441801] scsi 0:0:4:0: Direct-Access ATA ST2000NX0403 NA05 PQ: 0 ANSI: 6
[ 2.477820] scsi 0:0:5:0: Direct-Access ATA ST2000NX0403 NA05 PQ: 0 ANSI: 6
[ 2.513780] scsi 0:0:6:0: Direct-Access ATA ST2000NX0403 NA05 PQ: 0 ANSI: 6
[ 2.548685] scsi 0:2:0:0: Direct-Access DELL PERC H330 Mini 4.27 PQ: 0 ANSI: 5
[ 2.589046] sd 0:0:2:0: Attached scsi generic sg0 type 0
[ 2.589167] sd 0:0:3:0: Attached scsi generic sg1 type 0
[ 2.589286] sd 0:0:4:0: Attached scsi generic sg2 type 0
[ 2.589422] sd 0:0:5:0: Attached scsi generic sg3 type 0
[ 2.589543] sd 0:0:6:0: Attached scsi generic sg4 type 0
[ 2.589659] sd 0:2:0:0: Attached scsi generic sg5 type 0
[ 2.590117] sd 0:2:0:0: [sdf] 233308160 512-byte logical blocks: (119 GB/111 GiB)
[ 2.590240] sd 0:2:0:0: [sdf] Write Protect is off
[ 2.590241] sd 0:2:0:0: [sdf] Mode Sense: 1f 00 10 08
[ 2.590369] sd 0:2:0:0: [sdf] Write cache: disabled, read cache: disabled, supports DPO and FUA
[ 2.592969] sd 0:0:2:0: [sda] 468862128 512-byte logical blocks: (240 GB/224 GiB)
[ 2.593170] sdf: sdf1 sdf2 sdf3
[ 2.593708] sd 0:2:0:0: [sdf] Attached SCSI disk
[ 2.594348] sd 0:0:5:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[ 2.594392] sd 0:0:3:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[ 2.594705] sd 0:0:2:0: [sda] Write Protect is off
[ 2.594706] sd 0:0:2:0: [sda] Mode Sense: 9b 00 10 08
[ 2.594816] sd 0:0:4:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[ 2.594824] sd 0:0:6:0: [sde] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[ 2.595314] sd 0:0:2:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
[ 2.600817] sda: sda1 sda2
[ 2.607987] sd 0:0:2:0: [sda] Attached SCSI disk
[ 2.646535] ata4: SATA link down (SStatus 0 SControl 300)
[ 2.720004] sd 0:0:4:0: [sdc] Write Protect is off
[ 2.720008] sd 0:0:4:0: [sdc] Mode Sense: 9b 00 10 08
[ 2.721243] sd 0:0:4:0: [sdc] Write cache: disabled, read cache: enabled, supports DPO and FUA
[ 2.721611] sd 0:0:5:0: [sdd] Write Protect is off
[ 2.721614] sd 0:0:5:0: [sdd] Mode Sense: 9b 00 10 08
[ 2.722888] sd 0:0:5:0: [sdd] Write cache: disabled, read cache: enabled, supports DPO and FUA
[ 2.726344] sd 0:0:3:0: [sdb] Write Protect is off
[ 2.726347] sd 0:0:3:0: [sdb] Mode Sense: 9b 00 10 08
[ 2.727628] sd 0:0:3:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
[ 2.729133] sd 0:0:6:0: [sde] Write Protect is off
[ 2.729136] sd 0:0:6:0: [sde] Mode Sense: 9b 00 10 08
[ 2.730367] sd 0:0:6:0: [sde] Write cache: disabled, read cache: enabled, supports DPO and FUA
[ 2.894610] sdd: sdd1 sdd9
[ 2.895920] sdc: sdc1 sdc9
[ 2.897016] sdb: sdb1 sdb9
[ 2.905710] sde: sde1 sde9
[ 2.958925] ata6: SATA link down (SStatus 0 SControl 300)
[ 3.021492] sd 0:0:5:0: [sdd] Attached SCSI disk
[ 3.050236] sd 0:0:4:0: [sdc] Attached SCSI disk
[ 3.056027] sd 0:0:3:0: [sdb] Attached SCSI disk
[ 3.064492] sd 0:0:6:0: [sde] Attached SCSI disk
I have a couple of different goals here, and some of the required steps to meet the goals may be related to each other or be convenient to do at the same time.
Goal 1 and biggest priority -- Fix the performance issue that is killing this production server. Starting to wonder if I made a mistake not going with VMWare.
Goal 2 - Storage problem, after reading how great ZFS is in the documentation and the threads I decided to go that route. I don't fully regret this, but there are some consequences I was not aware of and would like to fix. The only image option is RAW and that means thick provisioning and wasted space. PVE thinks I am completely out of drive space even though "zpool list" shows I have 4.95TB free. I am thinking I might have missed a step and PVE is using the raw RAIDZ pool as a block device. I have been wondering if there was a way to put a file system on it before telling PVE to use the storage. I mean for example, could I have created the RAIDZ and gotten all of the advantages of that for the LVM portion of the picture, but then formatted it EXT4 or something so I could have used a better image format that would have allowed for thin provisioning? How do I get there from here? Can I move my images to an external HD, redo the storage pool, move the images back and reattach them? I assume so, but I have been unable to figure out how to do that.
This server (provided I add the storage space and RAM to keep up) should easily be able to do 5-10 times this number of VMs without breaking a sweat, and it was purchased with the promise of being able to do that. Something is seriously wrong here. below is some more information that might be of interest.
Entire dmesg from last boot: pastebin(dot)com/FScr3LUe -- I can't post external links, because that might be helpful.
Code:
root@the-verse:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
guests-zpool 7.25T 2.30T 4.95T - 29% 31% 1.00x ONLINE -
Code:
root@the-verse:~# zpool status -v guests-zpool
pool: guests-zpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 32h25m with 0 errors on Mon Oct 9 08:49:09 2017
config:
NAME STATE READ WRITE CKSUM
guests-zpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-ST2000NX0403_W46069PW ONLINE 0 0 0
ata-ST2000NX0403_W4606CZP ONLINE 0 0 0
ata-ST2000NX0403_W46069Z8 ONLINE 0 0 0
ata-ST2000NX0403_W4606DHW ONLINE 0 0 0
errors: No known data errors
Code:
root@the-verse:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
guests-zpool 4.95T 156G 140K none
guests-zpool/vm-100-disk-1 792G 695G 253G -
guests-zpool/vm-101-disk-1 792G 458G 490G -
guests-zpool/vm-102-disk-1 792G 432G 516G -
guests-zpool/vm-103-disk-1 823G 772G 197G -
guests-zpool/vm-104-disk-1 792G 889G 58.6G -
guests-zpool/vm-105-disk-1 542G 611G 79.4G -
guests-zpool/vm-105-state-Still_Evaluation 8.76G 163G 2.02G -
guests-zpool/vm-106-disk-1 318G 401G 59.2G -
guests-zpool/vm-106-state-Sentora 8.76G 160G 4.14G -
guests-zpool/vm-107-disk-1 198G 331G 22.8G -
Code:
root@the-verse:~# zfs get all guests-zpool
NAME PROPERTY VALUE SOURCE
guests-zpool type filesystem -
guests-zpool creation Fri Jul 28 17:19 2017 -
guests-zpool used 4.95T -
guests-zpool available 156G -
guests-zpool referenced 140K -
guests-zpool compressratio 1.00x -
guests-zpool mounted no -
guests-zpool quota none default
guests-zpool reservation none default
guests-zpool recordsize 128K default
guests-zpool mountpoint none local
guests-zpool sharenfs off default
guests-zpool checksum on default
guests-zpool compression off default
guests-zpool atime off local
guests-zpool devices on default
guests-zpool exec on default
guests-zpool setuid on default
guests-zpool readonly off default
guests-zpool zoned off default
guests-zpool snapdir hidden default
guests-zpool aclinherit restricted default
guests-zpool createtxg 1 -
guests-zpool canmount on default
guests-zpool xattr on default
guests-zpool copies 1 default
guests-zpool version 5 -
guests-zpool utf8only off -
guests-zpool normalization none -
guests-zpool casesensitivity sensitive -
guests-zpool vscan off default
guests-zpool nbmand off default
guests-zpool sharesmb off default
guests-zpool refquota none default
guests-zpool refreservation none default
guests-zpool guid 5417188825227061568 -
guests-zpool primarycache all default
guests-zpool secondarycache all default
guests-zpool usedbysnapshots 0B -
guests-zpool usedbydataset 140K -
guests-zpool usedbychildren 4.95T -
guests-zpool usedbyrefreservation 0B -
guests-zpool logbias latency default
guests-zpool dedup off local
guests-zpool mlslabel none default
guests-zpool sync standard default
guests-zpool dnodesize legacy default
guests-zpool refcompressratio 1.00x -
guests-zpool written 140K -
guests-zpool logicalused 1.14T -
guests-zpool logicalreferenced 40K -
guests-zpool volmode default default
guests-zpool filesystem_limit none default
guests-zpool snapshot_limit none default
guests-zpool filesystem_count none default
guests-zpool snapshot_count none default
guests-zpool snapdev hidden default
guests-zpool acltype off default
guests-zpool context none default
guests-zpool fscontext none default
guests-zpool defcontext none default
guests-zpool rootcontext none default
guests-zpool relatime off default
guests-zpool redundant_metadata all default
guests-zpool overlay off default
Code:
I can't post images either!!
Again, don't want to allow a user to provide useful information (SERIOUSLY MODS?!?)
photos(dot)app(dot)goo(dot)gl/zJuUfJc9d9eDNFMr1
photos(dot)app(dot)goo(dot)gl/fQEV9KW9DLPvGmhu1
photos(dot)app(dot)goo(dot)gl/X9cyIkJk0eLmqIUn1