Hi guys
We had Enterprise SATA Disks with HW RAID 10 we recently moved to ZFS with Crucial MX200 1TB disks.
Now we getting some weird outages occur where it goes down early hours of the morning for around 2 hours and VMs does not want to start by themselves - shows all stopped even though there is an entry for Start All VMS in proxmox logs I see. We do NFS backups of VMS at early hours of morning so its somehow related to that.
So I'm wondering if its due to memory shortage as when NFS is done ZFS eats memory.
Note we adjusted the VM Swappiness already to 10 when we setup the servers so doubt its that.
I dropped Arc memory from 16GB max to 8GB max to give server more memory and rebooted.
If this does not work should I rather just move back to good old HW Raid.
or does anyone have any other ideas?
Also this is from logs looks like server crashed and rebooted:
Apr 25 01:16:34 vz-jhb-1 kernel: [619101.686171] nfs: server 10.0.0.200 OK
Apr 25 01:16:34 vz-jhb-1 kernel: [619101.704347] nfs: server 10.0.0.200 OK
Apr 25 01:21:23 vz-jhb-1 qm[30978]: <root@pam> update VM 103: -lock backup
Apr 25 01:21:23 vz-jhb-1 kernel: [619390.822142] device tap103i0 entered promiscuous mode
Apr 25 01:21:23 vz-jhb-1 kernel: [619390.827363] vmbr0: port 12(tap103i0) entered forwarding state
Apr 25 01:21:23 vz-jhb-1 kernel: [619390.827373] vmbr0: port 12(tap103i0) entered forwarding state
Apr 25 01:28:18 vz-jhb-1 kernel: [619806.232573] zd160: p1 p2
Apr 25 01:28:18 vz-jhb-1 kernel: [619806.350661] vmbr0: port 12(tap103i0) entered disabled state
Apr 25 01:28:23 vz-jhb-1 qm[31603]: <root@pam> update VM 104: -lock backup
Apr 25 01:28:23 vz-jhb-1 kernel: [619811.047527] device tap104i0 entered promiscuous mode
Apr 25 01:28:23 vz-jhb-1 kernel: [619811.052351] vmbr0: port 12(tap104i0) entered forwarding state
Apr 25 01:28:23 vz-jhb-1 kernel: [619811.052360] vmbr0: port 12(tap104i0) entered forwarding state
Apr 25 01:32:20 vz-jhb-1 kernel: [620047.772501] zd128: p1 p2
Apr 25 01:32:20 vz-jhb-1 kernel: [620047.905385] vmbr0: port 12(tap104i0) entered disabled state
Apr 25 01:32:23 vz-jhb-1 qm[31975]: <root@pam> update VM 105: -lock backup
Apr 25 01:33:38 vz-jhb-1 qm[32088]: <root@pam> update VM 106: -lock backup
Apr 25 01:35:20 vz-jhb-1 qm[32225]: <root@pam> update VM 107: -lock backup
Apr 25 01:38:22 vz-jhb-1 qm[32476]: <root@pam> update VM 108: -lock backup
Apr 25 01:44:46 vz-jhb-1 qm[599]: <root@pam> update VM 109: -lock backup
Apr 25 01:47:10 vz-jhb-1 qm[799]: <root@pam> update VM 110: -lock backup
Apr 25 03:12:49 vz-jhb-1 qm[8031]: <root@pam> update VM 111: -lock backup
Apr 25 04:42:05 vz-jhb-1 pveupdate[15086]: <root@pam> starting task UPID:vz-jhb-1:00003B00:03C2BF21:571D83FD:aptupdate::root@pam:
Apr 25 04:42:12 vz-jhb-1 pveupdate[15086]: <root@pam> end task UPID:vz-jhb-1:00003B00:03C2BF21:571D83FD:aptupdate::root@pam: OK
Apr 25 04:49:59 vz-jhb-1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="3274" x-info="http://www.rsyslog.com"] start
Apr 25 04:49:59 vz-jhb-1 kernel: [ 0.000000] Initializing cgroup subsys cpuset
Apr 25 04:49:59 vz-jhb-1 kernel: [ 0.000000] Initializing cgroup subsys cpu
Apr 25 04:49:59 vz-jhb-1 kernel: [ 0.000000] Initializing cgroup subsys cpuacct
Apr 25 04:49:59 vz-jhb-1 kernel: [ 0.000000] Linux version 4.2.8-1-pve (root@elsa) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Fri Feb 26 16:37:36 CET 2016 ()
Also note we using FreeNAS as NFS server which seems to work great currently for other servers.
We had Enterprise SATA Disks with HW RAID 10 we recently moved to ZFS with Crucial MX200 1TB disks.
Now we getting some weird outages occur where it goes down early hours of the morning for around 2 hours and VMs does not want to start by themselves - shows all stopped even though there is an entry for Start All VMS in proxmox logs I see. We do NFS backups of VMS at early hours of morning so its somehow related to that.
So I'm wondering if its due to memory shortage as when NFS is done ZFS eats memory.
Note we adjusted the VM Swappiness already to 10 when we setup the servers so doubt its that.
I dropped Arc memory from 16GB max to 8GB max to give server more memory and rebooted.
If this does not work should I rather just move back to good old HW Raid.
or does anyone have any other ideas?
Also this is from logs looks like server crashed and rebooted:
Apr 25 01:16:34 vz-jhb-1 kernel: [619101.686171] nfs: server 10.0.0.200 OK
Apr 25 01:16:34 vz-jhb-1 kernel: [619101.704347] nfs: server 10.0.0.200 OK
Apr 25 01:21:23 vz-jhb-1 qm[30978]: <root@pam> update VM 103: -lock backup
Apr 25 01:21:23 vz-jhb-1 kernel: [619390.822142] device tap103i0 entered promiscuous mode
Apr 25 01:21:23 vz-jhb-1 kernel: [619390.827363] vmbr0: port 12(tap103i0) entered forwarding state
Apr 25 01:21:23 vz-jhb-1 kernel: [619390.827373] vmbr0: port 12(tap103i0) entered forwarding state
Apr 25 01:28:18 vz-jhb-1 kernel: [619806.232573] zd160: p1 p2
Apr 25 01:28:18 vz-jhb-1 kernel: [619806.350661] vmbr0: port 12(tap103i0) entered disabled state
Apr 25 01:28:23 vz-jhb-1 qm[31603]: <root@pam> update VM 104: -lock backup
Apr 25 01:28:23 vz-jhb-1 kernel: [619811.047527] device tap104i0 entered promiscuous mode
Apr 25 01:28:23 vz-jhb-1 kernel: [619811.052351] vmbr0: port 12(tap104i0) entered forwarding state
Apr 25 01:28:23 vz-jhb-1 kernel: [619811.052360] vmbr0: port 12(tap104i0) entered forwarding state
Apr 25 01:32:20 vz-jhb-1 kernel: [620047.772501] zd128: p1 p2
Apr 25 01:32:20 vz-jhb-1 kernel: [620047.905385] vmbr0: port 12(tap104i0) entered disabled state
Apr 25 01:32:23 vz-jhb-1 qm[31975]: <root@pam> update VM 105: -lock backup
Apr 25 01:33:38 vz-jhb-1 qm[32088]: <root@pam> update VM 106: -lock backup
Apr 25 01:35:20 vz-jhb-1 qm[32225]: <root@pam> update VM 107: -lock backup
Apr 25 01:38:22 vz-jhb-1 qm[32476]: <root@pam> update VM 108: -lock backup
Apr 25 01:44:46 vz-jhb-1 qm[599]: <root@pam> update VM 109: -lock backup
Apr 25 01:47:10 vz-jhb-1 qm[799]: <root@pam> update VM 110: -lock backup
Apr 25 03:12:49 vz-jhb-1 qm[8031]: <root@pam> update VM 111: -lock backup
Apr 25 04:42:05 vz-jhb-1 pveupdate[15086]: <root@pam> starting task UPID:vz-jhb-1:00003B00:03C2BF21:571D83FD:aptupdate::root@pam:
Apr 25 04:42:12 vz-jhb-1 pveupdate[15086]: <root@pam> end task UPID:vz-jhb-1:00003B00:03C2BF21:571D83FD:aptupdate::root@pam: OK
Apr 25 04:49:59 vz-jhb-1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="3274" x-info="http://www.rsyslog.com"] start
Apr 25 04:49:59 vz-jhb-1 kernel: [ 0.000000] Initializing cgroup subsys cpuset
Apr 25 04:49:59 vz-jhb-1 kernel: [ 0.000000] Initializing cgroup subsys cpu
Apr 25 04:49:59 vz-jhb-1 kernel: [ 0.000000] Initializing cgroup subsys cpuacct
Apr 25 04:49:59 vz-jhb-1 kernel: [ 0.000000] Linux version 4.2.8-1-pve (root@elsa) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Fri Feb 26 16:37:36 CET 2016 ()
Also note we using FreeNAS as NFS server which seems to work great currently for other servers.