Hi,
i am interested, how Proxmox handles storage outage. In our test case, it's network HA NFSv4 master/slave (pacemaker setup).
So facts:
1] master server reboot 15:28:28
...pacemaker doing its things...
2] nfs server started on new master 15:28:36
3] rpc.mountd detect incoming connection 15:28:41
4] but our pvestatd has big delay
So there is 15:28:28~15:30:06 = almost 90 seconds delay. That's default grace recovery for nfs4. We can modify grace/lease time, but what is the best for Proxmox{4,5}? Questions (even for any type network storage):
1] does Proxmox{4,5} pause VMs if network storage outage for some time (10++s etc)? I want evade having VMs with RO disks inside with small failover time, especially very write busy VMs.
2] any difference, if VM is in HA mode?
3] if we tune down grace/lease time, any risks with 1],2]?
Thanks.
i am interested, how Proxmox handles storage outage. In our test case, it's network HA NFSv4 master/slave (pacemaker setup).
So facts:
1] master server reboot 15:28:28
...pacemaker doing its things...
2] nfs server started on new master 15:28:36
Apr 21 15:28:36 stor-01 crmd[1641]: notice: Result of start operation for nfs-kernel-server on stor-01: 0 (ok)
Apr 21 15:28:36 stor-01 crmd[1641]: notice: Transition 1 (Complete=14, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-678.bz2): Complete
Apr 21 15:28:36 stor-01 crmd[1641]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Apr 21 15:28:36 stor-01 crmd[1641]: notice: Transition 1 (Complete=14, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-678.bz2): Complete
Apr 21 15:28:36 stor-01 crmd[1641]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: inbuf 'nfsd IP_A'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: client 0x556238425940 '*,IP_A/24'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: nfsd_fh: inbuf '*,IP_A/24 1 \x00000000'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: nfsd_fh: found 0x556238423bc0 path /srv/nfs
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: from_local: updating local if addr list
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: from_local: checked 22 local if addrs; incoming address not found
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_2_IP ALLOWED
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: Received EXPORT request from PROXMOX_2_IP.
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: inbuf 'nfsd PROXMOX_2_IP'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: client 0x556238418fe0 '*,PROXMOX_2_IP/24'
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: from_local: updating local if addr list
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: from_local: checked 22 local if addrs; incoming address not found
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_1_IP ALLOWED
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: Received EXPORT request from PROXMOX_1_IP.
Apr 21 15:28:43 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_2_IP ALLOWED (cached)
Apr 21 15:28:43 stor-01 rpc.mountd[3675]: Received EXPORT request from PROXMOX_2_IP.
Apr 21 15:28:44 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_1_IP ALLOWED (cached)
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: client 0x556238425940 '*,IP_A/24'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: nfsd_fh: inbuf '*,IP_A/24 1 \x00000000'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: nfsd_fh: found 0x556238423bc0 path /srv/nfs
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: from_local: updating local if addr list
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: from_local: checked 22 local if addrs; incoming address not found
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_2_IP ALLOWED
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: Received EXPORT request from PROXMOX_2_IP.
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: inbuf 'nfsd PROXMOX_2_IP'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: client 0x556238418fe0 '*,PROXMOX_2_IP/24'
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: from_local: updating local if addr list
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: from_local: checked 22 local if addrs; incoming address not found
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_1_IP ALLOWED
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: Received EXPORT request from PROXMOX_1_IP.
Apr 21 15:28:43 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_2_IP ALLOWED (cached)
Apr 21 15:28:43 stor-01 rpc.mountd[3675]: Received EXPORT request from PROXMOX_2_IP.
Apr 21 15:28:44 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_1_IP ALLOWED (cached)
Apr 21 15:28:33 pve-02 pvestatd[1972]: storage 'nfs_proxmox_vmstore3' is not online
Apr 21 15:28:35 pve-02 pvestatd[1972]: storage 'nfs_proxmox_vmstore2' is not online
Apr 21 15:28:43 pve-02 pvestatd[1972]: got timeout
Apr 21 15:28:45 pve-02 pvestatd[1972]: got timeout
Apr 21 15:28:53 pve-02 pvestatd[1972]: got timeout
Apr 21 15:28:55 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:03 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:06 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:13 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:15 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:23 pve-02 pvestatd[1972]: got timeout
Apr 21 15:30:06 pve-02 pvestatd[1972]: status update time (45.463 seconds)
Apr 21 15:28:35 pve-02 pvestatd[1972]: storage 'nfs_proxmox_vmstore2' is not online
Apr 21 15:28:43 pve-02 pvestatd[1972]: got timeout
Apr 21 15:28:45 pve-02 pvestatd[1972]: got timeout
Apr 21 15:28:53 pve-02 pvestatd[1972]: got timeout
Apr 21 15:28:55 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:03 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:06 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:13 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:15 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:23 pve-02 pvestatd[1972]: got timeout
Apr 21 15:30:06 pve-02 pvestatd[1972]: status update time (45.463 seconds)
nfs: nfs_proxmox_vmstore3
export /export3/proxmox_vmstore3
path /mnt/pve/nfs_proxmox_vmstore3
server NFS_IP
content backup,iso,images
maxfiles 0
nodes pve-02,pve-01
options vers=4.0,minorversion=2,rw,relatime,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=14,retrans=2,sec=sys,local_lock=none
export /export3/proxmox_vmstore3
path /mnt/pve/nfs_proxmox_vmstore3
server NFS_IP
content backup,iso,images
maxfiles 0
nodes pve-02,pve-01
options vers=4.0,minorversion=2,rw,relatime,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=14,retrans=2,sec=sys,local_lock=none
So there is 15:28:28~15:30:06 = almost 90 seconds delay. That's default grace recovery for nfs4. We can modify grace/lease time, but what is the best for Proxmox{4,5}? Questions (even for any type network storage):
1] does Proxmox{4,5} pause VMs if network storage outage for some time (10++s etc)? I want evade having VMs with RO disks inside with small failover time, especially very write busy VMs.
2] any difference, if VM is in HA mode?
3] if we tune down grace/lease time, any risks with 1],2]?
Thanks.