How Proxmox handles storage outage?

czechsys

Renowned Member
Nov 18, 2015
419
43
93
Hi,

i am interested, how Proxmox handles storage outage. In our test case, it's network HA NFSv4 master/slave (pacemaker setup).

So facts:
1] master server reboot 15:28:28
...pacemaker doing its things...
2] nfs server started on new master 15:28:36
Apr 21 15:28:36 stor-01 crmd[1641]: notice: Result of start operation for nfs-kernel-server on stor-01: 0 (ok)
Apr 21 15:28:36 stor-01 crmd[1641]: notice: Transition 1 (Complete=14, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-678.bz2): Complete
Apr 21 15:28:36 stor-01 crmd[1641]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
3] rpc.mountd detect incoming connection 15:28:41
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: inbuf 'nfsd IP_A'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: client 0x556238425940 '*,IP_A/24'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: nfsd_fh: inbuf '*,IP_A/24 1 \x00000000'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: nfsd_fh: found 0x556238423bc0 path /srv/nfs
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: from_local: updating local if addr list
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: from_local: checked 22 local if addrs; incoming address not found
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_2_IP ALLOWED
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: Received EXPORT request from PROXMOX_2_IP.
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: inbuf 'nfsd PROXMOX_2_IP'
Apr 21 15:28:41 stor-01 rpc.mountd[3675]: auth_unix_ip: client 0x556238418fe0 '*,PROXMOX_2_IP/24'
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: from_local: updating local if addr list
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: from_local: checked 22 local if addrs; incoming address not found
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_1_IP ALLOWED
Apr 21 15:28:42 stor-01 rpc.mountd[3675]: Received EXPORT request from PROXMOX_1_IP.
Apr 21 15:28:43 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_2_IP ALLOWED (cached)
Apr 21 15:28:43 stor-01 rpc.mountd[3675]: Received EXPORT request from PROXMOX_2_IP.
Apr 21 15:28:44 stor-01 rpc.mountd[3675]: check_default: access by PROXMOX_1_IP ALLOWED (cached)
4] but our pvestatd has big delay
Apr 21 15:28:33 pve-02 pvestatd[1972]: storage 'nfs_proxmox_vmstore3' is not online
Apr 21 15:28:35 pve-02 pvestatd[1972]: storage 'nfs_proxmox_vmstore2' is not online
Apr 21 15:28:43 pve-02 pvestatd[1972]: got timeout
Apr 21 15:28:45 pve-02 pvestatd[1972]: got timeout
Apr 21 15:28:53 pve-02 pvestatd[1972]: got timeout
Apr 21 15:28:55 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:03 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:06 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:13 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:15 pve-02 pvestatd[1972]: got timeout
Apr 21 15:29:23 pve-02 pvestatd[1972]: got timeout
Apr 21 15:30:06 pve-02 pvestatd[1972]: status update time (45.463 seconds)

nfs: nfs_proxmox_vmstore3
export /export3/proxmox_vmstore3
path /mnt/pve/nfs_proxmox_vmstore3
server NFS_IP
content backup,iso,images
maxfiles 0
nodes pve-02,pve-01
options vers=4.0,minorversion=2,rw,relatime,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=14,retrans=2,sec=sys,local_lock=none

So there is 15:28:28~15:30:06 = almost 90 seconds delay. That's default grace recovery for nfs4. We can modify grace/lease time, but what is the best for Proxmox{4,5}? Questions (even for any type network storage):

1] does Proxmox{4,5} pause VMs if network storage outage for some time (10++s etc)? I want evade having VMs with RO disks inside with small failover time, especially very write busy VMs.
2] any difference, if VM is in HA mode?
3] if we tune down grace/lease time, any risks with 1],2]?

Thanks.
 
That's very weird. When I played with nfs/pacemaker+corosync last year, it never took more than 15 seconds or so (although that was using vmware, not proxmox, but still...) Don't know what pvestatd is doing here?
 
You used nfs4, right? Does vmware uses locks? That's only difference i have in the head now. I can't test this until Monday.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!