Ive set up a new Standalone Proxmox Host for testing purposes and got some strange timeouts with the NFS Shares.
My current Setup:
PM Testserver (10.10.250.10) stands on a Site which is connected with an S2S VPN. Stable connection and low latency (5-7ms). My NFS Storage (Synology NAS 10.10.0.220) is located on my other Site. Ive mounted 4 NFS Shares (3 from x.0.220 and 1 from x.0.221) 2x datastore for the VM's (1x HDD, 1x SSD) 1 Share and 1 Backup where i backup my production VMs. For testing i started to restore some VMs to my Test PM (10.10.250.10) (From Backup NFS 10.10.0.221 to Datastore 10.10.0.220) and at this point everything starts going crazy.
As soon as i start the Restore my NFS paths going offline and various Logs appear in the journal. Following a part from the start of the restore:
Followed by:
First i thought that the connection drops while sending an high load of data but ive tested with iperf from Site A to Site B and had no problems while stressing the 100mbit/s up/download from the remote site. Also its kinda strange that all NFS Connections from the Dest. NAS drips while the NFS path to my backup NAS has no problems at all ( NFS is identically configured on both NAS).
Can anyone give me any hint or stumbled upon similar problems?
Following the Log file from the Journal. And yes, my HDD sda has some faulty sectors but sda is not in use in any way.
My current Setup:
PM Testserver (10.10.250.10) stands on a Site which is connected with an S2S VPN. Stable connection and low latency (5-7ms). My NFS Storage (Synology NAS 10.10.0.220) is located on my other Site. Ive mounted 4 NFS Shares (3 from x.0.220 and 1 from x.0.221) 2x datastore for the VM's (1x HDD, 1x SSD) 1 Share and 1 Backup where i backup my production VMs. For testing i started to restore some VMs to my Test PM (10.10.250.10) (From Backup NFS 10.10.0.221 to Datastore 10.10.0.220) and at this point everything starts going crazy.
As soon as i start the Restore my NFS paths going offline and various Logs appear in the journal. Following a part from the start of the restore:
Code:
Dec 05 14:11:22 pm2 pvedaemon[28810]: <root@pam> successful auth for user 'root@pam'
Dec 05 14:11:58 pm2 kernel: usb 4-1: reset high-speed USB device number 2 using xhci_hcd
Dec 05 14:12:58 pm2 pvedaemon[28808]: <root@pam> starting task UPID:pm2:00007AAE:00075CC1:656F21D9:vzrestore:200:root@pam:
Dec 05 14:13:00 pm2 kernel: usb 4-1: reset high-speed USB device number 2 using xhci_hcd
Dec 05 14:13:09 pm2 kernel: loop0: detected capacity change from 0 to 41943040
Dec 05 14:13:10 pm2 kernel: EXT4-fs (loop0): mounted filesystem 5fbcecf2-2014-4006-a401-99abe2a1f7d6 r/w with ordered data mode. Quota mode: none.
Dec 05 14:13:13 pm2 kernel: operation not supported error, dev loop0, sector 8488 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:15 pm2 kernel: operation not supported error, dev loop0, sector 12576 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:19 pm2 kernel: operation not supported error, dev loop0, sector 16672 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:21 pm2 kernel: operation not supported error, dev loop0, sector 20768 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:23 pm2 kernel: operation not supported error, dev loop0, sector 24864 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:25 pm2 kernel: operation not supported error, dev loop0, sector 28960 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:33 pm2 kernel: operation not supported error, dev loop0, sector 33056 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:35 pm2 kernel: operation not supported error, dev loop0, sector 37152 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:37 pm2 kernel: operation not supported error, dev loop0, sector 41248 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:39 pm2 kernel: operation not supported error, dev loop0, sector 45344 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:41 pm2 kernel: operation not supported error, dev loop0, sector 49440 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:43 pm2 kernel: operation not supported error, dev loop0, sector 53536 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:51 pm2 kernel: operation not supported error, dev loop0, sector 57632 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:58 pm2 kernel: operation not supported error, dev loop0, sector 61728 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:13:59 pm2 kernel: usb 4-1: reset high-speed USB device number 2 using xhci_hcd
Dec 05 14:14:00 pm2 kernel: operation not supported error, dev loop0, sector 65824 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:14:02 pm2 kernel: operation not supported error, dev loop0, sector 69920 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 2
Dec 05 14:14:15 pm2 pvestatd[1343]: got timeout
Dec 05 14:14:17 pm2 pvestatd[1343]: got timeout
Dec 05 14:14:19 pm2 pvestatd[1343]: got timeout
Dec 05 14:14:19 pm2 pvestatd[1343]: status update time (6.377 seconds)
Dec 05 14:14:58 pm2 kernel: usb 4-1: reset high-speed USB device number 2 using xhci_hcd
Dec 05 14:16:00 pm2 kernel: usb 4-1: reset high-speed USB device number 2 using xhci_hcd
Dec 05 14:16:59 pm2 kernel: usb 4-1: reset high-speed USB device number 2 using xhci_hcd
Dec 05 14:17:01 pm2 CRON[32038]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Dec 05 14:17:01 pm2 CRON[32039]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Dec 05 14:17:01 pm2 CRON[32038]: pam_unix(cron:session): session closed for user root
Dec 05 14:17:05 pm2 kernel: nfs: server 10.10.0.220 not responding, still trying
Dec 05 14:17:05 pm2 kernel: nfs: server 10.10.0.220 not responding, still trying
Dec 05 14:17:09 pm2 kernel: INFO: task kworker/u40:1:31234 blocked for more than 120 seconds.
Dec 05 14:17:09 pm2 kernel: Tainted: P O 6.5.11-4-pve #1
Dec 05 14:03:44 pm2 spiceproxy[1401]: server closing
Dec 05 14:03:44 pm2 spiceproxy[1401]: server shutdown (r>
Dec 05 14:03:44 pm2 systemd[1]: Reloaded spiceproxy.serv>
Dec 05 14:03:44 pm2 systemd[1]: Reloading pvescheduler.s>
Dec 05 14:03:44 pm2 spiceproxy[1401]: restarting server
Dec 05 14:03:44 pm2 spiceproxy[1401]: starting 1 worker(>
Dec 05 14:03:44 pm2 spiceproxy[1401]: worker 28817 start>
Dec 05 14:03:44 pm2 spiceproxy[1401]: server closing
Dec 05 14:03:44 pm2 spiceproxy[1401]: server shutdown (restart)
Dec 05 14:03:44 pm2 systemd[1]: Reloaded spiceproxy.service - PVE SPICE Proxy Server.
Dec 05 14:03:44 pm2 systemd[1]: Reloading pvescheduler.service - Proxmox VE scheduler...
Followed by:
Code:
Dec 05 14:17:13 pm2 kernel: nfs: server 10.10.0.220 not responding, still trying
Dec 05 14:17:13 pm2 kernel: nfs: server 10.10.0.220 not responding, still trying
Dec 05 14:17:28 pm2 pvedaemon[28810]: got timeout
Dec 05 14:17:28 pm2 pvedaemon[28810]: unable to activate storage 'N2_VmStoreSSD_NFS' - directory '/mnt/pve/N2_VmStoreSSD_NFS' does not exist or is unreachable
Dec 05 14:17:29 pm2 nfsidmap[32104]: nss_getpwnam: name 'root@localdomain' does not map into domain 'seclua2.local'
Dec 05 14:17:29 pm2 nfsidmap[32105]: nss_name_to_gid: name 'root@localdomain' does not map into domain 'seclua2.local'
Dec 05 14:17:30 pm2 kernel: nfs: server 10.10.0.220 not responding, timed out
Dec 05 14:17:31 pm2 pvedaemon[28810]: got timeout
Dec 05 14:17:31 pm2 pvedaemon[28810]: unable to activate storage 'N2_Share_NFS' - directory '/mnt/pve/N2_Share_NFS' does not exist or is unreachable
Dec 05 14:17:33 pm2 pvedaemon[28810]: got timeout
Dec 05 14:17:33 pm2 pvedaemon[28810]: unable to activate storage 'N2_VmStoreHDD_NFS' - directory '/mnt/pve/N2_VmStoreHDD_NFS' does not exist or is unreachable
Dec 05 14:17:49 pm2 pvedaemon[28808]: got timeout
Dec 05 14:17:49 pm2 pvedaemon[28808]: unable to activate storage 'N2_VmStoreSSD_NFS' - directory '/mnt/pve/N2_VmStoreSSD_NFS' does not exist or is unreachable
Dec 05 14:17:51 pm2 pvedaemon[28808]: got timeout
Dec 05 14:17:51 pm2 pvedaemon[28808]: unable to activate storage 'N2_Share_NFS' - directory '/mnt/pve/N2_Share_NFS' does not exist or is unreachable
Dec 05 14:17:53 pm2 pvedaemon[28808]: got timeout
Dec 05 14:17:53 pm2 pvedaemon[28808]: unable to activate storage 'N2_VmStoreHDD_NFS' - directory '/mnt/pve/N2_VmStoreHDD_NFS' does not exist or is unreachable
Dec 05 14:17:59 pm2 kernel: usb 4-1: reset high-speed USB device number 2 using xhci_hcd
Dec 05 14:17:59 pm2 pvedaemon[28810]: got timeout
Dec 05 14:18:43 pm2 kernel: nfs: server 10.10.0.220 not responding, still trying
Dec 05 14:18:43 pm2 kernel: nfs: server 10.10.0.220 not responding, still trying
Dec 05 14:19:00 pm2 kernel: usb 4-1: reset high-speed USB device number 2 using xhci_hcd
Dec 05 14:19:10 pm2 kernel: INFO: task kworker/u40:1:31234 blocked for more than 241 seconds.
Dec 05 14:19:10 pm2 kernel: Tainted: P O 6.5.11-4-pve #1
Dec 05 14:19:10 pm2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
First i thought that the connection drops while sending an high load of data but ive tested with iperf from Site A to Site B and had no problems while stressing the 100mbit/s up/download from the remote site. Also its kinda strange that all NFS Connections from the Dest. NAS drips while the NFS path to my backup NAS has no problems at all ( NFS is identically configured on both NAS).
Can anyone give me any hint or stumbled upon similar problems?
Following the Log file from the Journal. And yes, my HDD sda has some faulty sectors but sda is not in use in any way.
Attachments
Last edited: