Valid SMB/CIFS timing out and other issues

jaytee129

Member
Jun 16, 2022
132
10
18
I have NAS drive connected to proxmox via SMB/CIFS and can see the NAS drive, copy to it and vice versa from Windows VM, but OS copies and promox backups fail every attempt.

The NAS is connected via VPN running at 15/10 Mbps (up/down). So, yes, there may be occasional latency issues, etc.

Attached are screenshot showing steady read and a steady write from/to NAS drive using a Windows VM, followed by examples here of errors I get when

1) I do OS "cp" from NAS to OS directory:

Code:
Jul 29 09:45:02 thibworldpx3 pvestatd[1516]: got timeout
Jul 29 09:45:02 thibworldpx3 pvestatd[1516]: unable to activate storage 'NASDrive' - directory '/mnt/pve/NASDrive' does not exist or is unreachable
Jul 29 09:45:12 thibworldpx3 pvestatd[1516]: got timeout
Jul 29 09:45:12 thibworldpx3 pvestatd[1516]: unable to activate storage 'NASDrive' - directory '/mnt/pve/NASDrive' does not exist or is unreachable
Jul 29 09:45:22 thibworldpx3 pvestatd[1516]: got timeout
Jul 29 09:45:22 thibworldpx3 pvestatd[1516]: unable to activate storage 'NASDrive' - directory '/mnt/pve/NASDrive' does not exist or is unreachable
Jul 29 09:45:32 thibworldpx3 pvestatd[1516]: got timeout
Jul 29 09:45:32 thibworldpx3 pvestatd[1516]: unable to activate storage 'NASDrive' - directory '/mnt/pve/NASDrive' does not exist or is unreachable


2) while doing backups (goes on like this for hours)

Code:
Jul 28 21:13:24 thibworldpx3 pvestatd[1515]: unable to activate storage 'NASDrive' - directory '/mnt/pve/NASDrive' does not exist or is unreachable
Jul 28 21:14:20 thibworldpx3 pvestatd[1515]: got timeout
Jul 28 21:14:21 thibworldpx3 pvestatd[1515]: status update time (50.580 seconds)
Jul 28 21:14:25 thibworldpx3 pvestatd[1515]: got timeout
Jul 28 21:14:25 thibworldpx3 pvestatd[1515]: unable to activate storage 'NASDrive' - directory '/mnt/pve/NASDrive' does not exist or is unreachable
Jul 28 21:14:26 thibworldpx3 kernel: [ 1059.951163] CIFS: VFS: \\192.168.0.90 Error -32 sending data on socket to server
Jul 28 21:14:26 thibworldpx3 kernel: [ 1060.369565] CIFS: VFS: \\192.168.0.90 Error -32 sending data on socket to server
Jul 28 21:14:27 thibworldpx3 kernel: [ 1060.728531] CIFS: VFS: \\192.168.0.90 Error -32 sending data on socket to server
Jul 28 21:14:27 thibworldpx3 kernel: [ 1061.130805] CIFS: VFS: \\192.168.0.90 Error -32 sending data on socket to server
Jul 28 21:14:27 thibworldpx3 kernel: [ 1061.502907] CIFS: VFS: \\192.168.0.90 Error -104 sending data on socket to server
Jul 28 21:14:28 thibworldpx3 kernel: [ 1061.876677] CIFS: VFS: \\192.168.0.90 Error -104 sending data on socket to server
Jul 28 21:14:28 thibworldpx3 kernel: [ 1062.283689] CIFS: VFS: \\192.168.0.90 Error -32 sending data on socket to server
Jul 28 21:14:29 thibworldpx3 kernel: [ 1062.674979] CIFS: VFS: \\192.168.0.90 Error -104 sending data on socket to server
Jul 28 21:14:29 thibworldpx3 kernel: [ 1063.072636] CIFS: VFS: \\192.168.0.90 Error -32 sending data on socket to server
Jul 28 21:14:29 thibworldpx3 kernel: [ 1063.454835] CIFS: VFS: \\192.168.0.90 Error -104 sending data on socket to server
Jul 28 21:14:31 thibworldpx3 kernel: [ 1065.134686] __smb_send_rqst: 3 callbacks suppressed

Then I got these (repeated multiple times):
Code:
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597124] INFO: task gzip:7787 blocked for more than 120 seconds.
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597139]       Tainted: P           O      5.15.39-1-pve #1
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597145] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597151] task:gzip            state:D stack:    0 pid: 7787 ppid:  7635 flags:0x00004000
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597159] Call Trace:
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597162]  <TASK>
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597167]  __schedule+0x33d/0x1750
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597178]  ? sysvec_apic_timer_interrupt+0x4e/0x90
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597185]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597194]  schedule+0x4e/0xb0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597199]  io_schedule+0x46/0x70
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597203]  wait_on_page_bit_common+0x114/0x3e0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597213]  ? filemap_invalidate_unlock_two+0x40/0x40
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597219]  wait_on_page_bit+0x3f/0x50
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597225]  wait_on_page_writeback+0x26/0x80
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597232]  cifs_writepages+0x5ed/0xc40 [cifs]
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597320]  do_writepages+0xd5/0x210
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597325]  ? __mod_memcg_lruvec_state+0x63/0xe0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597334]  ? __mod_lruvec_state+0x37/0x40
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597341]  filemap_fdatawrite_wbc+0x81/0xd0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597348]  filemap_write_and_wait_range+0x72/0xd0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597355]  cifs_flush+0x5c/0xe0 [cifs]
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597433]  filp_close+0x37/0x70
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597438]  close_fd+0x38/0x50
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597444]  __x64_sys_close+0x12/0x40
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597448]  do_syscall_64+0x5c/0xc0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597453]  ? exit_to_user_mode_prepare+0x37/0x1b0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597459]  ? syscall_exit_to_user_mode+0x27/0x50
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597465]  ? __x64_sys_close+0x12/0x40
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597469]  ? do_syscall_64+0x69/0xc0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597473]  ? exit_to_user_mode_prepare+0x8c/0x1b0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597479]  ? irqentry_exit_to_user_mode+0x9/0x20
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597484]  ? irqentry_exit+0x19/0x30
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597489]  ? exc_page_fault+0x89/0x160
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597494]  ? asm_exc_page_fault+0x8/0x30
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597501]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597508] RIP: 0033:0x7fc0cdeff6c3
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597512] RSP: 002b:00007ffce78abab8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597519] RAX: ffffffffffffffda RBX: 00007ffce78abca8 RCX: 00007fc0cdeff6c3
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597522] RDX: 00000000000004aa RSI: 000055f3b1d32fc0 RDI: 0000000000000001
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597526] RBP: 0000000000000002 R08: 0000000000000007 R09: 000055f3b1d42180
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597529] R10: fffffffffffff287 R11: 0000000000000246 R12: 000055f3b1cece75
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597532] R13: 0000000000000000 R14: 000055f3b1d40100 R15: 0000000000000006
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597538]  </TASK>
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597542] INFO: task kworker/3:2:19170 blocked for more than 120 seconds.
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597550]       Tainted: P           O      5.15.39-1-pve #1
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597555] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597561] task:kworker/3:2     state:D stack:    0 pid:19170 ppid:     2 flags:0x00004000
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597569] Workqueue: cifsiod cifs_writev_complete [cifs]
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597640] Call Trace:
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597641]  <TASK>
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597644]  __schedule+0x33d/0x1750
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597652]  schedule+0x4e/0xb0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597656]  io_schedule+0x46/0x70
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597660]  wait_on_page_bit_common+0x114/0x3e0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597668]  ? filemap_invalidate_unlock_two+0x40/0x40
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597674]  __lock_page+0x4c/0x50
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597680]  cifs_writev_complete+0x37c/0x810 [cifs]
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597751]  process_one_work+0x22b/0x3d0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597757]  worker_thread+0x53/0x410
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597761]  ? process_one_work+0x3d0/0x3d0
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597766]  kthread+0x12a/0x150
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597772]  ? set_kthread_struct+0x50/0x50
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597779]  ret_from_fork+0x22/0x30
Jul 28 22:37:28 thibworldpx3 kernel: [ 6042.597788]  </TASK>

The failure is not seen by the backup process. It just keeps going. I can't even "Stop" it. I also have to unlock the VM to do anything with it, and I've had to reboot as various things remain stuck.

Is there a wait to make proxmox OS and backups more resilient to remote CIFS share by extending timeouts and/or other parameter changes? And is there a way to make proxmox backup give up/stop/release lock itself at some point when things aren't working? (it was still running 10 hours after I started it)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!