CIFS input/output error on large backups (>10GB)

hiddenman

Member
Mar 10, 2021
7
1
8
45
Hi there,

We have a CIFS share (mounted by Proxmox itself, not by /usr/bin/mount):

Code:
cifs: hetzner-backup                                                                                                                                    
        path /mnt/pve/hetzner-backup                                                                                                                    
server XXX.your-storagebox.de                                                                                                              
share backup                                                                                                                                    
content backup                                                                                                                                  
prune-backups keep-all=1                                                                                                                        
username XXX

It worked fine (although it always lost the connection and we had to manually umount the share every few days). Now it just can't create the backup with this error:

Code:
INFO: zstd: /*stdout*\: Input/output error
INFO: cleanup temporary 'vzdump' snapshot
ERROR: Backup of VM 10001 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/var/tmp/vzdumptmp1483_10001' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd --rsyncable '--threads=1' >/mnt/pve/hetzner-backup/dump/vzdump-lxc-10001-2021_03_10-13_00_40.tar.dat' failed: exit code 1
INFO: Failed at 2021-03-10 13:11:58
INFO: Backup job finished with errors
TASK ERROR: job errors

dmesg show this problem in the kernel:

Code:
[4932106.761169] CIFS VFS: \\XXX.your-storagebox.de crypt_message: Failed to init sg
[4932127.161455] kworker/u64:1: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[4932127.161459] CPU: 3 PID: 18221 Comm: kworker/u64:1 Tainted: P           O      5.4.78-2-pve #1
[4932127.161460] Hardware name: Hetzner /B450D4U-V1L, BIOS L1.02W 07/09/2020
[4932127.161465] Workqueue: writeback wb_workfn (flush-cifs-5)
[4932127.161466] Call Trace:
[4932127.161472]  dump_stack+0x6d/0x9a
[4932127.161474]  warn_alloc.cold.119+0x7b/0xdd
[4932127.161476]  __alloc_pages_slowpath+0xde9/0xe30
[4932127.161478]  ? ttwu_do_activate+0x5a/0x70
[4932127.161484]  ? aes_set_key_common+0x54/0x90 [aesni_intel]
[4932127.161487]  ? aes_set_key_common+0x54/0x90 [aesni_intel]
[4932127.161489]  ? aes_set_key+0x1a/0x20 [aesni_intel]
[4932127.161490]  __alloc_pages_nodemask+0x2df/0x330
[4932127.161492]  alloc_pages_current+0x81/0xe0
[4932127.161493]  kmalloc_order+0x1e/0x80
[4932127.161494]  kmalloc_order_trace+0x24/0xb0
[4932127.161495]  ? crypto_ccm_setkey+0x8d/0xb0 [ccm]
[4932127.161497]  __kmalloc+0x227/0x280
[4932127.161513]  ? crypt_message+0x2a7/0x810 [cifs]
[4932127.161524]  crypt_message+0x327/0x810 [cifs]
[4932127.161526]  ? _crng_backtrack_protect+0x56/0x70
[4932127.161527]  ? crng_backtrack_protect+0x44/0x50
[4932127.161539]  smb3_init_transform_rq+0x279/0x300 [cifs]
[4932127.161551]  smb_send_rqst+0xfb/0x1c0 [cifs]
[4932127.161562]  cifs_call_async+0x116/0x270 [cifs]
[4932127.161574]  ? SMB2_sess_establish_session.isra.14+0x150/0x150 [cifs]
[4932127.161585]  smb2_async_writev+0x254/0x5d0 [cifs]
[4932127.161587]  ? __mod_lruvec_state+0x49/0x110
[4932127.161596]  ? cifs_echo_callback+0x70/0x70 [cifs]
[4932127.161608]  cifs_writepages+0x82e/0xb70 [cifs]
[4932127.161618]  ? cifs_writepages+0x82e/0xb70 [cifs]
[4932127.161620]  do_writepages+0x41/0xd0
[4932127.161621]  ? check_preempt_curr+0x68/0x90
[4932127.161622]  ? ttwu_do_wakeup+0x1e/0x150
[4932127.161623]  __writeback_single_inode+0x40/0x350
[4932127.161624]  writeback_sb_inodes+0x209/0x4a0
[4932127.161625]  __writeback_inodes_wb+0x66/0xd0
[4932127.161626]  wb_writeback+0x25b/0x2f0
[4932127.161627]  wb_workfn+0x308/0x490
[4932127.161629]  ? __switch_to_asm+0x40/0x70
[4932127.161630]  ? __switch_to_asm+0x34/0x70
[4932127.161631]  ? __switch_to_asm+0x40/0x70
[4932127.161631]  ? __switch_to_asm+0x34/0x70
[4932127.161632]  ? __switch_to_asm+0x40/0x70
[4932127.161633]  ? __switch_to_asm+0x34/0x70
[4932127.161635]  ? __switch_to+0x85/0x480
[4932127.161636]  ? __schedule+0x2ee/0x6f0
[4932127.161637]  process_one_work+0x20f/0x3d0
[4932127.161638]  worker_thread+0x34/0x400
[4932127.161639]  kthread+0x120/0x140
[4932127.161640]  ? process_one_work+0x3d0/0x3d0
[4932127.161641]  ? kthread_park+0x90/0x90
[4932127.161641]  ret_from_fork+0x22/0x40
[4932127.161653] Mem-Info:
[4932127.161656] active_anon:3498853 inactive_anon:324903 isolated_anon:0
active_file:50208 inactive_file:2540688 isolated_file:0
unevictable:1330 dirty:713365 writeback:2034 unstable:0
slab_reclaimable:1038371 slab_unreclaimable:859459
mapped:55070 shmem:1047470 pagetables:16975 bounce:0
free:1585426 free_pcp:107 free_cma:0
[4932127.161660] Node 0 active_anon:13995412kB inactive_anon:1299612kB active_file:200832kB inactive_file:10162752kB unevictable:5320kB isolated(anon):0k
B isolated(file):0kB mapped:220280kB dirty:2853460kB writeback:8136kB shmem:4189880kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
unstable:0kB all_unreclaimable? no
[4932127.161661] Node 0 DMA free:15904kB min:16kB low:28kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[4932127.161663] lowmem_reserve[]: 0 3662 64186 64186 64186
[4932127.161664] Node 0 DMA32 free:245928kB min:3852kB low:7600kB high:11348kB active_anon:282832kB inactive_anon:0kB active_file:8kB inactive_file:30191
76kB unevictable:0kB writepending:0kB present:3857396kB managed:3857396kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0
kB free_cma:0kB
[4932127.161665] lowmem_reserve[]: 0 0 60523 60523 60523
[4932127.161666] Node 0 Normal free:6079872kB min:63708kB low:125684kB high:187660kB active_anon:13712580kB inactive_anon:1299612kB active_file:200824kB
inactive_file:7144268kB unevictable:5320kB writepending:2861596kB present:63163904kB managed:61984836kB mlocked:5320kB kernel_stack:10240kB pagetables:67
900kB bounce:0kB free_pcp:428kB local_pcp:0kB free_cma:0kB
[4932127.161668] lowmem_reserve[]: 0 0 0 0 0
[4932127.161669] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
[4932127.161672] Node 0 DMA32: 1508*4kB (UME) 2353*8kB (UME) 1415*16kB (UME) 2011*32kB (UME) 333*64kB (UME) 29*128kB (UME) 10*256kB (UME) 10*512kB (UE) 1
5*1024kB (UME) 30*2048kB (UME) 6*4096kB (UM) = 245928kB
[4932127.161675] Node 0 Normal: 2015*4kB (UME) 36074*8kB (UE) 107502*16kB (UE) 127013*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB
= 6081100kB
[4932127.161679] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[4932127.161680] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[4932127.161680] 3609512 total pagecache pages
[4932127.161681] 0 pages in swap cache
[4932127.161681] Swap cache stats: add 0, delete 0, find 0/0
[4932127.161681] Free swap  = 0kB
[4932127.161682] Total swap = 0kB
[4932127.161682] 16759322 pages RAM
[4932127.161682] 0 pages HighMem/MovableOnly
[4932127.161682] 294788 pages reserved
[4932127.161683] 0 pages cma reserved
[4932127.161683] 0 pages hwpoisoned

Not sure what causes this and what to do. Kernel bug? Google does not found anything relevant, only old errors from the past.



Our version is:

pve-manager/6.3-3/eee5f901 (running kernel: 5.4.78-2-pve)

Any ideas?

Thank you.
 
We are experiencing the exact same problem.
I hope it's not a provider related issue because we are using the same one...

Did you find any solution?
 
We are now experiencing the very same issue. We are also hosting on a Hetzner server. Yesterday we had one failed backup, today it is already two. Did anyone have any progress on finding a solution for the issue? A comment in this reddit post suggests migrating to NFS.
 
We are now experiencing the very same issue. We are also hosting on a Hetzner server. Yesterday we had one failed backup, today it is already two. Did anyone have any progress on finding a solution for the issue? A comment in this reddit post suggests migrating to NFS.

Hetzner doesn’t offer NFS for their StorageBoxes.
We ended with setting up a Promox Backup Server.
And I have to admit: Best decision ever.
 
  • Like
Reactions: _oskar

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!