Backup 'zstd: /*stdout*\: Input/output error'

Asano

Well-Known Member
Jan 27, 2018
57
10
48
44
In one of my server I frequently run into failed backups on a CIFS location and since this is the case for several month now I thought I start a new thread, also there already is this old thread from another user but probably with a similar problem.

What I can say for sure is that this is not a problem due to a limit of some sort on the backup destination. I tried various Storage Boxes with a lot of free TB and all had the same issue.

What I also can say for sure is I only see and saw this problem on one specific server. All other servers (even in the same cluster and with the same storage as backup location and the same backup job) never had it even once.

This is why I could imagine that the NIC, CPU or another pieces of hardware is responsible...

The server in question has a Xeon E-2176G CPU, an Intel I219-LM (rev 10) NIC and is currently running Linux 5.4.128-1-pve with pve-manager/6.4-13.

The failed backups always get to 100% but with a `zstd` error somewhere at the end (often it is also after the `INFO: 100%` line):

Code:
INFO:  99% (119.8 GiB of 120.0 GiB) in 9m 30s, read: 1.1 GiB/s, write: 0 B/s
zstd: /*stdout*\: Input/output error
INFO: 100% (120.0 GiB of 120.0 GiB) in 9m 31s, read: 250.7 MiB/s, write: 136.0 KiB/s
INFO: backup is sparse: 37.29 GiB (31%) total zero data
INFO: transferred 120.00 GiB in 571 seconds (215.2 MiB/s)
ERROR: Backup of VM 126 failed - zstd --rsyncable --threads=1 failed - wrong exit status 1

Are there any things I could try to troubleshoot further or maybe work around it?

Thanks for any input!
 
Last edited:
So I did a lot of tests with `memtester`. Always between 10 and 20 GB and many different runs since this way no downtime was required and I thought sooner or later I'd also catch an error if there was a problem (64 GB ECC RAM are installed in total). However, no error came up.

Anyways, this problem seems a litte too diffuse to me and since the server is also already nearly 2 years old I'll simply switch it in the next days. That's usually a very fast procedure and probably a lot faster than troubleshooting this further ;-)

Still if anybody has another idea how to troubleshoot this (preferably without downtime) I could still try it for some days.
 
So now just few days (maybe coincidence) after I replaced the server with the frequent zstd: /*stdout*\: Input/output error I just saw the error the first time on an other older server in the cluster. So currently the new server (which has quite heavy load and also inherited most from the former 'problematic' server) works fine and so far completed every backup without issue (and that happend already so often that the old server would have had multiple errors).
But another server with quite low load which worked for over two years now had the zstd: /*stdout*\: Input/output error the first time.

I do see quite a lot of dmesg entries to a time which should match the error and which seem CIFS and/or low memory related. But I'm not sure if they show the issue or are just a side effect. (The server is a Hetzner AX51-NVMe which also has 64 GB ECC memory and there were always at least 30 GB free during the backup. But ZFS is limited to 8GB on this server. Just in case this matters...)

@dietmar can you maybe look over the dmesg and see if you can make something out of it and if I maybe can work around the problem with other CIFS and/or memory settings?

dmesg:
Code:
[Mon Sep  6 01:39:58 2021] kworker/u64:2: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[Mon Sep  6 01:39:58 2021] CPU: 2 PID: 3976956 Comm: kworker/u64:2 Tainted: P           O      5.11.22-3-pve #1
[Mon Sep  6 01:39:58 2021] Hardware name: Hetzner /B450D4U-V1L, BIOS L1.02W 07/09/2020
[Mon Sep  6 01:39:58 2021] Workqueue: writeback wb_workfn (flush-cifs-1)
[Mon Sep  6 01:39:58 2021] Call Trace:
[Mon Sep  6 01:39:58 2021]  dump_stack+0x70/0x8b
[Mon Sep  6 01:39:58 2021]  warn_alloc.cold+0x7b/0xdf
[Mon Sep  6 01:39:58 2021]  __alloc_pages_slowpath.constprop.0+0xd35/0xd60
[Mon Sep  6 01:39:58 2021]  ? aesni_set_key+0x194/0x1e0 [aesni_intel]
[Mon Sep  6 01:39:58 2021]  ? kernel_fpu_begin_mask+0x24/0xb0
[Mon Sep  6 01:39:58 2021]  __alloc_pages_nodemask+0x2e0/0x310
[Mon Sep  6 01:39:58 2021]  alloc_pages_current+0x87/0x110
[Mon Sep  6 01:39:58 2021]  kmalloc_order+0x2d/0xa0
[Mon Sep  6 01:39:58 2021]  kmalloc_order_trace+0x1d/0x80
[Mon Sep  6 01:39:58 2021]  __kmalloc+0x212/0x250
[Mon Sep  6 01:39:58 2021]  ? crypt_message+0x31e/0x870 [cifs]
[Mon Sep  6 01:39:58 2021]  crypt_message+0x3a4/0x870 [cifs]
[Mon Sep  6 01:39:58 2021]  ? _crng_backtrack_protect+0x56/0xa0
[Mon Sep  6 01:39:58 2021]  smb3_init_transform_rq+0x26c/0x2f0 [cifs]
[Mon Sep  6 01:39:58 2021]  smb_send_rqst+0xe9/0x1a0 [cifs]
[Mon Sep  6 01:39:58 2021]  cifs_call_async+0x125/0x280 [cifs]
[Mon Sep  6 01:39:58 2021]  ? alloc_path_with_tree_prefix+0x170/0x170 [cifs]
[Mon Sep  6 01:39:58 2021]  smb2_async_writev+0x2ae/0x5c0 [cifs]
[Mon Sep  6 01:39:58 2021]  ? validate_t2+0x70/0x70 [cifs]
[Mon Sep  6 01:39:58 2021]  cifs_writepages+0x8c6/0xd60 [cifs]
[Mon Sep  6 01:39:58 2021]  do_writepages+0x38/0xc0
[Mon Sep  6 01:39:58 2021]  ? __schedule+0x2d2/0x880
[Mon Sep  6 01:39:58 2021]  __writeback_single_inode+0x44/0x2b0
[Mon Sep  6 01:39:58 2021]  writeback_sb_inodes+0x223/0x4d0
[Mon Sep  6 01:39:58 2021]  __writeback_inodes_wb+0x56/0xf0
[Mon Sep  6 01:39:58 2021]  wb_writeback+0x1dd/0x290
[Mon Sep  6 01:39:58 2021]  wb_workfn+0x309/0x500
[Mon Sep  6 01:39:58 2021]  process_one_work+0x220/0x3c0
[Mon Sep  6 01:39:58 2021]  worker_thread+0x53/0x420
[Mon Sep  6 01:39:58 2021]  kthread+0x12f/0x150
[Mon Sep  6 01:39:58 2021]  ? process_one_work+0x3c0/0x3c0
[Mon Sep  6 01:39:58 2021]  ? __kthread_bind_mask+0x70/0x70
[Mon Sep  6 01:39:58 2021]  ret_from_fork+0x22/0x30
[Mon Sep  6 01:39:58 2021] Mem-Info:
[Mon Sep  6 01:39:58 2021] active_anon:780362 inactive_anon:4511174 isolated_anon:0
                            active_file:127296 inactive_file:8400600 isolated_file:0
                            unevictable:41696 dirty:1297865 writeback:3319
                            slab_reclaimable:28448 slab_unreclaimable:380880
                            mapped:58524 shmem:18127 pagetables:13055 bounce:0
                            free:156988 free_pcp:164 free_cma:0
[Mon Sep  6 01:39:58 2021] Node 0 active_anon:3121448kB inactive_anon:18044696kB active_file:509184kB inactive_file:33602400kB unevictable:166784kB isolated(anon):0kB isolated(file):0kB mapped:234096kB dirty:5191460kB writeback:13276kB shmem:72508kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 19924992kB writeback_tmp:0kB kernel_stack:9376kB pagetables:52220kB all_unreclaimable? no
[Mon Sep  6 01:39:58 2021] Node 0 DMA free:11808kB min:16kB low:28kB high:40kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Mon Sep  6 01:39:58 2021] lowmem_reserve[]: 0 3647 64163 64163 64163
[Mon Sep  6 01:39:58 2021] Node 0 DMA32 free:244528kB min:3840kB low:7572kB high:11304kB reserved_highatomic:0KB active_anon:16kB inactive_anon:74096kB active_file:168kB inactive_file:3508840kB unevictable:0kB writepending:3508820kB present:3857396kB managed:3857396kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Mon Sep  6 01:39:58 2021] lowmem_reserve[]: 0 0 60515 60515 60515
[Mon Sep  6 01:39:58 2021] Node 0 Normal free:371616kB min:63724kB low:125692kB high:187660kB reserved_highatomic:2048KB active_anon:3121432kB inactive_anon:17970700kB active_file:509016kB inactive_file:30093868kB unevictable:166784kB writepending:1696776kB present:63163904kB managed:61975656kB mlocked:166700kB bounce:0kB free_pcp:668kB local_pcp:0kB free_cma:0kB
[Mon Sep  6 01:39:58 2021] lowmem_reserve[]: 0 0 0 0 0
[Mon Sep  6 01:39:58 2021] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11808kB
[Mon Sep  6 01:39:58 2021] Node 0 DMA32: 1145*4kB (UM) 358*8kB (M) 210*16kB (UME) 64*32kB (UME) 17*64kB (UM) 5*128kB (UM) 2*256kB (UE) 3*512kB (ME) 1*1024kB (E) 1*2048kB (M) 55*4096kB (M) = 244980kB
[Mon Sep  6 01:39:58 2021] Node 0 Normal: 2762*4kB (UEH) 31297*8kB (UMEH) 6918*16kB (UMEH) 25*32kB (UH) 10*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 373552kB
[Mon Sep  6 01:39:58 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Mon Sep  6 01:39:58 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Mon Sep  6 01:39:58 2021] 8550461 total pagecache pages
[Mon Sep  6 01:39:58 2021] 0 pages in swap cache
[Mon Sep  6 01:39:58 2021] Swap cache stats: add 0, delete 0, find 0/0
[Mon Sep  6 01:39:58 2021] Free swap  = 0kB
[Mon Sep  6 01:39:58 2021] Total swap = 0kB
[Mon Sep  6 01:39:58 2021] 16759322 pages RAM
[Mon Sep  6 01:39:58 2021] 0 pages HighMem/MovableOnly
[Mon Sep  6 01:39:58 2021] 297083 pages reserved
[Mon Sep  6 01:39:58 2021] 0 pages hwpoisoned
[Mon Sep  6 01:39:58 2021] CIFS: VFS: \\uXXX.your-storagebox.de crypt_message: Failed to init sg
[Mon Sep  6 01:39:58 2021] CIFS: VFS: \\uXXX.your-storagebox.de crypt_message: Failed to init sg
[Mon Sep  6 01:39:58 2021] CIFS: VFS: \\uXXX.your-storagebox.de crypt_message: Failed to init sg

Thanks in advance!
 
In one of my server I frequently run into failed backups on a CIFS location and since this is the case for several month now I thought I start a new thread, also there already is this old thread from another user but probably with a similar problem.

What I can say for sure is that this is not a problem due to a limit of some sort on the backup destination. I tried various Storage Boxes with a lot of free TB and all had the same issue.

What I also can say for sure is I only see and saw this problem on one specific server. All other servers (even in the same cluster and with the same storage as backup location and the same backup job) never had it even once.

This is why I could imagine that the NIC, CPU or another pieces of hardware is responsible...

The server in question has a Xeon E-2176G CPU, an Intel I219-LM (rev 10) NIC and is currently running Linux 5.4.128-1-pve with pve-manager/6.4-13.

The failed backups always get to 100% but with a `zstd` error somewhere at the end (often it is also after the `INFO: 100%` line):

Code:
INFO:  99% (119.8 GiB of 120.0 GiB) in 9m 30s, read: 1.1 GiB/s, write: 0 B/s
zstd: /*stdout*\: Input/output error
INFO: 100% (120.0 GiB of 120.0 GiB) in 9m 31s, read: 250.7 MiB/s, write: 136.0 KiB/s
INFO: backup is sparse: 37.29 GiB (31%) total zero data
INFO: transferred 120.00 GiB in 571 seconds (215.2 MiB/s)
ERROR: Backup of VM 126 failed - zstd --rsyncable --threads=1 failed - wrong exit status 1

Are there any things I could try to troubleshoot further or maybe work around it?

Thanks for any input!
I have the same problem, did you manage to solve it somehow?
 
Not really unfortunately. We migrated to a Proxmox Backup server instead. And there we don't see any of those problems with the exact same cluster (also the backup server's storage is also kind of slow HDDs ).
 
  • Like
Reactions: WIld_Tigra
I had the same problem.
The solution is to set in the /etc/vzdump.conf file:
bwlimit: 400000
The value must be selected empirically.
This value does not degrade performance, but it does prevent this error from occurring.
I don't know why, but it works. :rolleyes::)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!