proxmox-backup-client gets HTTP/2.0 after 70mn, PBS server crashed?

Nov 22, 2020
80
11
13
51
Hi,

While using PBS 1.0.8 (server and client) to backup a directory with old VM images:


Code:
proxmox-backup-client backup dir1.pxar:/mnt/old --verbose
...
append chunks list len (64)
append chunks list len (64)
"dir1/vm1"
"dir1/vm1/sdb.img"
append chunks list len (64)
...
append chunks list len (64)
HTTP/2.0 connection failed
catalog upload error - channel closed
Error: broken pipe

real    70m12.967s
user    93m16.038s
sys     6m57.742s
255
Thu 04 Mar 2021 12:13:38 PM CET

sdb.img is a 150G file. /mnt/old has 718G of data mainly smaller img files and some text files.

When I looked at my PBS server it had rebooted! system logs have a big chunk of ^@^@ before boot start messages.

Tried again the same backup in case it was a transcient problem and same HTTP/2.0 and reboot.

This PBS already has 800G of data for 129 successfull backups of various stuff (VM and files) over the past weeks.


Any idea on what to look at, or logs to activate (client and on pbs) since I seem to be able to reproduce (albeit slowly) the issue?
 
Hi,
please post the task log for the corresponding backup job. You can find it under "Administration"->"Tasks".
Also what are the last entries in the journal before the hang (the ^@^@ you see)?

I must say this sounds more like a kernel hang, what filesystem are you using under the PBS datastore?
 
I've got some more information : my setup is a physical server "pcstorage1" (Atom C2550, 16G ECC RAM, 6 4TB hdd) with PVE 6.3 RAIDZ2 UEFI boot with only one VM "backup1" running PBS 1.0.8 which has 12G RAM and 10TB ext4 UEFI system (virtio-scsi)

I got the crash issue again on a large VM disk backup (500G) and:
- on the physical host dmesg had an OOM killer on the kvm proces
- on the VM memory usage for process was normal but file cache reached 10G on the 12G

I'm upgrading proxmox VE + PBS which were running 5.4.78 and tryin again.

Code:
root@pcstorage1:~# cat /proc/version
Linux version 5.4.78-2-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100)
root@pcstorage1:~# dpkg -l|grep pve-qemu-kvm
ii  pve-qemu-kvm                         5.1.0-8                      amd64        Full virtualization on x86 hardware


croot@backup1:~# cat /proc/version
Linux version 5.4.78-2-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100)

* on pcstorage1 dmesg :

[74096.260815] kthreadd invoked oom-killer: gfp_mask=0x2dc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_ZERO), order=0, oom_score_adj=0
[74096.263731] CPU: 0 PID: 2 Comm: kthreadd Tainted: P           O      5.4.78-2-pve #1
[74096.265242] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C2550D4I, BIOS P1.40 01/14/2014
[74096.266801] Call Trace:
[74096.268358]  dump_stack+0x6d/0x9a
[74096.269908]  dump_header+0x4f/0x1e1
[74096.271451]  oom_kill_process.cold.33+0xb/0x10
[74096.273006]  out_of_memory+0x1ad/0x490
[74096.274563]  __alloc_pages_slowpath+0xd40/0xe30
[74096.276125]  ? __switch_to_asm+0x40/0x70
[74096.277682]  ? __switch_to+0x85/0x480
[74096.279226]  __alloc_pages_nodemask+0x2df/0x330
[74096.280777]  alloc_pages_current+0x81/0xe0
[74096.282327]  __vmalloc_node_range+0x15a/0x270
[74096.283884]  copy_process+0x813/0x1b60
[74096.285435]  ? _do_fork+0x85/0x350
[74096.286976]  ? __switch_to_asm+0x40/0x70
[74096.288516]  ? __switch_to_asm+0x34/0x70
[74096.290028]  ? __switch_to_asm+0x40/0x70
[74096.291531]  ? __switch_to_asm+0x40/0x70
[74096.293014]  _do_fork+0x85/0x350
[74096.294487]  ? __switch_to+0x85/0x480
[74096.295956]  ? __switch_to_asm+0x40/0x70
[74096.297417]  ? __switch_to_asm+0x34/0x70
[74096.298853]  kernel_thread+0x55/0x70
[74096.300291]  ? kthread_park+0x90/0x90
[74096.301732]  kthreadd+0x2a7/0x2f0
[74096.303161]  ? kthread_create_on_cpu+0xb0/0xb0
[74096.304593]  ret_from_fork+0x35/0x40
[74096.306076] Mem-Info:
[74096.307485] active_anon:3325914 inactive_anon:14293 isolated_anon:0
                active_file:5 inactive_file:23 isolated_file:0
                unevictable:1330 dirty:3 writeback:10 unstable:0
                slab_reclaimable:7057 slab_unreclaimable:247070
                mapped:14916 shmem:14747 pagetables:8346 bounce:0
                free:34142 free_pcp:0 free_cma:0
[74096.315857] Node 0 active_anon:13303656kB inactive_anon:57172kB active_file:20kB inactive_file:92kB unevictable:5320kB isolated(anon):0kB isolated(file):0kB mapped:59664kB dirty:12kB writeback:40kB shmem:58988kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6055936kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[74096.319918] Node 0 DMA free:15888kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15888kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[74096.324312] lowmem_reserve[]: 0 1913 15936 15936 15936
[74096.325847] Node 0 DMA32 free:63896kB min:8104kB low:10128kB high:12152kB active_anon:1764200kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2071784kB managed:2004800kB mlocked:0kB kernel_stack:96kB pagetables:124kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[74096.330736] lowmem_reserve[]: 0 0 14022 14022 14022
[74096.332338] Node 0 Normal free:56784kB min:59408kB low:74260kB high:89112kB active_anon:11539456kB inactive_anon:57172kB active_file:20kB inactive_file:92kB unevictable:5320kB writepending:52kB present:14680064kB managed:14366924kB mlocked:5320kB kernel_stack:5648kB pagetables:33260kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[74096.337287] lowmem_reserve[]: 0 0 0 0 0
[74096.338917] Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
[74096.342389] Node 0 DMA32: 513*4kB (UME) 353*8kB (UME) 979*16kB (UME) 336*32kB (UME) 110*64kB (UME) 27*128kB (ME) 7*256kB (UME) 18*512kB (UME) 11*1024kB (UM) 0*2048kB 0*4096kB = 64060kB
[74096.346112] Node 0 Normal: 4719*4kB (UME) 809*8kB (UME) 1990*16kB (UME) 15*32kB (UE) 2*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 57796kB
[74096.349982] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[74096.351979] 15761 total pagecache pages
[74096.353993] 0 pages in swap cache
[74096.355980] Swap cache stats: add 0, delete 0, find 0/0
[74096.357997] Free swap  = 0kB
[74096.359992] Total swap = 0kB
[74096.361996] 4191961 pages RAM
[74096.363976] 0 pages HighMem/MovableOnly
[74096.365992] 95058 pages reserved
[74096.368001] 0 pages cma reserved
[74096.370037] 0 pages hwpoisoned
[74096.372044] Tasks state (memory values in pages):
[74096.374092] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[74096.376212] [  12085]     0 12085     7435     2063   102400        0             0 systemd-journal
[74096.378371] [  12252]     0 12252     5741      618    69632        0         -1000 systemd-udevd
[74096.380514] [  12559]   100 12559    23270      231    81920        0             0 systemd-timesyn
[74096.382685] [  12560]   106 12560     1705      357    53248        0             0 rpcbind
[74096.384858] [  12569]     0 12569     4880      409    73728        0             0 systemd-logind
[74096.387017] [  12570]     0 12570     3172      736    61440        0             0 smartd
[74096.389193] [  12571]     0 12571      535      115    40960        0         -1000 watchdog-mux
[74096.391359] [  12573]     0 12573    41689      513    73728        0             0 zed
[74096.393531] [  12576]   104 12576     2264      442    53248        0          -900 dbus-daemon
[74096.395679] [  12579]     0 12579    68958      300    73728        0             0 pve-lxc-syscall
[74096.397852] [  12585]     0 12585    56455      649    86016        0             0 rsyslogd
[74096.400003] [  12586]     0 12586     1681      321    45056        0             0 ksmtuned
[74096.402163] [  12588]     0 12588    21333      281    53248        0             0 lxcfs
[74096.404291] [  12589]     0 12589     1022      327    49152        0             0 qmeventd
[74096.406394] [  13025]     0 13025     1823      202    57344        0             0 lxc-monitord
[74096.408434] [  13054]     0 13054      568      148    40960        0             0 none
[74096.410441] [  13115]     0 13115     1722       60    53248        0             0 iscsid
[74096.412379] [  13117]     0 13117     1848     1254    57344        0           -17 iscsid
[74096.414283] [  13227]     0 13227     3962      617    69632        0         -1000 sshd
[74096.416121] [  13244]     0 13244     1402      383    45056        0             0 agetty
[74096.417975] [  13338]     0 13338   183190      648   192512        0             0 rrdcached
[74096.419772] [  13708]     0 13708   244421    16066   450560        0             0 pmxcfs
[74096.421543] [  13796]     0 13796    10868      638    81920        0             0 master
[74096.423240] [  13798]   107 13798    10968      603    81920        0             0 qmgr
[74096.424934] [  13804]     0 13804     2125      553    57344        0             0 cron
[74096.426574] [  13820]     0 13820    76236    21162   319488        0             0 pve-firewall
[74096.428159] [  13826]     0 13826    75827    20832   299008        0             0 pvestatd
[74096.429713] [  14016]     0 14016    88399    29623   409600        0             0 pvedaemon
[74096.431224] [  14025]     0 14025    84201    23836   372736        0             0 pve-ha-crm
[74096.432784] [  14109]    33 14109    88780    30474   438272        0             0 pveproxy
[74096.434209] [  14115]    33 14115    17586    12646   176128        0             0 spiceproxy
[74096.435579] [  14117]     0 14117    84101    23465   372736        0             0 pve-ha-lrm
[74096.436919] [  10409]     0 10409    90519    30241   438272        0             0 pvedaemon worke
[74096.438201] [  23935]     0 23935    90531    30513   438272        0             0 pvedaemon worke
[74096.439421] [  11047]     0 11047    90530    30397   438272        0             0 pvedaemon worke
[74096.440621] [  11342]     0 11342    21543      367    65536        0             0 pvefw-logger
[74096.441781] [  11363]    33 11363    17652    12611   172032        0             0 spiceproxy work
[74096.442949] [   5229]     0  5229  3562994  3148331 26226688        0             0 kvm
[74096.444116] [   5259]     0  5259    90530    30352   417792        0             0 task UPID:pcsto
[74096.445314] [   5261]     0  5261    82133    23633   376832        0             0 qm
[74096.446523] [   8001]    33  8001    90908    31012   434176        0             0 pveproxy worker
[74096.447731] [  31673]    33 31673    91994    31599   430080        0             0 pveproxy worker
[74096.448955] [  31698]    33 31698    90918    30891   434176        0             0 pveproxy worker
[74096.450161] [  10508]   107 10508    10956      603    81920        0             0 pickup
[74096.451381] [  16568]    33 16568    90844    30773   430080        0             0 pveproxy worker
[74096.452640] [  31100]     0 31100     1314      171    49152        0             0 sleep
[74096.453883] [  32213]     0 32213     3914     1595    65536        0             0 pvesr
[74096.455096] [  32355]     0 32355     5741      552    65536        0             0 systemd-udevd
[74096.456288] [  32356]     0 32356     5741      549    65536        0             0 systemd-udevd
[74096.457481] [  32357]     0 32357     5741      549    65536        0             0 systemd-udevd
[74096.458634] [  32358]     0 32358     5741      549    65536        0             0 systemd-udevd
[74096.459762] [  32359]     0 32359     5741      549    65536        0             0 systemd-udevd
[74096.460897] [  32360]     0 32360     5741      549    65536        0             0 systemd-udevd
[74096.462008] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/qemu.slice/100.scope,task=kvm,pid=5229,uid=0
[74096.464348] Out of memory: Killed process 5229 (kvm) total-vm:14251976kB, anon-rss:12591620kB, file-rss:1700kB, shmem-rss:4kB, UID:0 pgtables:25612kB oom_score_adj:0
[74097.626836] oom_reaper: reaped process 5229 (kvm), now anon-rss:0kB, file-rss:36kB, shmem-rss:4kB
[74106.218729] vmbr0: port 2(tap100i0) entered disabled state
[74106.237198] vmbr0: port 2(tap100i0) entered disabled state
 
Last edited:
Code:
* last cat /proc/meminfo; ps fauxwwww on the backup1 (PBS) VM

========= Fri 05 Mar 2021 08:44:34 AM CET ========
MemTotal:       12264260 kB
MemFree:          162436 kB
MemAvailable:   11107408 kB
Buffers:          230920 kB
Cached:         10694904 kB
SwapCached:           20 kB
Active:           614776 kB
Inactive:       11003788 kB
Active(anon):     360704 kB
Inactive(anon):   342732 kB
Active(file):     254072 kB
Inactive(file): 10661056 kB
Unevictable:        5320 kB
Mlocked:            5320 kB
SwapTotal:       7340028 kB
SwapFree:        7337200 kB
Dirty:           1057280 kB
Writeback:            88 kB
AnonPages:        698112 kB
Mapped:            34068 kB
Shmem:              6680 kB
KReclaimable:     355028 kB
Slab:             396300 kB
SReclaimable:     355028 kB
SUnreclaim:        41272 kB
KernelStack:        4064 kB
PageTables:         5228 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    13472156 kB
Committed_AS:     332556 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       52720 kB
VmallocChunk:          0 kB
Percpu:             1984 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      122080 kB
DirectMap2M:    12455936 kB
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         2  0.0  0.0      0     0 ?        S    07:35   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [rcu_gp]
root         4  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [rcu_par_gp]
root         6  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kworker/0:0H-kblockd]
root         9  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [mm_percpu_wq]
root        10  0.0  0.0      0     0 ?        S    07:35   0:01  \_ [ksoftirqd/0]
root        11  1.0  0.0      0     0 ?        I    07:35   0:43  \_ [rcu_sched]
root        12  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [migration/0]
root        13  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [idle_inject/0]
root        14  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [cpuhp/0]
root        15  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [cpuhp/1]
root        16  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [idle_inject/1]
root        17  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [migration/1]
root        18  2.5  0.0      0     0 ?        S    07:35   1:44  \_ [ksoftirqd/1]
root        20  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kworker/1:0H-kblockd]
root        21  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [cpuhp/2]
root        22  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [idle_inject/2]
root        23  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [migration/2]
root        24  0.2  0.0      0     0 ?        S    07:35   0:09  \_ [ksoftirqd/2]
root        26  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kworker/2:0H-kblockd]
root        27  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [cpuhp/3]
root        28  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [idle_inject/3]
root        29  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [migration/3]
root        30  0.8  0.0      0     0 ?        S    07:35   0:35  \_ [ksoftirqd/3]
root        32  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kworker/3:0H-kblockd]
root        33  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [kdevtmpfs]
root        34  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [netns]
root        35  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [rcu_tasks_kthre]
root        36  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [kauditd]
root        37  0.0  0.0      0     0 ?        I    07:35   0:03  \_ [kworker/2:1-mm_percpu_wq]
root        38  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [khungtaskd]
root        39  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [oom_reaper]
root        40  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [writeback]
root        41  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [kcompactd0]
root        42  0.0  0.0      0     0 ?        SN   07:35   0:00  \_ [ksmd]
root        43  0.0  0.0      0     0 ?        SN   07:35   0:00  \_ [khugepaged]
root        57  0.0  0.0      0     0 ?        I    07:35   0:00  \_ [kworker/1:1-events]
root        91  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kintegrityd]
root        92  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kblockd]
root        93  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [blkcg_punt_bio]
root        94  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [tpm_dev_wq]
root        95  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [ata_sff]
root        96  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [md]
root        97  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [edac-poller]
root        98  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [devfreq_wq]
root        99  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [watchdogd]
root       103  3.2  0.0      0     0 ?        S    07:35   2:15  \_ [kswapd0]
root       104  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [ecryptfs-kthrea]
root       106  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kthrotld]
root       107  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [irq/24-aerdrv]
root       108  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [irq/24-pciehp]
root       109  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [irq/25-aerdrv]
root       110  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [irq/25-pciehp]
root       111  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [irq/26-aerdrv]
root       112  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [irq/26-pciehp]
root       113  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [irq/27-aerdrv]
root       114  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [irq/27-pciehp]
root       115  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [acpi_thermal_pm]
root       116  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [nvme-wq]
root       117  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [nvme-reset-wq]
root       118  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [nvme-delete-wq]
root       119  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [ipv6_addrconf]
root       128  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kstrp]
root       129  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kworker/u9:0]
root       142  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [charger_manager]
root       198  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [scsi_eh_0]
root       199  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [scsi_tmf_0]
root       200  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [scsi_eh_1]
root       201  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [scsi_tmf_1]
root       202  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [scsi_eh_2]
root       203  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [scsi_tmf_2]
root       204  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [scsi_eh_3]
root       205  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [scsi_tmf_3]
root       206  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [scsi_eh_4]
root       207  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [scsi_tmf_4]
root       208  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [scsi_eh_5]
root       211  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [scsi_tmf_5]
root       212  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [scsi_eh_6]
root       214  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [scsi_tmf_6]
root       220  2.4  0.0      0     0 ?        I    07:35   1:41  \_ [kworker/u8:6-events_unbound]
root       231  0.3  0.0      0     0 ?        I<   07:35   0:13  \_ [kworker/3:1H-kblockd]
root       232  0.0  0.0      0     0 ?        I    07:35   0:00  \_ [kworker/2:2-events]
root       254  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kdmflush]
root       256  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [kdmflush]
root       281  0.0  0.0      0     0 ?        I<   07:35   0:03  \_ [kworker/2:1H-kblockd]
root       292  0.0  0.0      0     0 ?        I<   07:35   0:02  \_ [kworker/1:1H-kblockd]
root       293  0.0  0.0      0     0 ?        I<   07:35   0:03  \_ [kworker/0:1H-kblockd]
root       302  0.2  0.0      0     0 ?        S    07:35   0:12  \_ [jbd2/dm-1-8]
root       303  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [ext4-rsv-conver]
root       359  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [iscsi_eh]
root       361  0.0  0.0      0     0 ?        I    07:35   0:03  \_ [kworker/0:3-events]
root       364  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [rpciod]
root       365  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [xprtiod]
root       367  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [ib-comp-wq]
root       368  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [ib-comp-unb-wq]
root       369  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [ib_mcast]
root       370  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [ib_nl_sa_wq]
root       371  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [rdma_cm]
root       373  0.0  0.0      0     0 ?        S<   07:35   0:00  \_ [spl_system_task]
root       374  0.0  0.0      0     0 ?        S<   07:35   0:00  \_ [spl_delay_taskq]
root       375  0.0  0.0      0     0 ?        S<   07:35   0:00  \_ [spl_dynamic_tas]
root       376  0.0  0.0      0     0 ?        S<   07:35   0:00  \_ [spl_kmem_cache]
root       425  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [cryptd]
root       437  0.0  0.0      0     0 ?        S<   07:35   0:00  \_ [zvol]
root       442  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [arc_prune]
root       444  0.0  0.0      0     0 ?        SN   07:35   0:00  \_ [zthr_procedure]
root       445  0.0  0.0      0     0 ?        SN   07:35   0:00  \_ [zthr_procedure]
root       455  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [dbu_evict]
root       456  0.0  0.0      0     0 ?        SN   07:35   0:00  \_ [dbuf_evict]
root       489  0.0  0.0      0     0 ?        I<   07:35   0:00  \_ [ttm_swap]
root       516  0.0  0.0      0     0 ?        SN   07:35   0:00  \_ [z_vdev_file]
root       517  0.0  0.0      0     0 ?        S    07:35   0:00  \_ [l2arc_feed]
root       521  0.0  0.0      0     0 ?        I    07:35   0:00  \_ [kworker/0:4-events]
root       917  0.0  0.0      0     0 ?        I    07:41   0:02  \_ [kworker/1:0-events]
root      1261  3.0  0.0      0     0 ?        I    07:49   1:41  \_ [kworker/u8:1-events_unbound]
root      2843  0.3  0.0      0     0 ?        I    08:29   0:02  \_ [kworker/3:1-events]
root      3036  3.3  0.0      0     0 ?        I    08:34   0:19  \_ [kworker/u8:2-events_unbound]
root      3049  0.0  0.0      0     0 ?        I    08:35   0:00  \_ [kworker/3:0-events]
root      3242  2.7  0.0      0     0 ?        I    08:40   0:07  \_ [kworker/u8:0-events_power_efficient]
root      3273  0.2  0.0      0     0 ?        I    08:40   0:00  \_ [kworker/3:2-events]
root         1  0.0  0.0  22144  6136 ?        Ss   07:35   0:03 /sbin/init
root       338  0.0  0.0  29592  5612 ?        Ss   07:35   0:00 /lib/systemd/systemd-journald
root       372  0.0  0.0  23092  4304 ?        Ss   07:35   0:00 /lib/systemd/systemd-udevd
systemd+   537  0.0  0.0  93080  3620 ?        Ssl  07:35   0:00 /lib/systemd/systemd-timesyncd
_rpc       538  0.0  0.0   6820  3444 ?        Ss   07:35   0:00 /sbin/rpcbind -f -w
root       542  0.0  0.0  11832  3724 ?        Ss   07:35   0:00 /usr/sbin/smartd -n
root       543  0.0  0.0 225820  4076 ?        Ssl  07:35   0:00 /usr/sbin/rsyslogd -n -iNONE
root       545  0.0  0.0 101220  2476 ?        Ssl  07:35   0:00 /usr/sbin/zed -F
message+   546  0.0  0.0   8980  3528 ?        Ss   07:35   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root       547  0.0  0.0  19388  4436 ?        Ss   07:35   0:00 /lib/systemd/systemd-logind
root       597 12.4  0.1 301544 14028 ?        Ssl  07:35   8:30 /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-backup-api
root       616  0.0  0.0   6888   300 ?        Ss   07:35   0:00 /sbin/iscsid
root       618  0.0  0.0   7392  4980 ?        S<Ls 07:35   0:00 /sbin/iscsid
root       630  0.0  0.0  15848  4548 ?        Ss   07:35   0:00 /usr/sbin/sshd -D
root       904  0.0  0.0  16892  5940 ?        Ss   07:41   0:03  \_ sshd: root@pts/0
root       932  0.0  0.0   8236  3740 pts/0    Ss   07:41   0:00      \_ -bash
root      1002  1.7  0.0  11116  3380 pts/0    S+   07:44   1:01          \_ top
root       644  0.0  0.0   8500  2504 ?        Ss   07:35   0:00 /usr/sbin/cron -f
root       650  0.0  0.0   5608  1444 tty1     Ss+  07:35   0:00 /sbin/agetty -o -p -- \u --noclear tty1 linux
backup     676  241  5.5 2294280 675992 ?      Ssl  07:35 165:26 /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-backup-proxy
root       808  0.0  0.0  43468  3288 ?        Ss   07:36   0:00 /usr/lib/postfix/sbin/master -w
postfix    809  0.0  0.0  43816  3396 ?        S    07:36   0:00  \_ pickup -l -t unix -u -c
postfix    810  0.0  0.0  43868  3356 ?        S    07:36   0:00  \_ qmgr -l -t unix -u
root       915  0.0  0.0  21280  5248 ?        Ss   07:41   0:00 /lib/systemd/systemd --user
root       916  0.0  0.0  23108  2524 ?        S    07:41   0:00  \_ (sd-pam)
root       938  0.0  0.0   8896  2476 ?        Ss   07:42   0:03 SCREEN
root       939  0.1  0.0   6988  3404 pts/1    Ss   07:42   0:04  \_ /bin/bash
root      3427  0.0  0.0   6988  1888 pts/1    S+   08:44   0:00      \_ /bin/bash
root      3431  0.0  0.0  10780  3252 pts/1    R+   08:44   0:00      |   \_ ps fauxwwwwwww
root      3428  0.0  0.0   5260   756 pts/1    S+   08:44   0:00      \_ tee -a debug.txt
 
With 5.4.101 on both host and guest so far no issue. VM file cache size has reached 11G and has been sitting there for a while, I'm relaunching previously failed backups.
 
Are u using ZFS?

Also a Atom CPU will cause performance issues in general as it's most probably not fast enough to run cleanup jobs in a decent timeframe.

Maybe there is a cleanup job running during backups that triggers the OOM.
 
Are u using ZFS?

Also a Atom CPU will cause performance issues in general as it's most probably not fast enough to run cleanup jobs in a decent timeframe.

Maybe there is a cleanup job running during backups that triggers the OOM.

Yes as I mentionned the PVE host has RAIDZ2 with 6x4TB hdd.

Do you know in what log I can see history of cleanup jobs?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!