Guests are stopping on data transfer (scp)

noviceiii

New Member
Oct 16, 2021
10
2
3
45
Dear all

I have an issue when copying large chunks of data from one virtual machine to another. The RAMs on the source and target machines are filling-up until, disk I/O increases until the whole host system freezes. Swapping however seems to be decent. The hosts are on different SATA discs. Source is an SSD, target an HDD.

I use scp to transfer files. I can avoid this behaviour by limiting the speed of scp to around 3 MB/s (100 MB/s would be feasible).

Where would I start looking at?




1634549525330.png
 
hi,

The RAMs on the source and target machines are filling-up until, disk I/O increases until the whole host system freezes.

does that only happen between those two machines or can you reproduce it with different guests as well?

Where would I start looking at?
you could look at system logs and journals to see if there are any error/warning messages during the time of transfer or before/after the freeze happens.

/var/log/syslog and journalctl can be helpful. you should also activate persistent journaling so you can keep journals from the previous boots as well (you just need to create the directory /var/log/journal using mkdir command)
 
Hello Oguz

Thank you for your guidance.
I have something

Syslog and journal note an RRDC error right in the moment when the system freezes. (I don't know what that is, I'd need to google).
So I googled it. I understand it is related to the statistics analyzer (?). So the message isn't directly related to the freeze of the whole system. I suspect the root disk (an USB stick) isn't the best friend (?).

The last two messages in the syslog (pve-ha-crm loop take to long) would be more promising to follow up I guess. But however, this is beyond my wisdom. Any suggestions of how to proceed?

.. or I may put a cooler/ fan on the mainboards northbridge...

Kind regards
n3



tail -f /var/log/syslog
Code:
Oct 18 19:06:41 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/DiskStation-Mount: -1
Oct 18 19:06:41 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/SSD4TB: -1
Oct 18 19:06:41 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/HDD12TB: -1
Oct 18 19:06:41 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/HDD16TB: -1
Oct 18 19:06:41 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/SSD1TB: -1
Oct 18 19:07:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Oct 18 19:07:00 proxmox systemd[1]: pvesr.service: Succeeded.
Oct 18 19:07:00 proxmox systemd[1]: Finished Proxmox VE replication runner.
Oct 18 19:07:06 proxmox pveproxy[1197]: worker exit
Oct 18 19:07:06 proxmox pveproxy[1195]: worker 1197 finished
Oct 18 19:07:06 proxmox pveproxy[1195]: starting 1 worker(s)
Oct 18 19:07:06 proxmox pveproxy[1195]: worker 81797 started
Oct 18 19:07:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Oct 18 19:07:00 proxmox systemd[1]: pvesr.service: Succeeded.
Oct 18 19:07:00 proxmox systemd[1]: Finished Proxmox VE replication runner.
Oct 18 19:07:06 proxmox pveproxy[1197]: worker exit
Oct 18 19:07:06 proxmox pveproxy[1195]: worker 1197 finished
Oct 18 19:07:06 proxmox pveproxy[1195]: starting 1 worker(s)
Oct 18 19:07:06 proxmox pveproxy[1195]: worker 81797 started
Oct 18 19:07:56 proxmox pveproxy[1198]: worker exit
Oct 18 19:07:56 proxmox pveproxy[1195]: worker 1198 finished
Oct 18 19:07:56 proxmox pveproxy[1195]: starting 1 worker(s)
Oct 18 19:07:56 proxmox pveproxy[1195]: worker 81939 started
Oct 18 19:08:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Oct 18 19:08:01 proxmox systemd[1]: pvesr.service: Succeeded.
Oct 18 19:08:01 proxmox systemd[1]: Finished Proxmox VE replication runner.

Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/351: /var/lib/rrdcached/db/pve2-vm/351: illegal attempt to update using time 1634582398 when last update time is 1634582398 (minimum one second step)
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/401: -1
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/100: -1
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/101: -1
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/403: -1
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/304: -1
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/402: -1
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/405: -1
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/131: -1
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/131: /var/lib/rrdcached/db/pve2-vm/131: illegal attempt to update using time 1634582398 when last update time is 1634582398 (minimum one second step)
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/404: -1
Oct 18 20:39:59 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/302: -1
Oct 18 20:39:59 proxmox systemd[1]: pvesr.service: Succeeded.
Oct 18 20:39:59 proxmox systemd[1]: Finished Proxmox VE replication runner.
Oct 18 20:40:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Oct 18 20:40:00 proxmox systemd[1]: pvesr.service: Succeeded.
Oct 18 20:40:00 proxmox systemd[1]: Finished Proxmox VE replication runner.
Oct 18 20:40:03 proxmox pve-ha-crm[1194]: loop take too long (52 seconds)
Oct 18 20:40:03 proxmox pve-ha-lrm[1204]: loop take too long (56 seconds)


journalctl

Code:
Oct 18 20:46:32 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/HDD12TB: -1
Oct 18 20:46:32 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/SSD4TB: -1
Oct 18 20:46:32 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/DiskStation-Mount: -1
Oct 18 20:46:32 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/SSD1TB: -1
Oct 18 20:46:32 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox/HDD16TB: -1
Oct 18 20:46:37 proxmox pve-ha-lrm[1204]: loop take too long (33 seconds)
Oct 18 20:47:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Oct 18 20:47:00 proxmox systemd[1]: pvesr.service: Succeeded.
Oct 18 20:47:00 proxmox systemd[1]: Finished Proxmox VE replication runner.
Oct 18 20:47:50 proxmox pve-firewall[1161]: firewall update time (8.331 seconds)
Oct 18 20:47:50 proxmox pvestatd[1167]: status update time (8.279 seconds)
Oct 18 20:48:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Oct 18 20:48:01 proxmox systemd[1]: pvesr.service: Succeeded.
Oct 18 20:48:01 proxmox systemd[1]: Finished Proxmox VE replication runner.
Oct 18 20:48:18 proxmox smartd[723]: Device: /dev/sda [SAT], CHECK POWER STATUS spins up disk (0x81 -> 0xff)
Oct 18 20:48:18 proxmox smartd[723]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 82 to 83
Oct 18 20:48:18 proxmox smartd[723]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 64 to 62
Oct 18 20:48:18 proxmox smartd[723]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 38
Oct 18 20:49:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Oct 18 20:49:01 proxmox systemd[1]: pvesr.service: Succeeded.
Oct 18 20:49:01 proxmox systemd[1]: Finished Proxmox VE replication runner.
Oct 18 20:49:36 proxmox pve-firewall[1161]: firewall update time (23.913 seconds)
Oct 18 20:49:36 proxmox pvestatd[1167]: status update time (23.705 seconds)
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-node/proxmox: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/401: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/100: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/100: /var/lib/rrdcached/db/pve2-vm/100: illegal attempt to upd>
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/101: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/101: /var/lib/rrdcached/db/pve2-vm/101: illegal attempt to upd>
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/403: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/403: /var/lib/rrdcached/db/pve2-vm/403: illegal attempt to upd>
Oct 18 20:49:3 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/303: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/303: /var/lib/rrdcached/db/pve2-vm/303: illegal attempt to upd>
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/301: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/351: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/404: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/404: /var/lib/rrdcached/db/pve2-vm/404: illegal attempt to upd>
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/302: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/402: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/304: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/405: -1
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-vm/405: /var/lib/rrdcached/db/pve2-vm/405: illegal attempt to upd>
Oct 18 20:49:36 proxmox pmxcfs[1149]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/131: -1
 

Attachments

  • 1634576830723.png
    1634576830723.png
    63.1 KB · Views: 3
Last edited:
I did dig deeper into the hypothesis about the an insufficent root disk (which is an USB stick). So I have moved the swap file to the NVME disk.

The system doesn't freeze anylonger: I/O latency is gone, overall disc performance became really good!

Just to complete this post: a guideline of how to move swap is here:
https://www.hdmusicvideo.ro/2019/proxmox-move-swap-partition-from-usb-boot-disk-to-another-disk/

However... yes... however, it now fills up the SWAP cache (and since its no longer on the root a root partition, I have a chance to see error messages). We now have a complete memory dump.

The mentioned machine 304 at the end of the log is a ubuntu 20.04 headless. Its RAM obviously fills up when using scp (incoming) from another machine. Haven't seen that yet..

Code:
Oct 18 21:52:52 proxmox pmxcfs[1149]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/proxmox/HDD12TB: /var/lib/rrdcached/db/pve2-storage/proxmox/HDD12TB: illegal attempt to update using time 1634586772 when last update time is 1634586772 (minimum one second step)
Oct 18 21:52:53 proxmox pvedaemon[1187]: <root@pam> successful auth for user 'root@pam'
Oct 18 21:53:00 proxmox kernel: [38103.739643] kvm invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Oct 18 21:53:00 proxmox kernel: [38103.739649] CPU: 2 PID: 107607 Comm: kvm Tainted: P           O      5.11.22-5-pve #1
Oct 18 21:53:00 proxmox kernel: [38103.739651] Hardware name: Gigabyte Technology Co., Ltd. H170N-WIFI/H170N-WIFI-CF, BIOS F22e 03/09/2018
Oct 18 21:53:00 proxmox kernel: [38103.739652] Call Trace:
Oct 18 21:53:00 proxmox kernel: [38103.739654]  dump_stack+0x70/0x8b
Oct 18 21:53:00 proxmox kernel: [38103.739658]  dump_header+0x4f/0x1f6
Oct 18 21:53:00 proxmox kernel: [38103.739661]  oom_kill_process.cold+0xb/0x10
Oct 18 21:53:00 proxmox kernel: [38103.739663]  out_of_memory+0x1cf/0x520
Oct 18 21:53:00 proxmox kernel: [38103.739667]  __alloc_pages_slowpath.constprop.0+0xc6d/0xd60
Oct 18 21:53:00 proxmox kernel: [38103.739670]  __alloc_pages_nodemask+0x2e0/0x310
Oct 18 21:53:00 proxmox kernel: [38103.739673]  alloc_pages_current+0x87/0x110
Oct 18 21:53:00 proxmox kernel: [38103.739676]  pagecache_get_page+0x18a/0x3b0
Oct 18 21:53:00 proxmox kernel: [38103.739678]  filemap_fault+0x6ce/0xa10
Oct 18 21:53:00 proxmox kernel: [38103.739680]  ? alloc_set_pte+0xf6/0x650
Oct 18 21:53:00 proxmox kernel: [38103.739682]  ext4_filemap_fault+0x32/0x50
Oct 18 21:53:00 proxmox kernel: [38103.739685]  __do_fault+0x3c/0xe0
Oct 18 21:53:00 proxmox kernel: [38103.739688]  handle_mm_fault+0x12db/0x1a70
Oct 18 21:53:00 proxmox kernel: [38103.739690]  do_user_addr_fault+0x1a0/0x450
Oct 18 21:53:00 proxmox kernel: [38103.739692]  ? exit_to_user_mode_prepare+0x75/0x190
Oct 18 21:53:00 proxmox kernel: [38103.739695]  exc_page_fault+0x69/0x150
Oct 18 21:53:00 proxmox kernel: [38103.739698]  ? asm_exc_page_fault+0x8/0x30
Oct 18 21:53:00 proxmox kernel: [38103.739700]  asm_exc_page_fault+0x1e/0x30
Oct 18 21:53:00 proxmox kernel: [38103.739702] RIP: 0033:0x5628fa4a7680
Oct 18 21:53:00 proxmox kernel: [38103.739707] Code: Unable to access opcode bytes at RIP 0x5628fa4a7656.
Oct 18 21:53:00 proxmox kernel: [38103.739708] RSP: 002b:00007ffd0990f5f8 EFLAGS: 00010246
Oct 18 21:53:00 proxmox kernel: [38103.739710] RAX: 0000000000000000 RBX: 00007ffd0990f604 RCX: 0000000000000000
Oct 18 21:53:00 proxmox kernel: [38103.739711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Oct 18 21:53:00 proxmox kernel: [38103.739712] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000028
Oct 18 21:53:00 proxmox kernel: [38103.739713] R10: 0000000001210b04 R11: 00007ffd099af080 R12: 00005628fa6063d9
Oct 18 21:53:00 proxmox kernel: [38103.739714] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct 18 21:53:00 proxmox kernel: [38103.739716] Mem-Info:
Oct 18 21:53:00 proxmox kernel: [38103.739717] active_anon:5230116 inactive_anon:2604747 isolated_anon:0
Oct 18 21:53:00 proxmox kernel: [38103.739717]  active_file:2823 inactive_file:1088 isolated_file:89
Oct 18 21:53:00 proxmox kernel: [38103.739717]  unevictable:3021 dirty:2 writeback:0
Oct 18 21:53:00 proxmox kernel: [38103.739717]  slab_reclaimable:37453 slab_unreclaimable:144144
Oct 18 21:53:00 proxmox kernel: [38103.739717]  mapped:10103 shmem:12443 pagetables:22190 bounce:0
Oct 18 21:53:00 proxmox kernel: [38103.739717]  free:48877 free_pcp:0 free_cma:0
Oct 18 21:53:00 proxmox kernel: [38103.739721] Node 0 active_anon:20920464kB inactive_anon:10418988kB active_file:11292kB inactive_file:4352kB unevictable:12084kB isolated(anon):0kB isolated(file):356kB mapped:40412kB dirty:8kB writeback:0kB shmem:49772kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10883072kB writeback_tmp:0kB kernel_stack:5664kB pagetables:88760kB all_unreclaimable? yes
Oct 18 21:53:00 proxmox kernel: [38103.739724] Node 0 DMA free:11776kB min:32kB low:44kB high:56kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15888kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct 18 21:53:00 proxmox kernel: [38103.739728] lowmem_reserve[]: 0 2799 31917 31917 31917
Oct 18 21:53:00 proxmox kernel: [38103.739731] Node 0 DMA32 free:122140kB min:5924kB low:8788kB high:11652kB reserved_highatomic:0KB active_anon:2076928kB inactive_anon:699972kB active_file:44kB inactive_file:0kB unevictable:0kB writepending:0kB present:3030420kB managed:2930748kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct 18 21:53:00 proxmox kernel: [38103.739735] lowmem_reserve[]: 0 0 29117 29117 29117
Oct 18 21:53:00 proxmox kernel: [38103.739738] Node 0 Normal free:61592kB min:61624kB low:91440kB high:121256kB reserved_highatomic:0KB active_anon:18843244kB inactive_anon:9719228kB active_file:10980kB inactive_file:4256kB unevictable:12084kB writepending:0kB present:30392320kB managed:29816460kB mlocked:11944kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct 18 21:53:00 proxmox kernel: [38103.739742] lowmem_reserve[]: 0 0 0 0 0
Oct 18 21:53:00 proxmox kernel: [38103.739744] Node 0 DMA: 4*4kB (U) 2*8kB (U) 2*16kB (U) 0*32kB 3*64kB (U) 2*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11776kB
Oct 18 21:53:00 proxmox kernel: [38103.739756] Node 0 DMA32: 1869*4kB (UME) 2015*8kB (UME) 720*16kB (UME) 401*32kB (UME) 92*64kB (UME) 56*128kB (UME) 42*256kB (UME) 25*512kB (UME) 38*1024kB (UME) 0*2048kB 0*4096kB = 123468kB
Oct 18 21:53:00 proxmox kernel: [38103.739767] Node 0 Normal: 1987*4kB (UME) 675*8kB (UME) 1233*16kB (UME) 529*32kB (UME) 212*64kB (UME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63572kB
Oct 18 21:53:00 proxmox kernel: [38103.739777] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 18 21:53:00 proxmox kernel: [38103.739779] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 18 21:53:00 proxmox kernel: [38103.739780] 21161 total pagecache pages
Oct 18 21:53:00 proxmox kernel: [38103.739781] 2051 pages in swap cache
Oct 18 21:53:00 proxmox kernel: [38103.739782] Swap cache stats: add 1513965, delete 1511918, find 1459756/1570690
Oct 18 21:53:00 proxmox kernel: [38103.739783] Free swap  = 0kB
Oct 18 21:53:00 proxmox kernel: [38103.739783] Total swap = 79996kB
Oct 18 21:53:00 proxmox kernel: [38103.739784] 8359682 pages RAM
Oct 18 21:53:00 proxmox kernel: [38103.739785] 0 pages HighMem/MovableOnly
Oct 18 21:53:00 proxmox kernel: [38103.739785] 168908 pages reserved
Oct 18 21:53:00 proxmox kernel: [38103.739786] 0 pages hwpoisoned
Oct 18 21:53:00 proxmox kernel: [38103.739786] Tasks state (memory values in pages):

.. snip ..

(null),cpuset=qemu.slice,mems_allowed=0,global_oom,task_memcg=/qemu.slice/304.scope,task=kvm,pid=107607,uid=0
Oct 18 21:53:00 proxmox kernel: [38103.739934] Out of memory: Killed process 107607 (kvm) total-vm:16112304kB, anon-rss:11667428kB, file-rss:0kB, shmem-rss:4kB, UID:0 pgtables:24492kB oom_score_adj:0
Oct 18 21:53:00 proxmox kernel: [38104.034560] oom_reaper: reaped process 107607 (kvm), now anon-rss:0kB, file-rss:68kB, shmem-rss:4kB
Oct 18 21:53:00 proxmox kernel: [38104.038998] vmbr2: port 6(tap304i0) entered disabled state
Oct 18 21:53:00 proxmox kernel: [38104.039078] vmbr2: port 6(tap304i0) entered disabled state
Oct 18 21:53:00 proxmox systemd[1]: 304.scope: A process of this unit has been killed by the OOM killer.
Oct 18 21:53:00 proxmox systemd[1]: 304.scope: Succeeded.
Oct 18 21:53:00 proxmox systemd[1]: 304.scope: Consumed 9min 2.782s CPU time.
 
The system doesn't freeze anylonger: I/O latency is gone, overall disc performance became really good!
thats good

However... yes... however, it now fills up the SWAP cache (and since its no longer on the root a root partition, I have a chance to see error messages). We now have a complete memory dump.

The mentioned machine 304 at the end of the log is a ubuntu 20.04 headless. Its RAM obviously fills up when using scp (incoming) from another machine. Haven't seen that yet..
can you post the VM configuration? qm config VMID

how big are the files you're copying via scp?
 
thats good


can you post the VM configuration? qm config VMID

how big are the files you're copying via scp?

Thanks for asking back.
The files are more or less 1GB.

Worth to mention: I had been playing around with network card and disc controller but the behaviour hasn't changed no matter what type I have used.

Code:
boot: order=scsi0;ide2;net0
cores: 4
ide2: none,media=cdrom
memory: 8640
name: DB-srvr
net0: rtl8139=xx:xx:xx:xx:xx:xx,bridge=vmbr2
numa: 0
onboot: 1
ostype: l26
scsi0: SSD4TB:304/vm-304-disk-0.qcow2,cache=unsafe,size=50G
smbios1: uuid=f8aeb64d-0ea9-47ee-a40a-6934199d40b3
sockets: 2
startup: order=7,up=20
virtio1: HDD16TB:304/vm-304-disk-0.qcow2,cache=unsafe,discard=on,iothread=1,size=14000G
vmgenid: 969251e0-5796-450a-8ddf-0055c7f9f453
 
how much memory do you have on the host?

and how does the memory load look on average?

do you have a lot of VMs running at the same time?

also you have cache=unsafe setting on your disk, that increases in-memory caching so i would try without that option as well.
 
I have 32GB RAM in total. The load average is 60 to 70%.
There are 8 concurrent running VMs in which, an average of 5 GB RAM is assigned, hardly over 50% load average. The VM 301 is an exception with a bit more RAM assigned. I haven't seen the other VM using more RAM simultaneously.

I'll adjust the cache=unsafe setting but however, I have had the same issues with other disk settings.

I might investigate on ubuntu 20.04 to see, if there is a memory problem with the latest versions, maybe related to disk or network i/o.

Node Total RAM:
1634736389329.png
 
Last edited:
>I suspect the root disk (an USB stick) isn't the best friend (?).

no, especially not when using ZFS.

if you get hangs or hiccups and suspect storage to be the cultprit, you can nicely inspect storage latency with

watch -d zpool iostat -wp $zpool-name

and

zpool iostat $zpool-name -lv 1

to get the whole picture.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!