Restoring a VM causes whole system to slow to a halt

shadow wizard

New Member
Apr 17, 2024
8
0
1
When I restore a VM (32 GB Virtual drive, onto a Sata SSD, from network location) my IO delay will go up to over 80% (After the restore is 100% complete) and stay there for 30 min, making the rest of the system totally unusable.
There is plenty of free RAM on the system, and plenty of free CPU power, but it grinds the whole system to a hault. None of the other VM's are useable.
Ideas?
 
Can you look in node - > System log -> and focus on when you start the restore and when it finishes. What does it say? i have this issue when eth1 was crashing due to being overloaded on backups.
 
AHA! Here is an part of it. hopefully this will help you find the issue.. (PS. There is a LOT of this)
Oct 10 23:27:12 proliantproxmox kernel: file_write_and_wait_range+0x90/0xc0
Oct 10 23:27:12 proliantproxmox kernel: blkdev_fsync+0x36/0x60
Oct 10 23:27:12 proliantproxmox kernel: vfs_fsync_range+0x45/0xa0
Oct 10 23:27:12 proliantproxmox kernel: blkdev_write_iter+0x2a5/0x330
Oct 10 23:27:12 proliantproxmox kernel: io_write+0xe3/0x3a0
Oct 10 23:27:12 proliantproxmox kernel: io_issue_sqe+0x6f/0x6d0
Oct 10 23:27:12 proliantproxmox kernel: io_wq_submit_work+0xd2/0x330
Oct 10 23:27:12 proliantproxmox kernel: io_worker_handle_work+0x142/0x570
Oct 10 23:27:12 proliantproxmox kernel: io_wq_worker+0x12d/0x3e0
Oct 10 23:27:12 proliantproxmox kernel: ? finish_task_switch.isra.0+0x9c/0x340
Oct 10 23:27:12 proliantproxmox kernel: ? __pfx_io_wq_worker+0x10/0x10
Oct 10 23:27:12 proliantproxmox kernel: ret_from_fork+0x47/0x70
Oct 10 23:27:12 proliantproxmox kernel: ? __pfx_io_wq_worker+0x10/0x10
Oct 10 23:27:12 proliantproxmox kernel: ret_from_fork_asm+0x1a/0x30
Oct 10 23:27:12 proliantproxmox kernel: RIP: 0033:0x0
Oct 10 23:27:12 proliantproxmox kernel: RSP: 002b:0000000000000000 EFLAGS: 00000246 ORIG_RAX: 000000000000010f
Oct 10 23:27:12 proliantproxmox kernel: RAX: 0000000000000000 RBX: 00007b5a6c98b840 RCX: 00007b5a6fbbc9ee
Oct 10 23:27:12 proliantproxmox kernel: RDX: 00007ffe99386390 RSI: 0000000000000053 RDI: 0000636ddafa6400
Oct 10 23:27:12 proliantproxmox kernel: RBP: 00007ffe993863fc R08: 0000000000000008 R09: 0000000000000000
Oct 10 23:27:12 proliantproxmox kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000000004aa0a81e
Oct 10 23:27:12 proliantproxmox kernel: R13: 0000636dd9f13af0 R14: 0000636dcddf8d70 R15: 00007ffe99386400
Oct 10 23:27:12 proliantproxmox kernel: </TASK>
Oct 10 23:27:12 proliantproxmox kernel: Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
 
Just like the guy in the last post, I am kinda new to all this. Interesting enough I have a server similar to his. What exactly do you suggest I try, and what to set it to? I am about to head to bed for the evening, but will try the settings tomorrow.
 
Try to limit bandwidth for clone/backup/restore at datacenter level, it should help. Test that and see if there is any notable change, go to data center -> options -> max workers / bulk set to 1 -> Bandwidth limit, to 25mb, test the restore and see if it bricks the system.
 
Try to limit bandwidth for clone/backup/restore at datacenter level, it should help. Test that and see if there is any notable change, go to data center -> options -> max workers / bulk set to 1 -> Bandwidth limit, to 25mb, test the restore and see if it bricks the system.
The problem with that solution is basically you are saying "Try this and see if it shuts down all your VM's that run your stuff for 30-45 min." I am sure you can understand that isn't practical/desirable. Unless I am misunderstanding, and those numbers are the ones that people have found work well in this circumstance. I am sure you can understand that trial and error just isn't practical. I am really trying not to be rude here (Autism, having trouble finding the right wording) and I am thankful for your response and attempt to assist. So I was hoping to find a more concrete solution then "Try this and see if that works, now reduce the numbers more, and do ti again, and more, and do it again"
Again, please, I am REALLY trying not to be rude, I just can't find the right words (And maybe I am misreading it and its not really rude at all) so please forgive me if I am a bit rude.
I guess in a nutshell, is this the ONLY solution, trial and error?