Hello
I have an cluster with 4 servers, 3 with 1 VM on each and no one on the last server. I configure the replcation of each VM on the empty last server each 30 mn.
At 21h today, I review the state of all of them and see that one have already the task at 05h30... the other, correctly have done the task at 21h and next task at 21h30.
I review the server that seem have an problem, and see that pvesr have crached :
Oct 21 06:03:49 b7 kernel: [256037.384220] pvesr D 0 6783 1 0x00000000
Oct 21 06:03:49 b7 kernel: [256037.384224] Call Trace:
Oct 21 06:03:49 b7 kernel: [256037.384234] __schedule+0x233/0x6f0
Oct 21 06:03:49 b7 kernel: [256037.384239] ? kmem_cache_alloc_node+0x11d/0x1b0
Oct 21 06:03:49 b7 kernel: [256037.384242] ? alloc_request_struct+0x19/0x20
Oct 21 06:03:49 b7 kernel: [256037.384245] schedule+0x36/0x80
Oct 21 06:03:49 b7 kernel: [256037.384247] schedule_timeout+0x22a/0x3f0
Oct 21 06:03:49 b7 kernel: [256037.384250] ? cpumask_next_and+0x2d/0x50
Oct 21 06:03:49 b7 kernel: [256037.384253] ? update_sd_lb_stats+0x108/0x540
Oct 21 06:03:49 b7 kernel: [256037.384256] ? ktime_get+0x41/0xb0
Oct 21 06:03:49 b7 kernel: [256037.384258] io_schedule_timeout+0xa4/0x110
Oct 21 06:03:49 b7 kernel: [256037.384262] __lock_page+0x10d/0x150
Oct 21 06:03:49 b7 kernel: [256037.384264] ? unlock_page+0x30/0x30
Oct 21 06:03:49 b7 kernel: [256037.384266] pagecache_get_page+0x19f/0x2a0
Oct 21 06:03:49 b7 kernel: [256037.384269] shmem_unused_huge_shrink+0x214/0x3b0
Oct 21 06:03:49 b7 kernel: [256037.384272] shmem_unused_huge_scan+0x20/0x30
Oct 21 06:03:49 b7 kernel: [256037.384275] super_cache_scan+0x190/0x1a0
Oct 21 06:03:49 b7 kernel: [256037.384278] shrink_slab.part.40+0x1f5/0x420
Oct 21 06:03:49 b7 kernel: [256037.384281] shrink_slab+0x29/0x30
Oct 21 06:03:49 b7 kernel: [256037.384283] shrink_node+0x108/0x320
but server have still worked normally as this time. I decide to kill the task as pvesr have a D state, and both host as VM go down... (...blocked for more than 120 seconds.)
is this an bug?
I have an cluster with 4 servers, 3 with 1 VM on each and no one on the last server. I configure the replcation of each VM on the empty last server each 30 mn.
At 21h today, I review the state of all of them and see that one have already the task at 05h30... the other, correctly have done the task at 21h and next task at 21h30.
I review the server that seem have an problem, and see that pvesr have crached :
Oct 21 06:03:49 b7 kernel: [256037.384220] pvesr D 0 6783 1 0x00000000
Oct 21 06:03:49 b7 kernel: [256037.384224] Call Trace:
Oct 21 06:03:49 b7 kernel: [256037.384234] __schedule+0x233/0x6f0
Oct 21 06:03:49 b7 kernel: [256037.384239] ? kmem_cache_alloc_node+0x11d/0x1b0
Oct 21 06:03:49 b7 kernel: [256037.384242] ? alloc_request_struct+0x19/0x20
Oct 21 06:03:49 b7 kernel: [256037.384245] schedule+0x36/0x80
Oct 21 06:03:49 b7 kernel: [256037.384247] schedule_timeout+0x22a/0x3f0
Oct 21 06:03:49 b7 kernel: [256037.384250] ? cpumask_next_and+0x2d/0x50
Oct 21 06:03:49 b7 kernel: [256037.384253] ? update_sd_lb_stats+0x108/0x540
Oct 21 06:03:49 b7 kernel: [256037.384256] ? ktime_get+0x41/0xb0
Oct 21 06:03:49 b7 kernel: [256037.384258] io_schedule_timeout+0xa4/0x110
Oct 21 06:03:49 b7 kernel: [256037.384262] __lock_page+0x10d/0x150
Oct 21 06:03:49 b7 kernel: [256037.384264] ? unlock_page+0x30/0x30
Oct 21 06:03:49 b7 kernel: [256037.384266] pagecache_get_page+0x19f/0x2a0
Oct 21 06:03:49 b7 kernel: [256037.384269] shmem_unused_huge_shrink+0x214/0x3b0
Oct 21 06:03:49 b7 kernel: [256037.384272] shmem_unused_huge_scan+0x20/0x30
Oct 21 06:03:49 b7 kernel: [256037.384275] super_cache_scan+0x190/0x1a0
Oct 21 06:03:49 b7 kernel: [256037.384278] shrink_slab.part.40+0x1f5/0x420
Oct 21 06:03:49 b7 kernel: [256037.384281] shrink_slab+0x29/0x30
Oct 21 06:03:49 b7 kernel: [256037.384283] shrink_node+0x108/0x320
but server have still worked normally as this time. I decide to kill the task as pvesr have a D state, and both host as VM go down... (...blocked for more than 120 seconds.)
is this an bug?