Blockierendes pvestatd/vgs

_Lars_

Well-Known Member
Jan 11, 2012
45
2
48
Auf einem PVE 6.0 ist mir heute folgender hängender vgs-Prozess aufgefallen:

Code:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1566  0.1  0.0 294112 35824 ?        Ss   Nov30  20:46 pvestatd
root      8392  0.0  0.0  15848  8728 ?        D    Dec07   0:00  \_ /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count

und das Kernel-Log enthält ab dem Zeitpunkt periodisch diesen Block:

Code:
[625791.751686] INFO: task vgs:8392 blocked for more than 120 seconds.
[625791.751716]       Tainted: P           O      5.0.21-5-pve #1
[625791.751732] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[625791.751751] vgs             D    0  8392   1566 0x00000000
[625791.751753] Call Trace:
[625791.751761]  __schedule+0x2d4/0x870
[625791.751762]  schedule+0x2c/0x70
[625791.751763]  schedule_timeout+0x258/0x360
[625791.751766]  ? ttwu_do_activate+0x67/0x90
[625791.751767]  wait_for_completion+0xb7/0x140
[625791.751769]  ? wake_up_q+0x80/0x80
[625791.751771]  __flush_work+0x138/0x1f0
[625791.751772]  ? worker_detach_from_pool+0xb0/0xb0
[625791.751773]  ? get_work_pool+0x40/0x40
[625791.751774]  __cancel_work_timer+0x115/0x190
[625791.751777]  ? exact_lock+0x11/0x20
[625791.751778]  cancel_delayed_work_sync+0x13/0x20
[625791.751779]  disk_block_events+0x78/0x80
[625791.751781]  __blkdev_get+0x73/0x550
[625791.751783]  ? bd_acquire+0xd0/0xd0
[625791.751784]  blkdev_get+0x10c/0x330
[625791.751785]  ? bd_acquire+0xd0/0xd0
[625791.751786]  blkdev_open+0x92/0x100
[625791.751788]  do_dentry_open+0x143/0x3a0
[625791.751789]  vfs_open+0x2d/0x30
[625791.751791]  path_openat+0x2bf/0x1570
[625791.751792]  ? filename_lookup.part.61+0xe0/0x170
[625791.751795]  ? strncpy_from_user+0x57/0x1c0
[625791.751796]  do_filp_open+0x93/0x100
[625791.751797]  ? strncpy_from_user+0x57/0x1c0
[625791.751799]  ? __alloc_fd+0x46/0x150
[625791.751800]  do_sys_open+0x177/0x280
[625791.751802]  __x64_sys_openat+0x20/0x30
[625791.751804]  do_syscall_64+0x5a/0x110
[625791.751805]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[625791.751807] RIP: 0033:0x7f63e03c11ae
[625791.751813] Code: Bad RIP value.
[625791.751813] RSP: 002b:00007ffc818d9b80 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[625791.751814] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f63e03c11ae
[625791.751815] RDX: 0000000000044000 RSI: 00005558cffbda70 RDI: 00000000ffffff9c
[625791.751815] RBP: 00007ffc818d9ce0 R08: 00005558d0000820 R09: 0000000000000000
[625791.751816] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc818dbecd
[625791.751816] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

Kann mir jemand helfen, das zu interpretieren?
 
Ist irgendwo ein LVM am laufen, dass nicht mehr erreichbar ist? FibreChannel, iSCSI o.ä.?
 
Ich hatte die Maschine Sonntag Abend rebooted und das Upgrade auf 6.1 durchgeführt.
Heute (Montag) Mittag das gleiche um 11:04 Uhr:

Code:
Dec  9 11:04:22 vhost01 kernel: [44708.952421] INFO: task vgs:2071 blocked for more than 120 seconds.
Dec  9 11:04:22 vhost01 kernel: [44708.952451]       Tainted: P           O      5.3.10-1-pve #1
Dec  9 11:04:22 vhost01 kernel: [44708.952466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  9 11:04:22 vhost01 kernel: [44708.952485] vgs             D    0  2071   1542 0x00000000
Dec  9 11:04:22 vhost01 kernel: [44708.952488] Call Trace:
Dec  9 11:04:22 vhost01 kernel: [44708.952496]  __schedule+0x2bb/0x660
Dec  9 11:04:22 vhost01 kernel: [44708.952497]  schedule+0x33/0xa0
Dec  9 11:04:22 vhost01 kernel: [44708.952499]  schedule_timeout+0x205/0x300
Dec  9 11:04:22 vhost01 kernel: [44708.952500]  wait_for_completion+0xb7/0x140
Dec  9 11:04:22 vhost01 kernel: [44708.952503]  ? wake_up_q+0x80/0x80
Dec  9 11:04:22 vhost01 kernel: [44708.952506]  __flush_work+0x131/0x1e0
Dec  9 11:04:22 vhost01 kernel: [44708.952507]  ? worker_detach_from_pool+0xb0/0xb0
Dec  9 11:04:22 vhost01 kernel: [44708.952508]  ? work_busy+0x90/0x90
Dec  9 11:04:22 vhost01 kernel: [44708.952509]  __cancel_work_timer+0x115/0x190
Dec  9 11:04:22 vhost01 kernel: [44708.952512]  ? exact_lock+0x11/0x20
Dec  9 11:04:22 vhost01 kernel: [44708.952515]  ? kobj_lookup+0xec/0x160
Dec  9 11:04:22 vhost01 kernel: [44708.952516]  cancel_delayed_work_sync+0x13/0x20
Dec  9 11:04:22 vhost01 kernel: [44708.952517]  disk_block_events+0x78/0x80
Dec  9 11:04:22 vhost01 kernel: [44708.952520]  __blkdev_get+0x73/0x550
Dec  9 11:04:22 vhost01 kernel: [44708.952521]  blkdev_get+0xe0/0x140
Dec  9 11:04:22 vhost01 kernel: [44708.952522]  ? bd_acquire+0xd0/0xd0
Dec  9 11:04:22 vhost01 kernel: [44708.952523]  blkdev_open+0x92/0x100
Dec  9 11:04:22 vhost01 kernel: [44708.952526]  do_dentry_open+0x143/0x3a0
Dec  9 11:04:22 vhost01 kernel: [44708.952527]  vfs_open+0x2d/0x30
Dec  9 11:04:22 vhost01 kernel: [44708.952529]  path_openat+0x2bf/0x1570
Dec  9 11:04:22 vhost01 kernel: [44708.952530]  ? aio_read+0xfe/0x150
Dec  9 11:04:22 vhost01 kernel: [44708.952533]  ? __do_page_fault+0x250/0x4c0
Dec  9 11:04:22 vhost01 kernel: [44708.952536]  ? mem_cgroup_try_charge+0x71/0x190
Dec  9 11:04:22 vhost01 kernel: [44708.952537]  do_filp_open+0x93/0x100
Dec  9 11:04:22 vhost01 kernel: [44708.952539]  ? strncpy_from_user+0x57/0x1b0
Dec  9 11:04:22 vhost01 kernel: [44708.952542]  ? __alloc_fd+0x46/0x150
Dec  9 11:04:22 vhost01 kernel: [44708.952543]  do_sys_open+0x177/0x280
Dec  9 11:04:22 vhost01 kernel: [44708.952544]  ? __x64_sys_io_submit+0xa9/0x190
Dec  9 11:04:22 vhost01 kernel: [44708.952545]  __x64_sys_openat+0x20/0x30
Dec  9 11:04:22 vhost01 kernel: [44708.952547]  do_syscall_64+0x5a/0x130
Dec  9 11:04:22 vhost01 kernel: [44708.952548]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec  9 11:04:22 vhost01 kernel: [44708.952550] RIP: 0033:0x7fd413a7f1ae
Dec  9 11:04:22 vhost01 kernel: [44708.952555] Code: Bad RIP value.
Dec  9 11:04:22 vhost01 kernel: [44708.952556] RSP: 002b:00007ffcc155be90 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Dec  9 11:04:22 vhost01 kernel: [44708.952558] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd413a7f1ae
Dec  9 11:04:22 vhost01 kernel: [44708.952558] RDX: 0000000000044000 RSI: 000055591d1b9a70 RDI: 00000000ffffff9c
Dec  9 11:04:22 vhost01 kernel: [44708.952559] RBP: 00007ffcc155bff0 R08: 000055591d2c4000 R09: 0000000000000000
Dec  9 11:04:22 vhost01 kernel: [44708.952559] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcc155decd
Dec  9 11:04:22 vhost01 kernel: [44708.952559] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

der Block wiederholt sich im Zweiminutentakt bis 11:22 Uhr, danach ist im "kern.log" Ruhe bis jetzt zum Shutdown der VMs.
Die beiden VMs liefen ohne Probleme weiter. Jetzt, als ich sie heruntergefahren habe, bleiben zwei Prozesse hängen:

Code:
root     26803  0.0  0.0 321868 110240 ?       S    00:19   0:00  \_ /usr/bin/perl /usr/sbin/qm cleanup 101 1 1
root     26807  0.0  0.0  15844  8708 ?        D    00:19   0:00  |   \_ /sbin/lvchange -aln /dev/pool_sas/vm-101-disk-0
root     27284  0.0  0.0 321856 110120 ?       S    00:23   0:00  \_ /usr/bin/perl /usr/sbin/qm cleanup 100 1 1
root     27290  0.0  0.0  15844  8708 ?        D    00:24   0:00      \_ /sbin/lvchange -aln /dev/pool_sas/vm-100-disk-1

Kann ich als Workaround den "pvestatd" deaktivieren? Oder zumindest die LVM-Statistik abschalten?
 
Der Server ist neu und läuft/lief knapp zwei Wochen ohne dieses Problem durch.
Jetzt kurz vor dem Problem kam ein Tandberg RDX Quickstore hinzu. Ich mounte das jeweilige Cartridge zum Backup-Zeitpunkt, hänge es danach wieder aus und lasse es mit eject -s $DEV_CARTRIDGE auswerfen.
Ich meine, daß das Problem dann nach jedem Auswerfen auftritt, aber nicht unmittelbar danach, denn von Sonntag auf Montag wurde das Cartridge gegen 4:00 Uhr ausgeworfen, also sieben Stunden vorher.
 
Last edited:
Hmm, kannst du bitte versuchen das Tandberg in die global_filters in der /etc/lvm/lvm.conf einzutragen?

AFAIU sollte ein
Code:
"r|/dev/<tandberg>|,"
am Anfang vor den anderen Regeln reichen.

Dann den Server neu starten und schauen ob es wieder auftritt.
 
  • Like
Reactions: _Lars_

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!