VM shut down suddenly with Ceph Storage 3 nodes in Proxmox

WXW

New Member
Nov 16, 2022
1
0
1
A virtual machine on the host suddenly shuts down, and other virtual machines run normally

The Ceph Status was alawys OK heathly
The accident occurred in 2022 Nov 16 08:07:52
find log show:
Nov 16 08:05:01 WL-980V-L1 systemd[1]: Finished Proxmox VE replication runner.
Nov 16 08:06:00 WL-980V-L1 systemd[1]: Starting Proxmox VE replication runner...
Nov 16 08:06:01 WL-980V-L1 systemd[1]: pvesr.service: Succeeded.
Nov 16 08:06:01 WL-980V-L1 systemd[1]: Finished Proxmox VE replication runner.
Nov 16 08:06:22 WL-980V-L1 pmxcfs[2654]: [status] notice: received log
Nov 16 08:06:51 WL-980V-L1 pmxcfs[2654]: [status] notice: received log
Nov 16 08:06:52 WL-980V-L1 systemd[1]: Created slice User Slice of UID 0.
Nov 16 08:06:52 WL-980V-L1 systemd[1]: Starting User Runtime Directory /run/user/0...
Nov 16 08:06:52 WL-980V-L1 systemd[1]: Finished User Runtime Directory /run/user/0.
Nov 16 08:06:52 WL-980V-L1 systemd[1]: Starting User Manager for UID 0...
Nov 16 08:06:52 WL-980V-L1 systemd[2719930]: gpgconf: error running '/usr/lib/gnupg/scdaemon': probably not installed
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Queued start job for default target Main User Target.
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Created slice User Application Slice.
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Reached target Paths.
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Reached target Timers.
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Listening on GnuPG network certificate management daemon.
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Listening on GnuPG cryptographic agent and passphrase cache.
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Reached target Sockets.
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Reached target Basic System.
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Reached target Main User Target.
Nov 16 08:06:52 WL-980V-L1 systemd[2719925]: Startup finished in 187ms.
Nov 16 08:06:52 WL-980V-L1 systemd[1]: Started User Manager for UID 0.
Nov 16 08:06:52 WL-980V-L1 systemd[1]: Started Session 809 of user root.
Nov 16 08:07:00 WL-980V-L1 systemd[1]: Starting Proxmox VE replication runner...
Nov 16 08:07:01 WL-980V-L1 systemd[1]: pvesr.service: Succeeded.
Nov 16 08:07:01 WL-980V-L1 systemd[1]: Finished Proxmox VE replication runner.
Nov 16 08:07:52 WL-980V-L1 pvedaemon[2877]: <root@pam> starting task UPID:WL-980V-L1:002981D1:0F484E79:637429D8:qmreboot:102:root@pam:
Nov 16 08:07:52 WL-980V-L1 pvedaemon[2720209]: requesting reboot of VM 102: UPID:WL-980V-L1:002981D1:0F484E79:637429D8:qmreboot:102:root@pam:
Nov 16 08:08:00 WL-980V-L1 systemd[1]: Starting Proxmox VE replication runner...
Nov 16 08:08:01 WL-980V-L1 systemd[1]: pvesr.service: Succeeded.
Nov 16 08:08:01 WL-980V-L1 systemd[1]: Finished Proxmox VE replication runner.
Nov 16 08:08:27 WL-980V-L1 pvedaemon[12714]: <root@pam> starting task UPID:WL-980V-L1:0029825A:0F485C3C:637429FB:hastop:102:root@pam:
Nov 16 08:08:28 WL-980V-L1 pvedaemon[12714]: <root@pam> end task UPID:WL-980V-L1:0029825A:0F485C3C:637429FB:hastop:102:root@pam: OK
Nov 16 08:08:30 WL-980V-L1 pve-ha-lrm[2720348]: stopping service vm:102 (timeout=60)
Nov 16 08:08:30 WL-980V-L1 pve-ha-lrm[2720350]: shutdown VM 102: UPID:WL-980V-L1:0029825E:0F485D3C:637429FE:qmshutdown:102:root@pam:
Nov 16 08:08:30 WL-980V-L1 pve-ha-lrm[2720348]: <root@pam> starting task UPID:WL-980V-L1:0029825E:0F485D3C:637429FE:qmshutdown:102:root@pam:
Nov 16 08:08:35 WL-980V-L1 pve-ha-lrm[2720348]: Task 'UPID:WL-980V-L1:0029825E:0F485D3C:637429FE:qmshutdown:102:root@pam:' still active, waiting
Nov 16 08:08:40 WL-980V-L1 pve-ha-lrm[2720350]: can't lock file '/var/lock/qemu-server/lock-102.conf' - got timeout
Nov 16 08:08:40 WL-980V-L1 pve-ha-lrm[2720348]: <root@pam> end task UPID:WL-980V-L1:0029825E:0F485D3C:637429FE:qmshutdown:102:root@pam: can't lock file '/var/lock/qemu-server/lock-102.conf' - got timeout
Nov 16 08:08:40 WL-980V-L1 pve-ha-lrm[2720348]: unable to stop stop service vm:102 (still running)
Nov 16 08:08:50 WL-980V-L1 pve-ha-lrm[2720432]: service vm:102 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
Nov 16 08:08:52 WL-980V-L1 pvedaemon[2720209]: VM quit/powerdown failed - got timeout
Nov 16 08:08:52 WL-980V-L1 pvedaemon[2877]: <root@pam> end task UPID:WL-980V-L1:002981D1:0F484E79:637429D8:qmreboot:102:root@pam: VM quit/powerdown failed - got timeout
Nov 16 08:09:00 WL-980V-L1 systemd[1]: Starting Proxmox VE replication runner...
Nov 16 08:09:01 WL-980V-L1 systemd[1]: pvesr.service: Succeeded.
Nov 16 08:09:01 WL-980V-L1 systemd[1]: Finished Proxmox VE replication runner.
Nov 16 08:09:14 WL-980V-L1 pvedaemon[2720564]: requesting reboot of VM 102: UPID:WL-980V-L1:00298334:0F486EBA:63742A2A:qmreboot:102:root@pam:
Nov 16 08:09:14 WL-980V-L1 pvedaemon[2877]: <root@pam> starting task UPID:WL-980V-L1:00298334:0F486EBA:63742A2A:qmreboot:102:root@pam:
Nov 16 08:10:00 WL-980V-L1 systemd[1]: Starting Proxmox VE replication runner...
Nov 16 08:10:01 WL-980V-L1 systemd[1]: pvesr.service: Succeeded.
Nov 16 08:10:01 WL-980V-L1 systemd[1]: Finished Proxmox VE replication runner.
Nov 16 08:10:07 WL-980V-L1 QEMU[16945]: kvm: terminating on signal 15 from pid 2259 (/usr/sbin/qmeventd)
Nov 16 08:10:07 WL-980V-L1 kernel: [2564070.060451] fwbr102i0: port 2(tap102i0) entered disabled state
Nov 16 08:10:07 WL-980V-L1 kernel: [2564070.097214] fwbr102i0: port 1(fwln102i0) entered disabled state
Nov 16 08:10:07 WL-980V-L1 kernel: [2564070.097355] vmbr0: port 5(fwpr102p0) entered disabled state
Nov 16 08:10:07 WL-980V-L1 kernel: [2564070.099554] device fwln102i0 left promiscuous mode
Nov 16 08:10:07 WL-980V-L1 kernel: [2564070.099561] fwbr102i0: port 1(fwln102i0) entered disabled state
Nov 16 08:10:07 WL-980V-L1 kernel: [2564070.128991] device fwpr102p0 left promiscuous mode
Nov 16 08:10:07 WL-980V-L1 kernel: [2564070.128997] vmbr0: port 5(fwpr102p0) entered disabled state
Nov 16 08:10:08 WL-980V-L1 systemd[1]: 102.scope: Succeeded.
Nov 16 08:10:08 WL-980V-L1 systemd[1]: 102.scope: Consumed 3w 2d 2h 37min 2.563s CPU time.
Nov 16 08:10:08 WL-980V-L1 systemd[1]: session-809.scope: Succeeded.
Nov 16 08:10:08 WL-980V-L1 systemd[1]: session-809.scope: Consumed 1.681s CPU time.
Nov 16 08:10:08 WL-980V-L1 pmxcfs[2654]: [status] notice: received log
Nov 16 08:10:08 WL-980V-L1 qmeventd[2720796]: Starting cleanup for 102
Nov 16 08:10:08 WL-980V-L1 qmeventd[2720796]: trying to acquire lock...
Nov 16 08:10:08 WL-980V-L1 pmxcfs[2654]: [status] notice: received log
Nov 16 08:10:09 WL-980V-L1 qmeventd[2720796]: OK
Nov 16 08:10:09 WL-980V-L1 qmeventd[2720796]: Finished cleanup for 102
Nov 16 08:10:09 WL-980V-L1 qmeventd[2720796]: Restarting VM 102
Nov 16 08:10:09 WL-980V-L1 pvedaemon[2877]: <root@pam> end task UPID:WL-980V-L1:00298334:0F486EBA:63742A2A:qmreboot:102:root@pam: OK
Nov 16 08:10:09 WL-980V-L1 qm[2720796]: <root@pam> starting task UPID:WL-980V-L1:00298420:0F4883E0:63742A61:hastart:102:root@pam:
Nov 16 08:10:09 WL-980V-L1 qmeventd[2720796]: Requesting HA start for VM 102
Nov 16 08:10:09 WL-980V-L1 systemd[1]: Started Session 811 of user root.
Nov 16 08:10:09 WL-980V-L1 qmeventd[2720796]: service 'vm:102' in error state, must be disabled and fixed first
Nov 16 08:10:09 WL-980V-L1 qm[2720800]: command 'ha-manager set vm:102 --state started' failed: exit code 255
Nov 16 08:10:09 WL-980V-L1 qmeventd[2720796]: command 'ha-manager set vm:102 --state started' failed: exit code 255
Nov 16 08:10:09 WL-980V-L1 qm[2720796]: <root@pam> end task UPID:WL-980V-L1:00298420:0F4883E0:63742A61:hastart:102:root@pam: command 'ha-manager set vm:102 --state started' failed: exit code 255
Nov 16 08:10:09 WL-980V-L1 qm[2720806]: VM 102 qmp command failed - VM 102 not running
Nov 16 08:10:09 WL-980V-L1 systemd[1]: session-811.scope: Succeeded.
Nov 16 08:10:09 WL-980V-L1 pmxcfs[2654]: [status] notice: received log
Nov 16 08:10:18 WL-980V-L1 pvedaemon[2877]: <root@pam> starting task UPID:WL-980V-L1:00298451:0F4887C3:63742A6A:hastart:102:root@pam:
Nov 16 08:10:19 WL-980V-L1 pvedaemon[2720849]: command 'ha-manager set vm:102 --state started' failed: exit code 255
Nov 16 08:10:19 WL-980V-L1 pvedaemon[2877]: <root@pam> end task UPID:WL-980V-L1:00298451:0F4887C3:63742A6A:hastart:102:root@pam: command 'ha-manager set vm:102 --state started' failed: exit code 255
Nov 16 08:10:19 WL-980V-L1 systemd[1]: Stopping User Manager for UID 0...
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Stopped target Main User Target.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Stopped target Basic System.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Stopped target Paths.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Stopped target Sockets.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Stopped target Timers.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: dirmngr.socket: Succeeded.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Closed GnuPG network certificate management daemon.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: gpg-agent-browser.socket: Succeeded.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: gpg-agent-extra.socket: Succeeded.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: gpg-agent-ssh.socket: Succeeded.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: gpg-agent.socket: Succeeded.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Closed GnuPG cryptographic agent and passphrase cache.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Removed slice User Application Slice.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Reached target Shutdown.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: systemd-exit.service: Succeeded.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Finished Exit the Session.
Nov 16 08:10:19 WL-980V-L1 systemd[2719925]: Reached target Exit the Session.
Nov 16 08:10:19 WL-980V-L1 systemd[1]: user@0.service: Succeeded.
Nov 16 08:10:19 WL-980V-L1 systemd[1]: Stopped User Manager for UID 0.
Nov 16 08:10:20 WL-980V-L1 systemd[1]: Stopping User Runtime Directory /run/user/0...
Nov 16 08:10:20 WL-980V-L1 systemd[1]: run-user-0.mount: Succeeded.
Nov 16 08:10:20 WL-980V-L1 systemd[1]: user-runtime-dir@0.service: Succeeded.
Nov 16 08:10:20 WL-980V-L1 systemd[1]: Stopped User Runtime Directory /run/user/0.
Nov 16 08:10:20 WL-980V-L1 systemd[1]: Removed slice User Slice of UID 0.
Nov 16 08:10:20 WL-980V-L1 systemd[1]: user-0.slice: Consumed 2.522s CPU time.
Nov 16 08:10:53 WL-980V-L1 pmxcfs[2654]: [status] notice: received log
Nov 16 08:10:54 WL-980V-L1 systemd[1]: Created slice User Slice of UID 0.
Nov 16 08:10:54 WL-980V-L1 systemd[1]: Starting User Runtime Directory /run/user/0...
Nov 16 08:10:54 WL-980V-L1 systemd[1]: Finished User Runtime Directory /run/user/0.
Nov 16 08:10:54 WL-980V-L1 systemd[1]: Starting User Manager for UID 0...
Nov 16 08:10:54 WL-980V-L1 systemd[2721027]: gpgconf: error running '/usr/lib/gnupg/scdaemon': probably not installed
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Queued start job for default target Main User Target.
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Created slice User Application Slice.
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Reached target Paths.
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Reached target Timers.
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Listening on GnuPG network certificate management daemon.
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Listening on GnuPG cryptographic agent and passphrase cache.
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Reached target Sockets.
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Reached target Basic System.
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Reached target Main User Target.
Nov 16 08:10:54 WL-980V-L1 systemd[2721022]: Startup finished in 160ms.
Nov 16 08:10:54 WL-980V-L1 systemd[1]: Started User Manager for UID 0.
Nov 16 08:10:54 WL-980V-L1 systemd[1]: Started Session 812 of user root.
Nov 16 08:10:55 WL-980V-L1 qm[2721045]: VM 102 qmp command failed - VM 102 not running
Nov 16 08:10:55 WL-980V-L1 systemd[1]: session-812.scope: Succeeded.
Nov 16 08:10:55 WL-980V-L1 pmxcfs[2654]: [status] notice: received log
Nov 16 08:11:00 WL-980V-L1 systemd[1]: Starting Proxmox VE replication runner...
Nov 16 08:11:01 WL-980V-L1 systemd[1]: pvesr.service: Succeeded.
Nov 16 08:11:01 WL-980V-L1 systemd[1]: Finished Proxmox VE replication runner.
Nov 16 08:11:05 WL-980V-L1 systemd[1]: Stopping User Manager for UID 0...
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Stopped target Main User Target.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Stopped target Basic System.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Stopped target Paths.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Stopped target Sockets.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Stopped target Timers.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: dirmngr.socket: Succeeded.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Closed GnuPG network certificate management daemon.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: gpg-agent-browser.socket: Succeeded.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: gpg-agent-extra.socket: Succeeded.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: gpg-agent-ssh.socket: Succeeded.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: gpg-agent.socket: Succeeded.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Closed GnuPG cryptographic agent and passphrase cache.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Removed slice User Application Slice.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Reached target Shutdown.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: systemd-exit.service: Succeeded.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Finished Exit the Session.
Nov 16 08:11:05 WL-980V-L1 systemd[2721022]: Reached target Exit the Session.
Nov 16 08:11:05 WL-980V-L1 systemd[1]: user@0.service: Succeeded.
Nov 16 08:11:05 WL-980V-L1 systemd[1]: Stopped User Manager for UID 0.
Nov 16 08:11:05 WL-980V-L1 systemd[1]: Stopping User Runtime Directory /run/user/0...
Nov 16 08:11:05 WL-980V-L1 systemd[1]: run-user-0.mount: Succeeded.
Nov 16 08:11:05 WL-980V-L1 systemd[1]: user-runtime-dir@0.service: Succeeded.
Nov 16 08:11:05 WL-980V-L1 systemd[1]: Stopped User Runtime Directory /run/user/0.
Nov 16 08:11:05 WL-980V-L1 systemd[1]: Removed slice User Slice of UID 0.
Nov 16 08:11:12 WL-980V-L1 pvedaemon[2876]: <root@pam> starting task UPID:WL-980V-L1:00298571:0F489CCB:63742AA0:hastart:102:root@pam:
Nov 16 08:11:13 WL-980V-L1 pvedaemon[2721137]: command 'ha-manager set vm:102 --state started' failed: exit code 255
Nov 16 08:11:13 WL-980V-L1 pvedaemon[2876]: <root@pam> end task UPID:WL-980V-L1:00298571:0F489CCB:63742AA0:hastart:102:root@pam: command 'ha-manager set vm:102 --state started' failed: exit code 255
Nov 16 08:12:00 WL-980V-L1 systemd[1]: Starting Proxmox VE replication runner...
Nov 16 08:12:01 WL-980V-L1 systemd[1]: pvesr.service: Succeeded.
Nov 16 08:12:01 WL-980V-L1 systemd[1]: Finished Proxmox VE replication runner.
 

Attachments

  • 故障图片.jpg
    故障图片.jpg
    587.4 KB · Views: 15