Problem mit dem pve-ha-lrm blocked

  • Thread starter Deleted member 34654
  • Start date
D

Deleted member 34654

Guest
Guten Morgen,

was muss es für Gegebenheiten geben damit diese "Syslog" Fehlermeldung ausgespuckt wird vom PVE V5.

Syslog Meldung:

pve-ha-lrm:4019 blocked for more than 120 sec.

Gruß

Markus
 
Hi,

pve-ha-lrm:4019 blocked for more than 120 sec.

Hmm, normal müsste hier der lrm daemon komplett hängen, sodass der Kernel das merkt und die Meldung ausgibt.
Wenn HA benützt wird sollte dann normal der Watchdog triggern und den Node zurücksetzten.

Etwas mehr Kontext vom Log wäre hier nett, könntest du bitte das Journal vom Zeitpunkt des Geschehens posten, z.B. mit:
Code:
journalctl
# oder gleich schon Zeit Bereich einschränken:
journalctl --since "2017-10-19 15:00" --until "2017-10-19 18:17"
 
Hi,

danke schon mal für deine Hilfe. Anbei das Log die Fehler treten genau 18 Minuten nach Beginn des Backups auf. (Bei dem Beispiel)

Code:
-- Logs begin at Thu 2017-10-19 21:59:01 CEST, end at Fri 2017-10-20 11:38:01 CEST. --
Oct 20 00:30:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:30:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:30:01 prx1 CRON[31710]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 00:30:01 prx1 CRON[31711]: (root) CMD (/opt/router/make-if-list)
Oct 20 00:30:01 prx1 CRON[31710]: pam_unix(cron:session): session closed for user root
Oct 20 00:30:12 prx1 sudo[31931]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:30:12 prx1 sudo[31931]: pam_unix(sudo:session): session closed for user root
Oct 20 00:30:17 prx1 sudo[32030]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:30:17 prx1 sudo[32030]: pam_unix(sudo:session): session closed for user root
Oct 20 00:30:22 prx1 sudo[32126]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:30:22 prx1 sudo[32126]: pam_unix(sudo:session): session closed for user root
Oct 20 00:30:27 prx1 sudo[32237]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:30:27 prx1 sudo[32237]: pam_unix(sudo:session): session closed for user root
Oct 20 00:30:32 prx1 sudo[32333]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:30:32 prx1 sudo[32333]: pam_unix(sudo:session): session closed for user root
Oct 20 00:31:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:31:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:32:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:32:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:32:36 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 00:33:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:33:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:34:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:34:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:34:01 prx1 CRON[3826]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 00:34:01 prx1 CRON[3827]: (root) CMD (/opt/router/smartcheck)
Oct 20 00:34:01 prx1 CRON[3826]: pam_unix(cron:session): session closed for user root
Oct 20 00:35:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:35:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:35:01 prx1 CRON[4896]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 00:35:01 prx1 CRON[4898]: (root) CMD (/opt/router/make-if-list)
Oct 20 00:35:01 prx1 CRON[4896]: pam_unix(cron:session): session closed for user root
Oct 20 00:35:55 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 00:36:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:36:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:37:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:37:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:38:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:38:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:38:32 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 00:39:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:39:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:39:06 prx1 liblogging-stdlog[1972]: -- MARK --
Oct 20 00:40:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:40:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:40:01 prx1 CRON[10047]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 00:40:01 prx1 CRON[10048]: (root) CMD (/opt/router/make-if-list)
Oct 20 00:40:01 prx1 CRON[10047]: pam_unix(cron:session): session closed for user root
Oct 20 00:41:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:41:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:42:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:42:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:42:01 prx1 pveproxy[23231]: worker exit
Oct 20 00:42:01 prx1 pveproxy[2477]: worker 23231 finished
Oct 20 00:42:01 prx1 pveproxy[2477]: starting 1 worker(s)
Oct 20 00:42:01 prx1 pveproxy[2477]: worker 12188 started
Oct 20 00:43:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:43:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:44:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:44:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:44:01 prx1 CRON[14146]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 00:44:01 prx1 CRON[14147]: (root) CMD (/opt/router/smartcheck)
Oct 20 00:44:02 prx1 CRON[14146]: pam_unix(cron:session): session closed for user root
Oct 20 00:44:25 prx1 pvedaemon[2468]: <root@pam> successful auth for user 'root@pam'
Oct 20 00:45:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:45:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:45:01 prx1 CRON[15170]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 00:45:01 prx1 CRON[15171]: (root) CMD (/opt/router/make-if-list)
Oct 20 00:45:01 prx1 CRON[15170]: pam_unix(cron:session): session closed for user root
Oct 20 00:45:12 prx1 sudo[15430]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:45:12 prx1 sudo[15430]: pam_unix(sudo:session): session closed for user root
Oct 20 00:45:17 prx1 sudo[15520]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:45:17 prx1 sudo[15520]: pam_unix(sudo:session): session closed for user root
Oct 20 00:45:22 prx1 sudo[15615]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:45:22 prx1 sudo[15615]: pam_unix(sudo:session): session closed for user root
Oct 20 00:45:27 prx1 sudo[15712]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:45:27 prx1 sudo[15712]: pam_unix(sudo:session): session closed for user root
Oct 20 00:45:32 prx1 sudo[15800]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 00:45:33 prx1 sudo[15800]: pam_unix(sudo:session): session closed for user root
Oct 20 00:46:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:46:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:47:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:47:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:47:36 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 00:48:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:48:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:49:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:49:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:49:06 prx1 liblogging-stdlog[1972]: -- MARK --
Oct 20 00:50:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:50:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:50:01 prx1 CRON[20405]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 00:50:01 prx1 CRON[20406]: (root) CMD (/opt/router/make-if-list)
Oct 20 00:50:01 prx1 CRON[20405]: pam_unix(cron:session): session closed for user root
Oct 20 00:50:55 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 00:51:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:51:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:52:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:52:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:53:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:53:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:53:32 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 00:54:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:54:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:55:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:55:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:55:01 prx1 CRON[25498]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 00:55:01 prx1 CRON[25499]: (root) CMD (/opt/router/make-if-list)
Oct 20 00:55:01 prx1 CRON[25498]: pam_unix(cron:session): session closed for user root
Oct 20 00:56:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:56:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:57:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:57:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:57:17 prx1 pveproxy[12188]: worker exit
Oct 20 00:57:17 prx1 pveproxy[2477]: worker 12188 finished
Oct 20 00:57:17 prx1 pveproxy[2477]: starting 1 worker(s)
Oct 20 00:57:17 prx1 pveproxy[2477]: worker 27867 started
Oct 20 00:58:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:58:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:59:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 00:59:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 00:59:06 prx1 liblogging-stdlog[1972]: -- MARK --
Oct 20 00:59:07 prx1 rrdcached[2429]: flushing old values
Oct 20 00:59:07 prx1 rrdcached[2429]: rotating journals
Oct 20 00:59:07 prx1 rrdcached[2429]: started new journal /var/lib/rrdcached/journal/rrd.journal.1508453947.655856
Oct 20 00:59:07 prx1 rrdcached[2429]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1508446747.655735
Oct 20 00:59:07 prx1 pmxcfs[2433]: [dcdb] notice: data verification successful
Oct 20 00:59:25 prx1 pvedaemon[2468]: <root@pam> successful auth for user 'root@pam'
Oct 20 01:00:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:00:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:00:01 prx1 CRON[30504]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:00:01 prx1 CRON[30506]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:00:01 prx1 CRON[30505]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:00:01 prx1 CRON[30507]: (root) CMD (vzdump --mailnotification always --quiet 1 --node prx1 --mailto support@xxx.xxx --storage data-back --all 1 --mode suspend --compress lzo)
Oct 20 01:00:01 prx1 CRON[30508]: (root) CMD (#vzdump --mailto support@xxx.xxx --storage dataraid --all 1 --mode suspend --compress lzo --mailnotification always --quiet 1)
Oct 20 01:00:01 prx1 CRON[30509]: (root) CMD (/opt/router/make-if-list)
Oct 20 01:00:01 prx1 CRON[30504]: pam_unix(cron:session): session closed for user root
Oct 20 01:00:01 prx1 CRON[30506]: pam_unix(cron:session): session closed for user root
Oct 20 01:00:01 prx1 vzdump[30511]: <root@pam> starting task UPID:prx1:00007782:00109698:59E92E71:vzdump::root@pam:
Oct 20 01:00:01 prx1 vzdump[30594]: INFO: starting new backup job: vzdump --mailnotification always --mode suspend --node prx1 --all 1 --quiet 1 --storage data-back --mailto support@xxx.xxx --compress lzo
Oct 20 01:00:01 prx1 vzdump[30594]: INFO: Starting Backup of VM 20004 (lxc)
Oct 20 01:00:12 prx1 sudo[10037]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:00:13 prx1 sudo[10037]: pam_unix(sudo:session): session closed for user root
Oct 20 01:00:17 prx1 sudo[17400]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:00:17 prx1 sudo[17400]: pam_unix(sudo:session): session closed for user root
Oct 20 01:00:22 prx1 sudo[24138]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:00:23 prx1 sudo[24138]: pam_unix(sudo:session): session closed for user root
Oct 20 01:00:27 prx1 sudo[32489]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:00:27 prx1 sudo[32489]: pam_unix(sudo:session): session closed for user root
Oct 20 01:00:32 prx1 sudo[8968]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:00:32 prx1 sudo[8968]: pam_unix(sudo:session): session closed for user root
Oct 20 01:01:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:01:02 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:02:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:02:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:02:36 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 01:03:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:03:02 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:04:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:04:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:04:40 prx1 pveproxy[29959]: worker exit
Oct 20 01:04:40 prx1 pveproxy[2477]: worker 29959 finished
Oct 20 01:04:40 prx1 pveproxy[2477]: starting 1 worker(s)
Oct 20 01:04:40 prx1 pveproxy[2477]: worker 8932 started
Oct 20 01:05:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:05:01 prx1 CRON[9258]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:05:01 prx1 CRON[9259]: (root) CMD (/opt/router/make-if-list)
Oct 20 01:05:02 prx1 CRON[9258]: pam_unix(cron:session): session closed for user root
Oct 20 01:05:02 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:05:55 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 01:06:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:06:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:07:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:07:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:08:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:08:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:08:33 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 01:09:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:09:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:09:06 prx1 liblogging-stdlog[1972]: -- MARK --
Oct 20 01:10:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:10:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:10:01 prx1 CRON[14514]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:10:01 prx1 CRON[14531]: (root) CMD (/opt/router/make-if-list)
Oct 20 01:10:01 prx1 CRON[14514]: pam_unix(cron:session): session closed for user root
Oct 20 01:11:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:11:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:12:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:12:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:13:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:13:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:14:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:14:01 prx1 CRON[18797]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:14:01 prx1 CRON[18798]: (root) CMD (/opt/router/smartcheck)
Oct 20 01:14:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:14:01 prx1 CRON[18797]: pam_unix(cron:session): session closed for user root
Oct 20 01:14:25 prx1 pvedaemon[2467]: <root@pam> successful auth for user 'root@pam'
Oct 20 01:14:49 prx1 pveproxy[6747]: worker exit
Oct 20 01:14:49 prx1 pveproxy[2477]: worker 6747 finished
Oct 20 01:14:49 prx1 pveproxy[2477]: starting 1 worker(s)
Oct 20 01:14:49 prx1 pveproxy[2477]: worker 19622 started
Oct 20 01:15:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:15:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:15:01 prx1 CRON[19841]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:15:01 prx1 CRON[19843]: (root) CMD (/opt/router/make-if-list)
Oct 20 01:15:01 prx1 CRON[19841]: pam_unix(cron:session): session closed for user root
Oct 20 01:15:12 prx1 sudo[20096]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:15:13 prx1 sudo[20096]: pam_unix(sudo:session): session closed for user root
Oct 20 01:15:17 prx1 sudo[20174]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:15:17 prx1 sudo[20174]: pam_unix(sudo:session): session closed for user root
Oct 20 01:15:22 prx1 sudo[20282]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:15:22 prx1 sudo[20282]: pam_unix(sudo:session): session closed for user root
Oct 20 01:15:27 prx1 sudo[20380]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:15:27 prx1 sudo[20380]: pam_unix(sudo:session): session closed for user root
Oct 20 01:15:32 prx1 sudo[20470]: pam_unix(sudo:session): session opened for user root by (uid=0)
Oct 20 01:15:32 prx1 sudo[20470]: pam_unix(sudo:session): session closed for user root
Oct 20 01:16:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:16:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:17:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:17:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:17:01 prx1 CRON[21989]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:17:01 prx1 CRON[21990]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:17:01 prx1 CRON[21992]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Oct 20 01:17:01 prx1 CRON[21991]: (root) CMD (/opt/router/man-db)
Oct 20 01:17:01 prx1 CRON[21989]: pam_unix(cron:session): session closed for user root
Oct 20 01:17:36 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 01:17:49 prx1 CRON[21990]: pam_unix(cron:session): session closed for user root
Oct 20 01:18:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:18:03 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:18:13 prx1 kernel: INFO: task pve-ha-lrm:20749 blocked for more than 120 seconds.
Oct 20 01:18:13 prx1 kernel:       Tainted: P           O    4.10.17-3-pve #1
Oct 20 01:18:13 prx1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 20 01:18:13 prx1 kernel: pve-ha-lrm      D    0 20749   2476 0x00000000
Oct 20 01:18:13 prx1 kernel: Call Trace:
Oct 20 01:18:13 prx1 kernel:  __schedule+0x233/0x6f0
Oct 20 01:18:13 prx1 kernel:  schedule+0x36/0x80
Oct 20 01:18:13 prx1 kernel:  schedule_timeout+0x22a/0x3f0
Oct 20 01:18:13 prx1 kernel:  ? ktime_get+0x41/0xb0
Oct 20 01:18:13 prx1 kernel:  io_schedule_timeout+0xa4/0x110
Oct 20 01:18:13 prx1 kernel:  __lock_page+0x10d/0x150
Oct 20 01:18:13 prx1 kernel:  ? unlock_page+0x30/0x30
Oct 20 01:18:13 prx1 kernel:  pagecache_get_page+0x19f/0x2a0
Oct 20 01:18:13 prx1 kernel:  shmem_unused_huge_shrink+0x214/0x3b0
Oct 20 01:18:13 prx1 kernel:  shmem_unused_huge_scan+0x20/0x30
Oct 20 01:18:13 prx1 kernel:  super_cache_scan+0x190/0x1a0
Oct 20 01:18:13 prx1 kernel:  shrink_slab.part.40+0x1f5/0x420
Oct 20 01:18:13 prx1 kernel:  shrink_slab+0x29/0x30
Oct 20 01:18:13 prx1 kernel:  shrink_node+0x108/0x320
Oct 20 01:18:13 prx1 kernel:  do_try_to_free_pages+0xf5/0x330
Oct 20 01:18:13 prx1 kernel:  try_to_free_pages+0xe9/0x190
Oct 20 01:18:13 prx1 kernel:  __alloc_pages_slowpath+0x40f/0xba0
Oct 20 01:18:13 prx1 kernel:  ? radix_tree_lookup_slot+0x22/0x50
Oct 20 01:18:13 prx1 kernel:  __alloc_pages_nodemask+0x209/0x260
Oct 20 01:18:13 prx1 kernel:  alloc_pages_current+0x95/0x140
Oct 20 01:18:13 prx1 kernel:  pte_alloc_one+0x17/0x40
Oct 20 01:18:13 prx1 kernel:  __pte_alloc+0x1e/0x110
Oct 20 01:18:13 prx1 kernel:  alloc_set_pte+0x592/0x600
Oct 20 01:18:13 prx1 kernel:  finish_fault+0x2c/0x50
Oct 20 01:18:13 prx1 kernel:  handle_mm_fault+0xb49/0x1330
Oct 20 01:18:13 prx1 kernel:  ? common_mmap+0x48/0x50
Oct 20 01:18:13 prx1 kernel:  ? apparmor_mmap_file+0x18/0x20
Oct 20 01:18:13 prx1 kernel:  __do_page_fault+0x23e/0x4e0
Oct 20 01:18:13 prx1 kernel:  do_page_fault+0x22/0x30
Oct 20 01:18:13 prx1 kernel:  page_fault+0x28/0x30
Oct 20 01:18:13 prx1 kernel: RIP: 0033:0x7fb7a780ea57
Oct 20 01:18:13 prx1 kernel: RSP: 002b:00007ffce00a0250 EFLAGS: 00010246
Oct 20 01:18:13 prx1 kernel: RAX: 0000000000000000 RBX: 000055a790316fa0 RCX: 0000000000000000
Oct 20 01:18:13 prx1 kernel: RDX: 0000000000000000 RSI: 00007fb79e4c5000 RDI: 00007fb7adc9f000
Oct 20 01:18:13 prx1 kernel: RBP: 0000000000000002 R08: 00007ffce00a03f0 R09: 00000000ffffffff
Oct 20 01:18:13 prx1 kernel: R10: 0000000000000011 R11: 0000000000000246 R12: 000000000000001c
Oct 20 01:18:13 prx1 kernel: R13: 0000000000000010 R14: 0000000000000000 R15: 0000000000000010
Oct 20 01:19:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:19:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:19:06 prx1 liblogging-stdlog[1972]: -- MARK --
Oct 20 01:20:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:20:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:20:01 prx1 CRON[27432]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 20 01:20:01 prx1 CRON[27433]: (root) CMD (/opt/router/make-if-list)
Oct 20 01:20:01 prx1 CRON[27432]: pam_unix(cron:session): session closed for user root
Oct 20 01:20:14 prx1 kernel: INFO: task pve-ha-lrm:20749 blocked for more than 120 seconds.
Oct 20 01:20:14 prx1 kernel:       Tainted: P           O    4.10.17-3-pve #1
Oct 20 01:20:14 prx1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 20 01:20:14 prx1 kernel: pve-ha-lrm      D    0 20749   2476 0x00000000
Oct 20 01:20:14 prx1 kernel: Call Trace:
Oct 20 01:20:14 prx1 kernel:  __schedule+0x233/0x6f0
Oct 20 01:20:14 prx1 kernel:  schedule+0x36/0x80
Oct 20 01:20:14 prx1 kernel:  schedule_timeout+0x22a/0x3f0
Oct 20 01:20:14 prx1 kernel:  ? ktime_get+0x41/0xb0
Oct 20 01:20:14 prx1 kernel:  io_schedule_timeout+0xa4/0x110
Oct 20 01:20:14 prx1 kernel:  __lock_page+0x10d/0x150
Oct 20 01:20:14 prx1 kernel:  ? unlock_page+0x30/0x30
Oct 20 01:20:14 prx1 kernel:  pagecache_get_page+0x19f/0x2a0
Oct 20 01:20:14 prx1 kernel:  shmem_unused_huge_shrink+0x214/0x3b0
Oct 20 01:20:14 prx1 kernel:  shmem_unused_huge_scan+0x20/0x30
Oct 20 01:20:14 prx1 kernel:  super_cache_scan+0x190/0x1a0
Oct 20 01:20:14 prx1 kernel:  shrink_slab.part.40+0x1f5/0x420
Oct 20 01:20:14 prx1 kernel:  shrink_slab+0x29/0x30
Oct 20 01:20:14 prx1 kernel:  shrink_node+0x108/0x320
Oct 20 01:20:14 prx1 kernel:  do_try_to_free_pages+0xf5/0x330
Oct 20 01:20:14 prx1 kernel:  try_to_free_pages+0xe9/0x190
Oct 20 01:20:14 prx1 kernel:  __alloc_pages_slowpath+0x40f/0xba0
Oct 20 01:20:14 prx1 kernel:  ? radix_tree_lookup_slot+0x22/0x50
Oct 20 01:20:14 prx1 kernel:  __alloc_pages_nodemask+0x209/0x260
Oct 20 01:20:14 prx1 kernel:  alloc_pages_current+0x95/0x140
Oct 20 01:20:14 prx1 kernel:  pte_alloc_one+0x17/0x40
Oct 20 01:20:14 prx1 kernel:  __pte_alloc+0x1e/0x110
Oct 20 01:20:14 prx1 kernel:  alloc_set_pte+0x592/0x600
Oct 20 01:20:14 prx1 kernel:  finish_fault+0x2c/0x50
Oct 20 01:20:14 prx1 kernel:  handle_mm_fault+0xb49/0x1330
Oct 20 01:20:14 prx1 kernel:  ? common_mmap+0x48/0x50
Oct 20 01:20:14 prx1 kernel:  ? apparmor_mmap_file+0x18/0x20
Oct 20 01:20:14 prx1 kernel:  __do_page_fault+0x23e/0x4e0
Oct 20 01:20:14 prx1 kernel:  do_page_fault+0x22/0x30
Oct 20 01:20:14 prx1 kernel:  page_fault+0x28/0x30
Oct 20 01:20:14 prx1 kernel: RIP: 0033:0x7fb7a780ea57
Oct 20 01:20:14 prx1 kernel: RSP: 002b:00007ffce00a0250 EFLAGS: 00010246
Oct 20 01:20:14 prx1 kernel: RAX: 0000000000000000 RBX: 000055a790316fa0 RCX: 0000000000000000
Oct 20 01:20:14 prx1 kernel: RDX: 0000000000000000 RSI: 00007fb79e4c5000 RDI: 00007fb7adc9f000
Oct 20 01:20:14 prx1 kernel: RBP: 0000000000000002 R08: 00007ffce00a03f0 R09: 00000000ffffffff
Oct 20 01:20:14 prx1 kernel: R10: 0000000000000011 R11: 0000000000000246 R12: 000000000000001c
Oct 20 01:20:14 prx1 kernel: R13: 0000000000000010 R14: 0000000000000000 R15: 0000000000000010
Oct 20 01:20:55 prx1 pmxcfs[2433]: [status] notice: received log
Oct 20 01:21:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:21:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:22:00 prx1 systemd[1]: Starting Proxmox VE replication runner...
Oct 20 01:22:01 prx1 systemd[1]: Started Proxmox VE replication runner.
Oct 20 01:22:15 prx1 kernel: INFO: task pve-ha-lrm:20749 blocked for more than 120 seconds.

Gruß

Markus
 
Anbei das Logs, die Fehler treten genau 18 Minuten nach beginn des Backups auf.

Danke für die logs.

Hmm, wohin wird das backup den gemacht? Auf der/den disk/s wo auch die root partition drauf ist? Und was verwenden sie als storage, ZFS?
Wie schauts mit ram und swap aus?

Es könnte schon sein dass einfach die IO last zu groß wird so dass pve-ha-lrm nicht mehr dran kommt, aber schon ungewöhnlich.
Der stack trace wärend hängers ist im page fault, d.h. da holt er sich gerade eine page vom memory oder eben swap.

Ist das öfters passiert oder das erste mal?
 
Hi

Passiert unabhängig vom Ziel. Also entweder NFS (wo die ve's laufen) oder lokal xfs bzw. Ext4.
Swap wird nicht genutzt, RAM bei max. 75%. Und passiert auch auf anderen Systemen mit gleicher config (hard/Software).

Der io geht auch nach dem Backup nicht wirklich runter erst nach einem Neustart.

Gruß Markus
 
hi,

Der io geht auch nach dem Backup nicht wirklich runter erst nach einem Neustart.

OK, dann scheint es so als wäre das dass eigentliche Problem und der blockierte LRM "nur" ein Symptom...
Könnte ein Problem im mMemory Management Stack des Kernels sein, welche Vrsion verwendest du?
Code:
pveversion -v
 
Hi,

hier die Infos:

Code:
~# pveversion -v
proxmox-ve: 5.0-24 (running kernel: 4.10.17-4-pve)
pve-manager: 5.0-32 (running version: 5.0-32/2560e073)
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.17-3-pve: 4.10.17-23
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-14
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-18
libpve-guest-common-perl: 2.0-12
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-15
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.1-1
pve-container: 2.0-16
pve-firewall: 3.0-3
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve18~bpo90


~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1 439964 480000 942488 26592732    0    0     0    24 2047 3692  0  0 87 12  0
 0  1 439964 480496 942488 26592740    0    0     0  1656 4581 8807  1  0 85 14  0
 0  1 439964 480060 942488 26592740    0    0     0    76 2459 4371  0  0 87 12  0
 0  1 439964 479968 942488 26592780    0    0     0     4  947 1313  0  0 87 13  0
 1  1 439964 479888 942492 26592776    0    0     0    44 2382 4170  1  1 87 12  0
 0  1 439964 479660 942500 26592776    0    0    20   508 1370 2267  5  1 86  8  0
 0  1 439964 479380 942500 26592848    0    0     0  1672 5902 11948  1  1 85 14  0
 0  1 439964 479168 942504 26592844    0    0     0    32  898 1378  0  0 87 12  0
 0  1 439964 479224 942504 26592852    0    0     0     4 2242 3897  0  0 87 12  0
 0  1 439964 479260 942504 26592852    0    0     0    12  796 1254  0  0 87 13  0
 0  1 439964 479844 942504 26592852    0    0     0     0 2202 3804  0  1 87 12  0
 0  1 439964 479664 942504 26592852    0    0     0  1844 3616 6823  0  0 86 14  0
 0  1 439964 478940 942504 26593436    0    0     0    12 2307 3955  1  1 86 12  0
 0  1 439964 479292 942504 26592852    0    0     0     0  837 1204  0  0 87 12  0
 0  1 439964 480048 942504 26592852    0    0     0   176 3722 6698  0  0 87 12  0
 0  1 439964 479608 942504 26592860    0    0     0     0 2220 3866  0  0 87 12  0
 0  1 439964 479620 942504 26592860    0    0     0  1568 5333 10709  1  2 84 14  0

Gruß

Markus
 
Hat keiner eine Idee von euch oder versucht ihr gerade das Problem nachzustellen?

Was können wir machen das Ihr uns helft.
 
Hat keiner eine Idee von euch oder versucht ihr gerade das Problem nachzustellen?

Nachstellen konnte ich das bis jetzt hier nicht, unter Verdacht steht des weiteren das IO und/oder Memory subsystem.

Was können wir machen das Ihr uns helft.

Bei solchen System/Setup spezifischen Fehlern kann am ehesten über remote zugriff was heraus gefunden werden.
Dafür melde dich am besten im unseren Enterprise Support https://my.proxmox.com/ mit den entsprechenden Support subscriptions.
 
Im aktuellen Setup befinden sich 4 Proxmox Server und ein Raid.
Drei Proxmox Server haben je eine 1GB und eine 10GB Netzwerkkarte, wobei die 10GB direkt mit dem Raid verbunden sind.
Die 10GB Karten sind vom Typ "Mellanox ConnectX®-3 Pro EN".
Eingebunden wird das Raid über NFS3 in der fstab, im Proxmox als Verzeichniss mit Sharing.

Das Setup funktioniert ohne Fehler - Problem ist jedoch, dass die Proxmox-Server mit den 10GB Karten bis zu einem IO Wait von 80 ansteigen (und nicht wieder fallen), der nur durch einen
Reboot verschwindet ... Ursache ist hier wohl der pve-ha-lrm (einziger Prozess im Status "D").

Testweise habe wir jetzt mal einen Proxmox-Sever mit einer 10GB Karte auf 1GB umgestellt. Das Problem mit dem IO Wait besteht nicht mehr.

Sind Probleme mit den Mellanox Karten, der Verwendung von getrennten Netzen (Raid/Proxmox) oder dem Verzeichnissharing bekannt?

PS. Die 10GB Karten haben alle separate Netzwerkeadresse, keine Error's , Drops oder sowas auf den Interfacen.
PPS. Das Setup lieft bereits mit der 4'er Proxmox Version. Hier jedoch ohne jegliche Fehler.
 
Kommando zurück - das Problem besteht auch nach dem abschalten der 10GB weiter.

Der pve-ha-lrm klemmt sich nach einer weile fest (PS Status "D") , der IO-Wait geht immer weiter hoch, die Clusterfunktionen (start/stop/Migration) laufen nicht mehr und es hilft nur noch ein Neustart, bzw. Reset.

In den Logfiles sind keine Fehler zu erkennen, im PVE Status/nodes ist alles ok.

Sehr unangenehm das ganze! Hat keiner Tip's zur Fehlerfindung?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!