Hosts freeze running ps_mem

zorrobiwan

Active Member
Jun 10, 2020
26
4
43
52
Hi,

I had the issue on several of my hosts while launching ps_mem (https://github.com/pixelb/ps_mem) through ansible. Hosts just freeze and I had to hard reboot them asking my hoster.

I'm not familiar with all the logs. May be someone can help me to read them.
Could it be related to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1799497 ?

Thanks

pveversion (unfortunately avec a reboot followed by a dist-upgrade)
Code:
pveversion -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-29-pve)
pve-manager: 5.4-15 (running version: 5.4-15/d0ec33c6)
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-29-pve: 4.15.18-57
pve-kernel-4.15.18-28-pve: 4.15.18-56
pve-kernel-4.15.18-27-pve: 4.15.18-55
pve-kernel-4.15.18-17-pve: 4.15.18-43
pve-kernel-4.15.18-12-pve: 4.15.18-36
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-42
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-56
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

Code:
uname -a
Linux prox8 4.15.18-29-pve #1 SMP PVE 4.15.18-57 (Mon, 18 May 2020 14:34:54 +0200) x86_64 GNU/Linux
 
Yesterday syslog starting with the ps_mem launch until the hard reboot
Code:
Jul 7 17:50:48 prox8 ansible-setup: Invoked with filter=* gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10
Jul 7 17:50:51 prox8 ansible-command: Invoked with warn=True executable=None _uses_shell=True _raw_params=ps_mem removes=None argv=None creates=None chdir=None stdin=None
Jul 7 17:50:51 prox8 kernel: [22621173.190029] CPU: 11 PID: 5982 Comm: python Tainted: P O 4.15.18-17-pve 0000001
Jul 7 17:50:51 prox8 kernel: [22621173.204171] RIP: 0010:smaps_pte_range+0x3ee/0x5e0
Jul 7 17:50:51 prox8 kernel: [22621173.213391] RAX: 000000055b736007 RBX: 000055b736007000 RCX: 0000000000000001
Jul 7 17:50:51 prox8 kernel: [22621173.222423] RBP: ffffbc29cfd8bc40 R08: ffff945c18123930 R09: ffff945c18123930
Jul 7 17:50:51 prox8 kernel: [22621173.231187] R13: 00003ffffffff000 R14: ffffbc29cfd8bd08 R15: ffff945d9861b038
Jul 7 17:50:51 prox8 kernel: [22621173.239709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 7 17:50:51 prox8 kernel: [22621173.248065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 7 17:50:51 prox8 kernel: [22621173.256079] Call Trace:
Jul 7 17:50:51 prox8 kernel: [22621173.263730] walk_page_vma+0x4a/0x60
Jul 7 17:50:51 prox8 kernel: [22621173.271023] ? pagemap_pmd_range+0x8a0/0x8a0
Jul 7 17:50:51 prox8 kernel: [22621173.278038] ? do_filp_open+0xad/0x110
Jul 7 17:50:51 prox8 kernel: [22621173.284904] ? _cond_resched+0x1a/0x50
Jul 7 17:50:51 prox8 kernel: [22621173.291562] show_pid_smap+0xe/0x10
Jul 7 17:50:51 prox8 kernel: [22621173.297901] __vfs_read+0x1b/0x40
Jul 7 17:50:51 prox8 kernel: [22621173.303827] SyS_read+0x55/0xc0
Jul 7 17:50:51 prox8 kernel: [22621173.309266] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jul 7 17:50:51 prox8 kernel: [22621173.314454] RSP: 002b:00007ffd025a6a68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Jul 7 17:50:51 prox8 kernel: [22621173.319531] RDX: 0000000000002000 RSI: 00007ffd025a6b40 RDI: 0000000000000003
Jul 7 17:50:51 prox8 kernel: [22621173.324420] R10: 0000000000000230 R11: 0000000000000246 R12: 0000000000002000
Jul 7 17:50:51 prox8 kernel: [22621173.329041] Code: b3 41 f6 40 52 40 0f 85 e8 01 00 00 48 89 d8 49 2b 00 48 c1 e8 0c 49 03 80 98 00 00 00 49 8b 90 a0 00 00 00 48 89 c6 4c 89 5d d0 <48> 8b ba f0 00 00 00 e8 a6 ce ed ff 48 85 c0 48 89 c7 0f 84 71
Jul 7 17:50:51 prox8 kernel: [22621173.338619] CR2: 00000000000000f0
Jul 7 17:51:00 prox8 systemd[1]: Starting Proxmox VE replication runner...
Jul 7 17:51:00 prox8 systemd[1]: Started Proxmox VE replication runner.
(cut message too long)
Jul 7 17:52:22 prox8 kernel: [22621263.667538] CPU: 4 PID: 8284 Comm: python Tainted: P D W O L 4.15.18-17-pve 0000001
Jul 7 17:52:22 prox8 kernel: [22621263.676913] <IRQ>
Jul 7 17:52:22 prox8 kernel: [22621263.683714] ? lapic_can_unplug_cpu+0xb0/0xb0
Jul 7 17:52:22 prox8 kernel: [22621263.690600] rcu_dump_cpu_stacks+0xa3/0xd6
Jul 7 17:52:22 prox8 kernel: [22621263.697613] update_process_times+0x2f/0x60
Jul 7 17:52:22 prox8 kernel: [22621263.704683] __hrtimer_run_queues+0xe7/0x220
Jul 7 17:52:22 prox8 kernel: [22621263.711870] apic_timer_interrupt+0x84/0x90
Jul 7 17:52:22 prox8 kernel: [22621263.719313] RSP: 0018:ffffbc29cc31fbe0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11
Jul 7 17:52:22 prox8 kernel: [22621263.727246] RBP: ffffbc29cc31fbe0 R08: 0000000000000101 R09: ffff945c18123930
Jul 7 17:52:22 prox8 kernel: [22621263.735093] _raw_spin_lock+0x20/0x30
Jul 7 17:52:22 prox8 kernel: [22621263.742830] walk_page_vma+0x4a/0x60
Jul 7 17:52:22 prox8 kernel: [22621263.750145] ? gather_hugetlb_stats+0xa0/0xa0
Jul 7 17:52:22 prox8 kernel: [22621263.756974] ? _cond_resched+0x1a/0x50
Jul 7 17:52:22 prox8 kernel: [22621263.763144] seq_read+0x12a/0x430
Jul 7 17:52:22 prox8 kernel: [22621263.769237] SyS_read+0x55/0xc0
Jul 7 17:52:22 prox8 kernel: [22621263.775214] RIP: 0033:0x7f6101578910
Jul 7 17:52:22 prox8 kernel: [22621263.781431] RDX: 0000000000002000 RSI: 00007ffdb1b92bd0 RDI: 0000000000000003
Jul 7 17:52:22 prox8 kernel: [22621263.787979] R13: 0000000000000d68 R14: 00007ffdb1b92bd0 R15: 00007f6101832900
... (cut message too long)
Jul 7 17:54:00 prox8 systemd[1]: Starting Proxmox VE replication runner...
Jul 7 17:54:00 prox8 systemd[1]: Started Proxmox VE replication runner.
Jul 7 17:54:09 prox8 systemd[1]: Stopped target Graphical Interface.
Jul 7 17:54:09 prox8 systemd[1]: Stopped target RPC Port Mapper.
Jul 7 17:54:09 prox8 systemd[1]: Stopped target Timers.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Jul 7 17:54:09 prox8 systemd[1]: Stopped target Multi-User System.
Jul 7 17:54:09 prox8 systemd[1]: Stopping Fail2Ban Service...
Jul 7 17:54:09 prox8 systemd[1]: Stopping Login Service...
Jul 7 17:54:09 prox8 systemd[1]: Stopping PVE Qemu Event Daemon...
Jul 7 17:54:09 prox8 systemd[1]: Stopping PVE guests...
Jul 7 17:54:09 prox8 systemd[1]: Stopped Proxmox VE replication runner.
Jul 7 17:54:09 prox8 systemd[1]: Unmounting RPC Pipe File System...
Jul 7 17:54:09 prox8 systemd[1]: Stopping irqbalance daemon...
Jul 7 17:54:09 prox8 systemd[1]: Stopped Daily PVE download activities.
Jul 7 17:54:09 prox8 systemd[1]: Stopping Kernel Samepage Merging (KSM) Tuning Daemon...
Jul 7 17:54:09 prox8 systemd[1]: Stopping D-Bus System Message Bus...
Jul 7 17:54:09 prox8 smartd[1935]: smartd received signal 15: Terminated
Jul 7 17:54:09 prox8 systemd[1]: Stopping Self Monitoring and Reporting Technology (SMART) Daemon...
Jul 7 17:54:09 prox8 smartd[1935]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.HGST_HUS726T4TALA6L1-V6H471ZS.ata.state
Jul 7 17:54:09 prox8 systemd[1]: Stopping Real time performance monitoring...
Jul 7 17:54:09 prox8 smartd[1935]: Device: /dev/sdb [SAT], state written to /var/lib/smartmontools/smartd.HGST_HUS726T4TALA6L1-V6H48RGS.ata.state
Jul 7 17:54:09 prox8 systemd[1]: Stopped target Login Prompts.
Jul 7 17:54:09 prox8 smartd[1935]: smartd is exiting (exit status 0)
Jul 7 17:54:09 prox8 systemd[1]: Stopping Serial Getty on ttyS0...
Jul 7 17:54:09 prox8 systemd[1]: Stopping Munin Node...
Jul 7 17:54:09 prox8 systemd[1]: Stopped target ZFS startup target.
Jul 7 17:54:09 prox8 systemd[1]: Stopped target ZFS pool import target.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Daily apt upgrade and clean activities.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Daily apt download activities.
Jul 7 17:54:09 prox8 systemd[1]: Stopping Regular background program processing daemon...
Jul 7 17:54:09 prox8 systemd[1]: Stopping Unattended Upgrades Shutdown...
Jul 7 17:54:09 prox8 systemd[1]: Stopping LSB: Ceph RBD Mapping...
Jul 7 17:54:09 prox8 systemd[1]: Stopping LXC Container Monitoring Daemon...
Jul 7 17:54:09 prox8 systemd[1]: Stopping The Apache HTTP Server...
Jul 7 17:54:09 prox8 systemd[1]: Stopping Getty on tty1...
Jul 7 17:54:09 prox8 systemd[1]: Stopping LSB: disk temperature monitoring daemon...
Jul 7 17:54:09 prox8 systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
Jul 7 17:54:09 prox8 systemd[1]: Stopping Serial Getty on ttyS1...
Jul 7 17:54:09 prox8 systemd[1]: Stopped ZFS file system shares.
Jul 7 17:54:09 prox8 systemd[1]: Stopped irqbalance daemon.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Self Monitoring and Reporting Technology (SMART) Daemon.
Jul 7 17:54:09 prox8 systemd[1]: Stopped D-Bus System Message Bus.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Kernel Samepage Merging (KSM) Tuning Daemon.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Serial Getty on ttyS1.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Serial Getty on ttyS0.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Getty on tty1.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Regular background program processing daemon.
Jul 7 17:54:09 prox8 systemd[1]: Stopped LXC Container Monitoring Daemon.
Jul 7 17:54:09 prox8 systemd[1]: Stopped PVE Qemu Event Daemon.
Jul 7 17:54:09 prox8 systemd[1]: Failed to propagate agent release message: Transport endpoint is not connected
...
Jul 7 17:54:09 prox8 systemd[1]: Removed slice system-getty.slice.
Jul 7 17:54:09 prox8 systemd[1]: Stopping Permit User Sessions...
Jul 7 17:54:09 prox8 systemd[1]: Removed slice system-serial\x2dgetty.slice.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Permit User Sessions.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Munin Node.
Jul 7 17:54:09 prox8 systemd[1]: Stopped LSB: disk temperature monitoring daemon.
Jul 7 17:54:09 prox8 systemd[1]: Stopped Unattended Upgrades Shutdown.
Jul 7 17:54:09 prox8 systemd[1]: Stopped LSB: Ceph RBD Mapping.