Since we upgraded our cluster to PVE 4.3 from 3.4, all our OpenVZ containers have been converted to KVM virtual machines. In many of these guests we get frequent console alerts about CPU stalls, usually when the cluster node is under high IO load (for example when backing up or restoring VMs to / from NFS). These CPU stall intervals usually last a couple of minutes, during that the VM is not responding on the network, but console works. If the high IO activity stops on the host, the VMs recover in a few seconds.
It happens on all our nodes running PVE 4.3. We are running ZFS (SSD RAIDZ or HDD RAID10) on our nodes as local storage for VMs, and NFS for backups.
Strangely it seems it does not affect Ubuntu 16.04 guests running kernel 4.4.0, but does older Ubuntus and all Debians (6/7/8), running kernels 2.6.32, 3.x and 4.7.
Below I included how it looks in the guest /var/log/messages.
Debian 8, kernel 4.7 (2 episodes within 2 minutes)
This looks like the same issue that has been experienced by some Linode users after moving to KVM:
https://forum.linode.com/viewtopic.php?p=67775&sid=6159312034f76c59f8981ad183a96165
Has anyone seen this?
Any idea how to prevent it?
It happens on all our nodes running PVE 4.3. We are running ZFS (SSD RAIDZ or HDD RAID10) on our nodes as local storage for VMs, and NFS for backups.
Strangely it seems it does not affect Ubuntu 16.04 guests running kernel 4.4.0, but does older Ubuntus and all Debians (6/7/8), running kernels 2.6.32, 3.x and 4.7.
Below I included how it looks in the guest /var/log/messages.
Debian 8, kernel 4.7 (2 episodes within 2 minutes)
Code:
Nov 28 14:42:39 php-slave-03 kernel: [315902.792165] rcu_sched S ffff88023fc572c0 0 7 2 0x00000000
Nov 28 14:42:39 php-slave-03 kernel: [315902.792171] ffff880236210e40 ffff88023622a180 ffff88023fc50080 0000000000000186
Nov 28 14:42:39 php-slave-03 kernel: [315902.792174] ffff880236218000 ffff880236217e50 0000000104b3aadb ffff880236217dc0
Nov 28 14:42:39 php-slave-03 kernel: [315902.792177] ffff88023fc50080 0000000000000001 ffffffffbe9db651 ffff88023fc50080
Nov 28 14:42:39 php-slave-03 kernel: [315902.792180] Call Trace:
Nov 28 14:42:39 php-slave-03 kernel: [315902.792263] [<ffffffffbe9db651>] ? schedule+0x31/0x80
Nov 28 14:42:39 php-slave-03 kernel: [315902.792271] [<ffffffffbe9de801>] ? schedule_timeout+0x161/0x2c0
Nov 28 14:42:39 php-slave-03 kernel: [315902.792293] [<ffffffffbe4e68d0>] ? trace_raw_output_tick_stop+0x70/0x70
Nov 28 14:42:39 php-slave-03 kernel: [315902.792306] [<ffffffffbe4bdd82>] ? prepare_to_swait+0x52/0x60
Nov 28 14:42:39 php-slave-03 kernel: [315902.792318] [<ffffffffbe4e11bb>] ? rcu_gp_kthread+0x3db/0x840
Nov 28 14:42:39 php-slave-03 kernel: [315902.792321] [<ffffffffbe9db173>] ? __schedule+0x293/0x740
Nov 28 14:42:39 php-slave-03 kernel: [315902.792324] [<ffffffffbe4e0de0>] ? force_qs_rnp+0x130/0x130
Nov 28 14:42:39 php-slave-03 kernel: [315902.792336] [<ffffffffbe49b87f>] ? kthread+0xdf/0x100
Nov 28 14:42:39 php-slave-03 kernel: [315902.792340] [<ffffffffbe9df7ef>] ? ret_from_fork+0x1f/0x40
Nov 28 14:42:39 php-slave-03 kernel: [315902.792343] [<ffffffffbe49b7a0>] ? kthread_park+0x50/0x50
Nov 28 14:42:39 php-slave-03 kernel: [315902.792353] Task dump for CPU 0:
Nov 28 14:42:39 php-slave-03 kernel: [315902.792355] kworker/0:1 R running task 0 1397 2 0x00000008
Nov 28 14:42:39 php-slave-03 kernel: [315902.792403] Workqueue: events_freezable_power_ disk_events_workfn
Nov 28 14:42:39 php-slave-03 kernel: [315902.792406] 0000000000000000 00000000bfdc3b8f ffffffffbe579667 ffff88023fc18040
Nov 28 14:42:39 php-slave-03 kernel: [315902.792409] ffffffffbee54240 0000000000000000 ffff880044c52200 ffffffffbe4e1fcc
Nov 28 14:42:39 php-slave-03 kernel: [315902.792412] ffffffffbe4ee381 001dcd6500000000 00011f4fc3d5852b 0000000000000046
Nov 28 14:42:39 php-slave-03 kernel: [315902.792416] Call Trace:
Nov 28 14:42:39 php-slave-03 kernel: [315902.792421] <IRQ> [<ffffffffbe579667>] ? rcu_dump_cpu_stacks+0x67/0x86
Nov 28 14:42:39 php-slave-03 kernel: [315902.792434] [<ffffffffbe4e1fcc>] ? rcu_check_callbacks+0x70c/0x7b0
Nov 28 14:42:39 php-slave-03 kernel: [315902.792441] [<ffffffffbe4ee381>] ? timekeeping_update+0xf1/0x150
Nov 28 14:42:39 php-slave-03 kernel: [315902.792448] [<ffffffffbe4efa43>] ? update_wall_time+0x2e3/0x7b0
Nov 28 14:42:39 php-slave-03 kernel: [315902.792455] [<ffffffffbe4f7920>] ? tick_sched_do_timer+0x30/0x30
Nov 28 14:42:39 php-slave-03 kernel: [315902.792458] [<ffffffffbe4e8422>] ? update_process_times+0x32/0x60
Nov 28 14:42:39 php-slave-03 kernel: [315902.792460] [<ffffffffbe4f7340>] ? tick_sched_handle.isra.14+0x20/0x50
Nov 28 14:42:39 php-slave-03 kernel: [315902.792462] [<ffffffffbe4f7958>] ? tick_sched_timer+0x38/0x70
Nov 28 14:42:39 php-slave-03 kernel: [315902.792466] [<ffffffffbe4e8fea>] ? __hrtimer_run_queues+0xea/0x280
Nov 28 14:42:39 php-slave-03 kernel: [315902.792468] [<ffffffffbe4e9469>] ? hrtimer_interrupt+0x99/0x190
Nov 28 14:42:39 php-slave-03 kernel: [315902.792602] [<ffffffffc043f680>] ? ata_scsiop_inq_std+0x140/0x140 [libata]
Nov 28 14:42:39 php-slave-03 kernel: [315902.792608] [<ffffffffbe9e1eb9>] ? smp_apic_timer_interrupt+0x39/0x50
Nov 28 14:42:39 php-slave-03 kernel: [315902.792611] [<ffffffffbe9e01e2>] ? apic_timer_interrupt+0x82/0x90
Nov 28 14:42:39 php-slave-03 kernel: [315902.792612] <EOI> [<ffffffffc043f680>] ? ata_scsiop_inq_std+0x140/0x140 [libata]
Nov 28 14:42:39 php-slave-03 kernel: [315902.792626] [<ffffffffbe9df1a1>] ? _raw_spin_unlock_irqrestore+0x11/0x20
Nov 28 14:42:39 php-slave-03 kernel: [315902.792636] [<ffffffffc0443db5>] ? ata_scsi_queuecmd+0x155/0x360 [libata]
Nov 28 14:42:39 php-slave-03 kernel: [315902.792713] [<ffffffffc03afae8>] ? scsi_dispatch_cmd+0xd8/0x220 [scsi_mod]
Nov 28 14:42:39 php-slave-03 kernel: [315902.792725] [<ffffffffc03b2ae3>] ? scsi_request_fn+0x473/0x600 [scsi_mod]
Nov 28 14:42:39 php-slave-03 kernel: [315902.792735] [<ffffffffbe6eb1af>] ? __blk_run_queue+0x2f/0x40
Nov 28 14:42:39 php-slave-03 kernel: [315902.792738] [<ffffffffbe6f3eb8>] ? blk_execute_rq_nowait+0xa8/0x160
Nov 28 14:42:39 php-slave-03 kernel: [315902.792741] [<ffffffffbe6f3fe7>] ? blk_execute_rq+0x77/0x120
Nov 28 14:42:39 php-slave-03 kernel: [315902.792750] [<ffffffffbe6e53a4>] ? bio_phys_segments+0x14/0x20
Nov 28 14:42:39 php-slave-03 kernel: [315902.792753] [<ffffffffbe6f3d7a>] ? blk_rq_map_kern+0xaa/0x120
Nov 28 14:42:39 php-slave-03 kernel: [315902.792755] [<ffffffffbe6edae2>] ? blk_get_request+0x72/0xf0
Nov 28 14:42:39 php-slave-03 kernel: [315902.792765] [<ffffffffc03af61c>] ? scsi_execute+0x12c/0x1d0 [scsi_mod]
Nov 28 14:42:39 php-slave-03 kernel: [315902.792774] [<ffffffffc03b140f>] ? scsi_execute_req_flags+0x8f/0xf0 [scsi_mod]
Nov 28 14:42:39 php-slave-03 kernel: [315902.792793] [<ffffffffc039268e>] ? sr_check_events+0xbe/0x2d0 [sr_mod]
Nov 28 14:42:39 php-slave-03 kernel: [315902.793041] [<ffffffffc031d054>] ? cdrom_check_events+0x14/0x30 [cdrom]
Nov 28 14:42:39 php-slave-03 kernel: [315902.793046] [<ffffffffbe6fec52>] ? disk_check_events+0x62/0x150
Nov 28 14:42:39 php-slave-03 kernel: [315902.793049] [<ffffffffbe495afb>] ? process_one_work+0x14b/0x400
Nov 28 14:42:39 php-slave-03 kernel: [315902.793052] [<ffffffffbe4965a5>] ? worker_thread+0x65/0x4a0
Nov 28 14:42:39 php-slave-03 kernel: [315902.793054] [<ffffffffbe496540>] ? rescuer_thread+0x340/0x340
Nov 28 14:42:39 php-slave-03 kernel: [315902.793056] [<ffffffffbe49b87f>] ? kthread+0xdf/0x100
Nov 28 14:42:39 php-slave-03 kernel: [315902.793060] [<ffffffffbe9df7ef>] ? ret_from_fork+0x1f/0x40
Nov 28 14:42:39 php-slave-03 kernel: [315902.793062] [<ffffffffbe49b7a0>] ? kthread_park+0x50/0x50
Nov 28 14:44:10 php-slave-03 kernel: [315993.500650] Task dump for CPU 0:
Nov 28 14:44:10 php-slave-03 kernel: [315993.500652] kworker/0:1 R running task 0 1397 2 0x00000008
Nov 28 14:44:10 php-slave-03 kernel: [315993.500683] Workqueue: events_freezable_power_ disk_events_workfn
Nov 28 14:44:10 php-slave-03 kernel: [315993.500686] ffff88023fc16ac0 0000000000000000 ffffe8ffffc02900 0000000000000000
Nov 28 14:44:10 php-slave-03 kernel: [315993.500688] ffffffffbe495afb 000000003fc16ac0 ffff88023fc16ac0 ffff880191568270
Nov 28 14:44:10 php-slave-03 kernel: [315993.500690] 0000000000000008 ffff88023fc16ae0 ffff880191568240 ffff880044c52200
Nov 28 14:44:10 php-slave-03 kernel: [315993.500693] Call Trace:
Nov 28 14:44:10 php-slave-03 kernel: [315993.500709] [<ffffffffbe495afb>] ? process_one_work+0x14b/0x400
Nov 28 14:44:10 php-slave-03 kernel: [315993.500713] [<ffffffffbe4965a5>] ? worker_thread+0x65/0x4a0
Nov 28 14:44:10 php-slave-03 kernel: [315993.500715] [<ffffffffbe496540>] ? rescuer_thread+0x340/0x340
Nov 28 14:44:10 php-slave-03 kernel: [315993.500718] [<ffffffffbe49b87f>] ? kthread+0xdf/0x100
Nov 28 14:44:10 php-slave-03 kernel: [315993.500729] [<ffffffffbe9df7ef>] ? ret_from_fork+0x1f/0x40
Nov 28 14:44:10 php-slave-03 kernel: [315993.500731] [<ffffffffbe49b7a0>] ? kthread_park+0x50/0x50
Nov 28 14:44:16 php-slave-03 kernel: [315999.128486] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel jitterentropy_rng hmac joydev drbg hid_generic ansi_cprng usbhid aesni_intel cirrus hid aes_x86_64 lrw ttm gf128mul glue_helper drm_kms_helper ablk_helper drm cryptd ppdev evdev i2c_piix4 acpi_cpufreq serio_raw virtio_balloon shpchp tpm_tis parport_pc pcspkr tpm parport button autofs4 ext4 crc16 jbd2 mbcache sg sr_mod cdrom ata_generic virtio_net virtio_blk ata_piix libata uhci_hcd crc32c_intel ehci_hcd scsi_mod usbcore psmouse virtio_pci virtio_ring virtio usb_common floppy
This looks like the same issue that has been experienced by some Linode users after moving to KVM:
https://forum.linode.com/viewtopic.php?p=67775&sid=6159312034f76c59f8981ad183a96165
Has anyone seen this?
Any idea how to prevent it?
Last edited: