PCI Passthrough stops working on heavy load

t_fruba

New Member
Jan 30, 2019
3
0
1
38
Hello Everyone!

I have server at my customer's site that runs PVE 5.1-41.
The server is HP Proliant DL380p G8 tih 2x Xeon E5-2670 (16 cores each), 64GB RAM and 8 SAS 15k HDDs.

Inside the server there is a PCI-e ISDN telecommunication card which is redirected to one of the VM's hosted on it.

Normally - everything works as desired with no problems at all, but... when the server is under heavy stress (eg. daily backups, copying large files etc.) - the passthrough stops working and guest VM loses the card. The only option to restore it is to simply reboot guest. The card is still visible on host.

My colleagues made a simple script on guest that checks if card is working properly and if not - it restarts the machine, but it's obviously not an option, as this card handles whole office's telco traffic. So my customer calls me few time a'day saying that ale the phones went dead.

Can anyone please help me out with this issue?

Thank you in advance and best regards,
Tom
 
hi,

I have server at my customer's site that runs PVE 5.1-41.
the first thing you could try to update to a current version

also check the host syslog/dmesg and also the guest logs for anything that might be related
 
Thank you for the reply.

I've looked at log files on host again but found nothing related to pci device.

However - I've looked at syslog at guest and found those lines:
Code:
Jan 30 12:20:06 asterisk kernel: [1136215.646687] dahdi: disable_span: span 1
Jan 30 12:20:06 asterisk kernel: [1136215.646708] dahdi: disable_span: span 2
Jan 30 12:20:06 asterisk kernel: [1136215.647336] allo4xxp 0000:00:10.0: RCLK source set to span 2
Jan 30 12:20:06 asterisk kernel: [1136215.647340] allo4xxp 0000:00:10.0: Recovered timing mode, RCLK set to span 2
Jan 30 12:20:06 asterisk kernel: [1136215.663890] allo4xxp 0000:00:10.0: ALLO2XXP: Disabling interrupts since there are no active spans
Jan 30 12:20:06 asterisk 'dahdi_span_config'[1877]: offline: /devices/pci0000:00/0000:00:10.0/pci:0000:00:10.0/span-2
Jan 30 12:20:06 asterisk 'dahdi_span_config'[1882]: offline: /devices/pci0000:00/0000:00:10.0/pci:0000:00:10.0/span-1
Jan 30 12:20:06 asterisk kernel: [1136215.750045] dahdi: Telephony Interface Unloaded
Jan 30 12:20:06 asterisk 'dahdi_handle_device'[1892]: remove: /devices/pci0000:00/0000:00:10.0/pci:0000:00:10.0
Jan 30 12:20:06 asterisk 'dahdi_handle_device'[1894]: D: Running '/usr/share/dahdi/handle_device.d/10-span-types'
Jan 30 12:20:06 asterisk 'dahdi_handle_device'[1894]: D: Running '/usr/share/dahdi/handle_device.d/20-span-assignments'
Jan 30 12:20:12 asterisk systemd[1]: Starting LSB: Asterisk PBX...
Jan 30 12:20:12 asterisk asterisk[1906]: Starting Asterisk PBX: asterisk.
Jan 30 12:20:12 asterisk systemd[1]: Started LSB: Asterisk PBX.
===== HERE I'VE RESTARTED SERVICES =====
Jan 30 12:20:16 asterisk kernel: [1136225.765723] dahdi: Version: 2.10.0.1
Jan 30 12:20:16 asterisk kernel: [1136225.768351] dahdi: Telephony Interface Registered on major 196
Jan 30 12:20:16 asterisk kernel: [1136225.777165] allo4xxp 0000:00:10.0: 2nd gen card with initial latency of 2 and 1 ms per IRQ
Jan 30 12:20:16 asterisk kernel: [1136225.777176] allo4xxp 0000:00:10.0: Firmware Version: c01a0168
Jan 30 12:20:16 asterisk kernel: [1136225.777425] allo4xxp 0000:00:10.0: firmware: direct-loading firmware allo-dahdi-fw-2aCP2e.bin
Jan 30 12:20:16 asterisk kernel: [1136225.782067] allo4xxp 0000:00:10.0: FALC Framer Version: 3.1
Jan 30 12:20:16 asterisk kernel: [1136225.782188] allo4xxp 0000:00:10.0: Found a Allocard: Allocard 2aCP2e (2nd Gen)
Jan 30 12:20:16 asterisk kernel: [1136225.782250] allo4xxp 0000:00:10.0: VPM450: Not Present
Jan 30 12:20:16 asterisk 'dahdi_handle_device'[1993]: add: /devices/pci0000:00/0000:00:10.0/pci:0000:00:10.0
Jan 30 12:20:16 asterisk 'dahdi_handle_device'[2000]: auto_assign_spans=1. Skip /devices/pci0000:00/0000:00:10.0/pci:0000:00:10.0
Jan 30 12:20:16 asterisk 'dahdi_span_config'[2014]: add: /devices/pci0000:00/0000:00:10.0/pci:0000:00:10.0/span-2
Jan 30 12:20:16 asterisk 'dahdi_span_config'[2016]: add: /devices/pci0000:00/0000:00:10.0/pci:0000:00:10.0/span-1
Jan 30 12:20:16 asterisk 'dahdi_span_config'[2019]: auto_assign_spans=1. Skip /devices/pci0000:00/0000:00:10.0/pci:0000:00:10.0/span-2
Jan 30 12:20:16 asterisk 'dahdi_span_config'[2020]: auto_assign_spans=1. Skip /devices/pci0000:00/0000:00:10.0/pci:0000:00:10.0/span-1
Jan 30 12:20:17 asterisk kernel: [1136226.370011] dahdi_devices pci:0000:00:10.0: local span 1 is already assigned span 1
Jan 30 12:20:17 asterisk kernel: [1136226.370018] dahdi_devices pci:0000:00:10.0: local span 2 is already assigned span 2
Jan 30 12:20:17 asterisk kernel: [1136226.426087] dahdi_echocan_mg2: Registered echo canceler 'MG2'
Jan 30 12:20:17 asterisk kernel: [1136226.427142] allo4xxp 0000:00:10.0: ALLO2XXP: Span 1 configured for CCS/HDB3/CRC4
Jan 30 12:20:17 asterisk kernel: [1136226.427281] allo4xxp 0000:00:10.0: SPAN 1: Primary Sync Source
Jan 30 12:20:17 asterisk kernel: [1136226.427523] allo4xxp 0000:00:10.0: RCLK source set to span 1
Jan 30 12:20:17 asterisk kernel: [1136226.427531] allo4xxp 0000:00:10.0: Recovered timing mode, RCLK set to span 1
Jan 30 12:20:17 asterisk kernel: [1136226.448383] allo4xxp 0000:00:10.0: ALLO2XXP: Span 2 configured for CCS/HDB3/CRC4
Jan 30 12:20:17 asterisk kernel: [1136226.448440] allo4xxp 0000:00:10.0: RCLK source set to span 1
Jan 30 12:20:17 asterisk kernel: [1136226.448446] allo4xxp 0000:00:10.0: Recovered timing mode, RCLK set to span 1
Jan 30 12:20:17 asterisk kernel: [1136226.448585] allo4xxp 0000:00:10.0: SPAN 2: Secondary Sync Source
Jan 30 12:20:17 asterisk kernel: [1136226.533796] allo4xxp 0000:00:10.0: Need to increase latency.  Estimated latency should be 3
Jan 30 12:20:17 asterisk kernel: [1136226.533875] allo4xxp 0000:00:10.0: Increased latency to 3
Jan 30 12:20:17 asterisk kernel: [1136226.802980] allo4xxp 0000:00:10.0: Need to increase latency.  Estimated latency should be 15
Jan 30 12:20:17 asterisk kernel: [1136226.803962] allo4xxp 0000:00:10.0: Increased latency to 15
Jan 30 12:20:19 asterisk kernel: [1136228.802415] allo4xxp 0000:00:10.0: Need to increase latency.  Estimated latency should be 16
Jan 30 12:20:19 asterisk kernel: [1136228.802926] allo4xxp 0000:00:10.0: Increased latency to 16
Jan 30 12:20:26 asterisk kernel: [1136235.459341] allo4xxp 0000:00:10.0: Need to increase latency.  Estimated latency should be 17
Jan 30 12:20:26 asterisk kernel: [1136235.459458] allo4xxp 0000:00:10.0: Increased latency to 17
Jan 30 12:20:27 asterisk systemd[1]: Started LSB: Asterisk PBX.
Jan 30 12:20:31 asterisk kernel: [1136240.923169] allo4xxp 0000:00:10.0: Need to increase latency.  Estimated latency should be 38
Jan 30 12:20:31 asterisk kernel: [1136240.923977] allo4xxp 0000:00:10.0: Increased latency to 38
Jan 30 12:22:23 asterisk systemd[1]: Stopping LSB: DAHDI kernel modules...
Jan 30 12:22:23 asterisk kernel: [1136353.162364] dahdi: disable_span: span 1
Jan 30 12:22:23 asterisk kernel: [1136353.162387] dahdi: disable_span: span 2
Jan 30 12:22:23 asterisk kernel: [1136353.162578] allo4xxp 0000:00:10.0: RCLK source set to span 2
Jan 30 12:22:23 asterisk kernel: [1136353.162601] allo4xxp 0000:00:10.0: Recovered timing mode, RCLK set to span 2

So it looks like the guest is loosing the card as it would be simply pulled off. Is there any other place that may the host's logs been stored than /var/log ?
 
maybe a hardware problem? or a power saving setting? i would try to upgrade the host first though
 
maybe a hardware problem? or a power saving setting?

That's what I'd think of in a first place but it's obviously regarding to high server load. Whenever it increases - the card is falling down. What's interesting - there's another telco card in the same server (this time it's VoIP card) that's redirected to another VM and this one is working fine no matter what. The only difference is that a VM with failing card is Linux and the other one that's fine - is Windows.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!