Server fails to boot

gdi2k

Renowned Member
Aug 13, 2016
83
1
73
We had one of our Proxmox VE servers crash today, and it wouldn't boot back up.

I've attached screenshots. Hard disk related? We have 2 hard disks in RAID1, so I feel it's unlikely that both would have failed.

The cluster did its job and the VMs were moved to remaining servers automatically, so no major downtime. :)
 

Attachments

  • WhatsApp Image 2017-09-02 at 17.43.42.jpeg
    WhatsApp Image 2017-09-02 at 17.43.42.jpeg
    137 KB · Views: 53
  • WhatsApp Image 2017-09-02 at 17.49.34.jpeg
    WhatsApp Image 2017-09-02 at 17.49.34.jpeg
    159.2 KB · Views: 53
Hi,

I would use a live CD to check if your main-board is working proper.

The pictures are showing that your network is failing and also your disk.
When this is true then a possible reason is a broken main-board.
 
This looks to be hard disk related, specifically the first hard disk (system disk).

The server is installed with RAID1 config across sda and sdb (using Proxmox VE installer). I switched the boot drive to sdb and it runs successfully, but it does tend to crash again after a few hours / days.

I am seeing a lot of these errors in dmesg:

Code:
[   10.764968] ata1.00: exception Emask 0x10 SAct 0x78 SErr 0x280100 action 0x6 frozen
[   10.764993] ata1.00: irq_stat 0x08000000, interface fatal error
[   10.765008] ata1: SError: { UnrecovData 10B8B BadCRC }
[   10.765022] ata1.00: failed command: READ FPDMA QUEUED
[   10.765036] ata1.00: cmd 60/98:18:60:9e:d8/00:00:02:00:00/40 tag 3 ncq 77824 in
         res 40/00:28:c0:51:f2/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
[   10.765069] ata1.00: status: { DRDY }
[   10.765079] ata1.00: failed command: READ FPDMA QUEUED
[   10.765092] ata1.00: cmd 60/00:20:a0:9c:ec/01:00:00:00:00/40 tag 4 ncq 131072 in
         res 40/00:28:c0:51:f2/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
[   10.765124] ata1.00: status: { DRDY }
[   10.765134] ata1.00: failed command: READ FPDMA QUEUED
[   10.765147] ata1.00: cmd 60/c0:28:c0:51:f2/00:00:02:00:00/40 tag 5 ncq 98304 in
         res 40/00:28:c0:51:f2/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
[   10.765179] ata1.00: status: { DRDY }
[   10.765473] ata1.00: failed command: READ FPDMA QUEUED
[   10.765763] ata1.00: cmd 60/20:30:58:5e:b8/00:00:02:00:00/40 tag 6 ncq 16384 in
         res 40/00:28:c0:51:f2/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
[   10.766343] ata1.00: status: { DRDY }
[   10.766639] ata1: hard resetting link
[   11.085003] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   11.086629] ata1.00: supports DRM functions and may not be fully accessible
[   11.086885] ata1.00: disabling queued TRIM support
[   11.087432] ata1.00: supports DRM functions and may not be fully accessible
[   11.087611] ata1.00: disabling queued TRIM support
[   11.087926] ata1.00: configured for UDMA/133
[   11.087934] ata1: EH complete
[   11.116938] ata1.00: exception Emask 0x10 SAct 0x60000000 SErr 0x280100 action 0x6 frozen
[   11.117266] ata1.00: irq_stat 0x08000000, interface fatal error
[   11.117599] ata1: SError: { UnrecovData 10B8B BadCRC }
[   11.117931] ata1.00: failed command: READ FPDMA QUEUED
[   11.118271] ata1.00: cmd 60/00:e8:a0:3b:b8/01:00:02:00:00/40 tag 29 ncq 131072 in
         res 40/00:e8:a0:3b:b8/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
[   11.118983] ata1.00: status: { DRDY }
[   11.119348] ata1.00: failed command: READ FPDMA QUEUED
[   11.119723] ata1.00: cmd 60/00:f0:a0:3c:b8/01:00:02:00:00/40 tag 30 ncq 131072 in
         res 40/00:e8:a0:3b:b8/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
[   11.120548] ata1.00: status: { DRDY }
[   11.121012] ata1: hard resetting link
[   11.444933] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   11.446586] ata1.00: supports DRM functions and may not be fully accessible
[   11.446871] ata1.00: disabling queued TRIM support
[   11.447438] ata1.00: supports DRM functions and may not be fully accessible
[   11.447617] ata1.00: disabling queued TRIM support
[   11.447947] ata1.00: configured for UDMA/133
[   11.447954] ata1: EH complete
[   11.460435] SGI XFS with ACLs, security attributes, realtime, no debug enabled
[   11.463566] XFS (sdc1): Mounting V4 Filesystem
[   11.476903] ata1: limiting SATA link speed to 3.0 Gbps
[   11.476906] ata1.00: exception Emask 0x10 SAct 0x8000000 SErr 0x280100 action 0x6 frozen
[   11.477374] ata1.00: irq_stat 0x08000000, interface fatal error
[   11.477873] ata1: SError: { UnrecovData 10B8B BadCRC }
[   11.478320] ata1.00: failed command: READ FPDMA QUEUED
[   11.478768] ata1.00: cmd 60/f0:d8:d8:4b:f2/00:00:02:00:00/40 tag 27 ncq 122880 in
         res 40/00:d8:d8:4b:f2/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
[   11.479652] ata1.00: status: { DRDY }
[   11.480098] ata1: hard resetting link
[   11.485963] XFS (sdc1): Starting recovery (logdev: internal)
[   11.493679] XFS (sdc1): Ending recovery (logdev: internal)
[   11.796903] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   11.798510] ata1.00: supports DRM functions and may not be fully accessible
[   11.798772] ata1.00: disabling queued TRIM support
[   11.799405] ata1.00: supports DRM functions and may not be fully accessible
[   11.799614] ata1.00: disabling queued TRIM support
[   11.799980] ata1.00: configured for UDMA/133
[   11.799986] ata1: EH complete
[   12.017924] systemd-sysv-generator[2802]: Ignoring creation of an alias umountiscsi.service for itself
[   12.053838] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   12.056605] systemd-sysv-generator[2839]: Ignoring creation of an alias umountiscsi.service for itself
[   12.069394] ip_set: protocol 6
[   12.088613] systemd-sysv-generator[2854]: Ignoring creation of an alias umountiscsi.service for itself
[   12.132898] XFS (sdb1): Mounting V4 Filesystem
[   12.148120] XFS (sdb1): Starting recovery (logdev: internal)
[   12.164455] XFS (sdb1): Ending recovery (logdev: internal)
[   12.258317] systemd-sysv-generator[2889]: Ignoring creation of an alias umountiscsi.service for itself
[   12.289102] systemd-sysv-generator[2901]: Ignoring creation of an alias umountiscsi.service for itself
[   12.319213] systemd-sysv-generator[2913]: Ignoring creation of an alias umountiscsi.service for itself
[   32.746598] XFS (sdd1): Mounting V4 Filesystem
[   32.763609] XFS (sdd1): Starting recovery (logdev: internal)
[   32.781957] XFS (sdd1): Ending recovery (logdev: internal)
[   32.876777] systemd-sysv-generator[3292]: Ignoring creation of an alias umountiscsi.service for itself
[   32.908547] systemd-sysv-generator[3304]: Ignoring creation of an alias umountiscsi.service for itself
[   32.944236] systemd-sysv-generator[3316]: Ignoring creation of an alias umountiscsi.service for itself

So I would like to replace sda drive. What is the procedure for this (given the RAID1 setup from the installer)? I would like to avoid reinstalling PVE from scratch if possible.
 
I have similar error, my server proxmox begin to frozen and after I lost connectivity and I need reboot manually.

when I start again my node, I receive this alert:

dmesg |grep ata8
[ 8.097820] ata8: SATA max UDMA/133 abar m524288@0x92200000 port 0x92200180 irq 68
[ 8.411662] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 8.412367] ata8.00: supports DRM functions and may not be fully accessible
[ 8.412392] ata8.00: ATA-10: CT500MX500SSD1, M3CR020, max UDMA/133
[ 8.412394] ata8.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 8.412533] ata8.00: READ LOG DMA EXT failed, trying PIO
[ 8.412534] ata8.00: failed to get Identify Device Data, Emask 0x40
[ 8.412535] ata8.00: ATA Identify Device Log not supported
[ 8.412536] ata8.00: Security Log not supported
[ 8.412538] ata8.00: failed to set xfermode (err_mask=0x40)
[ 13.855233] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 13.856024] ata8.00: supports DRM functions and may not be fully accessible
[ 13.856632] ata8.00: supports DRM functions and may not be fully accessible
[ 13.857113] ata8.00: configured for UDMA/133


6.695827] pci 0000:01:00.0: BAR 7: failed to assign [mem size 0x00100000 64bit]
[ 6.695831] pci 0000:01:00.0: BAR 10: failed to assign [mem size 0x00100000 64bit]
[ 6.695834] pci 0000:01:00.1: BAR 7: failed to assign [mem size 0x00100000 64bit]
[ 6.695837] pci 0000:01:00.1: BAR 10: failed to assign [mem size 0x00100000 64bit]
[ 8.412533] ata8.00: READ LOG DMA EXT failed, trying PIO
[ 8.412534] ata8.00: failed to get Identify Device Data, Emask 0x40
[ 8.412538] ata8.00: failed to set xfermode (err_mask=0x40)


# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.10.17-2-pve: 4.10.17-20
ceph: 12.2.8-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1



please somebody can help me to understand the issue.

Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!