HEALTH_ERR with BlueStore

hidalgo

Well-Known Member
Nov 11, 2016
60
0
46
57
Since a few days I got a issue with a damaged PG.
Code:
# ceph health detail
HEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 3 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 1.25 is active+clean+inconsistent, acting [2,3,5]

Unfortunately this didn’t help
Code:
# ceph pg repair 1.25

What to do?

BTW:
Code:
proxmox-ve: 5.0-24 (running kernel: 4.10.17-4-pve)
pve-manager: 5.0-32 (running version: 5.0-32/2560e073)
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-14
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-18
libpve-guest-common-perl: 2.0-12
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-15
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-4
pve-container: 2.0-16
pve-firewall: 3.0-3
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve18~bpo90
ceph: 12.2.1-pve1
 
Greetings!
I am having the same problem after the migration.

Code:
proxmox-ve: 5.1-26 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.4.40-1-pve: 4.4.40-82
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.4.62-1-pve: 4.4.62-88
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.59-1-pve: 4.4.59-87
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.1-pve3
 
Hello,

I have the same issue...

- 3 pve cluster with same hardware
- CEPH with 3 OSD (2 HHD and 1 SSD)
- System installed on another ssd (zfs)
- I only have 32GB of RAM in these servers and they are using around 70% of it and swapping a little bit even with swappiness at 0 (the issue seems to be related to ram and swap)
- CEPH seams to use lot of memory : the cluster is not so much loaded, (relatively light VMs) but it uses 70% of the RAM while the whole VMs RAM attribution (and they don't use it all !) is less than 30%

Software :
pve-manager/5.1-46/ae8241d4 (running kernel: 4.13.16-1-pve)
ceph version 12.2.4 (4832b6f0acade977670a37c20ff5dbe69e727416) luminous (stable)

The error occurs almost once a day everyday and completely random, different OSD, didn't see a hardware relation...

You should try not to repair but to issue a new deep scrub on the faulty pg (ceph pg deep-scrub X.XX) for me it solves the problem...
You also can try to downgrade the kernel to 4.10 (witch seem to not be affected)

More informations here : http://tracker.ceph.com/issues/22464
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!