Hi,
today one of my VMs (backed on LVM thin and backed on a RAID) ran out of disk space. The only vm without disk monitoring :-((
It seems that this has destroyed my other vms in my thin pool. Several I/O errors have occured.
My LVM thin pool is not overbooked, total vm space is 871 Gb and space on device is 892Gb.
When I look int the ProxMox GUI of the storage the used VM space is only 501 Gb (56%) of the 892 Gb.
In my syslog I have found a " switching pool to out-of-data-space" and this was the beginning ot the end...
After that several I/O errors on the other VMs have occured.
The error reports "ssd-ssd-tpool (251:3)" that means the thin pool itself is full, right?
Any ideas? I do not know what to do here ...
Luckily i didn't lose any data, because the VMs are all part of clusters and get the current state when they are starting up again.
Thx for your help.
today one of my VMs (backed on LVM thin and backed on a RAID) ran out of disk space. The only vm without disk monitoring :-((
It seems that this has destroyed my other vms in my thin pool. Several I/O errors have occured.
My LVM thin pool is not overbooked, total vm space is 871 Gb and space on device is 892Gb.
When I look int the ProxMox GUI of the storage the used VM space is only 501 Gb (56%) of the 892 Gb.
In my syslog I have found a " switching pool to out-of-data-space" and this was the beginning ot the end...
Code:
Nov 25 05:38:37 wpx3 kernel: [85437823.532713] device-mapper: thin: 251:3: switching pool to out-of-data-space (queue IO) mode
Nov 25 05:38:37 wpx3 lvm[1052]: Thin ssd-ssd-tpool is now 100% full.
After that several I/O errors on the other VMs have occured.
Code:
Nov 25 05:39:37 wpx3 kernel: [85437883.566354] Buffer I/O error on device dm-12, logical block 23020159
Nov 25 05:39:37 wpx3 kernel: [85437883.566614] Buffer I/O error on device dm-6, logical block 22077699
Nov 25 05:39:37 wpx3 kernel: [85437883.567171] EXT4-fs warning (device dm-6): ext4_end_bio:330: I/O error -28 writing to inode 6947209 (offset 343932928 size 8388608 starting block 22078720)
Nov 25 05:39:40 wpx3 kernel: [85437887.228228] JBD2: Detected IO errors while flushing file data on dm-6-8
Nov 25 05:39:41 wpx3 kernel: [85437888.145104] JBD2: Detected IO errors while flushing file data on dm-6-8
...
Nov 25 06:24:03 wpx3 kernel: [85440550.222865] dm-9: rw=0, want=35262332145712, limit=16777216
Nov 25 06:24:03 wpx3 kernel: [85440550.226850] dm-9: rw=0, want=35262332145712, limit=16777216
Nov 25 06:24:03 wpx3 kernel: [85440550.230841] attempt to access beyond end of device
The error reports "ssd-ssd-tpool (251:3)" that means the thin pool itself is full, right?
Code:
root@wpx3:/var/log# dmsetup ls --tree
ssd-vm--111--disk--1 (251:12)
└─ssd-ssd-tpool (251:3)
├─ssd-ssd_tdata (251:2)
│ └─ (8:17)
└─ssd-ssd_tmeta (251:1)
└─ (8:17)
ssd-vm--126--disk--1 (251:9)
└─ssd-ssd-tpool (251:3)
├─ssd-ssd_tdata (251:2)
│ └─ (8:17)
└─ssd-ssd_tmeta (251:1)
└─ (8:17)
...
Any ideas? I do not know what to do here ...
Luckily i didn't lose any data, because the VMs are all part of clusters and get the current state when they are starting up again.
Thx for your help.