[Help] Finding out what dm-18 is

Ramalama · Apr 1, 2023

Hi Boys & Girls,
I had an issue where Proxmox VM/s and Containers Frozen.

The root case is an Volume that got suddenly unavailable, the problem is just, that i can't find the Volume:

Error:

Code:

Apr 1 04:00:03 proxmox kernel: EXT4-fs (dm-18): write access unavailable, skipping orphan cleanup
Apr 1 04:00:21 proxmox kernel: EXT4-fs (dm-18): write access unavailable, skipping orphan cleanup
Apr 1 04:00:34 proxmox kernel: [729785.190831] EXT4-fs (dm-18): unmounting filesystem b101ba58-704a-476e-80f2-d1e7de7a4308.
Apr 1 04:00:34 proxmox kernel: [729785.383702] dm-18: detected capacity change from 33554432 to 0
Apr 1 04:00:35 proxmox kernel: EXT4-fs (dm-18): write access unavailable, skipping orphan cleanup
Apr 1 04:01:05 proxmox kernel: [729815.652847] dm-18: detected capacity change from 50331648 to 0
Apr 1 04:01:06 proxmox kernel: EXT4-fs (dm-18): write access unavailable, skipping orphan cleanup
Apr 1 04:01:06 proxmox kernel: [729816.773933] EXT4-fs (dm-18): mounted filesystem 25a9a4e6-4588-491c-9e85-4aea535b9218 without journal. Quota mode: none.
Apr 1 04:22:15 proxmox kernel: EXT4-fs (dm-18): write access unavailable, skipping orphan cleanup

My Volumes:

Code:

lsblk --output NAME,KNAME,TYPE,SIZE,MOUNTPOINT
NAME                         KNAME     TYPE   SIZE MOUNTPOINT
sda                          sda       disk 465.8G
├─sda1                       sda1      part   360G
└─sda2                       sda2      part    64G
sdb                          sdb       disk 953.9G
├─sdb1                       sdb1      part 953.9G
└─sdb9                       sdb9      part     8M
sdc                          sdc       disk 953.9G
├─sdc1                       sdc1      part 953.9G
└─sdc9                       sdc9      part     8M
sdd                          sdd       disk   5.5T
├─sdd1                       sdd1      part   5.5T
└─sdd9                       sdd9      part     8M
sde                          sde       disk   5.5T
├─sde1                       sde1      part   5.5T
└─sde9                       sde9      part     8M
sdf                          sdf       disk   5.5T
├─sdf1                       sdf1      part   5.5T
└─sdf9                       sdf9      part     8M
sdg                          sdg       disk  18.2T
└─sdg1                       sdg1      part  18.2T /mnt/pve/USB-20TB
zd0                          zd0       disk   162G
├─zd0p1                      zd0p1     part   500M
└─zd0p2                      zd0p2     part 161.5G
zd16                         zd16      disk    32G
├─zd16p1                     zd16p1    part   100M
├─zd16p2                     zd16p2    part    16M
├─zd16p3                     zd16p3    part  31.4G
└─zd16p4                     zd16p4    part   515M
zd32                         zd32      disk   120G
├─zd32p1                     zd32p1    part   549M
└─zd32p2                     zd32p2    part 119.5G
zd48                         zd48      disk    64G
├─zd48p1                     zd48p1    part    63G
├─zd48p2                     zd48p2    part     1K
└─zd48p5                     zd48p5    part   975M
zd64                         zd64      disk   120G
├─zd64p1                     zd64p1    part   512M
└─zd64p2                     zd64p2    part 119.5G
zd80                         zd80      disk     1M
zd96                         zd96      disk   128G
├─zd96p1                     zd96p1    part   100M
├─zd96p2                     zd96p2    part    16M
├─zd96p3                     zd96p3    part 127.3G
└─zd96p4                     zd96p4    part   625M
nvme0n1                      nvme0n1   disk 465.8G
├─nvme0n1p1                  nvme0n1p1 part  1007K
├─nvme0n1p2                  nvme0n1p2 part   512M /boot/efi
└─nvme0n1p3                  nvme0n1p3 part 465.3G
  ├─pve-swap                 dm-0      lvm      8G [SWAP]
  ├─pve-root                 dm-1      lvm     96G /
  ├─pve-data_tmeta           dm-2      lvm    3.5G
  │ └─pve-data-tpool         dm-4      lvm  338.4G
  │   ├─pve-data             dm-5      lvm  338.4G
  │   ├─pve-vm--102--disk--0 dm-6      lvm     16G
  │   ├─pve-vm--103--disk--0 dm-7      lvm     24G
  │   ├─pve-vm--104--disk--0 dm-8      lvm     16G
  │   ├─pve-vm--105--disk--0 dm-9      lvm     32G
  │   ├─pve-vm--109--disk--0 dm-10     lvm     32G
  │   ├─pve-vm--112--disk--0 dm-11     lvm     64G
  │   ├─pve-vm--101--disk--0 dm-12     lvm     24G
  │   ├─pve-vm--113--disk--0 dm-13     lvm      4M
  │   ├─pve-vm--113--disk--1 dm-14     lvm      4M
  │   ├─pve-vm--121--disk--0 dm-15     lvm     48G
  │   ├─pve-vm--115--disk--0 dm-16     lvm      4M
  │   └─pve-vm--115--disk--1 dm-17     lvm     32G
  └─pve-data_tdata           dm-3      lvm  338.4G
    └─pve-data-tpool         dm-4      lvm  338.4G
      ├─pve-data             dm-5      lvm  338.4G
      ├─pve-vm--102--disk--0 dm-6      lvm     16G
      ├─pve-vm--103--disk--0 dm-7      lvm     24G
      ├─pve-vm--104--disk--0 dm-8      lvm     16G
      ├─pve-vm--105--disk--0 dm-9      lvm     32G
      ├─pve-vm--109--disk--0 dm-10     lvm     32G
      ├─pve-vm--112--disk--0 dm-11     lvm     64G
      ├─pve-vm--101--disk--0 dm-12     lvm     24G
      ├─pve-vm--113--disk--0 dm-13     lvm      4M
      ├─pve-vm--113--disk--1 dm-14     lvm      4M
      ├─pve-vm--121--disk--0 dm-15     lvm     48G
      ├─pve-vm--115--disk--0 dm-16     lvm      4M
      └─pve-vm--115--disk--1 dm-17     lvm     32G

Code:

lvdisplay |awk  '/LV Name/{n=$3} /Block device/{d=$3; sub(".*:","dm-",d); print d,n;}'
dm-0 swap
dm-1 root
dm-5 data
dm-6 vm-102-disk-0
dm-7 vm-103-disk-0
dm-8 vm-104-disk-0
dm-9 vm-105-disk-0
dm-10 vm-109-disk-0
dm-11 vm-112-disk-0
dm-12 vm-101-disk-0
dm-13 vm-113-disk-0
dm-14 vm-113-disk-1
dm-15 vm-121-disk-0
dm-16 vm-115-disk-0
dm-17 vm-115-disk-1

My Containers/Vms:

Code:

qm list && pct list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       100 SAGE                 running    8096             120.00 7629
       110 Mathias-PC           stopped    8096             162.00 0
       111 Marianna-PC          stopped    8192             120.00 0
       113 Terminal-SRV         stopped    8092             128.00 0
       115 Firewall             running    4096              32.00 3847
       119 Win-SRV              stopped    2048              32.00 0
       122 sip                  running    4096              64.00 15667
VMID       Status     Lock         Name
101        running                 pihole
102        running                 Linux-SRV
103        running                 NC
104        running                 plex
105        running                 MySQL
106        running                 zigbee2mqtt
107        running                 Grafana
108        stopped                 mgmt
109        running                 SmartHome
112        running                 Docker
114        running                 Unifi
118        stopped                 Filme
120        running                 linux-arch
121        running                 Docker-Public
124        stopped                 Linux-New

---------------

So in short, i don't miss anything, everything is there.
After the dm-18 crash, i had to reboot my server and everything is running normally again without issues...
But i can't find out what dm-18 is, does anyone have any ideas what it could be or how i can find it?
Is it probably an corosync quorum device? im not sure how that works, but thats the only thing that comes into my mind.

Thanks for Help and Cheers

leesteken · Apr 1, 2023

Which ext4 filesystem has UUID b101ba58-704a-476e-80f2-d1e7de7a4308? Try lsblk +o UUID.
Apparently there were write errors and the filesystem got remounted read-only. The write error might be caused by a thin virtual disk that could not grow because a storage is full. Remounting read-only is often enabled for root filesystems.

Ramalama · Apr 1, 2023

leesteken said:
Which ext4 filesystem has UUID b101ba58-704a-476e-80f2-d1e7de7a4308? Try lsblk +o UUID.
Apparently there were write errors and the filesystem got remounted read-only. The write error might be caused by a thin virtual disk that could not grow because a storage is full. Remounting read-only is often enabled for root filesystems.

Code:

/dev/mapper/pve-vm--102--disk--0: UUID="b101ba58-704a-476e-80f2-d1e7de7a4308" BLOCK_SIZE="4096" TYPE="ext4"
/dev/mapper/pve-vm--105--disk--0: UUID="25a9a4e6-4588-491c-9e85-4aea535b9218" BLOCK_SIZE="4096" TYPE="ext4"

But that doesn't make any sense somehow, since after the reboot:

Code:

dm-6 vm-102-disk-0
dm-9 vm-105-disk-0

However, the root device of both disks is an 980 Pro:

Code:

/dev/nvme0n1
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        38 Celsius
Available Spare:                    89%
Available Spare Threshold:          10%
Percentage Used:                    18%
Data Units Read:                    35,766,753 [18.3 TB]
Data Units Written:                 32,101,599 [16.4 TB]
Host Read Commands:                 507,120,609
Host Write Commands:                1,049,766,673
Controller Busy Time:               11,625
Power Cycles:                       95
Power On Hours:                     19,245
Unsafe Shutdowns:                   47
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               38 Celsius
Temperature Sensor 2:               39 Celsius

The used space of the LVM Partition is (/dev/nvme0n1p3): 55.17% (200.44 GB of 363.31 GB)
The used Space of (dm-6 vm-102-disk-0): 18.81% (2.93 GiB of 15.58 GiB)
The used Space of (dm-9 vm-105-disk-0): 5.81% (1.81 GiB of 31.20 GiB)

So in short, while the error message with those UUID's seems to be related to DM-18, i think tbh that its missleading.
Probably a side effect of sth thats crashed. Not sure.

But as you see, the Storages arent full, either the lvm volumes themselves, or the partition where the lvm volumes are located.

Thanks for the reply btw! xD

Ramalama · Apr 1, 2023

Im getting further, i just looked at the time when it happens and looked at all my logs in the past:

Code:

zcat /var/log/messages* | grep dm-18
gzip: /var/log/messages: not in gzip format
gzip: /var/log/messages.1: not in gzip format
Mar 18 04:00:24 proxmox kernel: [831555.025919] EXT4-fs (dm-18): unmounting filesystem.
Mar 18 04:21:39 proxmox kernel: [832829.669932] EXT4-fs (dm-18): mounted filesystem without journal. Quota mode: none.
Mar 11 04:01:05 proxmox kernel: [226808.701484] EXT4-fs (dm-18): unmounting filesystem.
Mar 11 04:01:06 proxmox kernel: [226810.043708] EXT4-fs (dm-18): mounted filesystem without journal. Quota mode: none.
Mar 11 04:01:15 proxmox kernel: [226819.238885] EXT4-fs (dm-18): unmounting filesystem.
Mar  4 04:00:02 proxmox kernel: [919630.963528] EXT4-fs (dm-18): mounted filesystem without journal. Quota mode: none.
Mar  4 04:00:19 proxmox kernel: [919647.635609] EXT4-fs (dm-18): unmounting filesystem.
Mar  4 04:00:35 proxmox kernel: [919663.226281] EXT4-fs (dm-18): unmounting filesystem.
Mar  4 04:00:36 proxmox kernel: [919664.355965] EXT4-fs (dm-18): mounted filesystem without journal. Quota mode: none.
Mar  4 04:01:04 proxmox kernel: [919692.022282] EXT4-fs (dm-18): unmounting filesystem.
Mar  4 04:01:13 proxmox kernel: [919701.210603] EXT4-fs (dm-18): unmounting filesystem.
Mar  4 04:21:00 proxmox kernel: [920888.381291] EXT4-fs (dm-18): unmounting filesystem.

It happens always at 4am, so it's this backup job:

Bildschirmfoto 2023-04-01 um 17.51.48.png

-----
So the unmounting and mounting is fine.

So that means that dm-18 is an device that gets created when the volumes are backuping?
Its getting more and more confusing for me...

Ramalama · Apr 1, 2023

Sorry to spam, but i found the root case for my crash.

At 6am my Firewall is getting backuped up, but its getting stopped and backuped, instead of an snapshot.
It seems like it went fine always, but today the firewall (opnsense) got issues after it started again. I changed the backup job to snopshot it now instead of stopping, so that wont happen again hopefully.

However, since the firewall didn't came up again, it lead to an completely unroutable network here.
I have a lot of vlans that get routed by opnsense, thats why...
So proxmox didnt crashed actually, just the firewall.

Sorry for the confusing.

However, im still not absolutely sure what dm-18 is, since it shows up only during backup jobs, or gets created there.

So the question should be more:
Is a new dm device getting created for backup jobs?
And why do we getting errors like "write access unavailable, skipping orphan cleanup"

fweber · Apr 3, 2023

Hi, your backup job is configured to run in snapshot mode. When backing up a container in snapshot mode, vzdump creates temporary storage-level snapshots of the container volumes -- see the docs [1] for more details. In your case, I guess the underlying storage is an LVM thin pool? Then, I suppose the temporary snapshot corresponds to the temporary dm-18 device you're seeing. If you want to look into this more, you could (1) check the backup job log, it should contain lines like Logical volume "snap_vm-102-disk-0_vzdump" created. (2) run udevadm monitor while running the backup job, it should print the add/remove events of dm-18.

The "write access unavailable, skipping orphan cleanup" message is apparently just a byproduct of the snapshot being read-only, and you can ignore it [2].

Hope this helps!

[1]: https://pve.proxmox.com/pve-docs/vzdump.1.html#_backup_modes (under "Backup modes for Containers")
[2]: https://forum.proxmox.com/threads/46785/

Search

Search

[Help] Finding out what dm-18 is

Ramalama

Well-Known Member

leesteken

Distinguished Member

Ramalama

Well-Known Member

Ramalama

Well-Known Member

Ramalama

Well-Known Member

fweber

Proxmox Staff Member