Container with RBD disks can not be started after shutdown

ericchu630

New Member
Nov 24, 2019
6
0
1
39
Hi,
I am running PVE 6.0-9


I have a container with two ceph disks.
rootfs is 8G and mp0 is 10T.
This is a mail server so I need to migrate old mails into the mp0 as data disk.
Everything goes smoothly until I press shutown button in web console.

After that, I can not pct start, I can not pct fsck.
When I do, I get this error in dmesg:
EXT4-fs warning (device rbd4): read_mmp_block:111: Error -117 wh ile reading MMP block 9255

rbd4 is mp0 and is the problematic file system.
rootfs does not show any error.

I can see the device in /dev/.
When I do fsck /dev/rbd4, fsck says it detects invalid super block magic.
If I choose to fix the errors in fsck, the container still would not start, and in fsck the same errors show up again.

The problem can only be solved when mount mp0 with another rbd disk.

Please help let me know what I should to and how I can avoid this from happening.
Thank you very much for your kind help.
Best Regards,
Eric
 
How is the container configured, pct config <id>? Once the container is stopped, can you migrate it to a different node and try there to start it?
 
Arch: amd64
Cores: 8
Hostname: ct-mail
Memory: 49152
Mp0: ceph-ct:vm-105-disk-1,mp0=/mnt/maildata,mountoptions=nodev;noatime,size=10T
Net0:name=eth0,bridge=vmbr0,firewall=1,hwaddr=xxxxx,ip=DHCP,type=veth
Net1:name=eth1,bridge=vmbr1,firewall=1,hwaddr=xxxxx,ip=192.xxxxx/24,ip6=dhcp,type=veth
Ostype: centos
Parent: installed
Rootfs: ceph-ct:vm105-105-0,size=8G
Swap:49152
Unprivileged: 1
 
Migrated to other nodes but still same error in dmesg

EXT4-fs warning (device rbd4): read_mmp_block:111: Error -117 wh ile reading MMP block 9255

Pls help!
 
Thanks, but as mentioned above, I have tried that but didn't help.

Can you advise why is the superblock so easily corrupted? (when inadventently shutdown or restarted)
This only happens with ceph disk, not local disk.
I have tried with default mount options, non-default mount options(as shown above).
I have tried with ubuntu buster and centos 7.

Unfortunately I have suffered production data loss due to this.
And I really hope to prevent that from happening again.
 
Last edited:
Can you advise why is the superblock so easily corrupted?
Not really. But maybe something was logged in the journal/syslog (also inside the CT).
 
Is this problem common or is it something rare?
Am I doing something wrong?
Not that I know of. And I don't think you do something out of the ordinary. It may be some hardware fault that caused it. But without anything in the logs, it is hard to tell.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!