Container with RBD disks can not be started after shutdown

ericchu630 · Nov 24, 2019

Hi,
I am running PVE 6.0-9

I have a container with two ceph disks.
rootfs is 8G and mp0 is 10T.
This is a mail server so I need to migrate old mails into the mp0 as data disk.
Everything goes smoothly until I press shutown button in web console.

After that, I can not pct start, I can not pct fsck.
When I do, I get this error in dmesg:
EXT4-fs warning (device rbd4): read_mmp_block:111: Error -117 wh ile reading MMP block 9255

rbd4 is mp0 and is the problematic file system.
rootfs does not show any error.

I can see the device in /dev/.
When I do fsck /dev/rbd4, fsck says it detects invalid super block magic.
If I choose to fix the errors in fsck, the container still would not start, and in fsck the same errors show up again.

The problem can only be solved when mount mp0 with another rbd disk.

Please help let me know what I should to and how I can avoid this from happening.
Thank you very much for your kind help.
Best Regards,
Eric

ericchu630 · Nov 26, 2019

Please help as I am not sure what the problem is
Thanks again
Best Regards,
Eric

Alwin · Nov 26, 2019

How is the container configured, pct config <id>? Once the container is stopped, can you migrate it to a different node and try there to start it?

ericchu630 · Nov 28, 2019

Arch: amd64
Cores: 8
Hostname: ct-mail
Memory: 49152
Mp0: ceph-ct:vm-105-disk-1,mp0=/mnt/maildata,mountoptions=nodev;noatime,size=10T
Net0:name=eth0,bridge=vmbr0,firewall=1,hwaddr=xxxxx,ip=DHCP,type=veth
Net1:name=eth1,bridge=vmbr1,firewall=1,hwaddr=xxxxx,ip=192.xxxxx/24,ip6=dhcp,type=veth
Ostype: centos
Parent: installed
Rootfs: ceph-ct:vm105-105-0,size=8G
Swap:49152
Unprivileged: 1

ericchu630 · Nov 28, 2019

Migrated to other nodes but still same error in dmesg

EXT4-fs warning (device rbd4): read_mmp_block:111: Error -117 wh ile reading MMP block 9255

Pls help!

Alwin · Nov 28, 2019

ericchu630 said:
EXT4-fs warning (device rbd4): read_mmp_block:111: Error -117 wh ile reading MMP block 9255

If the mp0 is not mounted anywhere and a fsck -f is note helping, then you best restore from backup.

The following is at your own risk. You can try to restore the alternative superblock.
https://www.cyberciti.biz/faq/recover-bad-superblock-from-corrupted-partition/

ericchu630 · Nov 29, 2019

Thanks, but as mentioned above, I have tried that but didn't help.

Can you advise why is the superblock so easily corrupted? (when inadventently shutdown or restarted)
This only happens with ceph disk, not local disk.
I have tried with default mount options, non-default mount options(as shown above).
I have tried with ubuntu buster and centos 7.

Unfortunately I have suffered production data loss due to this.
And I really hope to prevent that from happening again.

Alwin · Nov 29, 2019

ericchu630 said:
Can you advise why is the superblock so easily corrupted?

Not really. But maybe something was logged in the journal/syslog (also inside the CT).

ericchu630 · Nov 30, 2019

Is this problem common or is it something rare?
Am I doing something wrong?

Alwin · Dec 2, 2019

ericchu630 said:
Is this problem common or is it something rare?
Am I doing something wrong?

Not that I know of. And I don't think you do something out of the ordinary. It may be some hardware fault that caused it. But without anything in the logs, it is hard to tell.

Search

Search

Container with RBD disks can not be started after shutdown

ericchu630

New Member

ericchu630

New Member

Alwin

Proxmox Retired Staff

ericchu630

New Member

ericchu630

New Member

Alwin

Proxmox Retired Staff

ericchu630

New Member

Alwin

Proxmox Retired Staff

ericchu630

New Member

Alwin

Proxmox Retired Staff

We value your privacy