After powerloss one of the containers do not start

Bedarbis

New Member
Apr 6, 2022
4
0
1
Hello. I am really new to this stuff so dont be harsh on me :). So we (my job) had some powerloss two days ago and one of the VM fails to start after that with an error "Job for pve-container@100.service failed because the control process exited with error code.
See "systemctl status pve-container@100.service" and "journalctl -xe" for details.
TASK ERROR: command 'systemctl start pve-container@100' failed: exit code 1".
So i checked the journal and it show this kind of error (or probably errors) - "Buffer O/O error on dev dm-7, logical block 472538 (and 8 other blocks), lost async page write. What should i do? If you please explain to me as simple as possible and with as much info as possible i am really new to this stuff because it wasnt me who build VM proxmox ;/
 
First I would backup that LXC before trying to fix it, so if you screw it up, you can still return to the previous state.
Then I would mount that blockdevice and run fsck so it can try to fix it.
And I would buy a UPS, put that in front of your server, install NUT server on your PVE host and connect the UPS to the NUT server so the server can run on backup battery and automatically gets shutdown so you don't loose data next time a power outage occures.
 
First I would backup that LXC before trying to fix it, so if you screw it up, you can still return to the previous state.
Then I would mount that blockdevice and run fsck so it can try to fix it.
And I would buy a UPS, put that in front of your server, install NUT server on your PVE host and connect the UPS to the NUT server so the server can run on backup battery and automatically gets shutdown so you don't loose data next time a power outage occures.
Thank you for a reply but can you explain it in more detail? Because i found it earlier that i have to fsck it but i do not know exact command to do that. And how should i mount blockdevice in the first place for fsck to fix it?
 
Did you already backed it up? Because fixing the filesystem might destroy it. So I wouldn'T touch that until you are sure that you could restore it from backup.

You should also give more infos like what storage is used to store the LXC, what template that LXC was created with and so on.

For example the outputs of lsblk, pct config <VMIDofYourLXC>, cat /etc/pve/storage.cfg, fdisk -l could be useful.

And is it actually a LXC or a VM?
 
Last edited:
Sad news ;/ it fails to mount
INFO: starting new backup job: vzdump 100 --storage nfsbackup --node pve83 --remove 0 --mode snapshot --compress gzip
INFO: Starting Backup of VM 100 (lxc)
INFO: Backup started at 2022-04-06 14:39:42
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: srv-t83
INFO: creating archive '/mnt/pve/nfsbackup/dump/vzdump-lxc-100-2022_04_06-14_39_42.tar.gz'
mount: /dev/mapper/pve-vm--100--disk--1: can't read superblock
umount: /mnt/vzsnap0/: not mounted
command 'umount -l -d /mnt/vzsnap0/' failed: exit code 32
ERROR: Backup of VM 100 failed - command 'mount /dev/dm-7 /mnt/vzsnap0//' failed: exit code 32
INFO: Failed at 2022-04-06 14:39:43
INFO: Backup job finished with errors
TASK ERROR: job errors