Increase LXC container STARTUP timeout ?

Chentoa

Well-Known Member
Feb 14, 2018
32
1
48
47
I've got a small Proxmox 5.1 server running 8 LXC containers (Debian).

Randomly, when the server reboots (usually after a power outage), 1 of the 8 containers (always the same) doesn't start. I can start it manually without any problem.
Only difference between this container and the 7 others : it has a second disk mounted.

Looking at the logs, I guess it has something to do with multi mount protection :

(...)
mars 07 19:49:22 saturn pve-guests[5241]: starting CT 1011: UPID:saturn:00001479:000018EF:5C8167B2:vzstart:1011:root@pam:
mars 07 19:49:22 saturn pve-guests[3857]: <root@pam> starting task UPID:saturn:00001479:000018EF:5C8167B2:vzstart:1011:root@pam:
mars 07 19:49:22 saturn pvestatd[3732]: status update time (38.181 seconds)
mars 07 19:49:22 saturn systemd[1]: Starting PVE LXC Container: 1011...
(...)
mars 07 19:49:23 saturn kernel: EXT4-fs warning (device loop1): ext4_multi_mount_protect:324: MMP interval 42 higher than expected, please wait.
(...)
mars 07 19:50:09 saturn kernel: EXT4-fs (loop1): 1 orphan inode deleted
mars 07 19:50:09 saturn kernel: EXT4-fs (loop1): recovery complete
mars 07 19:50:09 saturn kernel: EXT4-fs (loop1): mounted filesystem with ordered data mode. Opts: (null)
mars 07 19:50:09 saturn kernel: EXT4-fs warning (device loop2): ext4_multi_mount_protect:324: MMP interval 42 higher than expected, please wait.
(...)
mars 07 19:50:52 saturn systemd[1]: pve-container@1011.service: Start operation timed out. Terminating.
mars 07 19:50:52 saturn systemd[1]: Failed to start PVE LXC Container: 1011.
mars 07 19:50:52 saturn systemd[1]: pve-container@1011.service: Unit entered failed state.
mars 07 19:50:52 saturn systemd[1]: pve-container@1011.service: Failed with result 'timeout'.
mars 07 19:50:52 saturn pve-guests[5241]: command 'systemctl start pve-container@1011' failed: exit code 1
(...)
mars 07 19:50:54 saturn kernel: EXT4-fs (loop2): 1 orphan inode deleted
mars 07 19:50:54 saturn kernel: EXT4-fs (loop2): recovery complete
mars 07 19:50:54 saturn kernel: EXT4-fs (loop2): mounted filesystem with ordered data mode. Opts: noacl
(...)

I have a hunch that if the container 1011 would have waited about 5 seconds more before timing out, then loop2 would have been mounted and the container would have started.

1°/ Is there a way to increase the startup timeout of an LXC container ?

2°/ Any other idea how to solve this ?
 
Last edited:
I've got a small Proxmox 5.1 server
Please consider upgrading to 5.3 and the latest security updates! Quite a few bugs have been fixed in the meantime.

also - the warning would indicate that the container's fs is already mounted somewhere - maybe check if you have the container's image mounted somewhere? (sadly a simple mount is not enough since it doesn't show mounts from other namespaces)
 
OK I'll do the upgrade

But out of curiosity, is there a way to increase the startup timeout of an LXC container ?
 
Sounds good @upgrade!

regarding the timeout question - you could try to set the 'TimeoutStartSec' parameter for the pve-container@$vmid.service (see `man systemd.service`, and `man systemctl` (look for edit))
 
Same issue on 5.3-12. Secondary drive is very large, so may periodically take a while to fsck on start.

Increase timeout as discussed here

Bash:
# check current setting - note parameter is micro-Sec
# replace 100 with your problem container
systemctl show pve-container@100.service -p TimeoutStartUSec
# (should be 2min)
# create an upgrade-safe override file
systemctl edit --full pve-container@100.service
# add parameter - note in plain sec
# add a comment so you know what you've done when and why
TimeoutStartSec=480s
# probably no need to restart (disrupt) right now. Will be active on next start.
# check setting, same as above
systemctl show pve-container@100.service -p TimeoutStartUSec
# should be 8min