Samba Re-Mount issue causes server crash

DerDanilo

Renowned Member
Jan 21, 2017
477
139
83
This happened already in the past (1,5 years ago) before Proxmox VE started supporting mounting CIFS shares via GUI. Back then I wrote a script that would check every Minute if it can read the content of a file on the samba share and trigger a forced unmount and remount of a failed share.
After 3 retries, with 30 seconds waiting time, it would then trigger an critical alert that it's unable to remount the CIFS share.

Now the issue with that is that it causes the server to crash, since it writes million of entries like that into the syslog until the disk is full. On a standard setup of PVE there is no separate log partition, hence this will cause any host to crash at a specific point in time (when the disk is full).

A hard reboot is the only thing that helps to bring the host back as even the console won't react anymore.

@PVE devs:
Can you please build in a check for the CIFS drive that checks if the drive is available before the system triggers a constant remount nightmare?
It should also check for the current hostname resolution, in case that the IP behind the CIFS share should have changed. This does rarely happen but is quiet common if you rent storage via CIFS shares from providers.

Can you implement such feature please to improve the stability of the system?

Currently running PVE 5.3 Community on this box.

Code:
Dec 16 06:25:03 pve1 kernel: [1524357.930340] cifs_vfs_err: 2 callbacks suppressed
Dec 16 06:25:03 pve1 kernel: [1524357.930341] CIFS VFS: Free previous auth_key.response = 0000000035620a02
Dec 16 06:25:03 pve1 kernel: [1524357.932165] CIFS VFS: Send error in SessSetup = -13
 
Unstable NFS or CIFS mounts on shared and overloaded/unstable servers will always cause issues, you cannot make unstable server stable on client side.

But yes, maybe there is some room for improvements - fastest way is to connect to a stable server environment (CIFS).
 
  • Like
Reactions: DerDanilo
Thanks for your reply.

PVE handles the mounts, hence there could be a mechanism that checks if the mount is actually writable/readable and trigger a remount if not. The same could be valid for NFS.
It is very nice that PVE handles the mounts via the cluster config/settings but therefor it should also be able to handle remounts of any broken storage mount.

Simply saying "fastest way is to connect to a stable server environment (CIFS)" doesn't really help, even if it's true.
On the other hand there is no such thing as 100% stable and reliable environment, especially not in the storage world.

When one disables storage via PVE API the mount isn't necessarly removed from the local machine, hence this would need to be handled as well. Of course I could write a script for that but then again the workaround does only work for me and we want to improve PVE alltogether. :)
 
This happened already in the past (1,5 years ago) before Proxmox VE started supporting mounting CIFS shares via GUI. Back then I wrote a script that would check every Minute if it can read the content of a file on the samba share and trigger a forced unmount and remount of a failed share.
After 3 retries, with 30 seconds waiting time, it would then trigger an critical alert that it's unable to remount the CIFS share.

Now the issue with that is that it causes the server to crash, since it writes million of entries like that into the syslog until the disk is full. On a standard setup of PVE there is no separate log partition, hence this will cause any host to crash at a specific point in time (when the disk is full).

A hard reboot is the only thing that helps to bring the host back as even the console won't react anymore.

@PVE devs:
Can you please build in a check for the CIFS drive that checks if the drive is available before the system triggers a constant remount nightmare?
It should also check for the current hostname resolution, in case that the IP behind the CIFS share should have changed. This does rarely happen but is quiet common if you rent storage via CIFS shares from providers.

Can you implement such feature please to improve the stability of the system?

Currently running PVE 5.3 Community on this box.

Code:
Dec 16 06:25:03 pve1 kernel: [1524357.930340] cifs_vfs_err: 2 callbacks suppressed
Dec 16 06:25:03 pve1 kernel: [1524357.930341] CIFS VFS: Free previous auth_key.response = 0000000035620a02
Dec 16 06:25:03 pve1 kernel: [1524357.932165] CIFS VFS: Send error in SessSetup = -13


Hi, im having the same problem, how you fixed that?

I tried to disable the storage in pve and manual unmount it, but didn't work.

Thanks.
 
  • Like
Reactions: DerDanilo
Hi, im having the same problem, how you fixed that?

I tried to disable the storage in pve and manual unmount it, but didn't work.

Thanks.

I will take care of this within the next weeks. Once I got it working properly I'll let you know.
 
I have no time to contribute to the official repo but I can post the script on Github.
https://github.com/DerDanilo/proxmox-stuff/blob/master/umount-stale-mount.sh

Be carefull as this will umount even if the system should still write to the target.
I usually have an additional 1G Partiotion for `/mnt` to limit the amount of storage that could be written to the localdisk in case that something goes wrong.

Hope this helps.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!