I have three Proxmox machines making backups to a Samba share on a Netgear NAS appliance. The SMB share is mounted to a directory on the Proxmox machines and that directory is added as a storage device in Proxmox. Two of the nodes are version 3.0, the other is 2.3. The NAS failed sometime last night and something unexpected happened when these Proxmox machines attempted to run backups.
They each have a backup job running since midnight and a single log entry like the following:
INFO: starting new backup job: vzdump 100 104 105 102 108 --quiet 1 --mode snapshot --mailto adam@plexicomm.net --compress lzo --storage dot29backup
And then no further details about the status of the backup. This wouldn't be so strange by itself, but the web interface for Proxmox became very weird. In the search tab I see a list of VM's. The type column shows the darker colored icon as if they are stopped, the description column shows only the ID number, and the rest of the columns are blank. The VM's are actually running and I can open their consoles and so forth. I can ssh to the machines ok, but most parts of the web GUI are crippled.
On one of the 3.0 nodes I was able to stop the backup. The others don't seem to respond to the stop button. On the other 3.0 node I killed the vzdump processes, but it didn't seem to make any difference. A reboot fixed everything. I won't be able to reboot the other two for 14 hours or so to reach low traffic time in this time zone.
1) If anybody on the Proxmox team wants access to these systems while they're in this condition, I'd be happy to arrange it....but you'll have to respond today because I will reboot them at 03:00 (GMT -4).
2) If I used NFS or iSCSI instead, would the failure be handled more gracefully? I know I need to deal with the problem on the file server, but it would be helpful to know for the future.
They each have a backup job running since midnight and a single log entry like the following:
INFO: starting new backup job: vzdump 100 104 105 102 108 --quiet 1 --mode snapshot --mailto adam@plexicomm.net --compress lzo --storage dot29backup
And then no further details about the status of the backup. This wouldn't be so strange by itself, but the web interface for Proxmox became very weird. In the search tab I see a list of VM's. The type column shows the darker colored icon as if they are stopped, the description column shows only the ID number, and the rest of the columns are blank. The VM's are actually running and I can open their consoles and so forth. I can ssh to the machines ok, but most parts of the web GUI are crippled.
On one of the 3.0 nodes I was able to stop the backup. The others don't seem to respond to the stop button. On the other 3.0 node I killed the vzdump processes, but it didn't seem to make any difference. A reboot fixed everything. I won't be able to reboot the other two for 14 hours or so to reach low traffic time in this time zone.
1) If anybody on the Proxmox team wants access to these systems while they're in this condition, I'd be happy to arrange it....but you'll have to respond today because I will reboot them at 03:00 (GMT -4).
2) If I used NFS or iSCSI instead, would the failure be handled more gracefully? I know I need to deal with the problem on the file server, but it would be helpful to know for the future.