Search results

  1. L

    Migration failed (while replication seemed ok)

    Proxmox tries to migrate back the machines every 15 minutes. The restore of a 300GB machine is running - I am hoping that this is the issue - waiting for that to finish before further action. This is a sample of the relocation initiated by Proxmox (failing): task started by HA resource agent...
  2. L

    Migration failed (while replication seemed ok)

    # ha-manager status quorum OK master p3 (active, Wed Dec 20 22:04:13 2017) lrm p1 (idle, Wed Dec 20 22:04:18 2017) lrm p3 (active, Wed Dec 20 22:04:09 2017) lrm p4 (active, Wed Dec 20 22:04:16 2017) service ct:100 (p4, started) service ct:102 (p3, relocate) service ct:104 (p3, relocate) service...
  3. L

    Migration failed (while replication seemed ok)

    Root cause is better found in report of the migration that failed just before...
  4. L

    Migration failed (while replication seemed ok)

    Tried to migrate some servers for server hardware intervention. Migration failed and yet another nightmare to understand why. Replication seemed to be working, but still the migration of several machines failed. I now have to examine (with servers down) which replicate to retrieve and how...
  5. L

    No access to /etc/pve/nodes/p4/lcx/ during backup - several backups fail.

    Thanks for the suggestion. I forgot to come back to this thread - one of the nodes "silently" rebooted - silently because no VM was running on it and there is no mail after a reboot. It was at that moment that the backup progresse was "interrupted". I think that backup on a functionnal node...
  6. L

    No access to /etc/pve/nodes/p4/lcx/ during backup - several backups fail.

    I have daily backups for most of the virtual machines. This night several backups failed for a new reason: Backup of VM XXX failed - unable to open file /etc/pve/nodes/p4/lxc/XXX.conf.tmp.PPPP' where XXX is the CT number and PPPP the process number. A few backups worked and all following had...
  7. L

    Migration hooks

    Up. Same question here.
  8. L

    Storage Replication regularly stops

    After another failure (possibly HW error), the backup which was running was interrupted. So I was initially left with a lock on the machine (removed with 'pct unlock 109'), and I still had a snapshot according to the GUI. So I requested to remove the snapshot. The snapshot now is no longer...
  9. L

    Storage Replication regularly stops

    Now I have a backup issue, possibly related to the same root issue: The partial result from "pct list" is: 104 running snapshot-delete www The replication seems to work though. I performed a "pct unlock 104" to unlock the VM - I'll see if backup works again.
  10. L

    Storage Replication regularly stops

    There is another cause for the replication to stop, I do not know yet where to start looking. At this time the solution is to destroy the zfs filesystems on the target server, but that is not a good solution and it is unsafe (human error is to be expected): # zfs destroy -r...
  11. L

    Storage Replication regularly stops

    I opened the report: https://bugzilla.proxmox.com/show_bug.cgi?id=1538 .
  12. L

    Storage Replication regularly stops

    The problem occurred again this morning. That confirms that it is systematic when the backup occurs. This time the 'lxc-unfreeze' did not do the trick - I had to kill the 'lxc-freeze' process itself. I have now time-shifted the replications by one minute which will probably limit the...
  13. L

    Storage Replication regularly stops

    I forgot to check the GUI which shows that I am not done yet: So while stopping the 'lxc-freeze' with 'lxc-unfreeze' was stopping the backup task, the state of the replication was that it was "syncing". It reports the duration as 23.0h, but I the unfreeze was done more than 23 hours after the...
  14. L

    Storage Replication regularly stops

    Ok, I did an "lsof" the lockfile: # lsof /var/run/vzdump.lock COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME task\x20U 9077 root 5wW REG 0,23 0 1923 /run/vzdump.lock Then I grepped the processes: # ps -edalf | grep 9077 5 S root 9077 8930 0 80 0 - 81338 poll_s...
  15. L

    Storage Replication regularly stops

    I now got a backup failure. I guess that this is because the replication that started yesterday, is still running according to the WebGUI (probably in the "freeze guest filesystem" state. This is the content of the mail I received: can't aquire lock '/var/run/vzdump.lock' - got timeout VMID...
  16. L

    Storage Replication regularly stops

    According to the WebGui, VM102 is still syncing at 19:30 CEST. I havent found the command yet to fix this. I noticed that I can get a log of the sync - it is in the "freeze" step: 2017-10-25 05:00:00 102-0: start replication job 2017-10-25 05:00:00 102-0: guest => CT 102, running => 1...
  17. L

    Storage Replication regularly stops

    Ok, my backup report says "backup mode:snapshot". Here is the transcript (start time: 07:00 CEST). The server names have been changed to "example.com" for privacy: INFO: starting new backup job: vzdump --compress gzip --mailto support@example.com --mailnotification failure --node p4 --storage...
  18. L

    Storage Replication regularly stops

    Good that it works somewhere ;-). I am on Proxmox 5.0-33 . I currently have one LXC syncing on ZFS since 7 hours CET this morning and it is 13:00 CET now (the machines are synced every 15 minutes and the other machines had their last sync at 6:45). The previous sync took 6.8 seconds according...
  19. L

    Storage Replication regularly stops

    I'am pretty much getting the same experience here. It may be related to "reboots" of the target machine, which implies I have to remove all zfsStorageCopies before being able to start replication again. (using "zfs destroy vmpool/subvol-102-disk-1" like command line commands). That's pretty...