So far I had 2 nodes (running 3.2) and a nfs storage, everything was more or less fine.
Backup took some not that fast, but it was good.
Now I wanted to switch to 4.1, install a third node and switch to a new Open-E Storage (Raid6 with 6 disks) with 4Gb FC.
I installed on the new server 4.1 (which made me some problems because of booting from zpool), and connected the storage.
The same happened with one of the old nodes, which I connected to the new 4.1 cluster.
My idea was to create from each VM (OpenVZ and KVM) a backup, copy it via scp to the new host and restore it.
The VM hard disks should lay on the FC storage.
Than my KVM VM's started to crash with following entry in dmesg:
[71648.166402] end_request: I/O error, dev vda, sector 0
[71806.591613] end_request: I/O error, dev vda, sector 30350400
[71806.592230] Aborting journal on device dm-0-8.
[71806.595123] journal commit I/O error
[71806.598837] EXT4-fs error (device dm-0): ext4_journal_start_sb:327: Detected aborted journal
[71806.599346] EXT4-fs (dm-0): Remounting filesystem read-only
I switched all KVM VM's write-through and so far I don't get those errors.
Afterwards I created on the new Open-E storage an NFS share for my backups.
Nodes and NFS share are connected to a seperate lan.
Last night I created a backup task, but it couldn't finish that backups....
Here is the out put from first node:
INFO: starting new backup job: vzdump 101 107 202 203 205 210 212 214 301 300 114 100 102 103 104 105 106 115 117 --mode snapshot --quiet 1 --storage SAS_NFS --compress lzo
INFO: skip external VMs: 210, 114, 102, 103, 106
INFO: Starting Backup of VM 100 (lxc)
INFO: status = running
INFO: mode failure - some volumes does not support snapshots
INFO: trying 'suspend' mode instead
INFO: backup mode: suspend
INFO: ionice priority: 7
temporary directory is on NFS, disabling xattr and acl support, consider configuring a local tmpdir via /etc/vzdump.conf
INFO: starting first sync /proc/2785/root// to /mnt/pve/SAS_NFS/dump/vzdump-lxc-100-2016_01_09-01_30_05.tmp
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/3a/971628634928ededbcf61ff4195ba3a0"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/5b/6566f71fbefae69b857c5ca62b43f5b0"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/87/29595cebce57fdc06178bc8939861870"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/fd/100d76b23440af7798f4b02d1118afd0"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/1/ec/ad1844d760c5dd3d26a58032bd381ec1"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/2/3c/81f0a3de0031e49da228fdb4b2ec03c2"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/5/3c/af7bb1d59bee2a5a68060362d1d733c5"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/b/c0/4e127f7afc4ecb76d033466cc0e94c0b"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/d/89/09f583b4fb3cf92024209bde4a15b89d"
INFO: Number of files: 37,862 (reg: 25,786, dir: 7,177, link: 4,867, dev: 2, special: 30)
INFO: Number of created files: 37,861 (reg: 25,786, dir: 7,176, link: 4,867, dev: 2, special: 30)
INFO: Number of deleted files: 0
INFO: Number of regular files transferred: 25,771
INFO: Total file size: 769,120,252 bytes
INFO: Total transferred file size: 768,465,221 bytes
INFO: Literal data: 763,698,693 bytes
INFO: Matched data: 0 bytes
INFO: File list size: 786,338
INFO: File list generation time: 0.038 seconds
INFO: File list transfer time: 0.000 seconds
INFO: Total bytes sent: 765,868,778
INFO: Total bytes received: 537,370
INFO: sent 765,868,778 bytes received 537,370 bytes 35,512.18 bytes/sec
INFO: total size is 769,120,252 speedup is 1.00
INFO: rsync warning: some files vanished before they could be transferred (code 24) at main.c(1183) [sender=3.1.1]
INFO: first sync finished (21581 seconds)
INFO: suspend vm
INFO: starting final sync /proc/2785/root// to /mnt/pve/SAS_NFS/dump/vzdump-lxc-100-2016_01_09-01_30_05.tmp
INFO: Number of files: 37,861 (reg: 25,785, dir: 7,177, link: 4,867, dev: 2, special: 30)
INFO: Number of created files: 13 (reg: 13)
INFO: Number of deleted files: 5 (reg: 5)
INFO: Number of regular files transferred: 21
INFO: Total file size: 759,803,217 bytes
INFO: Total transferred file size: 2,921,288 bytes
INFO: Literal data: 1,995,316 bytes
INFO: Matched data: 925,972 bytes
INFO: File list size: 65,526
INFO: File list generation time: 0.001 seconds
INFO: File list transfer time: 0.000 seconds
INFO: Total bytes sent: 2,870,021
INFO: Total bytes received: 16,268
INFO: sent 2,870,021 bytes received 16,268 bytes 617.19 bytes/sec
INFO: total size is 759,803,217 speedup is 263.25
INFO: final sync finished (4676 seconds)
INFO: resume vm
INFO: vm is online again after 4676 seconds
INFO: creating archive '/mnt/pve/SAS_NFS/dump/vzdump-lxc-100-2016_01_09-01_30_05.tar.lzo'
Here is the Output from second node:
INFO: starting new backup job: vzdump 102 --remove 0 --compress lzo --node vmbase3 --mode snapshot --storage SAS_NFS
INFO: Starting Backup of VM 102 (lxc)
INFO: status = running
INFO: mode failure - some volumes does not support snapshots
INFO: trying 'suspend' mode instead
INFO: backup mode: suspend
INFO: ionice priority: 7
temporary directory is on NFS, disabling xattr and acl support, consider configuring a local tmpdir via /etc/vzdump.conf
INFO: starting first sync /proc/6323/root// to /mnt/pve/SAS_NFS/dump/vzdump-lxc-102-2016_01_09-11_32_21.tmp
So far I never had really problem with my proxmox infrastructure, but this time somehow everything stuck, make problems, is crap...
I am so frustrated and disappointed...
best wishes
Stefan
Backup took some not that fast, but it was good.
Now I wanted to switch to 4.1, install a third node and switch to a new Open-E Storage (Raid6 with 6 disks) with 4Gb FC.
I installed on the new server 4.1 (which made me some problems because of booting from zpool), and connected the storage.
The same happened with one of the old nodes, which I connected to the new 4.1 cluster.
My idea was to create from each VM (OpenVZ and KVM) a backup, copy it via scp to the new host and restore it.
The VM hard disks should lay on the FC storage.
Than my KVM VM's started to crash with following entry in dmesg:
[71648.166402] end_request: I/O error, dev vda, sector 0
[71806.591613] end_request: I/O error, dev vda, sector 30350400
[71806.592230] Aborting journal on device dm-0-8.
[71806.595123] journal commit I/O error
[71806.598837] EXT4-fs error (device dm-0): ext4_journal_start_sb:327: Detected aborted journal
[71806.599346] EXT4-fs (dm-0): Remounting filesystem read-only
I switched all KVM VM's write-through and so far I don't get those errors.
Afterwards I created on the new Open-E storage an NFS share for my backups.
Nodes and NFS share are connected to a seperate lan.
Last night I created a backup task, but it couldn't finish that backups....
Here is the out put from first node:
INFO: starting new backup job: vzdump 101 107 202 203 205 210 212 214 301 300 114 100 102 103 104 105 106 115 117 --mode snapshot --quiet 1 --storage SAS_NFS --compress lzo
INFO: skip external VMs: 210, 114, 102, 103, 106
INFO: Starting Backup of VM 100 (lxc)
INFO: status = running
INFO: mode failure - some volumes does not support snapshots
INFO: trying 'suspend' mode instead
INFO: backup mode: suspend
INFO: ionice priority: 7
temporary directory is on NFS, disabling xattr and acl support, consider configuring a local tmpdir via /etc/vzdump.conf
INFO: starting first sync /proc/2785/root// to /mnt/pve/SAS_NFS/dump/vzdump-lxc-100-2016_01_09-01_30_05.tmp
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/3a/971628634928ededbcf61ff4195ba3a0"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/5b/6566f71fbefae69b857c5ca62b43f5b0"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/87/29595cebce57fdc06178bc8939861870"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/0/fd/100d76b23440af7798f4b02d1118afd0"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/1/ec/ad1844d760c5dd3d26a58032bd381ec1"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/2/3c/81f0a3de0031e49da228fdb4b2ec03c2"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/5/3c/af7bb1d59bee2a5a68060362d1d733c5"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/b/c0/4e127f7afc4ecb76d033466cc0e94c0b"
INFO: file has vanished: "/proc/2785/root/var/lib/nginx/cache/d/89/09f583b4fb3cf92024209bde4a15b89d"
INFO: Number of files: 37,862 (reg: 25,786, dir: 7,177, link: 4,867, dev: 2, special: 30)
INFO: Number of created files: 37,861 (reg: 25,786, dir: 7,176, link: 4,867, dev: 2, special: 30)
INFO: Number of deleted files: 0
INFO: Number of regular files transferred: 25,771
INFO: Total file size: 769,120,252 bytes
INFO: Total transferred file size: 768,465,221 bytes
INFO: Literal data: 763,698,693 bytes
INFO: Matched data: 0 bytes
INFO: File list size: 786,338
INFO: File list generation time: 0.038 seconds
INFO: File list transfer time: 0.000 seconds
INFO: Total bytes sent: 765,868,778
INFO: Total bytes received: 537,370
INFO: sent 765,868,778 bytes received 537,370 bytes 35,512.18 bytes/sec
INFO: total size is 769,120,252 speedup is 1.00
INFO: rsync warning: some files vanished before they could be transferred (code 24) at main.c(1183) [sender=3.1.1]
INFO: first sync finished (21581 seconds)
INFO: suspend vm
INFO: starting final sync /proc/2785/root// to /mnt/pve/SAS_NFS/dump/vzdump-lxc-100-2016_01_09-01_30_05.tmp
INFO: Number of files: 37,861 (reg: 25,785, dir: 7,177, link: 4,867, dev: 2, special: 30)
INFO: Number of created files: 13 (reg: 13)
INFO: Number of deleted files: 5 (reg: 5)
INFO: Number of regular files transferred: 21
INFO: Total file size: 759,803,217 bytes
INFO: Total transferred file size: 2,921,288 bytes
INFO: Literal data: 1,995,316 bytes
INFO: Matched data: 925,972 bytes
INFO: File list size: 65,526
INFO: File list generation time: 0.001 seconds
INFO: File list transfer time: 0.000 seconds
INFO: Total bytes sent: 2,870,021
INFO: Total bytes received: 16,268
INFO: sent 2,870,021 bytes received 16,268 bytes 617.19 bytes/sec
INFO: total size is 759,803,217 speedup is 263.25
INFO: final sync finished (4676 seconds)
INFO: resume vm
INFO: vm is online again after 4676 seconds
INFO: creating archive '/mnt/pve/SAS_NFS/dump/vzdump-lxc-100-2016_01_09-01_30_05.tar.lzo'
Here is the Output from second node:
INFO: starting new backup job: vzdump 102 --remove 0 --compress lzo --node vmbase3 --mode snapshot --storage SAS_NFS
INFO: Starting Backup of VM 102 (lxc)
INFO: status = running
INFO: mode failure - some volumes does not support snapshots
INFO: trying 'suspend' mode instead
INFO: backup mode: suspend
INFO: ionice priority: 7
temporary directory is on NFS, disabling xattr and acl support, consider configuring a local tmpdir via /etc/vzdump.conf
INFO: starting first sync /proc/6323/root// to /mnt/pve/SAS_NFS/dump/vzdump-lxc-102-2016_01_09-11_32_21.tmp
So far I never had really problem with my proxmox infrastructure, but this time somehow everything stuck, make problems, is crap...
I am so frustrated and disappointed...
best wishes
Stefan