Container Backup Issue (Ceph->NFS)

JVore · Oct 10, 2018

Hi everyone,

the issue is the following:

I set up daily backups for four different containers on two different nodes. All containers are on ceph-storage on ssd's, the nodes are connected in one pve-cluster (3 nodes alltogether) and the backup goes to an nfs-directory located on the third node.

Node 1:
CT 203

Node 2:
CT 201
CT 202
CT 204

vzdump.cron:

Code:

0 12 * * 1,2,3,4,5 root vydump 204 202 201 203 --mailnotification failure --quiet 1 --mode snapshot --storage nfsproxmox -compress lzo

proxmox-version 5.2.8 (running kernel: 4.15.18-4-pve)
ceph version 12.2.8

From my understanding the backups on each node are done serially and there should not be an issue when there are two backup-tasks that are spawned on each of the nodes, one on node1, one on node2.

The issue is that the backup of CT203 is hanging and would not do anything until it reaches the timeout-limit or is stopped manually. This is not the first time this happened to a container. Previously I would come in the next morning (10AM) after the backup would start at 4AM in the morning and seeing the backup still trying to run.

I started doing those manually or put them all on one node to avoid the problem. Unfortunately I don't have any more verbose logging for this, just start of the backup and the termination by myself.

Am I misunderstanding anything about the backup-structure or how it is supposed to be working?

The issue is quite similar described to the one here (unfortunately its in german)
https://forum.proxmox.com/threads/in-letzter-zeit-immer-wieder-hängende-backups-von-ceph-nfs.44343/

Now stopping the backup leaves the node in a peculiar state as described in this ticket:
https://forum.proxmox.com/threads/how-do-i-remove-a-ceph-vzdump-snapshot.36573/

I can manually delete the vzdump from rbd, (rbd snap rm ssd_poool-vm-203-disk-0@vzdump)
unlock the container (pct unlock 203),
remove the symlink from /dev/rbd/ssd_ool/vm-203-disk-0@vzdump
(this symlink recreates itself though)

and after that I am able to make a new manual backup of the container. Though the first rbd-disk that was to be used for the backup is still there and in read-only mode, which I can't get rid off, unless I reboot the node. Also I can't see any snapshots anymore in the GUI anymore (pct still lists them properly) .

As I am not able to restart the nodes frequently, but still want to use the vzdump-feature, I would be greatful for any help. I am aware that I could space out the backups of the different nodes, but as I thought that four should not be too much (and normally they only take about ~2 minutes to finish) I decided to post this.

Thanks in advance,
Johannes

JVore · Oct 15, 2018

Updating this with some more info, as its happening again. This time the backupjob is started only on one node and is currently still trying to make a backup of the second container. 3 hours and 34 minutes passed at this point

Case Container "Bipa" 15.10.18: Backup Started 12:00

Code:

Task viewer: Backup

INFO: starting new backup job: vzdump 204 202 201 --quiet 1 --mailnotification failure --compress lzo --storage nfsproxmox --mode snapshot
INFO: Starting Backup of VM 201 (lxc)
INFO: status = running
INFO: CT Name: api.rise
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd11
INFO: creating archive '/mnt/pve/nfsproxmox/dump/vzdump-lxc-201-2018_10_15-12_00_01.tar.lzo'
INFO: Total bytes written: 2313134080 (2.2GiB, 44MiB/s)
INFO: archive file size: 1000MB
INFO: remove vzdump snapshot
Removing snap: 100% complete...done.
INFO: Finished Backup of VM 201 (00:00:58)
INFO: Starting Backup of VM 202 (lxc)
INFO: status = running
INFO: CT Name: bipa.rise
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd11
INFO: creating archive '/mnt/pve/nfsproxmox/dump/vzdump-lxc-202-2018_10_15-12_00_59.tar.lzo'

NFS-Backup-content of the unfinished backup

Code:

1.1G   vzdump-lxc-202-2018_10_15-12_00_59.tar.dat

Content of vzdump-lxc-202-2018_10_15-12_00_59.tmp/etc/vzdump/pct.conf:

arch: amd64
cores: 2
hostname: bipa.rise
memory: 2048
nameserver: 127.0.0.1
net0: name=eth0,bridge=vmbr0,gw=10.10.11.254,hwaddr=AE:BA:7B:59:8D:62,ip=10.10.10.33/23,type=veth
ostype: centos
rootfs: ssd_pool:vm-202-disk-0,size=16G
searchdomain: rise
swap: 512

the size of a normal backup is also 1.1GB so technically vzdump-lxc-202-2018_10_15-12_00_59.tar.dat already has the correct size, so I am wondering if anything is locking/blocking the container-backup

some additional info

Code:

prox2:/dev/rbd/ssd_pool
12:01 vm-202-disk-0@vzdump -> ../../rbd11

rbd showmapped
11 ssd_pool vm-202-disk-0 vzdump /dev/rbd11

/etc/exports
/nfsproxmox *(rw,sync,no_root_squash,no_subtree_check)

Cluster-config

three nodes in the cluster
6 SSD-osds(two on each) 3 hdd-osds(one on each)
one 6TB disk (for backups/templates) as the nfs-directory

I also discovered when deleting the symlink to the failed backup-rbd-drive, you are able to create a new backup of the container that had issues. After you create the new successful update, the symlink to the old failed rbd-drive reappears.

Adding the syslog lines from when the backup started (sorry for not posting this the last time) , very similar to the already linked thread

Code:

Oct 15 12:00:01 prox2 CRON[3261801]: (root) CMD (vzdump 204 202 201 --mailnotification failure --mode snapshot --quiet 1 --storage nfsproxmox --compress lzo)
Oct 15 12:00:01 prox2 vzdump[3261802]: <root@pam> starting task UPID:prox2:0031C573:0FF1A408:5BC46521:vzdump::root@pam:
Oct 15 12:00:01 prox2 vzdump[3261811]: INFO: starting new backup job: vzdump 204 202 201 --quiet 1 --mailnotification failure --compress lzo --storage nfsproxmox --mode snapshot
Oct 15 12:00:01 prox2 vzdump[3261811]: INFO: Starting Backup of VM 201 (lxc)
Oct 15 12:00:02 prox2 kernel: [2674947.075311] rbd: rbd11: capacity 17179869184 features 0x1
Oct 15 12:00:02 prox2 kernel: [2674947.148390] EXT4-fs (rbd11): write access unavailable, skipping orphan cleanup
Oct 15 12:00:02 prox2 kernel: [2674947.148422] EXT4-fs (rbd11): mounted filesystem without journal. Opts: noload
Oct 15 12:00:59 prox2 vzdump[3261811]: INFO: Finished Backup of VM 201 (00:00:58)
Oct 15 12:00:59 prox2 vzdump[3261811]: INFO: Starting Backup of VM 202 (lxc)
Oct 15 12:01:00 prox2 kernel: [2675005.256028] rbd: rbd11: capacity 17179869184 features 0x1
Oct 15 12:01:00 prox2 systemd[1]: Starting Proxmox VE replication runner...
Oct 15 12:01:00 prox2 kernel: [2675005.324529] EXT4-fs (rbd11): write access unavailable, skipping orphan cleanup
Oct 15 12:01:00 prox2 kernel: [2675005.324562] EXT4-fs (rbd11): mounted filesystem without journal. Opts: noload
Oct 15 12:01:00 prox2 systemd[1]: Started Proxmox VE replication runner.
Oct 15 12:01:57 prox2 kernel: [2675062.208505] EXT4-fs error (device rbd11): ext4_lookup:1575: inode #393456: comm tar: deleted inode referenced: 396704
Oct 15 12:01:57 prox2 kernel: [2675062.208639]
Oct 15 12:01:57 prox2 kernel: [2675062.208639] Assertion failure in rbd_queue_workfn() at line 4035:
Oct 15 12:01:57 prox2 kernel: [2675062.208639]
Oct 15 12:01:57 prox2 kernel: [2675062.208639]     rbd_assert(op_type == OBJ_OP_READ || rbd_dev->spec->snap_id == CEPH_NOSNAP);
Oct 15 12:01:57 prox2 kernel: [2675062.208639]
Oct 15 12:01:57 prox2 kernel: [2675062.208823] ------------[ cut here ]------------
Oct 15 12:01:57 prox2 kernel: [2675062.208824] kernel BUG at drivers/block/rbd.c:4035!
Oct 15 12:01:57 prox2 kernel: [2675062.208843] invalid opcode: 0000 [#1] SMP PTI
Oct 15 12:01:57 prox2 kernel: [2675062.208855] Modules linked in: rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss tcp_diag inet_diag veth rbd libceph nfsv3 nfs_acl nfs lockd grace fscache ip_set ip6table_filter ip6_tables xfs libcrc32c iptable_filter softdog nfnetlink_log nfnetlink 8021q garp mrp ipmi_ssif intel_rapl skx_edac x86_pkg_temp_thermal intel_powerclamp ast coretemp ttm kvm_intel drm_kms_helper kvm drm irqbypass crct10dif_pclmul i2c_algo_bit crc32_pclmul fb_sys_fops ghash_clmulni_intel syscopyarea sysfillrect pcbc sysimgblt snd_pcm snd_timer snd soundcore aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf joydev input_leds pcspkr lpc_ich shpchp mei_me mei ioatdma dca wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp
Oct 15 12:01:57 prox2 kernel: [2675062.209060]  libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq hid_generic usbmouse usbkbd usbhid hid i40e ahci ptp pps_core i2c_i801 libahci
Oct 15 12:01:57 prox2 kernel: [2675062.209133] CPU: 15 PID: 3172857 Comm: kworker/15:1 Tainted: P           O     4.15.18-4-pve #1
Oct 15 12:01:57 prox2 kernel: [2675062.209812] Hardware name: Supermicro Super Server/X11SPW-TF, BIOS 1.0a 11/17/2017
Oct 15 12:01:57 prox2 kernel: [2675062.210653] Workqueue: rbd rbd_queue_workfn [rbd]
Oct 15 12:01:57 prox2 kernel: [2675062.211438] RIP: 0010:rbd_queue_workfn+0x462/0x4f0 [rbd]
Oct 15 12:01:57 prox2 kernel: [2675062.212107] RSP: 0018:ffffbbe5af253e18 EFLAGS: 00010286
Oct 15 12:01:57 prox2 kernel: [2675062.212702] RAX: 0000000000000086 RBX: ffff9adb1ce87000 RCX: 0000000000000006
Oct 15 12:01:57 prox2 kernel: [2675062.213298] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9ae27f9d6490
Oct 15 12:01:57 prox2 kernel: [2675062.214010] RBP: ffffbbe5af253e60 R08: 0000000000000000 R09: 00000000000010af
Oct 15 12:01:57 prox2 kernel: [2675062.214727] R10: 00000000000000a7 R11: 00000000ffffffff R12: ffff9ae028174f00
Oct 15 12:01:57 prox2 kernel: [2675062.215336] R13: ffff9ade925d1200 R14: 0000000000000000 R15: 0000000000001000
Oct 15 12:01:57 prox2 kernel: [2675062.215941] FS:  0000000000000000(0000) GS:ffff9ae27f9c0000(0000) knlGS:0000000000000000
Oct 15 12:01:57 prox2 kernel: [2675062.216677] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 15 12:01:57 prox2 kernel: [2675062.217417] CR2: 00005605277a10b8 CR3: 0000000cb4c0a004 CR4: 00000000007606e0
Oct 15 12:01:57 prox2 kernel: [2675062.218141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 15 12:01:57 prox2 kernel: [2675062.218812] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct 15 12:01:57 prox2 kernel: [2675062.219394] PKRU: 55555554
Oct 15 12:01:57 prox2 kernel: [2675062.219995] Call Trace:
Oct 15 12:01:57 prox2 kernel: [2675062.220607]  ? __schedule+0x3e8/0x870
Oct 15 12:01:57 prox2 kernel: [2675062.221177]  process_one_work+0x1e0/0x400
Oct 15 12:01:57 prox2 kernel: [2675062.221791]  worker_thread+0x4b/0x420
Oct 15 12:01:57 prox2 kernel: [2675062.222412]  kthread+0x105/0x140
Oct 15 12:01:57 prox2 kernel: [2675062.223012]  ? process_one_work+0x400/0x400
Oct 15 12:01:57 prox2 kernel: [2675062.223587]  ? kthread_create_worker_on_cpu+0x70/0x70
Oct 15 12:01:57 prox2 kernel: [2675062.224170]  ? kthread_create_worker_on_cpu+0x70/0x70
Oct 15 12:01:57 prox2 kernel: [2675062.224753]  ret_from_fork+0x35/0x40
Oct 15 12:01:57 prox2 kernel: [2675062.225336] Code: 00 48 83 78 20 fe 0f 84 6a fc ff ff 48 c7 c1 a8 e8 d5 c0 ba c3 0f 00 00 48 c7 c6 b0 fc d5 c0 48 c7 c7 90 dd d5 c0 e8 1e 56 d9 d3 <0f> 0b 48 8b 75 d0 4d 89 d0 44 89 f1 4c 89 fa 48 89 df 4c 89 55
Oct 15 12:01:57 prox2 kernel: [2675062.226902] RIP: rbd_queue_workfn+0x462/0x4f0 [rbd] RSP: ffffbbe5af253e18
Oct 15 12:01:57 prox2 kernel: [2675062.227619] ---[ end trace 0789c6684890dda4 ]---

This also looks really similar to the issue described in this thread:
https://forum.proxmox.com/threads/backup-hangup-with-ceph-rbd.45820/

Alwin · Oct 16, 2018

JVore said:
From my understanding the backups on each node are done serially and there should not be an issue when there are two backup-tasks that are spawned on each of the nodes, one on node1, one on node2.

The backup is running serial per job, not per host. You would need to create for each host a job.

JVore said:
This also looks really similar to the issue described in this thread:
https://forum.proxmox.com/threads/backup-hangup-with-ceph-rbd.45820/

Yes, it looks similar, but those it still occur when you split the jobs per node?

JVore · Oct 16, 2018

Yes, the containers from the backup-job yesterday (INFO: starting new backup job: vzdump 204 202 201) are all on the same host.

You would need to create for each host a job.

If I add via Datacenter->Backup, different Containers from different hosts, then they spawn a backup-task each, why would I create a seperate on for each?

I did stop the backup-task yesterday after a couple more hours. It then tried to backup the remaing container 204 without success. I wasnt able to stop the backup-task from the gui and as per https://forum.proxmox.com/threads/proxmox-backup-wont-stop.23219/ tried to kill it via cli. The tar backup command still exists and I could not get rid of it via 'kill -9'

tar-command:

Code:

tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs --xattrs-include=user.* --xattrs-include=security.capability --warning=no-file-ignored --warning=no-xattr-write --one-file-system --warning=no-file-ignored --directory=/mnt/pve/nfsproxmox/dump/vzdump-lxc-202-2018_10_15-12_00_59.tmp ./etc/vzdump/pct.conf --directory=/mnt/vzsnap0 --no-anchored --exclude=lost+found --anchored --exclude=./tmp/?* --exclude=./var/tmp/?* --exclude=./var/run/?*.pid ./

Some more info:
- The containers in question are running exclusively on centos 7.5.
- They are replicas in a FreeIPA (Redhat Identity Management) cluster (never crashed on their own)
- It seems that the hanging backup only occured on containers with either higher RAM or CPU-usage (in the past it was a wiki-server with nearly full RAM (6GB) and a foreman-server once), I have another backup-task running every night that never failed for weeks on two containers

Looking at https://bugzilla.proxmox.com/show_bug.cgi?id=1911 again, in one line he quotes the ceph documentation:

Note STOP I/O BEFORE snapshotting an image. If the image contains a filesystem, the filesystem must be in a consistent state BEFORE snapshotting

So i looked in the vzdump perl module for the lxc-container and i figured out that only on containers with more than one volume the vm are freezed before the snapshot is taken.

So i changed it to always freeze the vm before take the snapshot and the problem didn't happens any more ....

The containers only have a single volume, so this is unfortunately not the issue.

Do you have a recommendation on how to deal with this for now? Manually backing up the containers never failed yet, so I guess I would go with that for now. And thanks for the reply!

Alwin · Oct 16, 2018

JVore said:
If I add via Datacenter->Backup, different Containers from different hosts, then they spawn a backup-task each, why would I create a seperate on for each?

I must have misunderstood you, that's what I actually meant. On each node the backup task is startet at the same time, running through the VM/CT list, that run on themselves, in series.

JVore said:
The containers only have a single volume, so this is unfortunately not the issue.

The proposed solution from the bug adds a "sync" before the "freeze". Sadly this sync is global and not per CT, as one would suspect. This in turn could bring down the whole node. We're working on a way to achieve similar without impacting the whole system. It seems that krbd is sensitive to this sort of snapshot/map, as EXT4 should handle the "unexpected poweroff" situation without issues.

Search

Search

Container Backup Issue (Ceph->NFS)

JVore

New Member

JVore

New Member

Alwin

Proxmox Retired Staff

JVore

New Member

Alwin

Proxmox Retired Staff