Backup always failing on one specific CT

alexdelprete

Member
Jan 5, 2022
42
6
13
50
Rome, Italy
Hi,

the backup of all my CTs/VMs is working fine, except for one CT for which I always have the following error:
Rich (BB code):
2022-01-30T00:01:27+01:00: starting new backup on datastore 'pve-cluster': "ct/101/2022-01-29T23:01:14Z"
2022-01-30T00:01:27+01:00: add blob "/backups/pve-cluster/ct/101/2022-01-29T23:01:14Z/pct.conf.blob" (388 bytes, comp: 388)
2022-01-30T00:01:27+01:00: created new dynamic index 1 ("ct/101/2022-01-29T23:01:14Z/root.pxar.didx")
2022-01-30T00:01:27+01:00: created new dynamic index 2 ("ct/101/2022-01-29T23:01:14Z/catalog.pcat1.didx")
2022-01-30T00:03:51+01:00: POST /dynamic_chunk: 400 Bad Request: error reading a body from connection: broken pipe
2022-01-30T00:03:51+01:00: POST /dynamic_chunk: 400 Bad Request: error reading a body from connection: broken pipe
2022-01-30T00:03:51+01:00: POST /dynamic_chunk: 400 Bad Request: error reading a body from connection: broken pipe
2022-01-30T00:03:51+01:00: backup failed: connection error: error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac:../ssl/record/ssl3_record.c:676:
2022-01-30T00:03:51+01:00: removing failed backup
2022-01-30T00:03:51+01:00: POST /dynamic_chunk: 400 Bad Request: error reading a body from connection: broken pipe
2022-01-30T00:03:51+01:00: TASK ERROR: connection error: error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac:../ssl/record/ssl3_record.c:676:

I have no custom certificate installed, the fingerprint of PBS cert is ok, so I don't know what I should check and why this behaviour only on that specific container.

CT 101, that is failing the backup, is my main Docker and it manages 12 containers.

Here is some info of the environment:

Rich (BB code):
root@pve1:~# pvesm list pbs-pvecluster
Volid                                             Format  Type             Size VMID
pbs-pvecluster:backup/ct/100/2022-01-29T20:36:04Z pbs-ct  backup    13341750327 100
pbs-pvecluster:backup/ct/100/2022-01-29T23:00:00Z pbs-ct  backup    13332543427 100
pbs-pvecluster:backup/ct/100/2022-01-30T00:54:45Z pbs-ct  backup    13317721167 100
pbs-pvecluster:backup/ct/102/2022-01-29T20:07:41Z pbs-ct  backup     6578685446 102
pbs-pvecluster:backup/ct/102/2022-01-29T23:03:52Z pbs-ct  backup     6575633022 102
pbs-pvecluster:backup/ct/102/2022-01-30T00:58:09Z pbs-ct  backup     6592530815 102
pbs-pvecluster:backup/ct/103/2022-01-29T20:09:48Z pbs-ct  backup     1868570782 103
pbs-pvecluster:backup/ct/103/2022-01-29T23:04:43Z pbs-ct  backup     1468240167 103
pbs-pvecluster:backup/ct/103/2022-01-30T00:59:02Z pbs-ct  backup     1472683759 103
pbs-pvecluster:backup/ct/104/2022-01-29T20:10:42Z pbs-ct  backup      828793041 104
pbs-pvecluster:backup/ct/104/2022-01-29T23:05:13Z pbs-ct  backup      816680500 104
pbs-pvecluster:backup/ct/104/2022-01-30T00:59:30Z pbs-ct  backup      816773452 104
pbs-pvecluster:backup/ct/105/2022-01-29T20:11:12Z pbs-ct  backup     1415286426 105
pbs-pvecluster:backup/ct/105/2022-01-29T23:05:23Z pbs-ct  backup     1406853661 105
pbs-pvecluster:backup/ct/105/2022-01-30T00:59:44Z pbs-ct  backup     1412579613 105
pbs-pvecluster:backup/ct/106/2022-01-29T20:12:02Z pbs-ct  backup     6265531436 106
pbs-pvecluster:backup/ct/106/2022-01-29T23:05:36Z pbs-ct  backup     5259568337 106
pbs-pvecluster:backup/ct/106/2022-01-30T00:59:59Z pbs-ct  backup     5264564450 106
pbs-pvecluster:backup/ct/107/2022-01-29T20:14:42Z pbs-ct  backup     1223005903 107
pbs-pvecluster:backup/ct/107/2022-01-29T23:06:25Z pbs-ct  backup     1210476275 107
pbs-pvecluster:backup/ct/107/2022-01-30T01:00:45Z pbs-ct  backup     1210570432 107
pbs-pvecluster:backup/ct/108/2022-01-29T20:15:28Z pbs-ct  backup     1963335202 108
pbs-pvecluster:backup/ct/108/2022-01-29T23:06:37Z pbs-ct  backup     1823719560 108
pbs-pvecluster:backup/ct/108/2022-01-30T01:00:57Z pbs-ct  backup     1825468558 108
pbs-pvecluster:backup/ct/109/2022-01-29T20:16:46Z pbs-ct  backup     7858881849 109
pbs-pvecluster:backup/ct/109/2022-01-29T23:07:12Z pbs-ct  backup     7859140689 109
pbs-pvecluster:backup/ct/109/2022-01-30T01:01:32Z pbs-ct  backup     7863123175 109
pbs-pvecluster:backup/ct/110/2022-01-29T20:26:38Z pbs-ct  backup     2948170651 110
pbs-pvecluster:backup/ct/110/2022-01-29T23:13:51Z pbs-ct  backup     2950440397 110
pbs-pvecluster:backup/ct/110/2022-01-30T01:08:13Z pbs-ct  backup     2961335396 110
pbs-pvecluster:backup/ct/111/2022-01-29T20:27:39Z pbs-ct  backup     1739807759 111
pbs-pvecluster:backup/ct/111/2022-01-29T23:14:22Z pbs-ct  backup     1723173173 111
pbs-pvecluster:backup/ct/111/2022-01-30T01:08:36Z pbs-ct  backup              1 111
pbs-pvecluster:backup/ct/200/2022-01-29T20:28:39Z pbs-ct  backup      876976955 200
pbs-pvecluster:backup/ct/200/2022-01-29T23:14:42Z pbs-ct  backup      876976948 200
pbs-pvecluster:backup/ct/201/2022-01-29T20:29:08Z pbs-ct  backup     2703908038 201
pbs-pvecluster:backup/ct/201/2022-01-29T23:14:48Z pbs-ct  backup     2703908027 201
root@pve1:~# cat /etc/pve/storage.cfg
dir: local
    path /var/lib/vz
    content vztmpl,iso,backup

lvmthin: local-lvm
    thinpool data
    vgname pve
    content images,rootdir

zfspool: pve-data
    pool pve-data
    content images,rootdir
    mountpoint /pve-data
    nodes pve1
    sparse 1

nfs: pve-nas
    export /volume1/shared/proxmox
    path /mnt/pve/pve-nas
    server nas.axel.dom
    content snippets,iso,rootdir,images,vztmpl,backup
    options vers=4.1
    prune-backups keep-all=1

zfspool: pve2-data
    pool pve-data
    content rootdir,images
    mountpoint /pve-data
    nodes pve2
    sparse 1

pbs: pbs-pvecluster
    datastore pve-cluster
    server pbs.axel.dom
    content backup
    fingerprint 7b:36:da:7a:52:9f:b7:72:b5:fc:44:b6:c5:c5:fb:15:4a:c1:28:df:2a:15:bf:35:72:30:36:a6:0a:6d:9d:c4
    nodes pve2,pve1
    username pve-cluster@pbs
 
Last edited:
I also want to specify that in order to make Docker work in the LXC container, I used fuse-overlayfs, that is compatible with ZFS, unlike overlayfs which gave me many issues. Now the only problem is that when backup starts, the snapshot doesn't work and never terminates (I have to reboot the node), using STOP it's better, the backup doesn't work, but at least it doesn't freeze the container forever.

So I thought maybe the PBS backup client doesn't support fuse-overlayfs in some way...

Here's Docker info:

Code:
root@traefik:~# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
  compose: Docker Compose (Docker Inc., v2.2.3)
  scan: Docker Scan (Docker Inc., v0.12.0)

Server:
 Containers: 17
  Running: 17
  Paused: 0
  Stopped: 0
 Images: 17
 Server Version: 20.10.12
 Storage Driver: fuse-overlayfs
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.13.19-3-pve
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 2GiB
 Name: traefik
 ID: SLZ2:KOAY:DKJX:G7Z6:RFNY:PM5C:FUCV:PQND:L3RU:YA6S:QK43:TBXC
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 
I use also fuse-overlayfs for one of my Docker Container and PBS Backup working without problems.
Except only a couple warnings:
"failed to open /var/lib/docker/fuse-overlayfs/b4e3f337e5fb6b2517e1a85bab9e78235b97a501b6984525dd5cf6778a2a1dfb/merged: Permission denied"
Proxmox: single node, no cluster setup
Container Properties: Storage: ZFS; Unprivileged; fuse, keycctl, nesting: on
PBS Datastore Storage : ZFS
 
Last edited:
You have my same setup, only difference is that my container is privileged. Are you using suspend or snapshot? The strange thing is that I also tried with STOP, but the error is always the same...if a container is stopped I thought there wouldn't be snap issues...that's the only thing I don't understand. And the error seems like a communication/networking one, not related to storage/fs.

Reading docs I found that it's a known issue, and that a VM is recommended for Docker. It's a pity because docker is working absolutely fine and with very low resource usage. I have to find a workaround for PBS, otherwise I'll migrate to a VM, even if I'd prefer keeping it in a CT...

1643553236508.png

1643553244122.png
 
Last edited:
You have my same setup, only difference is that my container is privileged. Are you using suspend or snapshot? The strange thing is that I also tried with STOP, but the error is always the same...if a container is stopped I thought there wouldn't be snap issues...that's the only thing I don't understand. And the error seems like a communication/networking one, not related to storage/fs.

Reading docs I found that it's a known issue, and that a VM is recommended for Docker. It's a pity because docker is working absolutely fine and with very low resource usage. I have to find a workaround for PBS, otherwise I'll migrate to a VM, even if I'd prefer keeping it in a CT...

View attachment 33762

View attachment 33763

I use snapshot for backup.
Yes it is like a communication/networking problem.
Dumb question: is your pbs/proxmox up to date?
 
Last edited:
I use snapshot for backup.
Yes it is like a communication/networking problem.
Dumb question: is your pbs/proxmox up to date?

Yes, I check for updates every day:

Code:
pve-manager/7.1-10/6ddebafe (running kernel: 5.13.19-3-pve)
proxmox-backup-server 2.1.4-1 running version: 2.1.4
 
I had the same issue on other CTs without the fuse-overlayfs driver installed. So I guess it's not related to that.
But on that specific CT (101) the error always comes up and I have to exclude it from the backup schedule.

backup failed: connection error: error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac:../ssl/record/ssl3_record.c:676:
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!