[SOLVED] One by one upgrade from v6.4 to v7.

whataboutpereira · Dec 8, 2021

I see there are two main options outlined in v6->v7 upgrade documentation (new cluster and upgrade using apt), but I'm wondering if it might be okay to upgrade my small cluster like this:

- Migrate all containers off a member
- Remove the member from cluster
- Clean install v7 on the machine
- Rejoin cluster
- Migrate containers off another v6 member to the v7 member
- Remove the v6 member from cluster
- Rince and repeat

czechsys · Dec 8, 2021

Yes, it's the other way to upgrade (aka, more complicated).

UdoB · Dec 8, 2021

whataboutpereira said:
- Rejoin cluster

Sometimes (always?) artefacts (inside of the remaining cluster) from the old setup create problems - even if that host is completely destroyed and newly installed.

If I had to go this route I would make sure to use a different IP-address and also a new hostname.

So my personal choice would be the other approach: move all guest off that host, remove it from HA-groups and run the inplace upgrade...

Best regards

whataboutpereira · Dec 8, 2021

Thanks for the replies. I might have a go at in-place upgrade then. Fingers crossed that I won't be affected by all the recent kernel trouble.

whataboutpereira · Dec 9, 2021

Successfully upgraded one member now. I'm seeing that the 6.4 node is happily still replicating to 7.1, however the 7.1 containers are giving replication errors. Is that expected until I upgrade the other member as well? These containers were migrated away and back after the upgrade.

Code:

2021-12-09 18:23:00 101-0: start replication job
2021-12-09 18:23:00 101-0: guest => CT 101, running => 1
2021-12-09 18:23:00 101-0: volumes => vpool:subvol-101-disk-0
2021-12-09 18:23:00 101-0: freeze guest filesystem
2021-12-09 18:23:00 101-0: create snapshot '__replicate_101-0_1639066980__' on vpool:subvol-101-disk-0
2021-12-09 18:23:01 101-0: thaw guest filesystem
2021-12-09 18:23:01 101-0: using secure transmission, rate limit: none
2021-12-09 18:23:01 101-0: incremental sync 'vpool:subvol-101-disk-0' (__replicate_101-0_1639065887__ => __replicate_101-0_1639066980__)
2021-12-09 18:23:01 101-0: send from @__replicate_101-0_1639065887__ to vpool/subvol-101-disk-0@__replicate_101-0_1639066980__ estimated size is 1.43M
2021-12-09 18:23:01 101-0: total estimated size is 1.43M
2021-12-09 18:23:01 101-0: Unknown option: snapshot
2021-12-09 18:23:01 101-0: 400 unable to parse option
2021-12-09 18:23:01 101-0: pvesm import <volume> <format> <filename> [OPTIONS]
2021-12-09 18:23:01 101-0: warning: cannot send 'vpool/subvol-101-disk-0@__replicate_101-0_1639066980__': signal received
2021-12-09 18:23:01 101-0: cannot send 'vpool/subvol-101-disk-0': I/O error
2021-12-09 18:23:01 101-0: command 'zfs send -Rpv -I __replicate_101-0_1639065887__ -- vpool/subvol-101-disk-0@__replicate_101-0_1639066980__' failed: exit code 1
2021-12-09 18:23:01 101-0: delete previous replication snapshot '__replicate_101-0_1639066980__' on vpool:subvol-101-disk-0
2021-12-09 18:23:01 101-0: end replication job with error: command 'set -o pipefail && pvesm export vpool:subvol-101-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_101-0_1639066980__ -base __replicate_101-0_1639065887__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxima' root@192.168.1.20 -- pvesm import vpool:subvol-101-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_101-0_1639066980__ -allow-rename 0' failed: exit code 255

Also getting a systemd-journald error:

Code:

[  854.703639] systemd-journald[768]: Failed to set ACL on /var/log/journal/7b03e054c6c9476a89182535d801f5e7/user-1001.journal, ignoring: Operation not supported

fiona · Dec 10, 2021

Hi,

whataboutpereira said:

Successfully upgraded one member now. I'm seeing that the 6.4 node is happily still replicating to 7.1, however the 7.1 containers are giving replication errors. Is that expected until I upgrade the other member as well? These containers were migrated away and back after the upgrade.

Code:

2021-12-09 18:23:00 101-0: start replication job
2021-12-09 18:23:00 101-0: guest => CT 101, running => 1
2021-12-09 18:23:00 101-0: volumes => vpool:subvol-101-disk-0
2021-12-09 18:23:00 101-0: freeze guest filesystem
2021-12-09 18:23:00 101-0: create snapshot '__replicate_101-0_1639066980__' on vpool:subvol-101-disk-0
2021-12-09 18:23:01 101-0: thaw guest filesystem
2021-12-09 18:23:01 101-0: using secure transmission, rate limit: none
2021-12-09 18:23:01 101-0: incremental sync 'vpool:subvol-101-disk-0' (__replicate_101-0_1639065887__ => __replicate_101-0_1639066980__)
2021-12-09 18:23:01 101-0: send from @__replicate_101-0_1639065887__ to vpool/subvol-101-disk-0@__replicate_101-0_1639066980__ estimated size is 1.43M
2021-12-09 18:23:01 101-0: total estimated size is 1.43M
2021-12-09 18:23:01 101-0: Unknown option: snapshot
2021-12-09 18:23:01 101-0: 400 unable to parse option
2021-12-09 18:23:01 101-0: pvesm import <volume> <format> <filename> [OPTIONS]
2021-12-09 18:23:01 101-0: warning: cannot send 'vpool/subvol-101-disk-0@__replicate_101-0_1639066980__': signal received
2021-12-09 18:23:01 101-0: cannot send 'vpool/subvol-101-disk-0': I/O error
2021-12-09 18:23:01 101-0: command 'zfs send -Rpv -I __replicate_101-0_1639065887__ -- vpool/subvol-101-disk-0@__replicate_101-0_1639066980__' failed: exit code 1
2021-12-09 18:23:01 101-0: delete previous replication snapshot '__replicate_101-0_1639066980__' on vpool:subvol-101-disk-0
2021-12-09 18:23:01 101-0: end replication job with error: command 'set -o pipefail && pvesm export vpool:subvol-101-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_101-0_1639066980__ -base __replicate_101-0_1639065887__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxima' root@192.168.1.20 -- pvesm import vpool:subvol-101-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_101-0_1639066980__ -allow-rename 0' failed: exit code 255

Also getting a systemd-journald error:

Code:

[  854.703639] systemd-journald[768]: Failed to set ACL on /var/log/journal/7b03e054c6c9476a89182535d801f5e7/user-1001.journal, ignoring: Operation not supported

yes, replication from 7.x back to 6.x is not possible. Please upgrade the remaining nodes too.

whataboutpereira · Dec 10, 2021

Fabian_E said:
Hi,

yes, replication from 7.x back to 6.x is not possible. Please upgrade the remaining nodes too.

Already done and running nicely.

Is there anything I can do to get rid of the systemd-journald error?

fiona · Dec 10, 2021

whataboutpereira said:
Already done and running nicely. Is there anything I can do to get rid of the systemd-journald error?

Does the file system for /var support ACLs? If not, you'd have to switch, but I'd say it's not that bad if some ACLs on some log file are not set.

whataboutpereira · Dec 10, 2021

Code:

# zfs get all / | grep acl
rpool/ROOT/pve-1  aclmode               discard                default
rpool/ROOT/pve-1  aclinherit            restricted             default
rpool/ROOT/pve-1  acltype               off                    default

That probably means no?

fiona · Dec 10, 2021

You can enable it with zfs set acltype=posixacl rpool/ROOT/pve-1 if you want.

whataboutpereira · Dec 10, 2021

Fabian_E said:
You can enable it with zfs set acltype=posixacl rpool/ROOT/pve-1 if you want.

Thanks! Should I expect any side effects from it?

fiona · Dec 10, 2021

Other than ACLs being supported on your root filesystem, no

whataboutpereira · Dec 10, 2021

Fabian_E said:
Other than ACLs being supported on your root filesystem, no

Wonderful!

Search

Search

[SOLVED] One by one upgrade from v6.4 to v7.

whataboutpereira

Member

czechsys

Renowned Member

UdoB

Distinguished Member

whataboutpereira

Member

whataboutpereira

Member

fiona

Proxmox Staff Member

whataboutpereira

Member

fiona

Proxmox Staff Member

whataboutpereira

Member

fiona

Proxmox Staff Member

whataboutpereira

Member

fiona

Proxmox Staff Member

whataboutpereira

Member