[SOLVED] One by one upgrade from v6.4 to v7.

Oct 6, 2019
41
4
13
46
I see there are two main options outlined in v6->v7 upgrade documentation (new cluster and upgrade using apt), but I'm wondering if it might be okay to upgrade my small cluster like this:

- Migrate all containers off a member
- Remove the member from cluster
- Clean install v7 on the machine
- Rejoin cluster
- Migrate containers off another v6 member to the v7 member
- Remove the v6 member from cluster
- Rince and repeat
 
- Rejoin cluster
Sometimes (always?) artefacts (inside of the remaining cluster) from the old setup create problems - even if that host is completely destroyed and newly installed.

If I had to go this route I would make sure to use a different IP-address and also a new hostname.

So my personal choice would be the other approach: move all guest off that host, remove it from HA-groups and run the inplace upgrade...

Best regards
 
Thanks for the replies. I might have a go at in-place upgrade then. Fingers crossed that I won't be affected by all the recent kernel trouble.
 
Successfully upgraded one member now. I'm seeing that the 6.4 node is happily still replicating to 7.1, however the 7.1 containers are giving replication errors. Is that expected until I upgrade the other member as well? These containers were migrated away and back after the upgrade.

Code:
2021-12-09 18:23:00 101-0: start replication job
2021-12-09 18:23:00 101-0: guest => CT 101, running => 1
2021-12-09 18:23:00 101-0: volumes => vpool:subvol-101-disk-0
2021-12-09 18:23:00 101-0: freeze guest filesystem
2021-12-09 18:23:00 101-0: create snapshot '__replicate_101-0_1639066980__' on vpool:subvol-101-disk-0
2021-12-09 18:23:01 101-0: thaw guest filesystem
2021-12-09 18:23:01 101-0: using secure transmission, rate limit: none
2021-12-09 18:23:01 101-0: incremental sync 'vpool:subvol-101-disk-0' (__replicate_101-0_1639065887__ => __replicate_101-0_1639066980__)
2021-12-09 18:23:01 101-0: send from @__replicate_101-0_1639065887__ to vpool/subvol-101-disk-0@__replicate_101-0_1639066980__ estimated size is 1.43M
2021-12-09 18:23:01 101-0: total estimated size is 1.43M
2021-12-09 18:23:01 101-0: Unknown option: snapshot
2021-12-09 18:23:01 101-0: 400 unable to parse option
2021-12-09 18:23:01 101-0: pvesm import <volume> <format> <filename> [OPTIONS]
2021-12-09 18:23:01 101-0: warning: cannot send 'vpool/subvol-101-disk-0@__replicate_101-0_1639066980__': signal received
2021-12-09 18:23:01 101-0: cannot send 'vpool/subvol-101-disk-0': I/O error
2021-12-09 18:23:01 101-0: command 'zfs send -Rpv -I __replicate_101-0_1639065887__ -- vpool/subvol-101-disk-0@__replicate_101-0_1639066980__' failed: exit code 1
2021-12-09 18:23:01 101-0: delete previous replication snapshot '__replicate_101-0_1639066980__' on vpool:subvol-101-disk-0
2021-12-09 18:23:01 101-0: end replication job with error: command 'set -o pipefail && pvesm export vpool:subvol-101-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_101-0_1639066980__ -base __replicate_101-0_1639065887__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxima' root@192.168.1.20 -- pvesm import vpool:subvol-101-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_101-0_1639066980__ -allow-rename 0' failed: exit code 255

Also getting a systemd-journald error:

Code:
[  854.703639] systemd-journald[768]: Failed to set ACL on /var/log/journal/7b03e054c6c9476a89182535d801f5e7/user-1001.journal, ignoring: Operation not supported
 
Last edited:
Hi,
Successfully upgraded one member now. I'm seeing that the 6.4 node is happily still replicating to 7.1, however the 7.1 containers are giving replication errors. Is that expected until I upgrade the other member as well? These containers were migrated away and back after the upgrade.

Code:
2021-12-09 18:23:00 101-0: start replication job
2021-12-09 18:23:00 101-0: guest => CT 101, running => 1
2021-12-09 18:23:00 101-0: volumes => vpool:subvol-101-disk-0
2021-12-09 18:23:00 101-0: freeze guest filesystem
2021-12-09 18:23:00 101-0: create snapshot '__replicate_101-0_1639066980__' on vpool:subvol-101-disk-0
2021-12-09 18:23:01 101-0: thaw guest filesystem
2021-12-09 18:23:01 101-0: using secure transmission, rate limit: none
2021-12-09 18:23:01 101-0: incremental sync 'vpool:subvol-101-disk-0' (__replicate_101-0_1639065887__ => __replicate_101-0_1639066980__)
2021-12-09 18:23:01 101-0: send from @__replicate_101-0_1639065887__ to vpool/subvol-101-disk-0@__replicate_101-0_1639066980__ estimated size is 1.43M
2021-12-09 18:23:01 101-0: total estimated size is 1.43M
2021-12-09 18:23:01 101-0: Unknown option: snapshot
2021-12-09 18:23:01 101-0: 400 unable to parse option
2021-12-09 18:23:01 101-0: pvesm import <volume> <format> <filename> [OPTIONS]
2021-12-09 18:23:01 101-0: warning: cannot send 'vpool/subvol-101-disk-0@__replicate_101-0_1639066980__': signal received
2021-12-09 18:23:01 101-0: cannot send 'vpool/subvol-101-disk-0': I/O error
2021-12-09 18:23:01 101-0: command 'zfs send -Rpv -I __replicate_101-0_1639065887__ -- vpool/subvol-101-disk-0@__replicate_101-0_1639066980__' failed: exit code 1
2021-12-09 18:23:01 101-0: delete previous replication snapshot '__replicate_101-0_1639066980__' on vpool:subvol-101-disk-0
2021-12-09 18:23:01 101-0: end replication job with error: command 'set -o pipefail && pvesm export vpool:subvol-101-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_101-0_1639066980__ -base __replicate_101-0_1639065887__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=proxima' root@192.168.1.20 -- pvesm import vpool:subvol-101-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_101-0_1639066980__ -allow-rename 0' failed: exit code 255

Also getting a systemd-journald error:

Code:
[  854.703639] systemd-journald[768]: Failed to set ACL on /var/log/journal/7b03e054c6c9476a89182535d801f5e7/user-1001.journal, ignoring: Operation not supported
yes, replication from 7.x back to 6.x is not possible. Please upgrade the remaining nodes too.
 
Already done and running nicely. :) Is there anything I can do to get rid of the systemd-journald error?
Does the file system for /var support ACLs? If not, you'd have to switch, but I'd say it's not that bad if some ACLs on some log file are not set.
 
Code:
# zfs get all / | grep acl
rpool/ROOT/pve-1  aclmode               discard                default
rpool/ROOT/pve-1  aclinherit            restricted             default
rpool/ROOT/pve-1  acltype               off                    default

That probably means no?
 
You can enable it with zfs set acltype=posixacl rpool/ROOT/pve-1 if you want.
 
Other than ACLs being supported on your root filesystem, no ;)
 
  • Like
Reactions: whataboutpereira

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!