Proxmox VE 6.0 released!

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
1,841
281
103
South Tyrol/Italy
very similar to this https://github.com/zfsonlinux/zfs/issues/7553
How should I proceed? return back to pve 5.4 ? How to disable monthly scrub, that will hang my server?
If i comment out line "24 0 8-14 * * root [ $(date +\%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/scrub ] && /usr/lib/zfs-linux/scrub" in "/etc/cron.d/zfsutils-linux" will it turn off monthly scrub?
Hmm, the issue mentions that it is already an issue with ZFS 0.7.x...
You could also comment on that issue with your details to make upstream notice of this (we cannot reproduce, had various scrubs on various systems here without issues).

Also are you sure that the drives are OK, just to rule out basic stuff..

Looking at the dmesg kernel log (relevant excerpt of your attached one inline below) it behaves and seems like sort of a deadlock?
(all doing a live-wait and then hanging in the scheduler)
Code:
[Sun Jul 21 02:08:08 2019] INFO: task txg_sync:670 blocked for more than 120 sec             onds.
[Sun Jul 21 02:08:08 2019]       Tainted: P           O      5.0.15-1-pve #1
[Sun Jul 21 02:08:08 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" di             sables this message.
[Sun Jul 21 02:08:08 2019] txg_sync        D    0   670      2 0x80000000
[Sun Jul 21 02:08:08 2019] Call Trace:
[Sun Jul 21 02:08:08 2019]  __schedule+0x2d4/0x870
[Sun Jul 21 02:08:08 2019]  schedule+0x2c/0x70
[Sun Jul 21 02:08:08 2019]  cv_wait_common+0x104/0x130 [spl]
[Sun Jul 21 02:08:08 2019]  ? wait_woken+0x80/0x80
[Sun Jul 21 02:08:08 2019]  __cv_wait+0x15/0x20 [spl]
[Sun Jul 21 02:08:08 2019]  spa_config_enter+0xfb/0x110 [zfs]
[Sun Jul 21 02:08:08 2019]  spa_sync+0x199/0xfc0 [zfs]
[Sun Jul 21 02:08:08 2019]  ? _cond_resched+0x19/0x30
[Sun Jul 21 02:08:08 2019]  ? mutex_lock+0x12/0x30
[Sun Jul 21 02:08:08 2019]  ? spa_txg_history_set.part.7+0xba/0xe0 [zfs]
[Sun Jul 21 02:08:08 2019]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[Sun Jul 21 02:08:08 2019]  txg_sync_thread+0x2d9/0x4c0 [zfs]
[Sun Jul 21 02:08:08 2019]  ? txg_thread_exit.isra.11+0x60/0x60 [zfs]
[Sun Jul 21 02:08:08 2019]  thread_generic_wrapper+0x74/0x90 [spl]
[Sun Jul 21 02:08:08 2019]  kthread+0x120/0x140
[Sun Jul 21 02:08:08 2019]  ? __thread_exit+0x20/0x20 [spl]
[Sun Jul 21 02:08:08 2019]  ? __kthread_parkme+0x70/0x70
[Sun Jul 21 02:08:08 2019]  ret_from_fork+0x35/0x40
[Sun Jul 21 02:08:08 2019] INFO: task zpool:22915 blocked for more than 120 seco             nds.
[Sun Jul 21 02:08:08 2019]       Tainted: P           O      5.0.15-1-pve #1
[Sun Jul 21 02:08:08 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" di             sables this message.
[Sun Jul 21 02:08:08 2019] zpool           D    0 22915  21494 0x00000004
[Sun Jul 21 02:08:08 2019] Call Trace:
[Sun Jul 21 02:08:08 2019]  __schedule+0x2d4/0x870
[Sun Jul 21 02:08:08 2019]  schedule+0x2c/0x70
[Sun Jul 21 02:08:08 2019]  taskq_wait+0x80/0xd0 [spl]
[Sun Jul 21 02:08:08 2019]  ? wait_woken+0x80/0x80
[Sun Jul 21 02:08:08 2019]  taskq_destroy+0x45/0x160 [spl]
[Sun Jul 21 02:08:08 2019]  vdev_open_children+0x117/0x170 [zfs]
[Sun Jul 21 02:08:08 2019]  vdev_root_open+0x3b/0x130 [zfs]
[Sun Jul 21 02:08:08 2019]  vdev_open+0xa4/0x720 [zfs]
[Sun Jul 21 02:08:08 2019]  ? mutex_lock+0x12/0x30
[Sun Jul 21 02:08:08 2019]  vdev_reopen+0x33/0xc0 [zfs]
[Sun Jul 21 02:08:08 2019]  dsl_scan+0x3a/0x120 [zfs]
[Sun Jul 21 02:08:08 2019]  spa_scan+0x2d/0xc0 [zfs]
[Sun Jul 21 02:08:08 2019]  zfs_ioc_pool_scan+0x5b/0xd0 [zfs]
[Sun Jul 21 02:08:08 2019]  zfsdev_ioctl+0x6db/0x8f0 [zfs]
[Sun Jul 21 02:08:08 2019]  ? lru_cache_add_active_or_unevictable+0x39/0xb0
[Sun Jul 21 02:08:08 2019]  do_vfs_ioctl+0xa9/0x640
[Sun Jul 21 02:08:08 2019]  ? handle_mm_fault+0xe1/0x210
[Sun Jul 21 02:08:08 2019]  ksys_ioctl+0x67/0x90
[Sun Jul 21 02:08:08 2019]  __x64_sys_ioctl+0x1a/0x20
[Sun Jul 21 02:08:08 2019]  do_syscall_64+0x5a/0x110
[Sun Jul 21 02:08:08 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Sun Jul 21 02:08:08 2019] RIP: 0033:0x7fdaa46ce427
[Sun Jul 21 02:08:08 2019] Code: Bad RIP value.
[Sun Jul 21 02:08:08 2019] RSP: 002b:00007ffc72433918 EFLAGS: 00000246 ORIG_RAX:              0000000000000010
[Sun Jul 21 02:08:08 2019] RAX: ffffffffffffffda RBX: 00007ffc72433950 RCX: 0000             7fdaa46ce427
[Sun Jul 21 02:08:08 2019] RDX: 00007ffc72433950 RSI: 0000000000005a07 RDI: 0000             000000000003
[Sun Jul 21 02:08:08 2019] RBP: 00007ffc72437340 R08: 0000000000000008 R09: 0000             7fdaa4719d90
[Sun Jul 21 02:08:08 2019] R10: 000056369d283010 R11: 0000000000000246 R12: 0000             000000000001
[Sun Jul 21 02:08:08 2019] R13: 000056369d286570 R14: 0000000000000000 R15: 0000             56369d284430
[Sun Jul 21 02:08:08 2019] INFO: task vdev_open:22916 blocked for more than 120              seconds.
[Sun Jul 21 02:08:08 2019]       Tainted: P           O      5.0.15-1-pve #1
[Sun Jul 21 02:08:08 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" di             sables this message.
[Sun Jul 21 02:08:08 2019] vdev_open       D    0 22916      2 0x80000000
[Sun Jul 21 02:08:08 2019] Call Trace:
[Sun Jul 21 02:08:08 2019]  __schedule+0x2d4/0x870
[Sun Jul 21 02:08:08 2019]  schedule+0x2c/0x70
[Sun Jul 21 02:08:08 2019]  taskq_wait+0x80/0xd0 [spl]
[Sun Jul 21 02:08:08 2019]  ? wait_woken+0x80/0x80
[Sun Jul 21 02:08:08 2019]  taskq_destroy+0x45/0x160 [spl]
[Sun Jul 21 02:08:08 2019]  vdev_open_children+0x117/0x170 [zfs]
[Sun Jul 21 02:08:08 2019]  vdev_mirror_open+0x34/0x140 [zfs]
[Sun Jul 21 02:08:08 2019]  vdev_open+0xa4/0x720 [zfs]
[Sun Jul 21 02:08:08 2019]  vdev_open_child+0x22/0x40 [zfs]
[Sun Jul 21 02:08:08 2019]  taskq_thread+0x2ec/0x4d0 [spl]
[Sun Jul 21 02:08:08 2019]  ? __switch_to_asm+0x40/0x70
[Sun Jul 21 02:08:08 2019]  ? wake_up_q+0x80/0x80
[Sun Jul 21 02:08:08 2019]  kthread+0x120/0x140
[Sun Jul 21 02:08:08 2019]  ? task_done+0xb0/0xb0 [spl]
[Sun Jul 21 02:08:08 2019]  ? __kthread_parkme+0x70/0x70
[Sun Jul 21 02:08:08 2019]  ret_from_fork+0x35/0x40
[Sun Jul 21 02:08:08 2019] INFO: task vdev_open:22918 blocked for more than 120              seconds.
[Sun Jul 21 02:08:08 2019]       Tainted: P           O      5.0.15-1-pve #1
[Sun Jul 21 02:08:08 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" di             sables this message.
[Sun Jul 21 02:08:08 2019] vdev_open       D    0 22918      2 0x80000000
[Sun Jul 21 02:08:08 2019] Call Trace:
[Sun Jul 21 02:08:08 2019]  __schedule+0x2d4/0x870
[Sun Jul 21 02:08:08 2019]  ? set_init_blocksize+0x80/0x80
[Sun Jul 21 02:08:08 2019]  ? get_disk_and_module+0x40/0x70
[Sun Jul 21 02:08:08 2019]  schedule+0x2c/0x70
[Sun Jul 21 02:08:08 2019]  schedule_timeout+0x258/0x360
[Sun Jul 21 02:08:08 2019]  wait_for_completion+0xb7/0x140
[Sun Jul 21 02:08:08 2019]  ? wake_up_q+0x80/0x80
[Sun Jul 21 02:08:08 2019]  call_usermodehelper_exec+0x14a/0x180
[Sun Jul 21 02:08:08 2019]  call_usermodehelper+0x98/0xb0
[Sun Jul 21 02:08:08 2019]  vdev_elevator_switch+0x112/0x1a0 [zfs]
[Sun Jul 21 02:08:08 2019]  vdev_disk_open+0x25f/0x410 [zfs]
[Sun Jul 21 02:08:08 2019]  vdev_open+0xa4/0x720 [zfs]
[Sun Jul 21 02:08:08 2019]  vdev_open_child+0x22/0x40 [zfs]
[Sun Jul 21 02:08:08 2019]  taskq_thread+0x2ec/0x4d0 [spl]
[Sun Jul 21 02:08:08 2019]  ? __switch_to_asm+0x40/0x70
[Sun Jul 21 02:08:08 2019]  ? wake_up_q+0x80/0x80
[Sun Jul 21 02:08:08 2019]  kthread+0x120/0x140
[Sun Jul 21 02:08:08 2019]  ? task_done+0xb0/0xb0 [spl]
[Sun Jul 21 02:08:08 2019]  ? __kthread_parkme+0x70/0x70
[Sun Jul 21 02:08:08 2019]  ret_from_fork+0x35/0x40
I have a slight hunch that this could be related to the block devices IO scheduler...

You could check the current one for all your SATA/SCSI "sXY" devices with following shell oneliner:
Code:
for blk in /sys/block/s*; do echo -n "$blk: "; cat "$blk/queue/scheduler"; done
I'd say that they're all now using mq-deadline, maybe it's worth to try setting them to "none" as a test...
(just `echo "none" > /sys/block/<BLKDEV>/queue/scheduler`)
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
1,841
281
103
South Tyrol/Italy
Hello, one incorrect information on the download page, the version informed is 6.0.4 but the iso file is 6.0.1. Is not a issue, but can confused someone, thanks.
Those two versions are not directly connected...

One is the ISO release version:
Code:
MAJOR.MINOR-ISORELEASE
The other is the version of pve-manager, the package currently providing the whole web GUI and API entry points and is
Code:
MAJOR.MINOR-PACKAGERELEASE
"ISORELEASE" will only be bumped to a higher number if we release the same MAJOR.MINOR version another time, this could be due to a bug in the installer or ISO build process, which is not directly related to the Proxmox VE version we ship. Or if grave security issues got detected which affect the install process itself (like the "apt" bug in January, which prompted a second release of the 5.3 ISO)

The other one is the current pve-manager package version, which will be bumped regularly during a release.
 
  • Like
Reactions: Tomas Waldow

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
1,841
281
103
South Tyrol/Italy
one question to the ceph upgrade. i have one OSD that was down and out during the upgrade (i was too lazy to remove it, unfortunately) and now it shows that it is an outdated OSD (12.2.11). can i upgrade to nautilus or should i get the OSD updated, when yes, how?
thanks in advance and keep up the great work!
I mean is it really down and out, i.e., gone forever? Then I'd just remove it before the upgrade.
 

guletz

Renowned Member
Apr 19, 2017
1,070
150
68
Brasov, Romania
Would you please include RAID-Z2 in this test? My configuration is a 6-disk RAID-Z2 with Xeon 4110. Also, I'm seeing no difference in regular striped volumes, but the RAID-Z2 fio seqwrite performance is less than half.

A 50% performance regression is a big deal. The patch to re-enable SIMD on 5.0 kernels only landed a few days ago, so I imagine it needs some time as well before release?

Hi,

As I understand, SIMD is used for fletcher4 checksums(maybe I am wrong).

cat /proc/spl/kstat/zfs/fletcher_4_bench | grep fastest
fastest avx512f avx2

You can change the checksum algo to sha256(create a test dataset and see if with sha256 fio will get better results).

Good luck!
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
3,607
556
118
Hi Fabian,

Maybe I don't understand this fully but the servers we run are not UEFI capable. Is there an alternative to GRUB in that case?
yes, putting /boot on non-ZFS does not require UEFI boot. it does create a single point of failure though, with manual recovery to get the system bootable again.
 

vanes

New Member
Nov 23, 2018
16
1
3
35
You could check the current one for all your SATA/SCSI "sXY" devices with following shell oneliner:
Code:
root@c236:~# for blk in /sys/block/s*; do echo -n "$blk: "; cat "$blk/queue/scheduler"; done

/sys/block/sda: [mq-deadline] none
/sys/block/sdb: [mq-deadline] none
/sys/block/sdc: [mq-deadline] none
/sys/block/sdd: [mq-deadline] none
/sys/block/sde: [mq-deadline] none
/sys/block/sdf: [mq-deadline] none
/sys/block/sdg: [mq-deadline] none
I'd say that they're all now using mq-deadline, maybe it's worth to try setting them to "none" as a test...
(just `echo "none" > /sys/block/<BLKDEV>/queue/scheduler`)
I did this for every drives on test server that was 20 hours online, and zcrub started fine without hanging the system.
As i am not linux expert, i need some advises how to go futher with this problem.
Am i need to set scheduler to none and live with it? This commands does not survive reboot, how to set this settings permanently? Is it right way to set scheduler to none? What will this affect? Need your advises =)
Thanks.
 

guletz

Renowned Member
Apr 19, 2017
1,070
150
68
Brasov, Romania
Hi,

mq-deadline is better for HDDs/SSDs compared with none:

"Avoid using the none/noop I/O schedulers for a HDD as sorting requests on block addresses reduce the seek time latencies and neither of these I/O schedulers support this feature."
 
  • Like
Reactions: vanes

vanes

New Member
Nov 23, 2018
16
1
3
35
My disk config on both servers is 4hdd raid10 zfs rpool with uefi boot and 2 ssd attached, but ssd`s not in use now. I disconnected ssd`s from one server, and going to test scrub without them after some uptime. Does it make sence?
 

joblack

Member
Apr 16, 2017
37
4
8
39
Hi,

thanks. Installed it last week.

I have found a potential bug. I have delete a VM but the backup job still seem to want to backup it

INFO: starting new backup job: vzdump 100 101 --mailnotification always --mode snapshot --storage local --quiet 1 --mailto some@email.de --compress gzip
ERROR: Backup of VM 100 failed - unable to find VM '100'
INFO: Failed at 2019-07-20 00:00:02
ERROR: Backup of VM 101 failed - unable to find VM '101'
INFO: Failed at 2019-07-20 00:00:02
INFO: Backup job finished with errors

TASK ERROR: job errors
---
Maybe there is still a bug?

Greetings
JB
 

guletz

Renowned Member
Apr 19, 2017
1,070
150
68
Brasov, Romania
My disk config on both servers is 4hdd raid10 zfs rpool with uefi boot and 2 ssd attached, but ssd`s not in use now. I disconnected ssd`s from one server, and going to test scrub without them after some uptime. Does it make sence?
Yes, zfs scrub will tell you if the pool is OK or not!
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
1,841
281
103
South Tyrol/Italy
Am i need to set scheduler to none and live with it?
It seems that yes, if you want to scrub (which one should do occasionally) then it's your single possibility.

This commands does not survive reboot, how to set this settings permanently?
Two general possibilities:
* add to kernel command line as "elevator=none" (this affects the default of all block devices)
* add udev rules which set the "none" scheduler for the ZFS backed devices only..

The first can be done like (for UEFI):
Code:
echo -n ' elevator=none' >> /etc/kernel/cmdline
pve-efiboot-tool refresh
Else, add the same "elevator=none" to the "/etc/default/grub" GRUB_CMDLINE_LINUX_DEFAULT variable and run
Code:
update-grub
Is it right way to set scheduler to none? What will this affect? Need your advises
"none" is what was "noop", while the "multiqueue" (mq) based schedulers are more modern and for most settings the desired one it needs to be said that they are still relatively new and that "none"/"noop" are not inherently bad or the like. Especially with solid state disk they should work good enough.

Some discussion regarding this change can be found at: https://github.com/zfsonlinux/zfs/pull/9042

For now I'd set to "none" in your case, to ensure that the pool and especially the scrub operation is properly working.
With newer ZFS or kernel coming in the following months we can continue to monitor this and include a fix, if one becomes available.
 
  • Like
Reactions: vanes

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
1,841
281
103
South Tyrol/Italy
Hi,

thanks. Installed it last week.

I have found a potential bug. I have delete a VM but the backup job still seem to want to backup it

INFO: starting new backup job: vzdump 100 101 --mailnotification always --mode snapshot --storage local --quiet 1 --mailto some@email.de --compress gzip
ERROR: Backup of VM 100 failed - unable to find VM '100'
INFO: Failed at 2019-07-20 00:00:02
ERROR: Backup of VM 101 failed - unable to find VM '101'
INFO: Failed at 2019-07-20 00:00:02
INFO: Backup job finished with errors

TASK ERROR: job errors
---
Maybe there is still a bug?

Greetings
JB
This is by design to ensure re-created VM still gets backed up - we wanted to stay at the safer side here. But there's a feature enhancement request regarding this: https://bugzilla.proxmox.com/show_bug.cgi?id=1291
Patches which would allow to "purge" a VM are on the development mailing list and await review there.
 

Rosario Contarino

New Member
Jul 16, 2019
28
0
1
51
Hi all,

Freshly installed ProxMox 6 VE with no subscriptions (only the recommended repos).

Just ran

#> apt-get upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages have been kept back:
ceph-common librados2 libradosstriper1 librbd1 python-cephfs python-rados
python-rbd
The following packages will be upgraded:
ceph-fuse libcephfs2 linux-libc-dev pve-container
4 upgraded, 0 newly installed, 0 to remove and 7 not upgraded.
Need to get 2268 kB of archives.
After this operation, 7271 kB disk space will be freed.
Do you want to continue? [Y/n]

would you advice to proceed?
Thanks
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
1,841
281
103
South Tyrol/Italy
would you advice to proceed?
First, please never use "apt-get upgrade" to upgrade any Proxmox based project. It refuses to install new packages and remove existing one, both things we sometimes need during stable upgrades.

Always use:
Code:
apt update
apt dist-upgrade
(note: you could replace "apt" with "apt-get", that's fine, but "apt" has a better user experience for humans, IMO, and it combines the most used commands from all the apt-* tools, for example "apt search").

would you advice to proceed?
To your real question: yes, it's a normal upgrade, we recently moved this pve-container version to the public repos and the others all seem well too.
 
  • Like
Reactions: ebiss

Rosario Contarino

New Member
Jul 16, 2019
28
0
1
51
If I wanted to use ProxMox as foundation for a Hybrid cloud architecture (on premises and from third part vendors, e.g. AWS, Azure, etc.), can I install a ProxMox cluster on, let's say, AWS and then move around VMs and LXD containers using proxmove between a cluster on premises and a cluster on AWS? Any drawbacks?
 

tin

Member
Aug 14, 2010
107
2
18
Northwest NSW, Australia
I'm trying to upgrade a Proxmox 5.4 box to 6.0. pve5to6 did not report any problems.
My Apt is stuck on:
Code:
Setting up lxc-pve (3.1.0-61)
I did a standalone box yesterday with the same problem. Had to kill a few processes to get it moving again. Got the same problem again on the first of my home boxes. I didn't record any notes yesterday, but today I had the following:
Code:
"Setting up lxc-pve (3.1.0-61) ..."
Resolved by killing:
/bin/systemctl restart lxc-monitord.service lxc-net.service
/bin/sh - /usr/lib/x86_64-linux-gnu/lxc/lxc-net start
/bin/sh /var/lib/dpkg/info/lxc-pve.postinst configure 3.1.0-3


"Setting up pve-ha-manager (3.0-2) ..."
Resolved by killing:
/bin/systemctl restart pve-ha-lrm.service
Edit: After the dist-upgrade finishes, I was left with several pve packages not configured... A reboot it required to get these to configure successfully. Attempting to configure them before rebooting just results in the same freeze.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!