that was indeed an important missing piece of information :) I am not sure what kind of backup archives that creates! file level access requires some level of structure inside the backup archive:
- for host/ct/file-based backups, we have our own archive format (pxar) which is - so far - not...
the backup and reader sessions use HTTP2, which probably changes the traffic pattern sufficiently from other work loads. we haven't had any reports yet of other traffic/connections being affected. it might also be a bad interaction between the HTTP 2 client/server code and the new behaviour of...
you can change the sync behaviour in the tuning options of the datastore, if you switch to more consistent ones the missing chunks should not longer happen.
this sounds like there might be some bigger issue with the storage though..
how did you backup the windows system? using a live-cd and raw/image backup of its disk? how are you attempting to restore it? please provide more details, else nobody will understand what you are trying to do and why it fails ;)
we're in the same boat there. unfortunately we haven't managed to reproduce it at all so far, which likely means there is some additional factor network-wise that makes this much more likely to trigger on your systems.
that commit was also the first one we identified, but it seems that the fixes so far were incomplete.
we have one more in our queue that might be worth checking out, but if you still have the capacity, bisecting v6.17..v6.18 would be great! thanks a lot for the work you already put in, users...
could you try first running the "kvm" command printed by `qm showcmd 103`, if it also exits with that error, then run it under strace, it might shed some light..
this sounds like an instance of "hole" mishandling, which is often caused by devices lying about their support for discarding. changing the block size might just cause the data to be aligned differently by the guest OS and thus avoid the issue..
you could try running the clone under strace:
1...
testing whether the https://kernel.ubuntu.com/mainline/v6.16/ and https://kernel.ubuntu.com/mainline/v6.16.12/ kernels are showing the issue on your system(s) would be highly appreciated to narrow down the cause. we still cannot reproduce any issues on kernels after 6.17.4 in our test lab..
the configs (storage and ceph) don't agree on the ports - the first mon has port 3300 in your ceph config, and 6789 in your storage.cfg. assuming qemu picks the first monitor, if it is not listening on 6789, that would explain the error ;)
yes, that sounds like something is blocking waiting for I/O, and the next point for checking the abort state is not reached..
is it possible something is throttling network connections after a certain amount of traffic?