Replication error I/O error

tomas12343

New Member
Jun 6, 2020
20
0
1
43
Tried to replicate vm with 3 virtual hdd's with total disk space 2.5 tb. At first I had some storage problems (as I mentioned on my previous post, they were resolved). Now replication goes well and on the second hdd I get an error with the log:

2020-06-12 02:52:00 208-0: start replication job
2020-06-12 02:52:00 208-0: guest => VM 208, running => 7308
2020-06-12 02:52:00 208-0: volumes => Disk5:vm-208-disk-0,Disk5:vm-208-disk-1,Disk5:vm-208-disk-2
2020-06-12 02:52:01 208-0: create snapshot '__replicate_208-0_1591919520__' on Disk5:vm-208-disk-0
2020-06-12 02:52:01 208-0: create snapshot '__replicate_208-0_1591919520__' on Disk5:vm-208-disk-1
2020-06-12 02:52:01 208-0: create snapshot '__replicate_208-0_1591919520__' on Disk5:vm-208-disk-2
2020-06-12 02:52:01 208-0: using secure transmission, rate limit: none
2020-06-12 02:52:01 208-0: full sync 'Disk5:vm-208-disk-0' (__replicate_208-0_1591919520__)
2020-06-12 02:52:02 208-0: full send of Disk5/vm-208-disk-0@__replicate_208-0_1591919520__ estimated size is 922G
2020-06-12 02:52:02 208-0: total estimated size is 922G
2020-06-12 02:52:03 208-0: TIME SENT SNAPSHOT Disk5/vm-208-disk-0@__replicate_208-0_1591919520__
2020-06-12 02:52:03 208-0: Disk5/vm-208-disk-0 name Disk5/vm-208-disk-0 -
2020-06-12 02:52:03 208-0: volume 'Disk5/vm-208-disk-0' already exists
2020-06-12 02:52:03 208-0: warning: cannot send 'Disk5/vm-208-disk-0@__replicate_208-0_1591919520__': signal received
2020-06-12 02:52:03 208-0: cannot send 'Disk5/vm-208-disk-0': I/O error
2020-06-12 02:52:03 208-0: command 'zfs send -Rpv -- Disk5/vm-208-disk-0@__replicate_208-0_1591919520__' failed: exit code 1
2020-06-12 02:52:03 208-0: delete previous replication snapshot '__replicate_208-0_1591919520__' on Disk5:vm-208-disk-0
2020-06-12 02:52:03 208-0: delete previous replication snapshot '__replicate_208-0_1591919520__' on Disk5:vm-208-disk-1
2020-06-12 02:52:04 208-0: delete previous replication snapshot '__replicate_208-0_1591919520__' on Disk5:vm-208-disk-2
2020-06-12 02:52:04 208-0: end replication job with error: command 'set -o pipefail && pvesm export Disk5:vm-208-disk-0 zfs - -with-snapshots 1 -snapshot __replicate_208-0_1591919520__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=emron-backup' root@192.168.10.223 -- pvesm import Disk5:vm-208-disk-0 zfs - -with-snapshots 1 -allow-rename 0' failed: exit code 255

My best guess is that the previous replication failed on hdd 2 (on the backup storage I see only hdd 0 and 1) and when the replication starts again it says that hdd 0 exists. But I cannot test it because the replication takes about 6-8 hours and I cannot wait on the pc for that long (Replication starts at midnight because it slows down the server for the first replication) and I cannot find previous replication logs..
Any ideas??
 
Last edited:
check the journal on both nodes - there should be more information regarding the source of the I/O error

I hope this helps!
Tried to, I cannot find the journal on destination.

What I did is started the replication and deactivated so that I could see the first log when it finished. The log is:

cannot receive new filesystem stream: out of space
2020-06-13 07:53:28 208-0: cannot open 'Disk5/vm-208-disk-2': dataset does not exist
2020-06-13 07:53:28 208-0: command 'zfs recv -F -- Disk5/vm-208-disk-2' failed: exit code 1
2020-06-13 07:53:28 208-0: delete previous replication snapshot '__replicate_208-0_1591994040__' on Disk5:vm-208-disk-0
2020-06-13 07:53:29 208-0: delete previous replication snapshot '__replicate_208-0_1591994040__' on Disk5:vm-208-disk-1
2020-06-13 07:53:31 208-0: delete previous replication snapshot '__replicate_208-0_1591994040__' on Disk5:vm-208-disk-2
2020-06-13 07:53:31 208-0: end replication job with error: command 'set -o pipefail && pvesm export Disk5:vm-208-disk-2 zfs - -with-snapshots 1 -snapshot __replicate_208-0_1591994040__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=emron-backup' root@192.168.10.223 -- pvesm import Disk5:vm-208-disk-2 zfs - -with-snapshots 1 -allow-rename 0' failed: exit code 1

I had a problem with space on the source disk: although there was enough space, when I started the replication, the space wasn't enough (system reserves space for replication), so I added another hdd in the pool.

Maybe the two disks are not sharing space and the first hdd fills up with the replication?
 
Tried to, I cannot find the journal on destination.
the journal can be read with `journalctl` - see `man journalctl`

cannot receive new filesystem stream: out of space
seems there is not enough space - as you analyzed correctly

check the output of:
* `zpool status`
* `zpool list`
* `zfs list`
* `zfs get all $dataset` - for one of the dataset you receive into
 
the journal can be read with `journalctl` - see `man journalctl`


seems there is not enough space - as you analyzed correctly

check the output of:
* `zpool status`
* `zpool list`
* `zfs list`
* `zfs get all $dataset` - for one of the dataset you receive into

I am waiting for a new hdd for the destination pool. Will inform how it will go. I really wish it is that simple!