Hi,
We observed many cases where issues arise during the regular nightly backup.
The symptoms are the following:
The vzdump backup fails, usually with a reason like this:
122: Jun 16 04:27:30 INFO: status: 14% (4352311296/30064771072), sparse 1% (408707072), duration 325, 0/0 MB/s
122: Jun 16 04:27:30 ERROR: vma_queue_write: write error - Broken pipe
122: Jun 16 04:27:30 INFO: aborting backup job
122: Jun 16 04:27:31 ERROR: Backup of VM 122 failed - vma_queue_write: write error - Broken pipe
At the same time in the VM we see:
[1154217.813185] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[1154217.813375] ata1.01: failed command: WRITE DMA
[1154217.813430] ata1.01: cmd ca/00:28:78:e7:06/00:00:00:00:00/f1 tag 0 dma 20480 out
[1154217.813430] res 40/00:01:00:00:00/00:00:00:00:00/f0 Emask 0x4 (timeout)
[1154217.813528] ata1.01: status: { DRDY }
[1154217.814646] ata1: soft resetting link
[1154217.816792] [sched_delayed] sched: RT throttling activated
[1154217.817713] INFO: task jbd2/sda1-8:160 blocked for more than 120 seconds.
[1154217.817765] Not tainted 3.13.0-8-generic #28-Ubuntu
[1154217.817802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1154217.817871] jbd2/sda1-8 D ffff8801bfd14440 0 160 2 0x00000000
[1154217.817874] ffff8801b29bdbc8 0000000000000002 ffff880036a00000 ffff8801b29bdfd8
[1154217.817876] 0000000000014440 0000000000014440 ffff880036a00000 ffff8801bfd14d00
[1154217.817878] ffff8801bffbc8e0 0000000000000002 ffffffff811e86a0 ffff8801b29bdc40
[1154217.817879] Call Trace:
...
....
etc.
Once this issue arises, if will fail until we restart the VM. Then it will work for a couple of days again.
The server runs on SSD in RAID-1, PVE 3.4-6
Any idea what could be the issue here?
We observed many cases where issues arise during the regular nightly backup.
The symptoms are the following:
The vzdump backup fails, usually with a reason like this:
122: Jun 16 04:27:30 INFO: status: 14% (4352311296/30064771072), sparse 1% (408707072), duration 325, 0/0 MB/s
122: Jun 16 04:27:30 ERROR: vma_queue_write: write error - Broken pipe
122: Jun 16 04:27:30 INFO: aborting backup job
122: Jun 16 04:27:31 ERROR: Backup of VM 122 failed - vma_queue_write: write error - Broken pipe
At the same time in the VM we see:
[1154217.813185] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[1154217.813375] ata1.01: failed command: WRITE DMA
[1154217.813430] ata1.01: cmd ca/00:28:78:e7:06/00:00:00:00:00/f1 tag 0 dma 20480 out
[1154217.813430] res 40/00:01:00:00:00/00:00:00:00:00/f0 Emask 0x4 (timeout)
[1154217.813528] ata1.01: status: { DRDY }
[1154217.814646] ata1: soft resetting link
[1154217.816792] [sched_delayed] sched: RT throttling activated
[1154217.817713] INFO: task jbd2/sda1-8:160 blocked for more than 120 seconds.
[1154217.817765] Not tainted 3.13.0-8-generic #28-Ubuntu
[1154217.817802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1154217.817871] jbd2/sda1-8 D ffff8801bfd14440 0 160 2 0x00000000
[1154217.817874] ffff8801b29bdbc8 0000000000000002 ffff880036a00000 ffff8801b29bdfd8
[1154217.817876] 0000000000014440 0000000000014440 ffff880036a00000 ffff8801bfd14d00
[1154217.817878] ffff8801bffbc8e0 0000000000000002 ffffffff811e86a0 ffff8801b29bdc40
[1154217.817879] Call Trace:
...
....
etc.
Once this issue arises, if will fail until we restart the VM. Then it will work for a couple of days again.
The server runs on SSD in RAID-1, PVE 3.4-6
Any idea what could be the issue here?