Backup failed

BitSprocket

Member
Jul 16, 2012
4
0
21
All,

I've been having trouble with backups lately. I have a 500GB VM that I'm backing up nightly (max backup copies: 2) to a 1.5TB disk. It has been running fine for well over a year and the started failing complaining about the backup drive having broken pipes. I suspected the backup drive was failing so I replace it. It has been working fine for a couple of days now but suddenly last night's overnight backup failed with this:

105: Jul 07 00:00:02 INFO: Starting Backup of VM 105 (qemu)105: Jul 07 00:00:02 INFO: status = running
105: Jul 07 00:00:03 INFO: update VM 105: -lock backup
105: Jul 07 00:00:03 INFO: backup mode: snapshot
105: Jul 07 00:00:03 INFO: ionice priority: 7
105: Jul 07 00:00:03 INFO: skip unused drive 'backup:105/vm-105-disk-2.qcow2' (not included into backup)
105: Jul 07 00:00:03 INFO: creating archive '/backup/dump/vzdump-qemu-105-2013_07_07-00_00_02.vma.gz'
105: Jul 07 00:00:03 INFO: started backup task '4d3ba95b-3e9e-4026-99c5-a3c85fc02d71'
105: Jul 07 00:00:06 INFO: status: 0% (79822848/496068722688), sparse 0% (7544832), duration 3, 26/24 MB/s
105: Jul 07 00:00:50 INFO: status: 1% (5819006976/496068722688), sparse 1% (5399236608), duration 47, 130/7 MB/s
105: Jul 07 00:00:53 INFO: status: 2% (12369461248/496068722688), sparse 2% (11949690880), duration 50, 2183/0 MB/s
105: Jul 07 00:00:56 INFO: status: 3% (18904317952/496068722688), sparse 3% (18484547584), duration 53, 2178/0 MB/s
105: Jul 07 00:00:59 INFO: status: 5% (25438060544/496068722688), sparse 5% (25018290176), duration 56, 2177/0 MB/s
105: Jul 07 00:01:02 INFO: status: 6% (31658606592/496068722688), sparse 6% (31238836224), duration 59, 2073/0 MB/s
105: Jul 07 00:01:09 INFO: status: 7% (35412049920/496068722688), sparse 7% (34873106432), duration 66, 536/17 MB/s
105: Jul 07 00:03:42 INFO: status: 8% (39702036480/496068722688), sparse 7% (36121055232), duration 219, 28/19 MB/s
105: Jul 07 00:07:41 INFO: status: 9% (44651380736/496068722688), sparse 7% (36315267072), duration 458, 20/19 MB/s
105: Jul 07 00:11:34 INFO: status: 10% (49617960960/496068722688), sparse 7% (36423000064), duration 691, 21/20 MB/s
105: Jul 07 00:15:46 INFO: status: 11% (54577266688/496068722688), sparse 7% (36480749568), duration 943, 19/19 MB/s
105: Jul 07 00:19:42 INFO: status: 12% (59532640256/496068722688), sparse 7% (36645298176), duration 1179, 20/20 MB/s
105: Jul 07 00:23:36 INFO: status: 13% (64489259008/496068722688), sparse 7% (36968579072), duration 1413, 21/19 MB/s
105: Jul 07 00:27:46 INFO: status: 14% (69459181568/496068722688), sparse 7% (37246402560), duration 1663, 19/18 MB/s
105: Jul 07 00:32:12 INFO: status: 15% (74421501952/496068722688), sparse 7% (37466824704), duration 1929, 18/17 MB/s
105: Jul 07 00:36:32 INFO: status: 16% (79495954432/496068722688), sparse 7% (37750685696), duration 2189, 19/18 MB/s
105: Jul 07 00:40:30 INFO: status: 17% (84349943808/496068722688), sparse 7% (37888196608), duration 2427, 20/19 MB/s
105: Jul 07 00:44:45 INFO: status: 18% (89298960384/496068722688), sparse 7% (37949906944), duration 2682, 19/19 MB/s
105: Jul 07 00:48:47 INFO: status: 19% (94266064896/496068722688), sparse 7% (38134910976), duration 2924, 20/19 MB/s
105: Jul 07 00:52:54 INFO: status: 20% (99215998976/496068722688), sparse 7% (38325841920), duration 3171, 20/19 MB/s
105: Jul 07 00:56:59 INFO: status: 21% (104187822080/496068722688), sparse 7% (38525804544), duration 3416, 20/19 MB/s
105: Jul 07 01:01:07 INFO: status: 22% (109148241920/496068722688), sparse 7% (38730977280), duration 3664, 20/19 MB/s
105: Jul 07 01:05:16 INFO: status: 23% (114108661760/496068722688), sparse 7% (38750363648), duration 3913, 19/19 MB/s
105: Jul 07 01:09:26 INFO: status: 24% (119065280512/496068722688), sparse 7% (38848811008), duration 4163, 19/19 MB/s
105: Jul 07 01:13:29 INFO: status: 25% (124037103616/496068722688), sparse 7% (38873612288), duration 4406, 20/20 MB/s
105: Jul 07 01:17:32 INFO: status: 26% (128978518016/496068722688), sparse 7% (38886453248), duration 4649, 20/20 MB/s
105: Jul 07 01:21:41 INFO: status: 27% (133954142208/496068722688), sparse 7% (38902599680), duration 4898, 19/19 MB/s
105: Jul 07 01:25:42 INFO: status: 28% (138910760960/496068722688), sparse 7% (38910128128), duration 5139, 20/20 MB/s
105: Jul 07 01:30:05 INFO: status: 29% (143867379712/496068722688), sparse 7% (38930317312), duration 5402, 18/18 MB/s
105: Jul 07 01:32:36 INFO: status: 29% (146421710848/496068722688), sparse 7% (38947393536), duration 5553, 16/16 MB/s
105: Jul 07 01:32:36 ERROR: vma_co_write write error - Broken pipe
105: Jul 07 01:32:36 INFO: aborting backup job
105: Jul 07 01:32:38 ERROR: Backup of VM 105 failed - vma_co_write write error - Broken pipe


Looking at DMESG I am seeing some I/O errors:


sd 5:0:0:0: [sdc] 2930277168 512-byte logical blocks: (1.50 TB/1.36 TiB)
sd 5:0:0:0: [sdc] 4096-byte physical blocks
sd 5:0:0:0: [sdc] Write Protect is off
sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdc: unknown partition table
sd 5:0:0:0: [sdc] Attached SCSI disk
sdc: sdc1
EXT3-fs (sdc1): using internal journal
EXT3-fs (sdc1): mounted filesystem with ordered data mode
sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4d 1a c8 00 00 08 00
Buffer I/O error on device sdc1, logical block 74031705
lost page write due to I/O error on sdc1
sd 5:0:0:0: [sdc] killing request
sd 5:0:0:0: [sdc] Unhandled error code
sd 5:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4d 2a d0 00 03 f0 00
Buffer I/O error on device sdc1, logical block 74032218
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 74032219
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 74032220
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 74032221
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 74032222
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 74032223
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 74032224
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 74032225
EXT3-fs error (device sdc1): read_block_bitmap: Cannot read block bitmap - block_group = 2260, block_bitmap = 74055680
EXT3-fs (sdc1): I/O error while writing superblock
EXT3-fs error (device sdc1): read_block_bitmap: Cannot read block bitmap - block_group = 2260, block_bitmap = 74055680
EXT3-fs (sdc1): I/O error while writing superblock
lost page write due to I/O error on sdc1
Buffer I/O error on device sdc1, logical block 74032226
lost page write due to I/O error on sdc1
sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4d 1a d0 00 04 00 00
sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4d 1e d0 00 04 00 00
sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4d 22 d0 00 04 00 00
sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4d 26 d0 00 04 00 00
sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4c cc a0 00 02 08 00
sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4c ce b0 00 04 00 00
sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4c d2 b0 00 00 08 00
sd 5:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 5:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
sd 5:0:0:0: [sdc] Add. Sense: No additional sense information
sd 5:0:0:0: [sdc] CDB: Write(10): 2a 00 23 4c d2 b8 00 04 00 00
end_request: I/O error, dev sdc, sector 592238264
end_request: I/O error, dev sdc, sector 592239288
end_request: I/O error, dev sdc, sector 592240312
end_request: I/O error, dev sdc, sector 592240320
end_request: I/O error, dev sdc, sector 592241344
end_request: I/O error, dev sdc, sector 592242368
end_request: I/O error, dev sdc, sector 592243392
end_request: I/O error, dev sdc, sector 592243400
end_request: I/O error, dev sdc, sector 592244408
end_request: I/O error, dev sdc, sector 592245432
end_request: I/O error, dev sdc, sector 592246456
end_request: I/O error, dev sdc, sector 592246464
end_request: I/O error, dev sdc, sector 592247488
end_request: I/O error, dev sdc, sector 592248512
end_request: I/O error, dev sdc, sector 592249536
end_request: I/O error, dev sdc, sector 592249544
end_request: I/O error, dev sdc, sector 592250568
end_request: I/O error, dev sdc, sector 592251592
end_request: I/O error, dev sdc, sector 592252608
end_request: I/O error, dev sdc, sector 592252616
end_request: I/O error, dev sdc, sector 592253640
end_request: I/O error, dev sdc, sector 592254664
JBD: Detected IO errors while flushing file data on sdc1
Aborting journal on device sdc1.
JBD: I/O error detected when updating journal superblock for sdc1.
JBD: Detected IO errors while flushing file data on sdc1
EXT3-fs (sdc1): error: ext3_journal_start_sb: Detected aborted journal
EXT3-fs (sdc1): error: remounting filesystem read-only
sd 5:0:0:0: [sdc] Synchronizing SCSI cache
EXT3-fs error (device sdc1): ext3_find_entry: reading directory #2 offset 0
sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 5:0:0:0: [sdc] Stopping disk
sd 5:0:0:0: [sdc] START_STOP FAILED
sd 5:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
EXT3-fs error (device sdc1): ext3_find_entry: reading directory #2 offset 0

This is a new drive on a different SATA controller but it seems to be having the same fate as the previous drive. I did notice an 'unknown partition table' which is odd since I just created and formatted the partition when I installed it.

Any ideas where to look next?

Thanks in advance //
 
it looks like there is a filesystem error.

from cli, try :
Code:
umount /dev/sdc1

# make sure that worked, then :
fsck  /dev/sdc1

Also what is the filesystem type? [ use df -T ]
 
I agree it *seems* to be a file system error of some variety. It's EXT3 and I just finished a fsck.ext3 -yf /dev/sdc1 which of course reported no errors. I did have to reboot the machine to get it to mount /dev/sdc1 as it was complaining that it was not a valid block device. It would appear that after all the i/o errors the kernel dropped the device and refused to remount it without a reboot.
 
@MasterTH - Yep, changed the cable when I changed the drive - thanks for the idea though...

@RobFantini - No raid involved.

Incidentally, after yesterday's reboot, the backup succeeded overnight. If history is an indicator, it will work for a day or two then fail again....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!