[SOLVED] Error in tape backup when a tape is near full

jamarsa

Member
Mar 31, 2020
22
0
21
61
Hello, I am operating with a tape library, and have a media pool with several tapes included. The expected behaviour (that I had in a previous version of PBS) was to switch tapes when the one currently used became full. But since a few months ago, whenever a tape is a few hundred GBs away from using its capacity, I get the following error:



Code:
2024-06-05T05:28:59+02:00: skip snapshot ns/WhiteCluster/vm/11006/2023-12-31T01:02:54Z
2024-06-05T05:28:59+02:00: skip snapshot ns/WhiteCluster/vm/11006/2024-04-28T00:02:07Z
2024-06-05T05:28:59+02:00: skip snapshot ns/WhiteCluster/vm/11006/2024-05-19T00:02:29Z
2024-06-05T05:28:59+02:00: skip snapshot ns/WhiteCluster/vm/11006/2024-05-26T00:02:22Z
2024-06-05T05:28:59+02:00: skip snapshot ns/WhiteCluster/vm/11006/2024-06-02T00:02:13Z
2024-06-05T05:28:59+02:00: skip snapshot ns/WhiteCluster/vm/11006/2024-06-03T00:02:02Z
2024-06-05T05:28:59+02:00: skip snapshot ns/WhiteCluster/vm/11006/2024-06-04T00:02:05Z
2024-06-05T05:28:59+02:00: backup snapshot "ns/WhiteCluster/vm/11006/2024-06-05T00:02:19Z"
2024-06-05T05:29:09+02:00: wrote 691 chunks (1123.02 MB at 113.18 MB/s)
2024-06-05T05:29:15+02:00: TASK ERROR: write filemark  failed - No Sense, Additional sense: Programmable early warning detected

The situation now is that I have to be extra vigilant when a tape is almost full, and mark it manually as full to avoid a failed backup.

Is there some way to avoid this? Perhaps sending some sort of command to the tape library to disable this warning, or make PBS switch tapes when this event is triggered?
 
Last edited:
Hi,

do you use the tapes or drive with a different application too?

the 'Programmable early warning detected' error comes when an application sets a custom Programmable Early Warning Zone (PEWZ) , but we don't set or check for that so that could imo only happen when it's used with a different application too.

i'll see that i send patches for that anyway since aborting on that error makes also no sense
maybe we'll try to set it to 0 anyway, but i'll check if that's feasible
 
oh, in the meantime, could you maybe open a bug for this: https://bugzilla.proxmox.com ?
so we can more easily track it and don't forget (also you'll get a notification then when it's fixed)
 
Hi,

do you use the tapes or drive with a different application too?

the 'Programmable early warning detected' error comes when an application sets a custom Programmable Early Warning Zone (PEWZ) , but we don't set or check for that so that could imo only happen when it's used with a different application too.

i'll see that i send patches for that anyway since aborting on that error makes also no sense
maybe we'll try to set it to 0 anyway, but i'll check if that's feasible
Well yes, I use LTFS occasionally - downloaded from git & compiled. But it's installed and used in 2021, and the error started appearing much later. I have scanned the source, and it mentions setting PEWS (Programmable Early Warning Space):
C:
/* Setup tape device.
 */
int ltfs_setup_device(struct ltfs_volume *vol)
{
        int ret;
        bool enabled;


        CHECK_ARG_NULL(vol, -LTFS_NULL_ARG);


        /* Check a cartrige is loaded or at lock position
           and suppress unnessesary senses before issueing mode select in follwing part */
        ret = tape_is_cartridge_loadable(vol->device);
        if (ret < 0)
                return ret;


        /* Set Programmable Early Warning Space so that half of
           Index partition space is reserved for index file. */
        ret = tape_set_pews(vol->device, vol->set_pew);

But the function is used both for setting a space, and clearing it (vol->set_pew, true or false).

C:
if (set_value) {
                /* set PEW to half of capacity of index partition */
                half_of_max_p0 = cap.max_p0 / 2;
                pews = (uint16_t) (half_of_max_p0 < max_pews ? half_of_max_p0 : max_pews);
        }
        else {
                /* clear PEW value */
                pews = 0;
        }

Doing a search, it seems that is activated on mounts (and not deactivated after, but I'm not sure if some other code outside of ltfs.c is called), and deactivated on fsck or mkltfs.


Code:
/ltfs-git/ltfs# grep -R set_pew . 2>/dev/null
./src/utils/ltfsindextool.c:    vol->set_pew = false;
./src/utils/ltfsck.c:    vol->set_pew = false;
./src/utils/mkltfs.c:    vol->set_pew = false;
./src/utils/mkltfs.c:    vol->set_pew = false;
./src/libltfs/tape.c:int tape_set_pews(struct device_data *dev, bool set_value)
./src/libltfs/ltfs.h:    bool set_pew;                  /**< Set PEW value */
./src/libltfs/tape.h:int tape_set_pews(struct device_data *dev, bool set_value);
./src/libltfs/ltfs.c:    newvol->set_pew = true;
./src/libltfs/ltfs.c:    ret = tape_set_pews(vol->device, vol->set_pew);

So.. I'm going to load a tape, and without mounting it do a ltfsck and cross my fingers.
Is there a way to check if the PEWZ is activated on the drive, perhaps via tapeinfo or pmt or some other scsi command? Or it's a setting stored in the cartridge?
If the latter, it's strange because I have only used a pair of cartridges for LTFS.
The LTO-8 Reference seems to indicate a setting in the drive. And of course warns that PEWZ must be addressed by all aplications using de drive, if set.

Rich (BB code):
4.5 Programmable early warning
When writing, the application client may need an indication prior to early warning to allow for the application client
to prepare to be ready for early warning (e.g., flush buffers in the application client).
Application clients that need this indication may request the device server to create a zone called the
programmable-early-warning zone (PEWZ) by setting the PEWS field (see 6.6.8) to the requested size of the
PEWZ. The EOP side of PEWZ is established at early-warning and extends towards BOP for a distance
indicated by the PEWS field.
If PEWZ is used, all applications that may access the drive when a PEWZ exists, should support PEWZ or there
is a risk of the application that does not support PEWZ detecting an unknown error or a diminished capacity
WARNING WARNING
when the PROGRAMMABLE EARLY WARNING error is reported.
The REW bit in the Device Configuration mode page (see 6.6.11) shall have no effect on the device server
behavior in the PEWZ.

IBM LTO-8 Reference
 
Last edited:
Hello again.

Indeed after the ltfsck (that seems to clear the PEWZ), and letting the backups go ahead, I can confirm that the error does not occur and the dataset switches tapes correctly.
Rich (BB code):
024-06-21T06:02:48+02:00: end backup BackupReAL:"ns/WhiteCluster/vm/74000014/2024-06-21T01:30:04Z"
2024-06-21T06:02:48+02:00: percentage done: 78.40% (123/158 groups, 29/33 snapshots in group #124)
2024-06-21T06:02:48+02:00: backup snapshot "ns/WhiteCluster/vm/74000014/2024-06-21T02:00:05Z"
2024-06-21T06:02:53+02:00: wrote 254 chunks (395.84 MB at 91.83 MB/s)
2024-06-21T06:02:56+02:00: allocated new writable media '000008L8'
2024-06-21T06:02:56+02:00: eject current media
2024-06-21T06:03:31+02:00: trying to load media '000008L8' into drive '11CC7BA05B'
2024-06-21T06:04:35+02:00: found media label 000008L8 (06ec59f5-0499-4553-bc53-2bd63983f045)
2024-06-21T06:04:35+02:00: writing new media set label
2024-06-21T06:04:46+02:00: moving to end of media
2024-06-21T06:04:48+02:00: arrived at end of media
2024-06-21T06:04:48+02:00: write catalog for previous media: 2c2a6b63-9a0b-4157-a087-5e1758689120
2024-06-21T06:04:49+02:00: end backup BackupReAL:"ns/WhiteCluster/vm/74000014/2024-06-21T02:00:05Z"
2024-06-21T06:04:49+02:00: percentage done: 78.42% (123/158 groups, 30/33 snapshots in group #124)
2024-06-21T06:04:49+02:00: backup snapshot "ns/WhiteCluster/vm/74000014/2024-06-21T02:30:05Z"
2024-06-21T06:04:53+02:00: wrote 239 chunks (382.21 MB at 112.34 MB/s)
2024-06-21T06:04:53+02:00: end backup BackupReAL:"ns/WhiteCluster/vm/74000014/2024-06-21T02:30:05Z"
2024-06-21T06:04:53+02:00: percentage done: 78.44% (123/158 groups, 31/33 snapshots in group #124)

So, I can mark this as solved.

But perhaps I should ask for the feature of detecting the PEWZ in the bugzilla page?
Maybe I would notify this also in the LTFS page.

Thanks again for pointing to the possible cause.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!