decode element status failed - failed to fill whole buffer

Thanks!


root@pbs01:~/pbssrc/proxmox-backup# gdb --args /usr/bin/pmtx status --changer msl2024
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/pmtx...
Reading symbols from /usr/lib/debug/.build-id/4e/8defcdf3749384e96a6d1087b55c07862ebe37.debug...
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /usr/bin/pmtx.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) break rust_panic
Breakpoint 1 at 0x125bd3: file library/std/src/panicking.rs, line 640.
(gdb) run
Starting program: /usr/bin/pmtx status --changer msl2024
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
using device /dev/tape/by-id/scsi-DEC80205XR
Transport Element (Griper) 0: Empty
Data Transfer Element (Drive) 0: Empty, Serial: HUE08455BM
Storage Element 1: VolumeTag("PBS001L4")
Storage Element 2: VolumeTag("PBS002L4")
Storage Element 3: VolumeTag("PBS003L4")
Storage Element 4: VolumeTag("PBS004L4")
Storage Element 5: VolumeTag("PBS005L4")
Storage Element 6: VolumeTag("PBS006L4")
Storage Element 7: VolumeTag("PBS007L4")
Storage Element 8: VolumeTag("PBS008L4")
Storage Element 9: VolumeTag("PBS009L4")
Storage Element 10: VolumeTag("PBS010L4")
Storage Element 11: VolumeTag("PBS011L4")
Storage Element 12: Empty
Storage Element 13: Empty
Storage Element 14: Empty
Storage Element 15: Empty
Storage Element 16: Empty
Storage Element 17: Empty
Storage Element 18: Empty
Storage Element 19: Empty
Storage Element 20: Empty
Storage Element 21: Empty
Storage Element 22: Empty
Storage Element 23: Empty
Storage Element 24: VolumeTag("CLN001CU")
[Inferior 1 (process 2581052) exited normally]
(gdb) thread apply all bt

Thread 1 (Thread 0x7ffff77476c0 (LWP 2581056) "pmtx"):
#0 std::panicking::rust_panic (msg=...) at library/std/src/panicking.rs:640
#1 0x0000555555679b8a in std::panicking::rust_panic_with_hook (payload=..., message=..., location=<optimized out>) at library/std/src/panicking.rs:610
#2 0x00005555556693d7 in std::panicking::begin_panic_handler::{{closure}} () at library/std/src/panicking.rs:497
#3 0x000055555566908c in std::sys_common::backtrace::__rust_end_short_backtrace<closure-0,!> (f=...) at library/std/src/sys_common/backtrace.rs:141
#4 0x00005555556796b2 in std::panicking::begin_panic_handler (info=0x7fffffffd418) at library/std/src/panicking.rs:493
#5 0x000055555569d131 in core::panicking::panic_fmt (fmt=...) at library/core/src/panicking.rs:92
#6 0x000055555569d25f in core::panicking::assert_failed::inner (kind=<optimized out>, left=..., right=..., args=<error reading variable: Cannot access memory at address 0x0>) at library/core/src/fmt/mod.rs:314
#7 0x00005555555a9c7b in core::panicking::assert_failed<usize,usize> (kind=core::panicking::AssertKind::Eq, left=0x7fffffffd660, right=0x5555556a07c0 <anon.8c282fdcb87b17243a4b531e6d7bcf58.63.llvm>, args=...) at /usr/src/rustc-1.52.1/library/core/src/panicking.rs:143
#8 0x00005555555b6fc7 in core::slice::{{impl}}::chunks_exact<u8> (self=..., chunk_size=0) at /usr/src/rustc-1.52.1/library/core/src/slice/mod.rs:863
#9 proxmox_backup::tape::changer::sg_pt_changer::decode_element_status_page::{{closure}} () at src/tape/changer/sg_pt_changer.rs:713
#10 proxmox_backup::tape::changer::sg_pt_changer::decode_element_status_page (_info=<optimized out>, data=..., start_element_address=<optimized out>) at src/tape/changer/sg_pt_changer.rs:676
#11 0x000055555559b139 in proxmox_backup::tape::changer::sg_pt_changer::get_element<std::fs::File> (inquiry=0x7fffffffdec8, sg_raw=0x7fffffffde70, element_type=<optimized out>, allocation_len=65536, retry=<optimized out>) at /usr/share/cargo/registry/proxmox-backup-2.0.7/src/tape/changer/sg_pt_changer.rs:345
#12 0x000055555559c93a in proxmox_backup::tape::changer::sg_pt_changer::read_element_status<std::fs::File> (file=<optimized out>) at /usr/share/cargo/registry/proxmox-backup-2.0.7/src/tape/changer/sg_pt_changer.rs:408
#13 0x000055555559058b in pmtx::status::{{closure}} () at src/bin/pmtx.rs:301
#14 pmtx::status (param=...) at src/bin/pmtx.rs:299
#15 pmtx::api_function_status (input_params=..., api_method_param=<optimized out>, rpc_env_param=...) at src/bin/pmtx.rs:274
#16 0x000055555558c1ef in core::eek:ps::function::Fn::call<fn(serde_json::value::Value, &proxmox::api::router::ApiMethod, &mut RpcEnvironment) -> core::result::Result<serde_json::value::Value, anyhow::Error>,(serde_json::value::Value, &proxmox::api::router::ApiMethod, &mut RpcEnvironment)> () at /usr/src/rustc-1.52.1/library/core/src/ops/function.rs:70
#17 0x00005555555d5a2f in core::eek:ps::function::impls::{{impl}}::call<(serde_json::value::Value, &proxmox::api::router::ApiMethod, &mut RpcEnvironment),Fn<(serde_json::value::Value, &proxmox::api::router::ApiMethod, &mut RpcEnvironment)>> (self=<optimized out>, args=...) at /usr/src/rustc-1.52.1/library/core/src/ops/function.rs:237
#18 proxmox::api::cli::command::handle_simple_command (prefix=..., cli_cmd=0x55555579b330, args=..., rpcenv=..., run=...) at /usr/share/cargo/registry/proxmox-0.12.1/src/api/cli/command.rs:109
#19 0x00005555555d7084 in proxmox::api::cli::command::handle_command (def=..., prefix=..., args=..., rpcenv=..., run=...) at /usr/share/cargo/registry/proxmox-0.12.1/src/api/cli/command.rs:333
#20 0x0000555555593ec8 in proxmox::api::cli::command::run_cli_command<proxmox::api::cli::CliCommandMap> (def=..., rpcenv=..., run=...) at /usr/share/cargo/registry/proxmox-0.12.1/src/api/cli/command.rs:418
#21 0x000055555558e6d0 in pmtx::main () at src/bin/pmtx.rs:484
(gdb)

no errors with mtx -f /dev/tape/by-id/scsi-DEC80205XR status
 
Last edited:
ok that confirms my suspicion where the panic comes from (already sent a patch for that[0]) but the data the library sends still makes no sense..

no errors with /dev/tape/by-id/scsi-DEC80205XR
you mean with mtx ?
also if you run it multiple times (like you had to with pmtx ?)

if that works all the time, i'll have to look into the mtx source to check what we still might do wrong....

edit: add link to patch...

0: https://lists.proxmox.com/pipermail/pbs-devel/2021-July/003784.html
 
  • Like
Reactions: flames
compiled again from master branch, now pmtx does following:
Starting program: /usr/bin/pmtx status --changer msl2024
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
using device /dev/tape/by-id/scsi-DEC80205XR
Transport Element (Griper) 0: Empty
Data Transfer Element (Drive) 0: Empty, Serial: HUE08455BM
Storage Element 1: VolumeTag("PBS001L4")
Storage Element 2: VolumeTag("PBS002L4")
Storage Element 3: VolumeTag("PBS003L4")
Storage Element 4: VolumeTag("PBS004L4")
Storage Element 5: VolumeTag("PBS005L4")
Storage Element 6: VolumeTag("PBS006L4")
Storage Element 7: VolumeTag("PBS007L4")
Storage Element 8: VolumeTag("PBS008L4")
Storage Element 9: VolumeTag("PBS009L4")
Storage Element 10: VolumeTag("PBS010L4")
Storage Element 11: VolumeTag("PBS011L4")
Storage Element 12: Empty
Storage Element 13: Empty
Storage Element 14: Empty
Storage Element 15: Empty
Storage Element 16: Empty
Storage Element 17: Empty
Storage Element 18: Empty
Storage Element 19: Empty
Storage Element 20: Empty
Storage Element 21: Empty
Storage Element 22: Empty
Storage Element 23: Empty
Storage Element 24: VolumeTag("CLN001CU")
[Inferior 1 (process 1728429) exited normally]
(gdb) run
Starting program: /usr/bin/pmtx status --changer msl2024
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
using device /dev/tape/by-id/scsi-DEC80205XR
Error: decode element status failed - got elements, but descriptor length 0
[Inferior 1 (process 1728430) exited with code 0377]
(gdb) run
Starting program: /usr/bin/pmtx status --changer msl2024
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
using device /dev/tape/by-id/scsi-DEC80205XR
Error: decode element status failed - got elements, but descriptor length 0
[Inferior 1 (process 1728431) exited with code 0377]
(gdb) run
Starting program: /usr/bin/pmtx status --changer msl2024
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
using device /dev/tape/by-id/scsi-DEC80205XR
Transport Element (Griper) 0: Empty
Data Transfer Element (Drive) 0: Empty, Serial: HUE08455BM
Storage Element 1: VolumeTag("PBS001L4")
Storage Element 2: VolumeTag("PBS002L4")
Storage Element 3: VolumeTag("PBS003L4")
Storage Element 4: VolumeTag("PBS004L4")
Storage Element 5: VolumeTag("PBS005L4")
Storage Element 6: VolumeTag("PBS006L4")
Storage Element 7: VolumeTag("PBS007L4")
Storage Element 8: VolumeTag("PBS008L4")
Storage Element 9: VolumeTag("PBS009L4")
Storage Element 10: VolumeTag("PBS010L4")
Storage Element 11: VolumeTag("PBS011L4")
Storage Element 12: Empty
Storage Element 13: Empty
Storage Element 14: Empty
Storage Element 15: Empty
Storage Element 16: Empty
Storage Element 17: Empty
Storage Element 18: Empty
Storage Element 19: Empty
Storage Element 20: Empty
Storage Element 21: Empty
Storage Element 22: Empty
Storage Element 23: Empty
Storage Element 24: VolumeTag("CLN001CU")
[Inferior 1 (process 1728432) exited normally]
(gdb)
 
compiled again from master branch, now pmtx does following:
undefined
yes thats expected. in the patch i sent, i throw an error instead of letting the program run into the panic (which is never good!)

but the underlying problem seems to be that the library sends weird data, and i do still not know if thats because the library has some problems, or if our code does something wrong (it works with many other library without problems....)

so far, i do not see what we would do wrong, looking at multiple library scsi references from hp/ibm/etc.
 
  • Like
Reactions: flames
i analyzed the returning data from library (sg_raw -r 64k /dev/tape/by-id/scsi-DEC80205XR B8 11 00 00 ff ff 00 ff ff ff 00 00 2>&1 | tee -a /root/test2.txt)

in expected probability of repetition as the error/panic happens, the retuned data is full 64kB (65536 bytes). but all those extra bytes are just zeros. the first lines are the same as in the "correct" answers (which are 68 bytes for B8 11 00 00 ff ff 00 ff ff ff 00 00).

Received 68 bytes of data:
00 00 00 00 01 00 00 00 3c 01 80 00 34 00 00 00 34 .......<...4...4
10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
40 00 00 00 00 ....
SCSI Status: Good

Received 65536 bytes of data:
00 00 00 00 01 00 00 00 3c 01 80 00 34 00 00 00 34 .......<...4...4
10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
60 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
70 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
--//--
ffa0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
ffb0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
ffc0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
ffd0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
ffe0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
fff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
SCSI Status: Good

wouldn't it be a solution to accept a 64kb answer and just cut the zeros off? :)
 
wouldn't it be a solution to accept a 64kb answer and just cut the zeros off? :)
yes, and honestly i thought we did exactly that.... but i checked your sample data manually with our code, and.. we do not :(
i'll send a fix later today. this finally gave me the push to start write some tests for the decoding of the element status page (this is what made the most problems so far...)
 
  • Like
Reactions: flames
It works now perfectly! at least the previous "short" tasks like reload and clean drive. i will test everything (backup etc.) out and report later.
Thank you Dominik!

Edit: so far everything is good. first backup job is running since 22 hours, is at 45%, tape 9, speed is good. will take another day or two (18tb datastore).
and i got a wow-moment: considering that the backups are zstd compressed, lto compression still managed to write ~850gb per tape average yet.
tho, my childish illusions to have 18tb to tape in a daily rotation with lto4 are gone :)
 
Last edited:
i would like to add here, that HP/E has always and everywhere firmware issues (firmware issues are a common problem with any HP/E hardware, old or new). so i hope your efforts were not only useful for commodity hardware as mine (msl2024 + lto4), but will also have a positive impact on modern and diverse hardware (as seen in the documentation here https://pbs.proxmox.com/docs/_images/pbs-gui-tape-drives.png ... a dream of a tape lib, not all small to medium businesses can afford that even from second hand market) :)
btw. recompiled latest master (including commit 855b55dc14de3ad2af0307e3b1d8cb0ec978f9d2): listing of tape catalog is nearly instant now (vs. before 12-15 seconds). cool and thank you again! (Wirklich, vielen Dank, Gruss Arthur)
 
thanks for your report, great that it seems to work now :)
considering that the backups are zstd compressed, lto compression still managed to write ~850gb per tape average yet.
would not have expected that. ~6% additional compression is not that bad

tho, my childish illusions to have 18tb to tape in a daily rotation with lto4 are gone :)
well 18Tb with LTO-4's maximum of 120MB/s is about 42 hours, so yeah that won't work ;)

a dream of a tape lib, not all small to medium businesses can afford that even from second hand market) :)
well, to be honest i think those are 'virtual tape drives' simulated with quadstor (see here for details: https://pbs.proxmox.com/wiki/index.php/Installing_a_Virtual_Tape_Library )
listing of tape catalog is nearly instant now (vs. before 12-15 seconds).
great, we aimed for such improvements
thanks again for helping to debug
 
  • Like
Reactions: flames
for sure all thanks are to you and ofc. all in the proxmox team who do a lot of work for such amazing products :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!