Cephfs Storage Object unable to transfer files over 999Kb

Nicolas Lobariñas

Renowned Member
May 9, 2017
8
0
66
45
Hi,I'll bring you a strange case.
We have 3 clusters (production, contingency and develop). All of them running PVE 7.4 and Ceph 16.2.15.
We use contingency cluster to share ISO images for VM installations mounting Cephfs storage object from PVE GUI.
Recently when creating VM at develop's cluster, the wizard started to hang with any ISO image. Only on that cluster.
We also try to copy files from mounted cephfs on /mnt/pve/Instaladores/template/iso and behavior is this:

Error code when stopping VM creation after several minutes "working":
Code:
command '/usr/bin/qemu-img info '--output=json' /mnt/pve/Instaladores/template/iso/debian-12.11.0-amd64-netinst.iso' failed: received interrupt
could not parse qemu-img info command output for '/mnt/pve/Instaladores/template/iso/debian-12.11.0-amd64-netinst.iso' - malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/share/perl5/PVE/Storage/Plugin.pm line 946.
TASK ERROR: unable to create VM 3048 - volume Instaladores:iso/debian-12.11.0-amd64-netinst.iso does not exist

Hangs when copying from console....
Code:
 [root@PVE-DEV /]# cp /mnt/pve/Instaladores/template/iso/debian-12.11.0-amd64-netinst.iso ~/

Copying files smaller than 998Kb works just fine
Code:
[root@PVE-DEV /]# dd if=/dev/zero of=998K.txt  bs=998K  count=1
1+0 records in
1+0 records out
1021952 bytes (1.0 MB, 998 KiB) copied, 0.00361033 s, 283 MB/s

[root@PVE-DEV /]# cp 998K.txt /mnt/pve/Instaladores/template/iso/

Copying files equal or greater than 999Kb fails
Code:
 [root@PVE-DEV /]# dd if=/dev/zero of=999K.txt  bs=999K  count=1
1+0 records in
1+0 records out
1022976 bytes (1.0 MB, 999 KiB) copied, 0.0154679 s, 66.1 MB/s

[root@PVE-DEV /]#  cp 999K.txt /mnt/pve/Instaladores/template/iso/

On other console.... process status is "uninterruptible sleep"
Code:
[root@PVE-DEV /]# ps ax | grep 999K
4130433 pts/0    D+     0:00 cp 999K.txt /mnt/pve/Instaladores/template/iso/

We have tried upgrading CEPH to the last version, restarting MDS services, restarting each node, removing and creating Storage objet, everything... No error were logged...

/etc/pve/storage.cfg
Code:
cephfs: Instaladores
    path /mnt/pve/Instaladores
    content iso
    fs-name cephfs
    monhost 10.x.x.x 10.x.x.x 10.x.x.x
    prune-backups keep-all=1
    username admin

Versions:
Bash:
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.4: 6.4-20
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph: 16.2.15-pve1
ceph-fuse: 16.2.15-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

On server side...
Code:
[root@node1 ~]# ceph status
  cluster:
    id:     cdab5d3f-42a0-4cba-8e91-7a79c10404ed
    health: HEALTH_OK
 
  services:
    mon: 4 daemons, quorum node1,node2,node3,node4 (age 4d)
    mgr: node3(active, since 4d), standbys: node2, node1
    mds: 1/1 daemons up, 2 standby
    osd: 24 osds: 24 up (since 4d), 24 in (since 3M)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 169 pgs
    objects: 491.38k objects, 1.7 TiB
    usage:   5.0 TiB used, 15 TiB / 20 TiB avail
    pgs:     169 active+clean
 
  io:
    client:   37 KiB/s rd, 2.9 MiB/s wr, 9 op/s rd, 379 op/s wr

Code:
[root@PVE-CONT /]#  ceph fs get cephfs
Filesystem 'cephfs' (1)
fs_name    cephfs
epoch    264
flags    12
created    2020-12-02T11:21:08.352970-0300
modified    2025-05-29T14:16:11.172011-0300
tableserver    0
root    0
session_timeout    60
session_autoclose    300
max_file_size    1099511627776
required_client_features    {}
last_failure    0
last_failure_osd_epoch    46897
compat    compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds    1
in    0
up    {0=277564840}
failed    
damaged    
stopped    
data_pools    [7]
metadata_pool    8
inline_data    disabled
balancer    
standby_count_wanted    1
[mds.node1{0:277564840} state up:active seq 7 addr [v2:10.6.25.4:6800/1338729611,v1:10.6.25.4:6801/1338729611] compat {c=[1],r=[1],i=[7ff]}]