pve > ceph > osd displays "partial read (500)"

RobFantini

Famous Member
May 24, 2012
2,042
109
133
Boston,Mass
'partial read (500)' displays in a rectangle.

same when click on 'pools'

I suspect it is due running vzdump backups on same network as ceph. the error continues an hour after the backup. network is 10G .

does anyone have a clue what else could cause the error?
 
more info. a pve-zsync that runs every 15 minutes had this occur for 2ND time in 3 days:

Code:
Date: Fri, 20 Jan 2017 13:30:01
From: Cron Daemon <root@f..>
To: root@...
Subject: Cron <root@sys1> pve-zsync sync --source 10.2.2.65:111 --dest tank/pve-zsync/15Minutes --name
  pro4-15min --maxsnap 96 --method ssh

COMMAND:
  ssh root@10.2.2.65 -- pvesm path kvm-zfs:vm-111-disk-1
GET ERROR:
  Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).

Job --source 10.2.2.65:111 --name pro4-15min got an ERROR!!!
ERROR Message:

the next pve-zsync worked.

also the log works on pve
 
OK I found a work around. Still not sure of the cause:

notes:
ceph seems OK, the issue just seems to be pve screen:
Code:
s020  /fbc/adm # ceph --status
  cluster 63efaa45-7507-428f-9443-82a0a546b70d
  health HEALTH_OK
  monmap e3: 3 mons at {0=10.2.2.21:6789/0,1=10.2.2.10:6789/0,2=10.2.2.67:6789/0}
  election epoch 28, quorum 0,1,2 1,0,2
  osdmap e71: 6 osds: 6 up, 6 in
  flags sortbitwise,require_jewel_osds
  pgmap v794889: 192 pgs, 3 pools, 290 GB data, 75330 objects
  581 GB used, 2070 GB / 2651 GB avail
  192 active+clean
  client io 30584 B/s wr, 0 op/s rd, 2 op/s wr
Mon ok per that.

for pve try:
Code:
systemctl restart ceph
that did not fix issue at pve.

note syslog has a lot of these, starting at login to pve time:
Code:
Jan 21 10:29:44 s020 pvedaemon[21409]: partial read
Jan 21 10:29:48 s020 pvedaemon[28890]: partial read
Jan 21 10:29:51 s020 pvedaemon[21409]: partial read

try
Code:
# systemctl restart pvedaemon

On that node pve > ceph osd , pools are normal.
error fixed!

Not so on other nodes. I did not wait more then a minute after fixing one node. It is possible that the other nodes could have self fixed after awhile.

on all nodes run systemctl restart pvedaemon .

that fixed issue. note no need to restart ceph on other nodes.