Hi,
Recently i have started using Ceph cluster for ha puropse.As per the documentation i have created with 3nodes in cluster.For Ceph storage purpose i have added these ssd's
Node1:1TB SSD
Node2:960GB SSD & 2.05TB SSD
Node3:Yet to connect.
on Ceph:
OSD's are created.
created storage pool.
Enabled monitoring too on 3 nodes.
Storage pool is showing across all nodes.
Created VM100 on Node2
Added HA rule as well.
Tested for demo by shutting down node2 and VM100 is migrated to Node1 as well.
But on not able to ssh or console on proxmox and here is the below error observing.
task started by HA resource agent
WARN: iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring
TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -name 'Redhat9,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=b4bc0197-107b-4aef-a7b1-130724eccbfb' -smp '8,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/100.vnc,password=on' -cpu qemu64,+aes,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+pni,+popcnt,+sse4.1,+sse4.2,+ssse3 -m 12048 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=7c5622f6-6e43-4c16-b0d8-ab6c1ed77d8c' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/100.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e6198908baa' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=rbd:Ceph_SDD/vm-100-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph_SDD.keyring,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=BC:24:11:A7:37:BD,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256' -machine 'type=pc+pve0'' failed: got timeout
Logs:
root@ts-test-pve02:~# ceph osd dump
epoch 120
fsid 523a8451-d60c-4dd7-950f-05bf45fa6eb0
created 2024-03-14T19:01:58.624424+0000
modified 2024-03-15T10:55:36.387475+0000
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 17
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client jewel
require_osd_release reef
stretch_mode_enabled false
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 2.99
pool 2 'Ceph_SDD' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 114 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 1.57
removed_snaps_queue [2~1]
max_osd 3
osd.0 up in weight 1 up_from 66 up_thru 118 down_at 63 last_clean_interval [60,62) [v2:10.40.1.12:6800/3107,v1:10.40.1.12:6801/3107] [v2:10.40.1.12:6802/3107,v1:10.40.1.12:6803/3107] exists,up 5d82642b-4c9b-4891-8865-4781dd3c203d
osd.1 up in weight 1 up_from 117 up_thru 117 down_at 116 last_clean_interval [96,114) [v2:10.40.1.60:6810/2220,v1:10.40.1.60:6811/2220] [v2:10.40.1.60:6812/2220,v1:10.40.1.60:6813/2220] exists,up 87d63ecd-39e8-4e57-8b08-e7693980e913
osd.2 up in weight 1 up_from 118 up_thru 118 down_at 117 last_clean_interval [96,114) [v2:10.40.1.60:6802/2215,v1:10.40.1.60:6803/2215] [v2:10.40.1.60:6804/2215,v1:10.40.1.60:6805/2215] exists,up d8650fbe-9a37-4cec-a5da-a13f510fc255
blocklist 10.40.1.66:6802/3861357805 expires 2024-03-16T09:47:01.255471+0000
blocklist 10.40.1.60:6800/1643008314 expires 2024-03-16T09:46:50.740797+0000
blocklist 10.40.1.12:6808/4183412188 expires 2024-03-16T07:47:14.385495+0000
blocklist 10.40.1.66:6803/1212910133 expires 2024-03-16T07:42:59.689316+0000
blocklist 10.40.1.66:6802/1212910133 expires 2024-03-16T07:42:59.689316+0000
blocklist 10.40.1.12:6808/874077101 expires 2024-03-16T09:46:37.759062+0000
blocklist 10.40.1.60:6801/2424504593 expires 2024-03-16T07:42:43.633404+0000
blocklist 10.40.1.60:6800/2424504593 expires 2024-03-16T07:42:43.633404+0000
blocklist 10.40.1.12:6808/1451316929 expires 2024-03-16T07:42:21.852580+0000
blocklist 10.40.1.12:0/3155762683 expires 2024-03-15T19:37:40.735922+0000
blocklist 10.40.1.60:0/700185328 expires 2024-03-15T19:52:24.323906+0000
blocklist 10.40.1.66:6801/86512 expires 2024-03-15T19:52:29.332788+0000
blocklist 10.40.1.12:6809/4183412188 expires 2024-03-16T07:47:14.385495+0000
blocklist 10.40.1.66:6800/86512 expires 2024-03-15T19:52:29.332788+0000
blocklist 10.40.1.60:0/2252979939 expires 2024-03-15T19:52:24.323906+0000
blocklist 10.40.1.12:0/2772258271 expires 2024-03-15T19:37:40.735922+0000
blocklist 10.40.1.12:0/3040310611 expires 2024-03-15T19:37:40.735922+0000
blocklist 10.40.1.60:0/3691436638 expires 2024-03-15T20:03:07.931684+0000
blocklist 10.40.1.60:6817/18985 expires 2024-03-15T20:03:07.931684+0000
blocklist 10.40.1.60:0/1438595603 expires 2024-03-15T20:03:07.931684+0000
blocklist 10.40.1.12:0/3407383225 expires 2024-03-15T19:58:46.092637+0000
# ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 3.6 TiB 3.6 TiB 3.2 GiB 3.2 GiB 0.08
TOTAL 3.6 TiB 3.6 TiB 3.2 GiB 3.2 GiB 0.08
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 1.0 MiB 2 2.0 MiB 0 1.7 TiB
Ceph_SDD 2 128 1.5 GiB 406 3.0 GiB 0.09 1.7 TiB
root@ts-test-pve02:~#
pveceph lspools
┌──────────┬──────┬──────────┬────────┬─────────────┬────────────────┬───────────────────┬──────────────────────────┬───────────────────────────┬─────────────
│ Name │ Size │ Min Size │ PG Num │ min. PG Num │ Optimal PG Num │ PG Autoscale Mode │ PG Autoscale Target Size │ PG Autoscale Target Ratio │ Crush Rule N
╞══════════╪══════╪══════════╪════════╪═════════════╪════════════════╪═══════════════════╪══════════════════════════╪═══════════════════════════╪═════════════
│ .mgr │ 3 │ 2 │ 1 │ 1 │ 1 │ on │ │ │ replicated_r
├──────────┼──────┼──────────┼────────┼─────────────┼────────────────┼───────────────────┼──────────────────────────┼───────────────────────────┼─────────────
│ Ceph_SDD │ 3 │ 2 │ 128 │ │ 32 │ on │ │ │ replicated_r
└──────────┴──────┴──────────┴────────┴─────────────┴────────────────┴───────────────────┴──────────────────────────┴───────────────────────────┴─────────────
root@ts-test-pve02:~#
ceph -w
cluster:
id: 523a8451-d60c-4dd7-950f-05bf45fa6eb0
health: HEALTH_WARN
1/3 mons down, quorum ts-mum1-dsr02,ts-mum1-dsr56
Degraded data redundancy: 408/1224 objects degraded (33.333%), 125 pgs degraded, 129 pgs undersized
3 slow ops, oldest one blocked for 579 sec, daemons [osd.0,mon.ts-mum1-dsr02] have slow ops.
services:
mon: 3 daemons, quorum ts-mum1-dsr02,ts-mum1-dsr56 (age 9m), out of quorum: ts-mum1-dsr50
mgr: ts-mum1-dsr56(active, since 14h), standbys: ts-mum1-dsr02
osd: 3 osds: 3 up (since 45m), 3 in (since 4h)
data:
pools: 2 pools, 129 pgs
objects: 408 objects, 1.5 GiB
usage: 3.2 GiB used, 3.6 TiB / 3.6 TiB avail
pgs: 408/1224 objects degraded (33.333%)
124 active+undersized+degraded
4 active+undersized
1 active+undersized+degraded+laggy
Recently i have started using Ceph cluster for ha puropse.As per the documentation i have created with 3nodes in cluster.For Ceph storage purpose i have added these ssd's
Node1:1TB SSD
Node2:960GB SSD & 2.05TB SSD
Node3:Yet to connect.
on Ceph:
OSD's are created.
created storage pool.
Enabled monitoring too on 3 nodes.
Storage pool is showing across all nodes.
Created VM100 on Node2
Added HA rule as well.
Tested for demo by shutting down node2 and VM100 is migrated to Node1 as well.
But on not able to ssh or console on proxmox and here is the below error observing.
task started by HA resource agent
WARN: iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring
TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -name 'Redhat9,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=b4bc0197-107b-4aef-a7b1-130724eccbfb' -smp '8,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/100.vnc,password=on' -cpu qemu64,+aes,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+pni,+popcnt,+sse4.1,+sse4.2,+ssse3 -m 12048 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=7c5622f6-6e43-4c16-b0d8-ab6c1ed77d8c' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/100.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e6198908baa' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=rbd:Ceph_SDD/vm-100-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Ceph_SDD.keyring,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=BC:24:11:A7:37:BD,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256' -machine 'type=pc+pve0'' failed: got timeout
Logs:
root@ts-test-pve02:~# ceph osd dump
epoch 120
fsid 523a8451-d60c-4dd7-950f-05bf45fa6eb0
created 2024-03-14T19:01:58.624424+0000
modified 2024-03-15T10:55:36.387475+0000
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 17
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client jewel
require_osd_release reef
stretch_mode_enabled false
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 18 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 2.99
pool 2 'Ceph_SDD' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 114 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 1.57
removed_snaps_queue [2~1]
max_osd 3
osd.0 up in weight 1 up_from 66 up_thru 118 down_at 63 last_clean_interval [60,62) [v2:10.40.1.12:6800/3107,v1:10.40.1.12:6801/3107] [v2:10.40.1.12:6802/3107,v1:10.40.1.12:6803/3107] exists,up 5d82642b-4c9b-4891-8865-4781dd3c203d
osd.1 up in weight 1 up_from 117 up_thru 117 down_at 116 last_clean_interval [96,114) [v2:10.40.1.60:6810/2220,v1:10.40.1.60:6811/2220] [v2:10.40.1.60:6812/2220,v1:10.40.1.60:6813/2220] exists,up 87d63ecd-39e8-4e57-8b08-e7693980e913
osd.2 up in weight 1 up_from 118 up_thru 118 down_at 117 last_clean_interval [96,114) [v2:10.40.1.60:6802/2215,v1:10.40.1.60:6803/2215] [v2:10.40.1.60:6804/2215,v1:10.40.1.60:6805/2215] exists,up d8650fbe-9a37-4cec-a5da-a13f510fc255
blocklist 10.40.1.66:6802/3861357805 expires 2024-03-16T09:47:01.255471+0000
blocklist 10.40.1.60:6800/1643008314 expires 2024-03-16T09:46:50.740797+0000
blocklist 10.40.1.12:6808/4183412188 expires 2024-03-16T07:47:14.385495+0000
blocklist 10.40.1.66:6803/1212910133 expires 2024-03-16T07:42:59.689316+0000
blocklist 10.40.1.66:6802/1212910133 expires 2024-03-16T07:42:59.689316+0000
blocklist 10.40.1.12:6808/874077101 expires 2024-03-16T09:46:37.759062+0000
blocklist 10.40.1.60:6801/2424504593 expires 2024-03-16T07:42:43.633404+0000
blocklist 10.40.1.60:6800/2424504593 expires 2024-03-16T07:42:43.633404+0000
blocklist 10.40.1.12:6808/1451316929 expires 2024-03-16T07:42:21.852580+0000
blocklist 10.40.1.12:0/3155762683 expires 2024-03-15T19:37:40.735922+0000
blocklist 10.40.1.60:0/700185328 expires 2024-03-15T19:52:24.323906+0000
blocklist 10.40.1.66:6801/86512 expires 2024-03-15T19:52:29.332788+0000
blocklist 10.40.1.12:6809/4183412188 expires 2024-03-16T07:47:14.385495+0000
blocklist 10.40.1.66:6800/86512 expires 2024-03-15T19:52:29.332788+0000
blocklist 10.40.1.60:0/2252979939 expires 2024-03-15T19:52:24.323906+0000
blocklist 10.40.1.12:0/2772258271 expires 2024-03-15T19:37:40.735922+0000
blocklist 10.40.1.12:0/3040310611 expires 2024-03-15T19:37:40.735922+0000
blocklist 10.40.1.60:0/3691436638 expires 2024-03-15T20:03:07.931684+0000
blocklist 10.40.1.60:6817/18985 expires 2024-03-15T20:03:07.931684+0000
blocklist 10.40.1.60:0/1438595603 expires 2024-03-15T20:03:07.931684+0000
blocklist 10.40.1.12:0/3407383225 expires 2024-03-15T19:58:46.092637+0000
# ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 3.6 TiB 3.6 TiB 3.2 GiB 3.2 GiB 0.08
TOTAL 3.6 TiB 3.6 TiB 3.2 GiB 3.2 GiB 0.08
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
.mgr 1 1 1.0 MiB 2 2.0 MiB 0 1.7 TiB
Ceph_SDD 2 128 1.5 GiB 406 3.0 GiB 0.09 1.7 TiB
root@ts-test-pve02:~#
pveceph lspools
┌──────────┬──────┬──────────┬────────┬─────────────┬────────────────┬───────────────────┬──────────────────────────┬───────────────────────────┬─────────────
│ Name │ Size │ Min Size │ PG Num │ min. PG Num │ Optimal PG Num │ PG Autoscale Mode │ PG Autoscale Target Size │ PG Autoscale Target Ratio │ Crush Rule N
╞══════════╪══════╪══════════╪════════╪═════════════╪════════════════╪═══════════════════╪══════════════════════════╪═══════════════════════════╪═════════════
│ .mgr │ 3 │ 2 │ 1 │ 1 │ 1 │ on │ │ │ replicated_r
├──────────┼──────┼──────────┼────────┼─────────────┼────────────────┼───────────────────┼──────────────────────────┼───────────────────────────┼─────────────
│ Ceph_SDD │ 3 │ 2 │ 128 │ │ 32 │ on │ │ │ replicated_r
└──────────┴──────┴──────────┴────────┴─────────────┴────────────────┴───────────────────┴──────────────────────────┴───────────────────────────┴─────────────
root@ts-test-pve02:~#
ceph -w
cluster:
id: 523a8451-d60c-4dd7-950f-05bf45fa6eb0
health: HEALTH_WARN
1/3 mons down, quorum ts-mum1-dsr02,ts-mum1-dsr56
Degraded data redundancy: 408/1224 objects degraded (33.333%), 125 pgs degraded, 129 pgs undersized
3 slow ops, oldest one blocked for 579 sec, daemons [osd.0,mon.ts-mum1-dsr02] have slow ops.
services:
mon: 3 daemons, quorum ts-mum1-dsr02,ts-mum1-dsr56 (age 9m), out of quorum: ts-mum1-dsr50
mgr: ts-mum1-dsr56(active, since 14h), standbys: ts-mum1-dsr02
osd: 3 osds: 3 up (since 45m), 3 in (since 4h)
data:
pools: 2 pools, 129 pgs
objects: 408 objects, 1.5 GiB
usage: 3.2 GiB used, 3.6 TiB / 3.6 TiB avail
pgs: 408/1224 objects degraded (33.333%)
124 active+undersized+degraded
4 active+undersized
1 active+undersized+degraded+laggy
Attachments
Last edited: