Hi,
I have 3 proxmox nodes v 6.4.
Every node has 6 SSD drives dedicated to store VM.
Every node is configured with ZFS raidz1-0
On top on this ZFS pool data I built a Gluster Brick.
So I set a gluster volume dispersed with 3 brick (redundancy 1).And it worked flowless for the last 3 years.
Now the problem: I lost one brick (node 3).
Long story short: ZFS fail something, but I cant' bring up the pool no more: "zpool import PVE03" ask me to destroy and reformat from zero cause I/O errors.
So Gluster now sees only 2 brick of 3.
But seems I've lost various VM and this is drive me crazy...
Do you have some brillant idea to start debug this problem?
Tnx in advance
I have 3 proxmox nodes v 6.4.
Every node has 6 SSD drives dedicated to store VM.
Every node is configured with ZFS raidz1-0
On top on this ZFS pool data I built a Gluster Brick.
So I set a gluster volume dispersed with 3 brick (redundancy 1).And it worked flowless for the last 3 years.
Now the problem: I lost one brick (node 3).
Long story short: ZFS fail something, but I cant' bring up the pool no more: "zpool import PVE03" ask me to destroy and reformat from zero cause I/O errors.
Code:
root@pve03 ~ # zpool import PVE03
cannot import 'PVE03': I/O error
Destroy and re-create the pool from
a backup source.
So Gluster now sees only 2 brick of 3.
But seems I've lost various VM and this is drive me crazy...
Code:
root@pve01 ~ # gluster volume status
Status of volume: DATASTORE
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick stor01:/PVE01/stor01 49152 0 Y 1967
Brick stor02:/PVE02/stor02 49152 0 Y 1991
Brick stor03:/PVE03/stor03 N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 1976
Self-heal Daemon on stor03 N/A N/A Y 2034
Self-heal Daemon on stor02 N/A N/A Y 2000
Task Status of Volume DATASTORE
------------------------------------------------------------------------------
There are no active volume tasks
Code:
root@pve01 ~ # gluster volume heal DATASTORE info
Brick stor01:/PVE01/stor01
/images/114
/images/108/vm-108-disk-0.qcow2
/images
/images/104/vm-104-disk-0.qcow2
<gfid:979d2546-124f-4d1b-bd3d-b8ccfbcc2800>
<gfid:426e0911-f5c9-4bc9-982b-37c244887d4c>
/images/114/vm-114-disk-0.qcow2
/images/111/vm-111-disk-0.qcow2
/images/109/vm-109-disk-0.qcow2
<gfid:fdc23428-8e45-40c9-856d-1c3011c0153f>
/images/112/vm-112-disk-0.qcow2
/images/102/vm-102-disk-0.qcow2
/images/113/vm-113-disk-0.qcow2
<gfid:b779547b-5e5f-44f7-82a7-302d8864a3b5>
Status: Connected
Number of entries: 14
Brick stor02:/PVE02/stor02
/images/110/vm-110-disk-0.qcow2
/images/114
<gfid:56d65fcb-451d-4288-b7d2-4c9a85fa6f87>
/images
<gfid:3216ab3b-76bb-4da0-8b1b-2e1848ee7283>
<gfid:d642498f-2e2c-4caf-a037-3418f1fc908b>
<gfid:43af1e25-4559-4ac0-af31-e7b19a195e17>
<gfid:2a8f4b90-62f1-476c-b29e-39316361042f>
/images/105/vm-105-disk-0.qcow2
/images/103/vm-103-disk-0.qcow2
<gfid:eff0daaf-dcaf-4faa-8f8f-558cd2a0022b>
<gfid:d907ae20-ce9b-4121-85fe-e983ab8a7d51>
<gfid:1d1b492b-dc1e-4ef4-a7eb-6e474c96427d>
/images/106/vm-106-disk-0.qcow2
Status: Connected
Number of entries: 14
Brick stor03:/PVE03/stor03
Status: Transport endpoint is not connected
Number of entries: -
Do you have some brillant idea to start debug this problem?
Tnx in advance
Last edited: