Gluster - ZFS problem

andrea68

Renowned Member
Jun 30, 2010
158
2
83
Hi,

I have 3 proxmox nodes v 6.4.

Every node has 6 SSD drives dedicated to store VM.
Every node is configured with ZFS raidz1-0
On top on this ZFS pool data I built a Gluster Brick.
So I set a gluster volume dispersed with 3 brick (redundancy 1).And it worked flowless for the last 3 years.
Now the problem: I lost one brick (node 3).
Long story short: ZFS fail something, but I cant' bring up the pool no more: "zpool import PVE03" ask me to destroy and reformat from zero cause I/O errors.

Code:
root@pve03 ~ # zpool import PVE03
cannot import 'PVE03': I/O error
    Destroy and re-create the pool from
    a backup source.

So Gluster now sees only 2 brick of 3.
But seems I've lost various VM and this is drive me crazy...

Code:
root@pve01 ~ # gluster volume status
Status of volume: DATASTORE
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick stor01:/PVE01/stor01                  49152     0          Y       1967
Brick stor02:/PVE02/stor02                  49152     0          Y       1991
Brick stor03:/PVE03/stor03                  N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        Y       1976
Self-heal Daemon on stor03                  N/A       N/A        Y       2034
Self-heal Daemon on stor02                  N/A       N/A        Y       2000

Task Status of Volume DATASTORE
------------------------------------------------------------------------------
There are no active volume tasks

Code:
root@pve01 ~ # gluster volume heal DATASTORE info
Brick stor01:/PVE01/stor01
/images/114
/images/108/vm-108-disk-0.qcow2
/images
/images/104/vm-104-disk-0.qcow2
<gfid:979d2546-124f-4d1b-bd3d-b8ccfbcc2800>
<gfid:426e0911-f5c9-4bc9-982b-37c244887d4c>
/images/114/vm-114-disk-0.qcow2
/images/111/vm-111-disk-0.qcow2
/images/109/vm-109-disk-0.qcow2
<gfid:fdc23428-8e45-40c9-856d-1c3011c0153f>
/images/112/vm-112-disk-0.qcow2
/images/102/vm-102-disk-0.qcow2
/images/113/vm-113-disk-0.qcow2
<gfid:b779547b-5e5f-44f7-82a7-302d8864a3b5>
Status: Connected
Number of entries: 14

Brick stor02:/PVE02/stor02
/images/110/vm-110-disk-0.qcow2
/images/114
<gfid:56d65fcb-451d-4288-b7d2-4c9a85fa6f87>
/images
<gfid:3216ab3b-76bb-4da0-8b1b-2e1848ee7283>
<gfid:d642498f-2e2c-4caf-a037-3418f1fc908b>
<gfid:43af1e25-4559-4ac0-af31-e7b19a195e17>
<gfid:2a8f4b90-62f1-476c-b29e-39316361042f>
/images/105/vm-105-disk-0.qcow2
/images/103/vm-103-disk-0.qcow2
<gfid:eff0daaf-dcaf-4faa-8f8f-558cd2a0022b>
<gfid:d907ae20-ce9b-4121-85fe-e983ab8a7d51>
<gfid:1d1b492b-dc1e-4ef4-a7eb-6e474c96427d>
/images/106/vm-106-disk-0.qcow2
Status: Connected
Number of entries: 14

Brick stor03:/PVE03/stor03
Status: Transport endpoint is not connected
Number of entries: -



Do you have some brillant idea to start debug this problem?

Tnx in advance
 
Last edited:
Hi,
I would check the ZFS status `zpool status` this might give a hint/information on the pool's state. And also if you may check the Syslog/journalctl and dmesg looking for any hints that might help to know the issue cause.
 
Code:
root@pve03 ~ # zpool status
no pools available

----

root@pve03 ~ # zpool import
   pool: PVE03
     id: 9958204538773202748
  state: DEGRADED
status: One or more devices contains corrupted data.
 action: The pool can be imported despite missing or damaged devices.  The
    fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
 config:

    PVE03                                              DEGRADED
      raidz1-0                                         DEGRADED
        ata-SAMSUNG_MZ7LM960HCHP-00003_S1YHNX0H410138  UNAVAIL
        ata-SAMSUNG_MZ7LM960HCHP-00003_S1YHNXAG813934  ONLINE
        ata-SAMSUNG_MZ7LM960HCHP-00003_S1YHNYAG600091  ONLINE
        ata-SAMSUNG_MZ7GE960HMHP-00003_S1M7NWAG305222  ONLINE
        ata-SAMSUNG_MZ7LM960HCHP-00003_S1YHNXAH308733  ONLINE
        ata-SAMSUNG_MZ7LM960HCHP-00003_S1YHNX0H408512  ONLINE

Every attempt to import pool fail with I/O error

Log messages on journal:


Code:
Feb 05 11:56:46 pve03 systemd[1]: Removed slice system-zfs\x2dimport.slice.
Feb 05 11:56:46 pve03 systemd[1]: zfs-share.service: Succeeded.
Feb 05 11:56:46 pve03 systemd[1]: zfs-zed.service: Succeeded.
Feb 05 12:01:40 pve03 systemd-modules-load[479]: Inserted module 'zfs'
Feb 05 12:01:41 pve03 systemd[1]: zfs-import@PVE03.service: Main process exited, code=exited, status=1/FAILURE
Feb 05 12:01:41 pve03 systemd[1]: zfs-import@PVE03.service: Failed with result 'exit-code'.
Feb 05 12:02:24 pve03 systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Feb 05 12:02:24 pve03 systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Feb 05 12:39:08 pve03 systemd-modules-load[470]: Inserted module 'zfs'
Feb 05 12:39:09 pve03 systemd[1]: zfs-import@PVE03.service: Main process exited, code=exited, status=1/FAILURE
Feb 05 12:39:09 pve03 systemd[1]: zfs-import@PVE03.service: Failed with result 'exit-code'.
Feb 05 12:39:53 pve03 systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Feb 05 12:39:53 pve03 systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Feb 06 00:24:01 pve03 CRON[12887]: (root) CMD (if [ $(date +%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/trim ]; then /usr/lib/zfs-linux/trim; fi)
Feb 06 07:46:03 pve03 systemd[1]: zfs-share.service: Succeeded.
Feb 06 07:46:03 pve03 systemd[1]: Removed slice system-zfs\x2dimport.slice.
Feb 06 07:46:03 pve03 systemd[1]: zfs-zed.service: Succeeded.
Feb 06 07:48:31 pve03 systemd-modules-load[481]: Inserted module 'zfs'
Feb 06 07:48:32 pve03 systemd[1]: zfs-import@PVE03.service: Main process exited, code=exited, status=1/FAILURE
Feb 06 07:48:32 pve03 systemd[1]: zfs-import@PVE03.service: Failed with result 'exit-code'.
Feb 06 07:48:50 pve03 systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Feb 06 07:48:50 pve03 systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Feb 06 08:32:47 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:32:47 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:32:47 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:32:47 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]
Feb 06 08:34:48 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:34:48 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:34:48 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:34:48 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]
Feb 06 08:36:49 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:36:49 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:36:49 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:36:49 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]
Feb 06 08:38:50 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:38:50 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:38:50 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:38:50 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]
Feb 06 08:40:51 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:40:51 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:40:51 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:40:51 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]
Feb 06 08:42:51 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:42:51 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:42:51 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:42:51 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]
Feb 06 08:44:52 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:44:52 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:44:52 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:44:52 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]
Feb 06 08:46:53 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:46:53 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:46:53 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:46:53 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]
Feb 06 08:48:54 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:48:54 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:48:54 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:48:54 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]
Feb 06 08:50:55 pve03 kernel:  spa_all_configs+0x3b/0x120 [zfs]
Feb 06 08:50:55 pve03 kernel:  zfs_ioc_pool_configs+0x1b/0x70 [zfs]
Feb 06 08:50:55 pve03 kernel:  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
Feb 06 08:50:55 pve03 kernel:  zfsdev_ioctl+0x54/0xe0 [zfs]

In dmesg this seems interesting:


Code:
[    2.339973] ata7: SATA link down (SStatus 0 SControl 300)
[    2.340224] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.340434] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.340658] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.340901] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.341139] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.341364] ata8: SATA link down (SStatus 0 SControl 300)
[    2.341574] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
 
Last edited:
Hello,

Thank you for the output!

Have you checked the health of the disks using the smartctl?

Does the import PVE03 pool using -f flag return the same issue? zpool import -f PVE03
 
Hello,

Thank you for the output!

Have you checked the health of the disks using the smartctl?

All disks pass the smartctl status ...


Schermata 2023-02-06 alle 13.45.46.jpg

/des/sda I intentionally formatted to see if zfs respond as I expect. Otherwise the volume was not degraded (but still unable to mount)

Does the import PVE03 pool using -f flag return the same issue? zpool import -f PVE03


Fail as before.
Also I try:

zpool import -XF -m -f -o PVE03 -> same I/O error
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!