Failing SSD - what's the best way forward with migrating?

westrangers

New Member
Jan 2, 2025
17
1
3
Hi everyone, I have my Proxmox running on the old Samsung SSD 860 EVO which seems to be failing. I couldn't create a new container today and running a SMART check on /dev/sdb shows errors and bad blocks. I have some VMs on this disk but they are backuped and not really critical so I don't mind losing them. My main LXC/VM data and other files are on a ZFS mirror (2x Intel D3-S4610, sda+sdc).

I also have an unused Crucial MX500 SSD attached (sde), which I would like to migrate my Proxmox to while I acquire a low capacity enterprise grade SSD with PLP.

Questions:
  1. What would be the best way move Proxmox into safe area e.g. to a new MX500? I see options like 'Move storage', but also I can imagine it would be better to just install new Proxmox on MX500 and recover config. Is there an agreed way to retreive full existing config if I go with fresh install route? If I do 'move storage' is there a chance that some of the corrupted data would carry on into a new drive?
  2. Before doing migration, do I need to disconnect D3-S4610 with ZFS, or is there no risk that something will happen to that?
Appreciate any tips.

Code:
root@mylab:/tank/backups/proxmox# smartctl -a /dev/sdb
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-18-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 870 EVO 500GB
Serial Number:    S62BNF0R928665Y
LU WWN Device Id: 5 002538 f4192f642
Firmware Version: SVT01B6Q
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Apr  4 20:03:23 2026 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x53) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  85) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   078   078   010    Pre-fail  Always       -       122
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       3845
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       783
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       7
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   078   078   010    Pre-fail  Always       -       122
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   078   078   010    Pre-fail  Always       -       122
187 Uncorrectable_Error_Cnt 0x0032   099   099   000    Old_age   Always       -       4
190 Airflow_Temperature_Cel 0x0032   072   056   000    Old_age   Always       -       28
195 ECC_Error_Rate          0x001a   199   199   000    Old_age   Always       -       4
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       21
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       5501151562

SMART Error Log Version: 1
ATA Error Count: 4

Code:
root@mylab:/tank/backups/proxmox# lsblk -o NAME,FSTYPE,SIZE,MODEL,MOUNTPOINT,LABEL
NAME                                FSTYPE        SIZE MODEL                     MOUNTPOINT LABEL
sda                                               1.7T INTEL SSDSC2KG019T8
├─sda1                              zfs_member    1.7T                                      tank
└─sda9                                              8M
sdb                                             465.8G Samsung SSD 870 EVO 500GB
├─sdb1                                           1007K
├─sdb2                              vfat            1G                           /boot/efi
└─sdb3                              LVM2_member 464.8G
  ├─pve-swap                        swap            8G                           [SWAP]
  ├─pve-root                        ext4           96G                           /
  ├─pve-data_tmeta                                3.4G
  │ └─pve-data-tpool                            337.9G
  │   ├─pve-data                                337.9G
  │   ├─pve-vm--109--disk--0                        4M
  │   ├─pve-vm--104--disk--0                       40G
  │   ├─pve-vm--107--disk--0                       40G
  │   ├─pve-vm--104--cloudinit      iso9660         4M                                      config-2
  │   ├─pve-vm--105--cloudinit      iso9660         4M                                      config-2
  │   ├─pve-vm--107--cloudinit      iso9660         4M                                      config-2
  │   ├─pve-vm--106--cloudinit      iso9660         4M                                      config-2
  │   ├─pve-vm--108--cloudinit      iso9660         4M                                      config-2
  │   ├─pve-vm--104--state--Working               6.5G
  │   ├─pve-vm--107--state--Working               8.5G
  │   ├─pve-vm--111--disk--0                        4M
  │   └─pve-vm--111--disk--1                       48G
  └─pve-data_tdata                              337.9G
    └─pve-data-tpool                            337.9G
      ├─pve-data                                337.9G
      ├─pve-vm--109--disk--0                        4M
      ├─pve-vm--104--disk--0                       40G
      ├─pve-vm--107--disk--0                       40G
      ├─pve-vm--104--cloudinit      iso9660         4M                                      config-2
      ├─pve-vm--105--cloudinit      iso9660         4M                                      config-2
      ├─pve-vm--107--cloudinit      iso9660         4M                                      config-2
      ├─pve-vm--106--cloudinit      iso9660         4M                                      config-2
      ├─pve-vm--108--cloudinit      iso9660         4M                                      config-2
      ├─pve-vm--104--state--Working               6.5G
      ├─pve-vm--107--state--Working               8.5G
      ├─pve-vm--111--disk--0                        4M
      └─pve-vm--111--disk--1                       48G
sdc                                               1.7T INTEL SSDSC2KG019T8
├─sdc1                              zfs_member    1.7T                                      tank
└─sdc9                                              8M
sdd                                             465.8G Samsung SSD 860 EVO 500GB
├─sdd1                              ntfs          450M
├─sdd2                              vfat           99M
├─sdd3                                             16M
├─sdd4                              ntfs        292.8G
├─sdd5                              ntfs          521M
└─sdd6                              ntfs        171.9G                                      Disk
sde                                             465.8G CT500MX500SSD1
├─sde1                                            128M
└─sde2                              ntfs        465.6G                                      Disk
zd0                                               8.5G
zd16                                               40G
└─zd16p1                            ntfs           40G                                      Windows 2016
zd32                                               48G
├─zd32p1                            vfat          100M
├─zd32p2                                           16M
├─zd32p3                            ntfs         47.4G
└─zd32p4                            ntfs          522M
zd48                                               40G
└─zd48p1                            ntfs           40G                                      Windows 2019
zd64                                               40G
└─zd64p1                            ntfs           40G                                      Windows 2019
zd80                                               50G
├─zd80p1                            ext4           49G
├─zd80p2                                            1K
└─zd80p5                            swap          975M
zd96                                               40G
└─zd96p1                            ntfs           40G                                      Windows 2019
zd112                                              48G
├─zd112p1                           vfat          100M
├─zd112p2                                          16M
├─zd112p3                           ntfs         47.4G
└─zd112p4                           ntfs          522M
zd128                                              40G
└─zd128p1                           ntfs           40G                                      Windows 2016
zd144                                             8.5G
zd160                                              32G
├─zd160p1                                         512K
├─zd160p2                                           1G
└─zd160p3                           zfs_member     31G                                      pfSense
zd176                                              32G
├─zd176p1                           ext4         30.3G
├─zd176p2                                           1K
└─zd176p5                           swap          1.7G
 
I would perform an offline image clone of the 860 EVO to the MX500 and then make that the boot drive.
 
I would perform an offline image clone of the 860 EVO to the MX500 and then make that the boot drive.
And considering that there are some relocated (123) sectors on current SSD, you don’t see the risk of potentially migrating any of the corrupted data into a new SSD?
 
And considering that there are some relocated (123) sectors on current SSD, you don’t see the risk of potentially migrating any of the corrupted data into a new SSD?
You are using filesystems without checksums (for all data) and therefore cannot be sure that data read without errors is actually correct. I would be concerned and restore everything from know good backups. But since you set it up with only a single copy without checksums before, maybe it's fine for you? Or maybe you can check the data manually by testing?
 
  • Like
Reactions: UdoB and Johannes S
And considering that there are some relocated (123) sectors on current SSD, you don’t see the risk of potentially migrating any of the corrupted data into a new SSD?
The bad sectors are marked in the controller of the old drive, the new will not mark them as bad and the sectors which ave already been relocated should be fine. The question is, will the drive last long enough to complete the cloning process?

If you can't clone or don't trust the data, then the only option is the backup the Proxmox config folder(s), fresh install and then restore the backed up config.