Proxmox Backup zfs hangs up

Gastondc · Oct 25, 2018

Dear,

I have a problem with Proxmox, after the backups it hangs up. I do not know what to check, it's even difficult to connect to the server by ssh. I pass the "dmesg" and the "journalctl -b"

¿Somthing to check? Thanks!

dmesg -T

[Wed Oct 24 20:38:18 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 24 20:38:18 2018] apt-get D 0 16223 15766 0x00000100
[Wed Oct 24 20:38:18 2018] Call Trace:
[Wed Oct 24 20:38:18 2018] __schedule+0x3e0/0x870
[Wed Oct 24 20:38:18 2018] schedule+0x36/0x80
[Wed Oct 24 20:38:18 2018] io_schedule+0x16/0x40
[Wed Oct 24 20:38:18 2018] cv_wait_common+0xb2/0x140 [spl]
[Wed Oct 24 20:38:18 2018] ? wait_woken+0x80/0x80
[Wed Oct 24 20:38:18 2018] __cv_wait_io+0x18/0x20 [spl]
[Wed Oct 24 20:38:18 2018] zio_wait+0x103/0x1b0 [zfs]
[Wed Oct 24 20:38:18 2018] zil_commit.part.14+0x4df/0x8b0 [zfs]
[Wed Oct 24 20:38:18 2018] zil_commit+0x17/0x20 [zfs]
[Wed Oct 24 20:38:18 2018] zvol_write+0x5a2/0x620 [zfs]
[Wed Oct 24 20:38:18 2018] ? avl_find+0x5f/0xa0 [zavl]
[Wed Oct 24 20:38:18 2018] zvol_request+0x24a/0x300 [zfs]
[Wed Oct 24 20:38:18 2018] ? SyS_madvise+0xa20/0xa20
[Wed Oct 24 20:38:18 2018] generic_make_request+0x123/0x2f0
[Wed Oct 24 20:38:18 2018] submit_bio+0x73/0x140
[Wed Oct 24 20:38:18 2018] ? submit_bio+0x73/0x140
[Wed Oct 24 20:38:18 2018] ? get_swap_bio+0xcf/0x100
[Wed Oct 24 20:38:18 2018] __swap_writepage+0x345/0x3b0
[Wed Oct 24 20:38:18 2018] ? __frontswap_store+0x73/0x100
[Wed Oct 24 20:38:18 2018] swap_writepage+0x34/0x90
[Wed Oct 24 20:38:18 2018] pageout.isra.53+0x1e5/0x330
[Wed Oct 24 20:38:18 2018] shrink_page_list+0x955/0xb70
[Wed Oct 24 20:38:18 2018] shrink_inactive_list+0x256/0x5e0
[Wed Oct 24 20:38:18 2018] ? next_arg+0x110/0x110
[Wed Oct 24 20:38:18 2018] shrink_node_memcg+0x365/0x780
[Wed Oct 24 20:38:18 2018] shrink_node+0xe1/0x310
[Wed Oct 24 20:38:18 2018] ? shrink_node+0xe1/0x310
[Wed Oct 24 20:38:18 2018] do_try_to_free_pages+0xef/0x360
[Wed Oct 24 20:38:18 2018] try_to_free_pages+0xf2/0x1b0
[Wed Oct 24 20:38:18 2018] __alloc_pages_slowpath+0x401/0xf10
[Wed Oct 24 20:38:18 2018] __alloc_pages_nodemask+0x25b/0x280
[Wed Oct 24 20:38:18 2018] alloc_pages_current+0x6a/0xe0
[Wed Oct 24 20:38:18 2018] new_slab+0x317/0x690
[Wed Oct 24 20:38:18 2018] ___slab_alloc+0x3c1/0x4e0
[Wed Oct 24 20:38:18 2018] ? spl_kmem_cache_alloc+0x72/0x8c0 [spl]
[Wed Oct 24 20:38:18 2018] ? spl_kmem_cache_alloc+0x72/0x8c0 [spl]
[Wed Oct 24 20:38:18 2018] ? spl_kmem_cache_alloc+0x72/0x8c0 [spl]
[Wed Oct 24 20:38:18 2018] __slab_alloc+0x20/0x40
[Wed Oct 24 20:38:18 2018] ? __slab_alloc+0x20/0x40
[Wed Oct 24 20:38:18 2018] kmem_cache_alloc+0x178/0x1b0
[Wed Oct 24 20:38:18 2018] ? spl_kmem_cache_alloc+0x72/0x8c0 [spl]
[Wed Oct 24 20:38:18 2018] spl_kmem_cache_alloc+0x72/0x8c0 [spl]
[Wed Oct 24 20:38:18 2018] ? arc_buf_access+0x1ad/0x290 [zfs]
[Wed Oct 24 20:38:18 2018] ? dbuf_rele_and_unlock+0x27b/0x4b0 [zfs]
[Wed Oct 24 20:38:18 2018] ? _cond_resched+0x1a/0x50
[Wed Oct 24 20:38:18 2018] ? mutex_lock+0x12/0x40
[Wed Oct 24 20:38:18 2018] zio_create+0x42/0x490 [zfs]
[Wed Oct 24 20:38:18 2018] zio_null+0x2f/0x40 [zfs]
[Wed Oct 24 20:38:18 2018] zio_root+0x1e/0x20 [zfs]
[Wed Oct 24 20:38:18 2018] dmu_buf_hold_array_by_dnode+0xa1/0x470 [zfs]
[Wed Oct 24 20:38:18 2018] ? dnode_hold_impl+0x34d/0xbe0 [zfs]
[Wed Oct 24 20:38:18 2018] dmu_read_impl+0xa9/0x170 [zfs]
[Wed Oct 24 20:38:18 2018] dmu_read+0x58/0x90 [zfs]
[Wed Oct 24 20:38:18 2018] zfs_get_data+0x264/0x2a0 [zfs]
[Wed Oct 24 20:38:18 2018] zil_commit.part.14+0x451/0x8b0 [zfs]
[Wed Oct 24 20:38:18 2018] zil_commit+0x17/0x20 [zfs]
[Wed Oct 24 20:38:18 2018] zpl_writepages+0xd6/0x170 [zfs]
[Wed Oct 24 20:38:18 2018] do_writepages+0x1f/0x70
[Wed Oct 24 20:38:18 2018] __filemap_fdatawrite_range+0xc6/0x100
[Wed Oct 24 20:38:18 2018] filemap_write_and_wait_range+0x35/0x90
[Wed Oct 24 20:38:18 2018] zpl_fsync+0x3c/0xa0 [zfs]
[Wed Oct 24 20:38:18 2018] vfs_fsync_range+0x51/0xb0
[Wed Oct 24 20:38:18 2018] SyS_msync+0x182/0x200
[Wed Oct 24 20:38:18 2018] do_syscall_64+0x73/0x130
[Wed Oct 24 20:38:18 2018] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Wed Oct 24 20:38:18 2018] RIP: 0033:0x7f26565417c0
[Wed Oct 24 20:38:18 2018] RSP: 002b:00007ffd8f1a9228 EFLAGS: 00000246 ORIG_RAX: 000000000000001a
[Wed Oct 24 20:38:18 2018] RAX: ffffffffffffffda RBX: 0000559860ddef60 RCX: 00007f26565417c0
[Wed Oct 24 20:38:18 2018] RDX: 0000000000000004 RSI: 00000000000042f0 RDI: 00007f2654208000
[Wed Oct 24 20:38:18 2018] RBP: 0000000000000000 R08: 00007ffd8f1a4ec7 R09: 0000000000000001
[Wed Oct 24 20:38:18 2018] R10: 0000000000000018 R11: 0000000000000246 R12: 00000000000042f0
[Wed Oct 24 20:38:18 2018] R13: 00000000000042f0 R14: 00007ffd8f1a9270 R15: 00007ffd8f1ad770
[Wed Oct 24 20:40:19 2018] INFO: task kswapd0:68 blocked for more than 120 seconds.
[Wed Oct 24 20:40:19 2018] Tainted: P O 4.15.18-7-pve #1
[Wed Oct 24 20:40:19 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Oct 24 20:40:19 2018] kswapd0 D 0 68 2 0x80000000
[Wed Oct 24 20:40:19 2018] Call Trace:
[Wed Oct 24 20:40:19 2018] __schedule+0x3e0/0x870
[Wed Oct 24 20:40:19 2018] schedule+0x36/0x80
[Wed Oct 24 20:40:19 2018] cv_wait_common+0x11e/0x140 [spl]
[Wed Oct 24 20:40:19 2018] ? wait_woken+0x80/0x80
[Wed Oct 24 20:40:19 2018] __cv_wait+0x15/0x20 [spl]
[Wed Oct 24 20:40:19 2018] zil_commit.part.14+0x86/0x8b0 [zfs]
[Wed Oct 24 20:40:19 2018] ? spl_kmem_free+0x33/0x40 [spl]
[Wed Oct 24 20:40:19 2018] ? zfs_range_unlock+0x1b3/0x2e0 [zfs]
[Wed Oct 24 20:40:19 2018] zil_commit+0x17/0x20 [zfs]
[Wed Oct 24 20:40:19 2018] zvol_write+0x5a2/0x620 [zfs]
[Wed Oct 24 20:40:19 2018] ? avl_find+0x5f/0xa0 [zavl]
[Wed Oct 24 20:40:19 2018] zvol_request+0x24a/0x300 [zfs]
[Wed Oct 24 20:40:19 2018] ? SyS_madvise+0xa20/0xa20
[Wed Oct 24 20:40:19 2018] generic_make_request+0x123/0x2f0
[Wed Oct 24 20:40:19 2018] submit_bio+0x73/0x140
[Wed Oct 24 20:40:19 2018] ? submit_bio+0x73/0x140
[Wed Oct 24 20:40:19 2018] ? get_swap_bio+0xcf/0x100
[Wed Oct 24 20:40:19 2018] __swap_writepage+0x345/0x3b0
[Wed Oct 24 20:40:19 2018] ? __frontswap_store+0x73/0x100
[Wed Oct 24 20:40:19 2018] swap_writepage+0x34/0x90
[Wed Oct 24 20:40:19 2018] pageout.isra.53+0x1e5/0x330
[Wed Oct 24 20:40:19 2018] shrink_page_list+0x955/0xb70
[Wed Oct 24 20:40:19 2018] shrink_inactive_list+0x256/0x5e0
[Wed Oct 24 20:40:19 2018] ? next_arg+0x110/0x110
[Wed Oct 24 20:40:19 2018] shrink_node_memcg+0x365/0x780
[Wed Oct 24 20:40:19 2018] shrink_node+0xe1/0x310
[Wed Oct 24 20:40:19 2018] ? shrink_node+0xe1/0x310
[Wed Oct 24 20:40:19 2018] kswapd+0x386/0x770
[Wed Oct 24 20:40:19 2018] kthread+0x105/0x140
[Wed Oct 24 20:40:19 2018] ? mem_cgroup_shrink_node+0x180/0x180
[Wed Oct 24 20:40:19 2018] ? kthread_create_worker_on_cpu+0x70/0x70
[Wed Oct 24 20:40:19 2018] ret_from_fork+0x35/0x40

journalctl -b

Oct 24 20:46:00 pve vzdump[26961]: VM 200 qmp command failed - VM 200 qmp command 'query-backup' failed - got timeout
Oct 24 21:08:46 pve smartd[28796]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 30 to 28
Oct 24 21:08:46 pve smartd[28796]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 28 to 27
Oct 24 21:38:36 pve sshd[17519]: Accepted password for root from 192.168.90.241 port 43492 ssh2
Oct 24 21:38:36 pve sshd[17519]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 24 21:38:36 pve systemd-logind[28805]: New session 231 of user root.
Oct 24 21:38:36 pve systemd[1]: Started Session 231 of user root.
Oct 24 21:38:47 pve rrdcached[29580]: flushing old values
Oct 24 21:38:47 pve rrdcached[29580]: rotating journals
Oct 24 21:38:47 pve rrdcached[29580]: started new journal /var/lib/rrdcached/journal/rrd.journal.1540427927.549079
Oct 24 21:38:47 pve rrdcached[29580]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1540420727.549127
...skipping...
Oct 24 20:39:08 pve pvestatd[30215]: status update time (6.144 seconds)
Oct 24 20:39:18 pve pvestatd[30215]: status update time (6.167 seconds)
Oct 24 20:39:28 pve pvestatd[30215]: status update time (6.137 seconds)
Oct 24 20:40:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 24 20:40:22 pve kernel: INFO: task kswapd0:68 blocked for more than 120 seconds.
Oct 24 20:40:22 pve kernel: Tainted: P O 4.15.18-7-pve #1
Oct 24 20:40:22 pve kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 24 20:40:22 pve kernel: kswapd0 D 0 68 2 0x80000000
Oct 24 20:40:22 pve kernel: Call Trace:
Oct 24 20:40:22 pve kernel: __schedule+0x3e0/0x870
Oct 24 20:40:22 pve kernel: schedule+0x36/0x80
Oct 24 20:40:22 pve kernel: cv_wait_common+0x11e/0x140 [spl]
Oct 24 20:40:22 pve kernel: ? wait_woken+0x80/0x80
Oct 24 20:40:22 pve kernel: __cv_wait+0x15/0x20 [spl]
Oct 24 20:40:22 pve kernel: zil_commit.part.14+0x86/0x8b0 [zfs]
Oct 24 20:40:22 pve kernel: ? spl_kmem_free+0x33/0x40 [spl]
Oct 24 20:40:22 pve kernel: ? zfs_range_unlock+0x1b3/0x2e0 [zfs]
Oct 24 20:40:22 pve kernel: zil_commit+0x17/0x20 [zfs]
Oct 24 20:40:22 pve kernel: zvol_write+0x5a2/0x620 [zfs]
Oct 24 20:40:22 pve kernel: ? avl_find+0x5f/0xa0 [zavl]
Oct 24 20:40:22 pve kernel: zvol_request+0x24a/0x300 [zfs]
Oct 24 20:40:22 pve kernel: ? SyS_madvise+0xa20/0xa20
Oct 24 20:40:22 pve kernel: generic_make_request+0x123/0x2f0
Oct 24 20:40:22 pve kernel: submit_bio+0x73/0x140
Oct 24 20:40:22 pve kernel: ? submit_bio+0x73/0x140
Oct 24 20:40:22 pve kernel: ? get_swap_bio+0xcf/0x100
Oct 24 20:40:22 pve kernel: __swap_writepage+0x345/0x3b0
Oct 24 20:40:22 pve kernel: ? __frontswap_store+0x73/0x100
Oct 24 20:40:22 pve kernel: swap_writepage+0x34/0x90
Oct 24 20:40:22 pve kernel: pageout.isra.53+0x1e5/0x330
Oct 24 20:40:22 pve kernel: shrink_page_list+0x955/0xb70
Oct 24 20:40:22 pve kernel: shrink_inactive_list+0x256/0x5e0
Oct 24 20:40:22 pve kernel: ? next_arg+0x110/0x110
Oct 24 20:40:22 pve kernel: shrink_node_memcg+0x365/0x780
Oct 24 20:40:22 pve kernel: shrink_node+0xe1/0x310
Oct 24 20:40:22 pve kernel: ? shrink_node+0xe1/0x310
Oct 24 20:40:22 pve kernel: kswapd+0x386/0x770
Oct 24 20:40:22 pve kernel: kthread+0x105/0x140
Oct 24 20:40:22 pve kernel: ? mem_cgroup_shrink_node+0x180/0x180
Oct 24 20:40:22 pve kernel: ? kthread_create_worker_on_cpu+0x70/0x70
Oct 24 20:40:22 pve kernel: ret_from_fork+0x35/0x40
Oct 24 20:46:00 pve vzdump[26961]: VM 200 qmp command failed - VM 200 qmp command 'query-backup' failed - got timeout
Oct 24 21:08:46 pve smartd[28796]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 30 to 28
Oct 24 21:08:46 pve smartd[28796]: Device: /dev/sdc [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 28 to 27
Oct 24 21:38:36 pve sshd[17519]: Accepted password for root from 192.168.90.241 port 43492 ssh2
Oct 24 21:38:36 pve sshd[17519]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 24 21:38:36 pve systemd-logind[28805]: New session 231 of user root.
Oct 24 21:38:36 pve systemd[1]: Started Session 231 of user root.
Oct 24 21:38:47 pve rrdcached[29580]: flushing old values
Oct 24 21:38:47 pve rrdcached[29580]: rotating journals
Oct 24 21:38:47 pve rrdcached[29580]: started new journal /var/lib/rrdcached/journal/rrd.journal.1540427927.549079
Oct 24 21:38:47 pve rrdcached[29580]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1540420727.549127[/CODE]

Gastondc · Oct 25, 2018

Syslog

Oct 24 19:59:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 24 20:00:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 24 20:00:01 pve systemd[1]: Started Proxmox VE replication runner.
Oct 24 20:00:03 pve vzdump[26961]: VM 200 qmp command failed - VM 200 qmp command 'guest-fsfreeze-freeze' failed - got timeout
Oct 24 20:00:13 pve vzdump[26961]: VM 200 qmp command failed - VM 200 qmp command 'guest-fsfreeze-thaw' failed - got timeout
Oct 24 20:01:00 pve systemd[1]: Starting Proxmox VE replication runner...
Oct 24 20:01:10 pve systemd[1]: Started Proxmox VE replication runner.
Oct 24 20:02:00 pve systemd[1]: Starting Proxmox VE replication runner...

guletz · Oct 26, 2018

Hi,

Try first to check if your zfs pool is ok or not. You must run on this server as root this:

zpool scrub rpool

Then you can go to take pause, because this check can take a lot of time. The best is to run this check in a weekend. You can see the status with this woodoo spell :

zpool status -v

When it is finish, you maybe see some errors (fatal or not).

Gastondc · Oct 27, 2018

Guletz,

I ran the tools tha you recommended, the zpool scrup rpool finish at the moment, very fast.

# zpool scrub rpool

# zpool status -v
pool: rpool
state: ONLINE
scan: scrub in progress since Sat Oct 27 14:42:19 2018
8.86M scanned out of 221G at 1008K/s, 63h43m to go
0B repaired, 0.00% done
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0

errors: No known data errors

another idea?

guletz · Oct 30, 2018

Gastondc said:
0B repaired, 0.00% done

It has not finish ....

Is finish when you can see something is like this:

scan: scrub repaired 0 in 4h13m with 0 errors on Sun Oct 28 03:13:47 2018

Gastondc · Oct 30, 2018

You are rigth, Between my bad English and I am dumbfounded, I read badly.

root@pve:~# zpool status -v
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 2h27m with 0 errors on Sat Oct 27 17:10:08 2018
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0

errors: No known data errors

guletz · Oct 30, 2018

OK. So it seems that your zfs is OK.

Can you try to show your output for smartctl:

Can you test your disks with smartctl ?

smartctl -t long /dev/sda
smartctl -t long /dev/sdb

And, next day, try to post your SMART statistics with this:

smartctl -a /dev/sda
smartctl -a /dev/sdb

Gastondc · Nov 2, 2018

Here the report of smartctl

root@pve:~# smartctl -a /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-7-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: MB2000GDUNV
Serial Number: 55F1K1E0F1BA
LU WWN Device Id: 5 000039 64b702c57
Firmware Version: HPG3
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Oct 30 16:27:49 2018 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 278) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0007 100 100 050 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 100 100 002 Pre-fail Always - 6134
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 5446
10 Spin_Retry_Count 0x0013 101 100 030 Pre-fail Always - 0
180 Unknown_HDD_Attribute 0x003b 100 100 001 Pre-fail Always - 0
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 33 (Min/Max 10/45)
196 Reallocated_Event_Count 0x0033 100 100 010 Pre-fail Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 5445 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@pve:~# smartctl -a /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-7-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: MB002000GWFGH
Serial Number: ZDS037Y7
LU WWN Device Id: 5 000c50 0a2e9deee
Firmware Version: HPG2
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Nov 2 12:13:28 2018 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 575) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 216) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x1025) SCT Status supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 083 063 044 Pre-fail Always - 202315896
3 Spin_Up_Time 0x0003 096 096 070 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 082 060 045 Pre-fail Always - 185424557
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 5487
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
180 Unknown_HDD_Attribute 0x003b 100 100 030 Pre-fail Always - 1541042792
194 Temperature_Celsius 0x0022 030 044 000 Old_age Always - 30 (0 12 0 0 0)
196 Reallocated_Event_Count 0x0033 100 100 010 Pre-fail Always - 0

SMART Error Log not supported

SMART Self-test Log not supported

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I don't know how to interprete de Raw_Read_Error_Rate and Seek_Error_Rate . i have problem con el sdb?

Thanks!

guletz · Nov 2, 2018

At this point is not so relevant. Try to run on each hdd a long smart test and then post the reaults again (smartctl -a /dev/hdX)

To run a smart test run this as root:

smartctl -t long /devsdX

where X={a,b}

Run smartctl -a /dev/sdX after 24 h, so you can be sure that the tests are finished.

Good luck

Gastondc · Nov 2, 2018

guletz said:
At this point is not so relevant. Try to run on each hdd a long smart test and then post the reaults again (smartctl -a /dev/hdX)

To run a smart test run this as root:

smartctl -t long /devsdX

where X={a,b}

Run smartctl -a /dev/sdX after 24 h, so you can be sure that the tests are finished.

Good luck

24 hs Before, I run smartctl -t long /dev/sda (and sdb)

This is one of the runs, check the date:

root@pve:~# smartctl -t long /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-7-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 278 minutes for test to complete.
Test will complete after Tue Oct 30 15:27:00 2018

Use smartctl -X to abort test.

You say did not run anyway ?

guletz · Nov 2, 2018

I do not see that you have run a long smart test - my fault. As I read again your post, you can run a manual backup for this VM/CT and in this time you can watch your sawap on the node who is run this guest? You can run this as root:

watch "free -h"

What I only guess is in the time of you run the backup you will run out of memory and you will go on swap. So watch if the swap is increse in this time !

Search

Search

Proxmox Backup zfs hangs up

Gastondc

Well-Known Member

Gastondc

Well-Known Member

guletz

Distinguished Member

Gastondc

Well-Known Member

guletz

Distinguished Member

Gastondc

Well-Known Member

guletz

Distinguished Member

Gastondc

Well-Known Member

guletz

Distinguished Member

Gastondc

Well-Known Member

guletz

Distinguished Member

We value your privacy