__pfx_process_timeout -

openaspace

Active Member
Sep 16, 2019
486
11
38
Italy
Hello I have continuous small freeze of the system with this log.
Proxmox run on zfs mirrored ssd on hba pci express card. this happen when there is filesystem intensive load.. but this not habppen if i move the intensive workload to the sas zfs mirrored/striped . What is happening?

Thank you

Linux 6.2.16-12-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-12 (2023-09-04T13:21Z)

Code:
Sep 06 00:45:15 prx1 kernel: INFO: task txg_sync:525 blocked for more than 120 seconds.
Sep 06 00:45:19 prx1 kernel:       Tainted: P           O       6.2.16-12-pve #1
Sep 06 00:45:19 prx1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 06 00:45:20 prx1 kernel: task:txg_sync        state:D stack:0     pid:525   ppid:2      flags:0x00004000
Sep 06 00:45:20 prx1 kernel: Call Trace:
Sep 06 00:45:20 prx1 kernel:  <TASK>
Sep 06 00:45:21 prx1 kernel:  __schedule+0x402/0x1510
Sep 06 00:45:21 prx1 kernel:  schedule+0x63/0x110
Sep 06 00:45:22 prx1 kernel:  schedule_timeout+0x95/0x170
Sep 06 00:45:22 prx1 kernel:  ? __pfx_process_timeout+0x10/0x10
Sep 06 00:45:22 prx1 kernel:  io_schedule_timeout+0x51/0x80
Sep 06 00:45:22 prx1 kernel:  __cv_timedwait_common+0x140/0x180 [spl]
Sep 06 00:45:22 prx1 kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Sep 06 00:45:22 prx1 kernel:  __cv_timedwait_io+0x19/0x30 [spl]
Sep 06 00:45:22 prx1 kernel:  zio_wait+0x13d/0x300 [zfs]
Sep 06 00:45:22 prx1 kernel:  dsl_pool_sync+0xce/0x4e0 [zfs]
Sep 06 00:45:22 prx1 kernel:  spa_sync+0x560/0x1010 [zfs]
Sep 06 00:45:22 prx1 kernel:  ? spa_txg_history_init_io+0x120/0x130 [zfs]
Sep 06 00:45:23 prx1 kernel:  txg_sync_thread+0x21a/0x3e0 [zfs]
Sep 06 00:45:23 prx1 kernel:  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
Sep 06 00:45:23 prx1 kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
Sep 06 00:45:23 prx1 kernel:  thread_generic_wrapper+0x5f/0x70 [spl]
Sep 06 00:45:23 prx1 kernel:  kthread+0xe9/0x110
Sep 06 00:45:23 prx1 kernel:  ? __pfx_kthread+0x10/0x10
Sep 06 00:45:23 prx1 kernel:  ret_from_fork+0x2c/0x50
Sep 06 00:45:23 prx1 kernel:  </TASK>
Sep 06 00:45:23 prx1 pve-firewall[5821]: firewall update time (7.990 seconds)
 
Last edited:
Can you please post more information about your disk Setup? Do you use SWAP on your system? What does arc_stat say? What does zpool status look like? Did you check that your RAM is working properly?
 

Code:
root@prx1:~# free -h
               total        used        free      shared  buff/cache   available
Mem:            31Gi        13Gi        16Gi        64Mi       1.7Gi        17Gi
Swap:             0B          0B          0B

root@prx1:~# cat /proc/swaps
Filename                                Type            Size            Used            Priority

root@prx1:~# swapon --show


after 16hours of ram tests...
2023-09-09 23.36.43.jpg



Code:
root@prx1:~# zpool status rpool
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:02:25 with 0 errors on Tue Sep  5 02:00:01 2023
config:

        NAME                                                      STATE     READ WRITE CKSUM
        rpool                                                     ONLINE       0     0     0
          mirror-0                                                ONLINE       0     0     0
            ata-SanDisk_SSD_PLUS_240GB_191730800683-part3         ONLINE       0     0     0
            ata-SPCC_Solid_State_Disk_B57B079917B300189124-part3  ONLINE       0     0     0

errors: No known data errors

Code:
root@prx1:~# zpool status zfs-sas
  pool: zfs-sas
 state: ONLINE
  scan: scrub repaired 0B in 08:37:32 with 0 errors on Tue Sep  5 10:43:55 2023
config:

        NAME                        STATE     READ WRITE CKSUM
        zfs-sas                     ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            scsi-35000c50062e2f95b  ONLINE       0     0     0
            scsi-35000c50062e2f3f7  ONLINE       0     0     0
          mirror-1                  ONLINE       0     0     0
            scsi-35000c50062e2f18f  ONLINE       0     0     0
            scsi-35000c50062e3329f  ONLINE       0     0     0

errors: No known data errors

Code:
root@prx1:~# zpool status zfs-sata1
  pool: zfs-sata1
 state: ONLINE
  scan: scrub repaired 0B in 03:12:30 with 0 errors on Sun Aug 13 03:43:32 2023
config:

        NAME                                  STATE     READ WRITE CKSUM
        zfs-sata1                             ONLINE       0     0     0
          mirror-0                            ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_96N9RNAAS  ONLINE       0     0     0
            ata-TOSHIBA_DT01ACA300_Y6NLJNRAS  ONLINE       0     0     0

errors: No known data errors

Code:
root@prx1:~# arcstat 1
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  size     c  avail
23:42:33     0     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:34    80     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:35   324     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:36    13     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:37    33     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:38   271     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:39   252     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:40   986    29      3    29    3     0    0     0    0  7.6G  8.0G    15G
23:42:41   159     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:42     2     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:43   356     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:44   569     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:45   252     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:46    85     0      0     0    0     0    0     0    0  7.6G  8.0G    15G
23:42:47   791     3      1     3    1     0    0     3    2  7.6G  8.0G    15G
23:42:48  1.3K    45      4    45    4     0    0    29    5  7.6G  8.0G    15G
23:42:49  1.3K    21      2    20    2     1  100    14    3  7.6G  8.0G    15G
23:42:50  2.7K   156      6   129    5    27  100    40    3  7.6G  8.0G    15G
23:42:51  5.2K   253      5   222    5    31   33    72    3  7.7G  8.0G    15G
23:42:52  3.7K   260      8   244    7    16  100    61    3  7.7G  8.0G    15G
23:42:53  8.1K   382      5   327    5    55  100    98    2  7.7G  8.0G    15G
23:42:54   17K   610      4   528    4    82   15   158    2  7.7G  8.0G    15G
23:42:55   12K   506      4   421    4    85   14   138    2  7.7G  8.0G    15G
23:42:56  9.9K   293      3   198    3    95  100    36    1  7.7G  8.0G    15G
23:42:57   11K   158      2   107    1    51   58    85    1  7.7G  8.0G    15G
23:42:58   14K   183      2   169    2    14  100    55    3  7.7G  8.0G    15G
23:42:59  4.8K   251      6   223    5    28  100    82    3  7.8G  8.0G    15G
23:43:00  1.9K   161      9   161    9     0    0    23    3  7.8G  8.0G    15G
23:43:01  1.3K    67      6    67    6     0    0    25    3  7.8G  8.0G    15G

Code:
root@prx1:~# fio /root/fio.conf
test: (g=0): rw=randrw, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=64
...
fio-3.33
Starting 4 processes
Jobs: 4 (f=4): [m(4)][100.0%][r=2846MiB/s,w=934MiB/s][r=711,w=233 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2758559: Fri Sep  8 15:42:45 2023
  read: IOPS=167, BW=670MiB/s (702MB/s)(39.2GiB/60004msec)
    slat (usec): min=1320, max=82078, avg=4052.70, stdev=1472.26
    clat (usec): min=23, max=548390, avg=280998.55, stdev=43922.49
     lat (msec): min=3, max=553, avg=285.05, stdev=44.41
    clat percentiles (msec):
     |  1.00th=[  226],  5.00th=[  243], 10.00th=[  249], 20.00th=[  253],
     | 30.00th=[  257], 40.00th=[  266], 50.00th=[  271], 60.00th=[  275],
     | 70.00th=[  288], 80.00th=[  300], 90.00th=[  334], 95.00th=[  363],
     | 99.00th=[  472], 99.50th=[  506], 99.90th=[  531], 99.95th=[  542],
     | 99.99th=[  550]
   bw (  KiB/s): min=122880, max=876544, per=24.59%, avg=683211.63, stdev=112333.37, samples=119
   iops        : min=   30, max=  214, avg=166.54, stdev=27.40, samples=119
  write: IOPS=56, BW=224MiB/s (235MB/s)(13.1GiB/60004msec); 0 zone resets
    slat (usec): min=3068, max=32822, avg=5592.28, stdev=1792.55
    clat (msec): min=3, max=532, avg=280.97, stdev=42.37
     lat (msec): min=7, max=538, avg=286.57, stdev=43.00
    clat percentiles (msec):
     |  1.00th=[  226],  5.00th=[  245], 10.00th=[  249], 20.00th=[  253],
     | 30.00th=[  257], 40.00th=[  266], 50.00th=[  271], 60.00th=[  279],
     | 70.00th=[  288], 80.00th=[  300], 90.00th=[  334], 95.00th=[  363],
     | 99.00th=[  451], 99.50th=[  481], 99.90th=[  527], 99.95th=[  531],
     | 99.99th=[  535]
   bw (  KiB/s): min=32768, max=351552, per=24.16%, avg=228060.19, stdev=48081.96, samples=119
   iops        : min=    8, max=   85, avg=55.34, stdev=11.66, samples=119
  lat (usec)   : 50=0.01%
  lat (msec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.06%, 100=0.10%
  lat (msec)   : 250=15.09%, 500=84.25%, 750=0.47%
  cpu          : usr=4.13%, sys=91.67%, ctx=11775, majf=0, minf=15
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=10047,3361,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64
test: (groupid=0, jobs=1): err= 0: pid=2758560: Fri Sep  8 15:42:45 2023
  read: IOPS=170, BW=683MiB/s (717MB/s)(40.0GiB/60002msec)
    slat (usec): min=1393, max=41838, avg=3918.34, stdev=1315.29
    clat (usec): min=23, max=604238, avg=273349.08, stdev=49207.07
     lat (msec): min=2, max=610, avg=277.27, stdev=49.73
    clat percentiles (msec):
     |  1.00th=[  203],  5.00th=[  226], 10.00th=[  241], 20.00th=[  249],
     | 30.00th=[  253], 40.00th=[  257], 50.00th=[  264], 60.00th=[  271],
     | 70.00th=[  275], 80.00th=[  288], 90.00th=[  313], 95.00th=[  363],
     | 99.00th=[  485], 99.50th=[  535], 99.90th=[  600], 99.95th=[  600],
     | 99.99th=[  600]
   bw (  KiB/s): min=81920, max=933888, per=25.06%, avg=696195.03, stdev=119067.95, samples=119
   iops        : min=   20, max=  228, avg=169.73, stdev=29.04, samples=119
  write: IOPS=59, BW=236MiB/s (248MB/s)(13.8GiB/60002msec); 0 zone resets
    slat (usec): min=2882, max=49333, avg=5459.48, stdev=2074.90
    clat (msec): min=7, max=596, avg=272.88, stdev=47.85
     lat (msec): min=10, max=606, avg=278.34, stdev=48.74
    clat percentiles (msec):
     |  1.00th=[  205],  5.00th=[  226], 10.00th=[  241], 20.00th=[  249],
     | 30.00th=[  253], 40.00th=[  257], 50.00th=[  262], 60.00th=[  271],
     | 70.00th=[  275], 80.00th=[  288], 90.00th=[  313], 95.00th=[  359],
     | 99.00th=[  485], 99.50th=[  523], 99.90th=[  592], 99.95th=[  600],
     | 99.99th=[  600]
   bw (  KiB/s): min=40960, max=409600, per=25.59%, avg=241599.24, stdev=51924.35, samples=119
   iops        : min=   10, max=  100, avg=58.68, stdev=12.66, samples=119
  lat (usec)   : 50=0.01%
  lat (msec)   : 4=0.01%, 10=0.01%, 20=0.03%, 50=0.05%, 100=0.09%
  lat (msec)   : 250=21.73%, 500=77.26%, 750=0.81%
  cpu          : usr=4.62%, sys=90.89%, ctx=11628, majf=0, minf=14
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=10250,3542,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64
test: (groupid=0, jobs=1): err= 0: pid=2758561: Fri Sep  8 15:42:45 2023
  read: IOPS=169, BW=679MiB/s (712MB/s)(39.8GiB/60002msec)
    slat (usec): min=1351, max=71337, avg=3965.19, stdev=1649.73
    clat (usec): min=25, max=610934, avg=275656.11, stdev=50919.96
     lat (msec): min=3, max=615, avg=279.62, stdev=51.44
    clat percentiles (msec):
     |  1.00th=[  209],  5.00th=[  232], 10.00th=[  245], 20.00th=[  251],
     | 30.00th=[  253], 40.00th=[  257], 50.00th=[  262], 60.00th=[  268],
     | 70.00th=[  275], 80.00th=[  292], 90.00th=[  326], 95.00th=[  368],
     | 99.00th=[  531], 99.50th=[  567], 99.90th=[  600], 99.95th=[  609],
     | 99.99th=[  609]
   bw (  KiB/s): min=122880, max=891145, per=24.91%, avg=692204.44, stdev=121792.39, samples=119
   iops        : min=   30, max=  217, avg=168.72, stdev=29.71, samples=119
  write: IOPS=58, BW=232MiB/s (244MB/s)(13.6GiB/60002msec); 0 zone resets
    slat (usec): min=2583, max=41498, avg=5473.55, stdev=1993.95
    clat (msec): min=3, max=608, avg=275.96, stdev=51.83
     lat (msec): min=8, max=615, avg=281.43, stdev=52.58
    clat percentiles (msec):
     |  1.00th=[  209],  5.00th=[  232], 10.00th=[  245], 20.00th=[  251],
     | 30.00th=[  253], 40.00th=[  257], 50.00th=[  262], 60.00th=[  271],
     | 70.00th=[  275], 80.00th=[  292], 90.00th=[  330], 95.00th=[  368],
     | 99.00th=[  531], 99.50th=[  558], 99.90th=[  592], 99.95th=[  600],
     | 99.99th=[  609]
   bw (  KiB/s): min=49152, max=352256, per=25.19%, avg=237781.35, stdev=53436.02, samples=119
   iops        : min=   12, max=   86, avg=57.73, stdev=13.07, samples=119
  lat (usec)   : 50=0.01%
  lat (msec)   : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.05%, 100=0.10%
  lat (msec)   : 250=20.91%, 500=77.59%, 750=1.30%
  cpu          : usr=4.33%, sys=91.03%, ctx=11661, majf=0, minf=14
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=10186,3485,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64
test: (groupid=0, jobs=1): err= 0: pid=2758562: Fri Sep  8 15:42:45 2023
  read: IOPS=170, BW=681MiB/s (714MB/s)(39.9GiB/60001msec)
    slat (usec): min=1151, max=94065, avg=3970.43, stdev=1644.05
    clat (usec): min=13, max=585914, avg=275584.93, stdev=51115.53
     lat (usec): min=1164, max=591057, avg=279555.37, stdev=51735.80
    clat percentiles (msec):
     |  1.00th=[  207],  5.00th=[  232], 10.00th=[  245], 20.00th=[  249],
     | 30.00th=[  253], 40.00th=[  255], 50.00th=[  257], 60.00th=[  264],
     | 70.00th=[  275], 80.00th=[  296], 90.00th=[  351], 95.00th=[  376],
     | 99.00th=[  481], 99.50th=[  527], 99.90th=[  575], 99.95th=[  584],
     | 99.99th=[  584]
   bw (  KiB/s): min=130810, max=899321, per=24.99%, avg=694293.22, stdev=123196.27, samples=119
   iops        : min=   31, max=  219, avg=169.26, stdev=30.07, samples=119
  write: IOPS=57, BW=230MiB/s (241MB/s)(13.4GiB/60001msec); 0 zone resets
    slat (usec): min=1606, max=53706, avg=5496.92, stdev=1998.93
    clat (usec): min=1179, max=582594, avg=276297.84, stdev=53049.74
     lat (msec): min=2, max=593, avg=281.79, stdev=53.80
    clat percentiles (msec):
     |  1.00th=[  207],  5.00th=[  232], 10.00th=[  245], 20.00th=[  249],
     | 30.00th=[  251], 40.00th=[  255], 50.00th=[  257], 60.00th=[  264],
     | 70.00th=[  275], 80.00th=[  292], 90.00th=[  355], 95.00th=[  380],
     | 99.00th=[  498], 99.50th=[  527], 99.90th=[  575], 99.95th=[  584],
     | 99.99th=[  584]
   bw (  KiB/s): min=57229, max=359728, per=24.79%, avg=234008.38, stdev=50462.01, samples=119
   iops        : min=   13, max=   87, avg=56.80, stdev=12.29, samples=119
  lat (usec)   : 20=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.03%, 20=0.04%, 50=0.07%
  lat (msec)   : 100=0.09%, 250=23.53%, 500=75.31%, 750=0.91%
  cpu          : usr=4.39%, sys=91.34%, ctx=11020, majf=0, minf=15
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=10221,3443,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=2713MiB/s (2845MB/s), 670MiB/s-683MiB/s (702MB/s-717MB/s), io=159GiB (171GB), run=60001-60004msec
  WRITE: bw=922MiB/s (967MB/s), 224MiB/s-236MiB/s (235MB/s-248MB/s), io=54.0GiB (58.0GB), run=60001-60004msec
 
Last edited:
I use Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

and after set
Code:
  GNU nano 7.2 /etc/kernel/cmdline 
root=ZFS=rpool/ROOT/pve-1 boot=zfs mpt3sas.max_queue_depth=10000

the host speed read/write is lower, but the system appear more stable..

I can choose better value for mpt3sas.max_queue_depth=?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!