Server is stopped with OOM when using external USB disk

agi

New Member
Jun 2, 2023
5
1
1
Everything worked fine for more than a years, recently without any change I have made (except for upgrading to the latest Proxmox firmware), different unprivileged containers that use CIFS to access a Samba shared run by another LXC started crashing. Either the container stops or gets stuck.
The external disk is connected to the Proxmos host via USB and the Samba container accesses it using a mount point.


The crash is consistent. For one example, have two torrent servers, each on a different LXC, saving stuff on this disk. A few minutes after this server starts downloading and utilizing the disk, the server stops with the following error messages.


Again, everything worked very well until recently.
What have I tried:
1. Thought the disk might got clunky - replaced the disk.
2. Excluded the USB disk driver to from using uas driver to usb-storage.
3. Gave all three LXCs more memory.


dmesg:

Code:
[70840.400635] Tasks state (memory values in pages):
[70840.400637] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[70840.400644] Out of memory and no killable processes...
[70840.842348] kworker/u24:5 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=3, oom_score_adj=0
[70840.842358] CPU: 10 PID: 733310 Comm: kworker/u24:5 Tainted: P           O       6.5.11-7-pve #1
[70840.842363] Hardware name: Apple Inc. MacBookPro16,1/Mac-E1008331FDC96864, BIOS 1916.40.8.0.0 (iBridge: 20.16.411.0.0,0) 09/29/2022
[70840.842367] Workqueue: writeback wb_workfn (flush-cifs-4)
[70840.842373] Call Trace:
[70840.842375]  <TASK>
[70840.842378]  dump_stack_lvl+0x48/0x70
[70840.842384]  dump_stack+0x10/0x20
[70840.842387]  dump_header+0x4f/0x260
[70840.842391]  out_of_memory+0x3c0/0x560
[70840.842394]  mem_cgroup_out_of_memory+0x145/0x170
[70840.842398]  try_charge_memcg+0x737/0x820
[70840.842402]  ? cgroup_rstat_updated+0xc8/0xe0
[70840.842406]  mem_cgroup_charge_skmem+0x40/0xf0
[70840.842410]  __sk_mem_raise_allocated+0xcc/0x4f0
[70840.842415]  ? alloc_pages+0x90/0x160
[70840.842418]  __sk_mem_schedule+0x38/0x60
[70840.842422]  tcp_wmem_schedule+0x41/0x90
[70840.842425]  tcp_sendmsg_locked+0x598/0xe30
[70840.842429]  tcp_sendmsg+0x2c/0x50
[70840.842432]  inet_sendmsg+0x42/0x80
[70840.842436]  sock_sendmsg+0x10d/0x140
[70840.842442]  smb_send_kvec+0x84/0x190 [cifs]
[70840.842494]  ? release_sock+0x8f/0xb0
[70840.842498]  __smb_send_rqst+0x427/0x700 [cifs]
[70840.842548]  smb_send_rqst+0x184/0x1e0 [cifs]
[70840.842593]  ? psi_task_switch+0xd3/0x240
[70840.842601]  cifs_call_async+0x144/0x330 [cifs]
[70840.842647]  ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[70840.842696]  smb2_async_writev+0x44e/0x6c0 [cifs]
[70840.842743]  ? cifs_extend_writeback+0x42a/0x5a0 [cifs]
[70840.842785]  ? __pfx_cifs_writedata_release+0x10/0x10 [cifs]
[70840.842830]  cifs_writepages_region+0xba0/0xcd0 [cifs]
[70840.842873]  ? cifs_writepages_region+0xba0/0xcd0 [cifs]
[70840.842919]  cifs_writepages+0xa5/0x110 [cifs]
[70840.842962]  do_writepages+0xcd/0x1e0
[70840.842966]  __writeback_single_inode+0x44/0x370
[70840.842970]  writeback_sb_inodes+0x211/0x510
[70840.842973]  ? blk_mq_run_hw_queue+0x154/0x210
[70840.842979]  __writeback_inodes_wb+0x54/0x100
[70840.842982]  ? queue_io+0x115/0x120
[70840.842985]  wb_writeback+0x2a8/0x320
[70840.842988]  wb_workfn+0x368/0x4d0
[70840.842991]  ? __schedule+0x405/0x1450
[70840.842995]  ? add_timer+0x20/0x40
[70840.843000]  process_one_work+0x23b/0x450
[70840.843004]  worker_thread+0x50/0x3f0
[70840.843008]  ? __pfx_worker_thread+0x10/0x10
[70840.843011]  kthread+0xef/0x120
[70840.843014]  ? __pfx_kthread+0x10/0x10
[70840.843017]  ret_from_fork+0x44/0x70
[70840.843021]  ? __pfx_kthread+0x10/0x10
[70840.843024]  ret_from_fork_asm+0x1b/0x30
[70840.843030]  </TASK>
[70840.843040] memory: usage 524548kB, limit 524288kB, failcnt 26749
[70840.843043] swap: usage 860kB, limit 524288kB, failcnt 0
[70840.843045] Memory cgroup stats for /lxc/120:
[70840.843177] anon 0
[70840.843180] file 473980928
[70840.843182] kernel 10801152
[70840.843183] kernel_stack 16384
[70840.843185] pagetables 0
[70840.843186] sec_pagetables 0
[70840.843188] percpu 640584
[70840.843190] sock 52355072
[70840.843191] vmalloc 36864
[70840.843192] shmem 0
[70840.843194] zswap 0
[70840.843195] zswapped 0
[70840.843196] file_mapped 0
[70840.843198] file_dirty 470175744
[70840.843200] file_writeback 3788800
[70840.843201] swapcached 0
[70840.843203] anon_thp 0
[70840.843204] file_thp 0
[70840.843205] shmem_thp 0
[70840.843207] inactive_anon 0
[70840.843208] active_anon 0
[70840.843210] inactive_file 473964544
[70840.843211] active_file 16384
[70840.843213] unevictable 0
[70840.843214] slab_reclaimable 9573000
[70840.843216] slab_unreclaimable 436736
[70840.843218] slab 10009736
[70840.843219] workingset_refault_anon 50099
[70840.843221] workingset_refault_file 2019572
[70840.843223] workingset_activate_anon 10031
[70840.843225] workingset_activate_file 578129
[70840.843226] workingset_restore_anon 10031
[70840.843228] workingset_restore_file 9940
[70840.843230] workingset_nodereclaim 2572
[70840.843232] pgscan 3274298
[70840.843233] pgsteal 3105556
[70840.843235] pgscan_kswapd 3048
[70840.843236] pgscan_direct 3271250
[70840.843238] pgscan_khugepaged 0
[70840.843239] pgsteal_kswapd 2694
[70840.843241] pgsteal_direct 3102862
[70840.843247] pgsteal_khugepaged 0
[70840.843249] pgfault 220552
[70840.843251] pgmajfault 37529
[70840.843252] pgrefill 3194145233
[70840.843254] pgactivate 126202
[70840.843255] pgdeactivate 0
[70840.843257] pglazyfree 0
[70840.843258] pglazyfreed 0
[70840.843260] zswpin 0
[70840.843261] zswpout 0
[70840.843263] thp_fault_alloc 0
[70840.843264] thp_collapse_alloc 0
[70840.843266] Tasks state (memory values in pages):
[70840.843268] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[70840.843275] Out of memory and no killable processes...
[70841.285381] kworker/u24:5 invoked oom-killer: gfp_mask=0x8cc0(GFP_KERNEL|__GFP_NOFAIL), order=3, oom_score_adj=0
[70841.285390] CPU: 10 PID: 733310 Comm: kworker/u24:5 Tainted: P           O       6.5.11-7-pve #1
[70841.285395] Hardware name: Apple Inc. MacBookPro16,1/Mac-E1008331FDC96864, BIOS 1916.40.8.0.0 (iBridge: 20.16.411.0.0,0) 09/29/2022
[70841.285399] Workqueue: writeback wb_workfn (flush-cifs-4)
[70841.285406] Call Trace:
[70841.285408]  <TASK>
[70841.285411]  dump_stack_lvl+0x48/0x70
[70841.285416]  dump_stack+0x10/0x20
[70841.285419]  dump_header+0x4f/0x260
[70841.285423]  out_of_memory+0x3c0/0x560
[70841.285426]  mem_cgroup_out_of_memory+0x145/0x170
[70841.285431]  try_charge_memcg+0x737/0x820
[70841.285435]  ? cgroup_rstat_updated+0xc8/0xe0
[70841.285439]  mem_cgroup_charge_skmem+0x40/0xf0
[70841.285443]  __sk_mem_raise_allocated+0x44c/0x4f0
[70841.285447]  ? alloc_pages+0x90/0x160
[70841.285450]  __sk_mem_schedule+0x38/0x60
[70841.285454]  tcp_wmem_schedule+0x41/0x90
[70841.285457]  tcp_sendmsg_locked+0x598/0xe30
[70841.285461]  tcp_sendmsg+0x2c/0x50
[70841.285464]  inet_sendmsg+0x42/0x80
[70841.285467]  sock_sendmsg+0x10d/0x140
[70841.285473]  smb_send_kvec+0x84/0x190 [cifs]
[70841.285547]  ? release_sock+0x8f/0xb0
[70841.285552]  __smb_send_rqst+0x427/0x700 [cifs]
[70841.285600]  smb_send_rqst+0x184/0x1e0 [cifs]
[70841.285645]  ? psi_task_switch+0xd3/0x240
[70841.285653]  cifs_call_async+0x144/0x330 [cifs]
[70841.285698]  ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[70841.285746]  smb2_async_writev+0x44e/0x6c0 [cifs]
[70841.285792]  ? cifs_extend_writeback+0x42a/0x5a0 [cifs]
[70841.285834]  ? __pfx_cifs_writedata_release+0x10/0x10 [cifs]
[70841.285879]  cifs_writepages_region+0xba0/0xcd0 [cifs]
[70841.285920]  ? cifs_writepages_region+0xba0/0xcd0 [cifs]
[70841.285965]  cifs_writepages+0xa5/0x110 [cifs]
[70841.286007]  do_writepages+0xcd/0x1e0
[70841.286011]  __writeback_single_inode+0x44/0x370
[70841.286015]  writeback_sb_inodes+0x211/0x510
[70841.286018]  ? blk_mq_run_hw_queue+0x154/0x210
[70841.286023]  __writeback_inodes_wb+0x54/0x100
[70841.286026]  ? queue_io+0x115/0x120
[70841.286029]  wb_writeback+0x2a8/0x320
[70841.286033]  wb_workfn+0x368/0x4d0
[70841.286035]  ? __schedule+0x405/0x1450
[70841.286039]  ? add_timer+0x20/0x40
[70841.286043]  process_one_work+0x23b/0x450
[70841.286075]  worker_thread+0x50/0x3f0
[70841.286079]  ? __pfx_worker_thread+0x10/0x10
[70841.286102]  kthread+0xef/0x120
[70841.286106]  ? __pfx_kthread+0x10/0x10
[70841.286110]  ret_from_fork+0x44/0x70
[70841.286114]  ? __pfx_kthread+0x10/0x10
[70841.286117]  ret_from_fork_asm+0x1b/0x30
[70841.286137]  </TASK>
[70841.286217] memory: usage 524548kB, limit 524288kB, failcnt 26768
[70841.286222] swap: usage 860kB, limit 524288kB, failcnt 0
[70841.286226] Memory cgroup stats for /lxc/120:
[70841.286467] anon 0
[70841.286472] file 473980928
[70841.286475] kernel 10801152
[70841.286477] kernel_stack 16384
[70841.286479] pagetables 0
[70841.286481] sec_pagetables 0
[70841.286483] percpu 640584
[70841.286485] sock 52355072
[70841.286486] vmalloc 36864
[70841.286490] shmem 0
[70841.286492] zswap 0
[70841.286494] zswapped 0
[70841.286496] file_mapped 0
[70841.286498] file_dirty 470175744
[70841.286505] file_writeback 3788800
[70841.286507] swapcached 0
[70841.286510] anon_thp 0
[70841.286512] file_thp 0
[70841.286514] shmem_thp 0
[70841.286516] inactive_anon 0
[70841.286518] active_anon 0
[70841.286526] inactive_file 473964544
[70841.286529] active_file 16384
[70841.286531] unevictable 0
[70841.286533] slab_reclaimable 9573000
[70841.286536] slab_unreclaimable 436736
[70841.286545] slab 10009736
[70841.286547] workingset_refault_anon 50099
[70841.286550] workingset_refault_file 2019572
[70841.286552] workingset_activate_anon 10031
[70841.286555] workingset_activate_file 578129
[70841.286557] workingset_restore_anon 10031
[70841.286560] workingset_restore_file 9940
[70841.286562] workingset_nodereclaim 2572
[70841.286565] pgscan 3274370
[70841.286567] pgsteal 3105556
[70841.286573] pgscan_kswapd 3048
[70841.286575] pgscan_direct 3271322
[70841.286578] pgscan_khugepaged 0
[70841.286580] pgsteal_kswapd 2694
[70841.286583] pgsteal_direct 3102862
[70841.286585] pgsteal_khugepaged 0
[70841.286587] pgfault 220552
[70841.286589] pgmajfault 37529
[70841.286591] pgrefill 3198310937
[70841.286598] pgactivate 126274
[70841.286600] pgdeactivate 0
[70841.286602] pglazyfree 0
[70841.286604] pglazyfreed 0
[70841.286606] zswpin 0
[70841.286608] zswpout 0
[70841.286610] thp_fault_alloc 0
[70841.286613] thp_collapse_alloc 0
[70841.286615] Tasks state (memory values in pages):
[70841.286618] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[70841.286638] Out of memory and no killable processes...
[70849.249401] vmbr0: port 11(veth120i0) entered disabled state
[70849.249645] veth120i0 (unregistering): left allmulticast mode
[70849.249650] veth120i0 (unregistering): left promiscuous mode
[70849.249654] vmbr0: port 11(veth120i0) entered disabled state
[70849.506818] audit: type=1400 audit(1704204892.276:39): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-120_</var/lib/lxc>" pid=738693 comm="apparmor_parser"
[71035.528026] CIFS: VFS: \\192.168.1.251 has not responded in 180 seconds. Reconnecting...
[71036.651932] EXT4-fs (dm-19): unmounting filesystem c3b7d1e7-84d0-44a8-b806-e81fb3c14f43.
[74400.219860] EXT4-fs (dm-19): mounted filesystem c3b7d1e7-84d0-44a8-b806-e81fb3c14f43 r/w with ordered data mode. Quota mode: none.
[74400.480855] audit: type=1400 audit(1704208443.324:40): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-120_</var/lib/lxc>" pid=773477 comm="apparmor_parser"


lsusb
Code:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 008: ID 2109:d101 VIA Labs, Inc. USB Keyboard
Bus 001 Device 005: ID 1a40:0801 Terminus Technology Inc. USB 2.0 Hub
Bus 001 Device 003: ID 2109:2813 VIA Labs, Inc. VL813 Hub
Bus 001 Device 009: ID 18d1:9302 Google Inc.
Bus 001 Device 007: ID 10c4:ea60 Silicon Labs CP210x UART Bridge
Bus 001 Device 006: ID 10c4:ea60 Silicon Labs CP210x UART Bridge
Bus 001 Device 004: ID 174c:1153 ASMedia Technology Inc. ASM1153 SATA 3Gb/s bridge
Bus 001 Device 002: ID 1a86:8095 QinHeng Electronics USB Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 005: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 004 Device 004: ID 2537:1081 Norelsys NS1081
Bus 004 Device 003: ID 152d:0578 JMicron Technology Corp. / JMicron USA Technology Corp. JMS578 SATA 6Gb/s
Bus 004 Device 002: ID 2109:0813 VIA Labs, Inc. VL813 Hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub


lsusb -t
Code:
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
    |__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
        |__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        |__ Port 3: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
        |__ Port 4: Dev 5, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 480M
        |__ Port 2: Dev 6, If 0, Class=Vendor Specific Class, Driver=usbfs, 12M
        |__ Port 3: Dev 7, If 0, Class=Vendor Specific Class, Driver=usbfs, 12M
        |__ Port 4: Dev 9, If 0, Class=Vendor Specific Class, Driver=usbfs, 480M
    |__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 1: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M
            |__ Port 2: Dev 8, If 0, Class=Human Interface Device, Driver=usbhid, 480M


journalctl on affected server while the crash starts:
Code:
Jan 02 17:44:41 torrent transmission-daemon[118]: [2024-01-02 17:44:41.222] Couldn't connect socket 23 to 2001:4451:8374:4600:d981:98d9:1c1b:2831, port 6881 (>
Jan 02 17:44:41 torrent transmission-daemon[118]: [2024-01-02 17:44:41.222] Couldn't connect socket 23 to 2001:4451:8374:4600:d981:98d9:1c1b:2831, port 6881 (>
Jan 02 17:44:41 torrent transmission-daemon[118]: [2024-01-02 17:44:41.222] Couldn't connect socket 23 to 2001:4451:8374:4600:1e:ab57:d948:1ded, port 6881 (er>
Jan 02 17:44:41 torrent transmission-daemon[118]: [2024-01-02 17:44:41.222] Couldn't connect socket 23 to 2001:4451:8374:4600:1e:ab57:d948:1ded, port 6881 (er>
Jan 02 17:45:42 torrent transmission-daemon[118]: [2024-01-02 17:45:42.218] Couldn't connect socket 28 to 2a03:ec00:b1a1:59a:19e4:ae2c:9378:3e85, port 1293 (e>
Jan 02 17:45:42 torrent transmission-daemon[118]: [2024-01-02 17:45:42.219] Couldn't connect socket 28 to 2a03:ec00:b1a3:16a8:a461:cb89:5b3c:b598, port 40228 >
Jan 02 17:45:43 torrent transmission-daemon[118]: [2024-01-02 17:45:43.219] Couldn't connect socket 34 to 2a03:ec00:b9a1:3b41:b0a9:5831:48e6:d08e, port 1 (err>
Jan 02 17:45:44 torrent transmission-daemon[118]: [2024-01-02 17:45:44.219] Couldn't connect socket 37 to 2a03:ec00:b199:3043:b8fb:2002:2428:6de5, port 36585 >
Jan 02 17:45:44 torrent transmission-daemon[118]: [2024-01-02 17:45:44.219] Couldn't connect socket 37 to 2a03:ec00:b1a1:3c3:e068:7d7e:4162:948b, port 6881 (e>
Jan 02 17:45:44 torrent transmission-daemon[118]: [2024-01-02 17:45:44.219] Couldn't connect socket 37 to 2a03:ec00:b1a3:16a8:17c2:29a6:b6df:e3d9, port 40228 >
Jan 02 17:45:45 torrent transmission-daemon[118]: [2024-01-02 17:45:45.219] Couldn't connect socket 38 to
 
Last edited:
Hi i had some problem with the JMIcron JMS578 controller that had hw energy saving mode and emulated serial number. I got randomly disconected disks and i had to reboot do get them to restart.

After disable Spindown and correct the naming with a firmware update i think its right.


JMIcron JMS578
Firmware generator tool:
https://zibri.github.io/JMS579/index2.html

I used the windows tool on this page to update:
https://gbatemp.net/threads/how-to-...nclosure-black-screen-lock-music-stop.569158/

There are some linux tool for fw update but i did not get it to work, think i got som arm binery.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!