Everything worked fine for more than a years, recently without any change I have made (except for upgrading to the latest Proxmox firmware), different unprivileged containers that use CIFS to access a Samba shared run by another LXC started crashing. Either the container stops or gets stuck.
The external disk is connected to the Proxmos host via USB and the Samba container accesses it using a mount point.
The crash is consistent. For one example, have two torrent servers, each on a different LXC, saving stuff on this disk. A few minutes after this server starts downloading and utilizing the disk, the server stops with the following error messages.
Again, everything worked very well until recently.
What have I tried:
1. Thought the disk might got clunky - replaced the disk.
2. Excluded the USB disk driver to from using uas driver to usb-storage.
3. Gave all three LXCs more memory.
dmesg:
lsusb
lsusb -t
journalctl on affected server while the crash starts:
The external disk is connected to the Proxmos host via USB and the Samba container accesses it using a mount point.
The crash is consistent. For one example, have two torrent servers, each on a different LXC, saving stuff on this disk. A few minutes after this server starts downloading and utilizing the disk, the server stops with the following error messages.
Again, everything worked very well until recently.
What have I tried:
1. Thought the disk might got clunky - replaced the disk.
2. Excluded the USB disk driver to from using uas driver to usb-storage.
3. Gave all three LXCs more memory.
dmesg:
Code:
[70840.400635] Tasks state (memory values in pages):
[70840.400637] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[70840.400644] Out of memory and no killable processes...
[70840.842348] kworker/u24:5 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=3, oom_score_adj=0
[70840.842358] CPU: 10 PID: 733310 Comm: kworker/u24:5 Tainted: P O 6.5.11-7-pve #1
[70840.842363] Hardware name: Apple Inc. MacBookPro16,1/Mac-E1008331FDC96864, BIOS 1916.40.8.0.0 (iBridge: 20.16.411.0.0,0) 09/29/2022
[70840.842367] Workqueue: writeback wb_workfn (flush-cifs-4)
[70840.842373] Call Trace:
[70840.842375] <TASK>
[70840.842378] dump_stack_lvl+0x48/0x70
[70840.842384] dump_stack+0x10/0x20
[70840.842387] dump_header+0x4f/0x260
[70840.842391] out_of_memory+0x3c0/0x560
[70840.842394] mem_cgroup_out_of_memory+0x145/0x170
[70840.842398] try_charge_memcg+0x737/0x820
[70840.842402] ? cgroup_rstat_updated+0xc8/0xe0
[70840.842406] mem_cgroup_charge_skmem+0x40/0xf0
[70840.842410] __sk_mem_raise_allocated+0xcc/0x4f0
[70840.842415] ? alloc_pages+0x90/0x160
[70840.842418] __sk_mem_schedule+0x38/0x60
[70840.842422] tcp_wmem_schedule+0x41/0x90
[70840.842425] tcp_sendmsg_locked+0x598/0xe30
[70840.842429] tcp_sendmsg+0x2c/0x50
[70840.842432] inet_sendmsg+0x42/0x80
[70840.842436] sock_sendmsg+0x10d/0x140
[70840.842442] smb_send_kvec+0x84/0x190 [cifs]
[70840.842494] ? release_sock+0x8f/0xb0
[70840.842498] __smb_send_rqst+0x427/0x700 [cifs]
[70840.842548] smb_send_rqst+0x184/0x1e0 [cifs]
[70840.842593] ? psi_task_switch+0xd3/0x240
[70840.842601] cifs_call_async+0x144/0x330 [cifs]
[70840.842647] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[70840.842696] smb2_async_writev+0x44e/0x6c0 [cifs]
[70840.842743] ? cifs_extend_writeback+0x42a/0x5a0 [cifs]
[70840.842785] ? __pfx_cifs_writedata_release+0x10/0x10 [cifs]
[70840.842830] cifs_writepages_region+0xba0/0xcd0 [cifs]
[70840.842873] ? cifs_writepages_region+0xba0/0xcd0 [cifs]
[70840.842919] cifs_writepages+0xa5/0x110 [cifs]
[70840.842962] do_writepages+0xcd/0x1e0
[70840.842966] __writeback_single_inode+0x44/0x370
[70840.842970] writeback_sb_inodes+0x211/0x510
[70840.842973] ? blk_mq_run_hw_queue+0x154/0x210
[70840.842979] __writeback_inodes_wb+0x54/0x100
[70840.842982] ? queue_io+0x115/0x120
[70840.842985] wb_writeback+0x2a8/0x320
[70840.842988] wb_workfn+0x368/0x4d0
[70840.842991] ? __schedule+0x405/0x1450
[70840.842995] ? add_timer+0x20/0x40
[70840.843000] process_one_work+0x23b/0x450
[70840.843004] worker_thread+0x50/0x3f0
[70840.843008] ? __pfx_worker_thread+0x10/0x10
[70840.843011] kthread+0xef/0x120
[70840.843014] ? __pfx_kthread+0x10/0x10
[70840.843017] ret_from_fork+0x44/0x70
[70840.843021] ? __pfx_kthread+0x10/0x10
[70840.843024] ret_from_fork_asm+0x1b/0x30
[70840.843030] </TASK>
[70840.843040] memory: usage 524548kB, limit 524288kB, failcnt 26749
[70840.843043] swap: usage 860kB, limit 524288kB, failcnt 0
[70840.843045] Memory cgroup stats for /lxc/120:
[70840.843177] anon 0
[70840.843180] file 473980928
[70840.843182] kernel 10801152
[70840.843183] kernel_stack 16384
[70840.843185] pagetables 0
[70840.843186] sec_pagetables 0
[70840.843188] percpu 640584
[70840.843190] sock 52355072
[70840.843191] vmalloc 36864
[70840.843192] shmem 0
[70840.843194] zswap 0
[70840.843195] zswapped 0
[70840.843196] file_mapped 0
[70840.843198] file_dirty 470175744
[70840.843200] file_writeback 3788800
[70840.843201] swapcached 0
[70840.843203] anon_thp 0
[70840.843204] file_thp 0
[70840.843205] shmem_thp 0
[70840.843207] inactive_anon 0
[70840.843208] active_anon 0
[70840.843210] inactive_file 473964544
[70840.843211] active_file 16384
[70840.843213] unevictable 0
[70840.843214] slab_reclaimable 9573000
[70840.843216] slab_unreclaimable 436736
[70840.843218] slab 10009736
[70840.843219] workingset_refault_anon 50099
[70840.843221] workingset_refault_file 2019572
[70840.843223] workingset_activate_anon 10031
[70840.843225] workingset_activate_file 578129
[70840.843226] workingset_restore_anon 10031
[70840.843228] workingset_restore_file 9940
[70840.843230] workingset_nodereclaim 2572
[70840.843232] pgscan 3274298
[70840.843233] pgsteal 3105556
[70840.843235] pgscan_kswapd 3048
[70840.843236] pgscan_direct 3271250
[70840.843238] pgscan_khugepaged 0
[70840.843239] pgsteal_kswapd 2694
[70840.843241] pgsteal_direct 3102862
[70840.843247] pgsteal_khugepaged 0
[70840.843249] pgfault 220552
[70840.843251] pgmajfault 37529
[70840.843252] pgrefill 3194145233
[70840.843254] pgactivate 126202
[70840.843255] pgdeactivate 0
[70840.843257] pglazyfree 0
[70840.843258] pglazyfreed 0
[70840.843260] zswpin 0
[70840.843261] zswpout 0
[70840.843263] thp_fault_alloc 0
[70840.843264] thp_collapse_alloc 0
[70840.843266] Tasks state (memory values in pages):
[70840.843268] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[70840.843275] Out of memory and no killable processes...
[70841.285381] kworker/u24:5 invoked oom-killer: gfp_mask=0x8cc0(GFP_KERNEL|__GFP_NOFAIL), order=3, oom_score_adj=0
[70841.285390] CPU: 10 PID: 733310 Comm: kworker/u24:5 Tainted: P O 6.5.11-7-pve #1
[70841.285395] Hardware name: Apple Inc. MacBookPro16,1/Mac-E1008331FDC96864, BIOS 1916.40.8.0.0 (iBridge: 20.16.411.0.0,0) 09/29/2022
[70841.285399] Workqueue: writeback wb_workfn (flush-cifs-4)
[70841.285406] Call Trace:
[70841.285408] <TASK>
[70841.285411] dump_stack_lvl+0x48/0x70
[70841.285416] dump_stack+0x10/0x20
[70841.285419] dump_header+0x4f/0x260
[70841.285423] out_of_memory+0x3c0/0x560
[70841.285426] mem_cgroup_out_of_memory+0x145/0x170
[70841.285431] try_charge_memcg+0x737/0x820
[70841.285435] ? cgroup_rstat_updated+0xc8/0xe0
[70841.285439] mem_cgroup_charge_skmem+0x40/0xf0
[70841.285443] __sk_mem_raise_allocated+0x44c/0x4f0
[70841.285447] ? alloc_pages+0x90/0x160
[70841.285450] __sk_mem_schedule+0x38/0x60
[70841.285454] tcp_wmem_schedule+0x41/0x90
[70841.285457] tcp_sendmsg_locked+0x598/0xe30
[70841.285461] tcp_sendmsg+0x2c/0x50
[70841.285464] inet_sendmsg+0x42/0x80
[70841.285467] sock_sendmsg+0x10d/0x140
[70841.285473] smb_send_kvec+0x84/0x190 [cifs]
[70841.285547] ? release_sock+0x8f/0xb0
[70841.285552] __smb_send_rqst+0x427/0x700 [cifs]
[70841.285600] smb_send_rqst+0x184/0x1e0 [cifs]
[70841.285645] ? psi_task_switch+0xd3/0x240
[70841.285653] cifs_call_async+0x144/0x330 [cifs]
[70841.285698] ? __pfx_smb2_writev_callback+0x10/0x10 [cifs]
[70841.285746] smb2_async_writev+0x44e/0x6c0 [cifs]
[70841.285792] ? cifs_extend_writeback+0x42a/0x5a0 [cifs]
[70841.285834] ? __pfx_cifs_writedata_release+0x10/0x10 [cifs]
[70841.285879] cifs_writepages_region+0xba0/0xcd0 [cifs]
[70841.285920] ? cifs_writepages_region+0xba0/0xcd0 [cifs]
[70841.285965] cifs_writepages+0xa5/0x110 [cifs]
[70841.286007] do_writepages+0xcd/0x1e0
[70841.286011] __writeback_single_inode+0x44/0x370
[70841.286015] writeback_sb_inodes+0x211/0x510
[70841.286018] ? blk_mq_run_hw_queue+0x154/0x210
[70841.286023] __writeback_inodes_wb+0x54/0x100
[70841.286026] ? queue_io+0x115/0x120
[70841.286029] wb_writeback+0x2a8/0x320
[70841.286033] wb_workfn+0x368/0x4d0
[70841.286035] ? __schedule+0x405/0x1450
[70841.286039] ? add_timer+0x20/0x40
[70841.286043] process_one_work+0x23b/0x450
[70841.286075] worker_thread+0x50/0x3f0
[70841.286079] ? __pfx_worker_thread+0x10/0x10
[70841.286102] kthread+0xef/0x120
[70841.286106] ? __pfx_kthread+0x10/0x10
[70841.286110] ret_from_fork+0x44/0x70
[70841.286114] ? __pfx_kthread+0x10/0x10
[70841.286117] ret_from_fork_asm+0x1b/0x30
[70841.286137] </TASK>
[70841.286217] memory: usage 524548kB, limit 524288kB, failcnt 26768
[70841.286222] swap: usage 860kB, limit 524288kB, failcnt 0
[70841.286226] Memory cgroup stats for /lxc/120:
[70841.286467] anon 0
[70841.286472] file 473980928
[70841.286475] kernel 10801152
[70841.286477] kernel_stack 16384
[70841.286479] pagetables 0
[70841.286481] sec_pagetables 0
[70841.286483] percpu 640584
[70841.286485] sock 52355072
[70841.286486] vmalloc 36864
[70841.286490] shmem 0
[70841.286492] zswap 0
[70841.286494] zswapped 0
[70841.286496] file_mapped 0
[70841.286498] file_dirty 470175744
[70841.286505] file_writeback 3788800
[70841.286507] swapcached 0
[70841.286510] anon_thp 0
[70841.286512] file_thp 0
[70841.286514] shmem_thp 0
[70841.286516] inactive_anon 0
[70841.286518] active_anon 0
[70841.286526] inactive_file 473964544
[70841.286529] active_file 16384
[70841.286531] unevictable 0
[70841.286533] slab_reclaimable 9573000
[70841.286536] slab_unreclaimable 436736
[70841.286545] slab 10009736
[70841.286547] workingset_refault_anon 50099
[70841.286550] workingset_refault_file 2019572
[70841.286552] workingset_activate_anon 10031
[70841.286555] workingset_activate_file 578129
[70841.286557] workingset_restore_anon 10031
[70841.286560] workingset_restore_file 9940
[70841.286562] workingset_nodereclaim 2572
[70841.286565] pgscan 3274370
[70841.286567] pgsteal 3105556
[70841.286573] pgscan_kswapd 3048
[70841.286575] pgscan_direct 3271322
[70841.286578] pgscan_khugepaged 0
[70841.286580] pgsteal_kswapd 2694
[70841.286583] pgsteal_direct 3102862
[70841.286585] pgsteal_khugepaged 0
[70841.286587] pgfault 220552
[70841.286589] pgmajfault 37529
[70841.286591] pgrefill 3198310937
[70841.286598] pgactivate 126274
[70841.286600] pgdeactivate 0
[70841.286602] pglazyfree 0
[70841.286604] pglazyfreed 0
[70841.286606] zswpin 0
[70841.286608] zswpout 0
[70841.286610] thp_fault_alloc 0
[70841.286613] thp_collapse_alloc 0
[70841.286615] Tasks state (memory values in pages):
[70841.286618] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[70841.286638] Out of memory and no killable processes...
[70849.249401] vmbr0: port 11(veth120i0) entered disabled state
[70849.249645] veth120i0 (unregistering): left allmulticast mode
[70849.249650] veth120i0 (unregistering): left promiscuous mode
[70849.249654] vmbr0: port 11(veth120i0) entered disabled state
[70849.506818] audit: type=1400 audit(1704204892.276:39): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-120_</var/lib/lxc>" pid=738693 comm="apparmor_parser"
[71035.528026] CIFS: VFS: \\192.168.1.251 has not responded in 180 seconds. Reconnecting...
[71036.651932] EXT4-fs (dm-19): unmounting filesystem c3b7d1e7-84d0-44a8-b806-e81fb3c14f43.
[74400.219860] EXT4-fs (dm-19): mounted filesystem c3b7d1e7-84d0-44a8-b806-e81fb3c14f43 r/w with ordered data mode. Quota mode: none.
[74400.480855] audit: type=1400 audit(1704208443.324:40): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-120_</var/lib/lxc>" pid=773477 comm="apparmor_parser"
lsusb
Code:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 008: ID 2109:d101 VIA Labs, Inc. USB Keyboard
Bus 001 Device 005: ID 1a40:0801 Terminus Technology Inc. USB 2.0 Hub
Bus 001 Device 003: ID 2109:2813 VIA Labs, Inc. VL813 Hub
Bus 001 Device 009: ID 18d1:9302 Google Inc.
Bus 001 Device 007: ID 10c4:ea60 Silicon Labs CP210x UART Bridge
Bus 001 Device 006: ID 10c4:ea60 Silicon Labs CP210x UART Bridge
Bus 001 Device 004: ID 174c:1153 ASMedia Technology Inc. ASM1153 SATA 3Gb/s bridge
Bus 001 Device 002: ID 1a86:8095 QinHeng Electronics USB Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 005: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
Bus 004 Device 004: ID 2537:1081 Norelsys NS1081
Bus 004 Device 003: ID 152d:0578 JMicron Technology Corp. / JMicron USA Technology Corp. JMS578 SATA 6Gb/s
Bus 004 Device 002: ID 2109:0813 VIA Labs, Inc. VL813 Hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
lsusb -t
Code:
/: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
/: Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
|__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
|__ Port 1: Dev 3, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
|__ Port 3: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
|__ Port 4: Dev 5, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=usb-storage, 480M
|__ Port 2: Dev 6, If 0, Class=Vendor Specific Class, Driver=usbfs, 12M
|__ Port 3: Dev 7, If 0, Class=Vendor Specific Class, Driver=usbfs, 12M
|__ Port 4: Dev 9, If 0, Class=Vendor Specific Class, Driver=usbfs, 480M
|__ Port 4: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 1: Dev 5, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 2: Dev 8, If 0, Class=Human Interface Device, Driver=usbhid, 480M
journalctl on affected server while the crash starts:
Code:
Jan 02 17:44:41 torrent transmission-daemon[118]: [2024-01-02 17:44:41.222] Couldn't connect socket 23 to 2001:4451:8374:4600:d981:98d9:1c1b:2831, port 6881 (>
Jan 02 17:44:41 torrent transmission-daemon[118]: [2024-01-02 17:44:41.222] Couldn't connect socket 23 to 2001:4451:8374:4600:d981:98d9:1c1b:2831, port 6881 (>
Jan 02 17:44:41 torrent transmission-daemon[118]: [2024-01-02 17:44:41.222] Couldn't connect socket 23 to 2001:4451:8374:4600:1e:ab57:d948:1ded, port 6881 (er>
Jan 02 17:44:41 torrent transmission-daemon[118]: [2024-01-02 17:44:41.222] Couldn't connect socket 23 to 2001:4451:8374:4600:1e:ab57:d948:1ded, port 6881 (er>
Jan 02 17:45:42 torrent transmission-daemon[118]: [2024-01-02 17:45:42.218] Couldn't connect socket 28 to 2a03:ec00:b1a1:59a:19e4:ae2c:9378:3e85, port 1293 (e>
Jan 02 17:45:42 torrent transmission-daemon[118]: [2024-01-02 17:45:42.219] Couldn't connect socket 28 to 2a03:ec00:b1a3:16a8:a461:cb89:5b3c:b598, port 40228 >
Jan 02 17:45:43 torrent transmission-daemon[118]: [2024-01-02 17:45:43.219] Couldn't connect socket 34 to 2a03:ec00:b9a1:3b41:b0a9:5831:48e6:d08e, port 1 (err>
Jan 02 17:45:44 torrent transmission-daemon[118]: [2024-01-02 17:45:44.219] Couldn't connect socket 37 to 2a03:ec00:b199:3043:b8fb:2002:2428:6de5, port 36585 >
Jan 02 17:45:44 torrent transmission-daemon[118]: [2024-01-02 17:45:44.219] Couldn't connect socket 37 to 2a03:ec00:b1a1:3c3:e068:7d7e:4162:948b, port 6881 (e>
Jan 02 17:45:44 torrent transmission-daemon[118]: [2024-01-02 17:45:44.219] Couldn't connect socket 37 to 2a03:ec00:b1a3:16a8:17c2:29a6:b6df:e3d9, port 40228 >
Jan 02 17:45:45 torrent transmission-daemon[118]: [2024-01-02 17:45:45.219] Couldn't connect socket 38 to
Last edited: