Setup:
Microserver Gen10 Plus with 48GB ECC (Memtest is OK)
4x4TB ZFS inside Proxmox 8.1.10
Latest TrueNAS Scale inside VM with 32GB RAM, 30GB System Disk and 3,2TB data Disk
Inside TrueNAS:
1 Dataset on ZFS (encrypted)
Exposed to LAN over SMB
When I copy over files, it crashes - the likelihood increases with larger files (starting from files of 2GB) but it does not happen always
Seems to happen more often with Windows and MacOS Clients than Debian based clients
What could be wrong?
When the crash happens the logs do look as follows
Microserver Gen10 Plus with 48GB ECC (Memtest is OK)
4x4TB ZFS inside Proxmox 8.1.10
Latest TrueNAS Scale inside VM with 32GB RAM, 30GB System Disk and 3,2TB data Disk
Inside TrueNAS:
1 Dataset on ZFS (encrypted)
Exposed to LAN over SMB
When I copy over files, it crashes - the likelihood increases with larger files (starting from files of 2GB) but it does not happen always
Seems to happen more often with Windows and MacOS Clients than Debian based clients
What could be wrong?
When the crash happens the logs do look as follows
Code:
Mar 28 22:27:14 pve pvedaemon[1662]: worker exit
Mar 28 22:27:14 pve pvedaemon[1660]: worker 1662 finished
Mar 28 22:27:14 pve pvedaemon[1660]: starting 1 worker(s)
Mar 28 22:27:14 pve pvedaemon[1660]: worker 1143811 started
Mar 28 22:29:42 pve pveproxy[1138713]: worker exit
Mar 28 22:29:42 pve pveproxy[1678]: worker 1138713 finished
Mar 28 22:29:42 pve pveproxy[1678]: starting 1 worker(s)
Mar 28 22:29:42 pve pveproxy[1678]: worker 1144290 started
Mar 28 22:36:22 pve pvedaemon[1142148]: <root@pam> successful auth for user 'root@pam'
Mar 28 22:41:09 pve pveproxy[1141573]: worker exit
Mar 28 22:41:09 pve pveproxy[1678]: worker 1141573 finished
Mar 28 22:41:09 pve pveproxy[1678]: starting 1 worker(s)
Mar 28 22:41:09 pve pveproxy[1678]: worker 1146432 started
Mar 28 23:02:54 pve smartd[1166]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 82 to 83
Mar 28 23:02:55 pve smartd[1166]: Device: /dev/sdb [SAT], SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 82 to 83
Mar 28 23:04:52 pve kernel: CPU 0/KVM invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Mar 28 23:04:54 pve kernel: CPU: 1 PID: 836931 Comm: CPU 0/KVM Tainted: P O 6.5.13-1-pve #1
Mar 28 23:04:54 pve kernel: Hardware name: HPE ProLiant MicroServer Gen10 Plus/ProLiant MicroServer Gen10 Plus, BIOS U48 06/24/2020
Mar 28 23:04:54 pve kernel: Call Trace:
Mar 28 23:04:54 pve kernel: <TASK>
Mar 28 23:04:54 pve kernel: dump_stack_lvl+0x48/0x70
Mar 28 23:04:54 pve kernel: dump_stack+0x10/0x20
Mar 28 23:04:54 pve kernel: dump_header+0x4f/0x260
Mar 28 23:04:54 pve kernel: oom_kill_process+0x10d/0x1c0
Mar 28 23:04:54 pve kernel: out_of_memory+0x270/0x560
Mar 28 23:04:54 pve kernel: __alloc_pages+0x114f/0x12e0
Mar 28 23:04:54 pve kernel: alloc_pages+0x90/0x160
Mar 28 23:04:54 pve kernel: folio_alloc+0x1d/0x60
Mar 28 23:04:54 pve kernel: filemap_alloc_folio+0xfd/0x110
Mar 28 23:04:54 pve kernel: __filemap_get_folio+0xd8/0x230
Mar 28 23:04:55 pve kernel: filemap_fault+0x584/0x9f0
Mar 28 23:04:55 pve kernel: __do_fault+0x36/0x150
Mar 28 23:04:55 pve kernel: do_fault+0x266/0x3e0
Mar 28 23:04:55 pve kernel: __handle_mm_fault+0x6cc/0xc30
Mar 28 23:04:55 pve kernel: handle_mm_fault+0x164/0x360
Mar 28 23:04:55 pve kernel: do_user_addr_fault+0x212/0x6a0
Mar 28 23:04:55 pve kernel: exc_page_fault+0x83/0x1b0
Mar 28 23:04:55 pve kernel: asm_exc_page_fault+0x27/0x30
Mar 28 23:04:55 pve kernel: RIP: 0033:0x7e45252afc5b
Mar 28 23:04:55 pve kernel: Code: Unable to access opcode bytes at 0x7e45252afc31.
Mar 28 23:04:55 pve kernel: RSP: 002b:00007e452098cf30 EFLAGS: 00010246
Mar 28 23:04:55 pve kernel: RAX: 0000000000000000 RBX: 00005c675d8f5b10 RCX: 00007e45252afc5b
Mar 28 23:04:55 pve kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001b
Mar 28 23:04:55 pve kernel: RBP: 000000000000ae80 R08: 00005c675b0acc90 R09: 0000000000000000
Mar 28 23:04:55 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Mar 28 23:04:55 pve kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Mar 28 23:04:55 pve kernel: </TASK>
Mar 28 23:04:55 pve kernel: Mem-Info:
Mar 28 23:04:55 pve kernel: active_anon:773252 inactive_anon:4694662 isolated_anon:0
active_file:46 inactive_file:49 isolated_file:0
unevictable:768 dirty:3 writeback:3
slab_reclaimable:6376 slab_unreclaimable:724886
mapped:11799 shmem:12732 pagetables:13226
sec_pagetables:6357 bounce:0
kernel_misc_reclaimable:0
free:53335 free_pcp:195 free_cma:0
Mar 28 23:04:55 pve kernel: Node 0 active_anon:3093944kB inactive_anon:18777712kB active_file:0kB inactive_file:196kB unevictable:3072kB isolated(anon):0kB isolated(file):0kB mapped:47196kB dirty:12kB writeback:12kB shmem:50928kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 9019392kB writeback_tmp:0kB kernel_stack:5424kB pagetables:52904kB sec_pagetables:25428kB all_unreclaimable? no
Mar 28 23:04:55 pve kernel: Node 0 DMA free:11264kB boost:0kB min:24kB low:36kB high:48kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 28 23:04:55 pve kernel: lowmem_reserve[]: 0 1776 39874 39874 39874
Mar 28 23:04:55 pve kernel: Node 0 DMA32 free:155772kB boost:0kB min:3008kB low:4824kB high:6640kB reserved_highatomic:0KB active_anon:60880kB inactive_anon:1652912kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1944276kB managed:1877488kB mlocked:0kB bounce:0kB free_pcp:1184kB local_pcp:0kB free_cma:0kB
Mar 28 23:04:55 pve kernel: lowmem_reserve[]: 0 0 38097 38097 38097
Mar 28 23:04:55 pve kernel: Node 0 Normal free:40280kB boost:0kB min:64548kB low:103560kB high:142572kB reserved_highatomic:0KB active_anon:3039532kB inactive_anon:17118332kB active_file:684kB inactive_file:268kB unevictable:3072kB writepending:24kB present:39829504kB managed:39020580kB mlocked:0kB bounce:0kB free_pcp:700kB local_pcp:0kB free_cma:0kB
Mar 28 23:04:55 pve kernel: lowmem_reserve[]: 0 0 0 0 0
Mar 28 23:04:55 pve kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Mar 28 23:04:55 pve kernel: Node 0 DMA32: 2*4kB (UM) 2*8kB (UM) 17*16kB (UM) 1*32kB (M) 23*64kB (UM) 35*128kB (UM) 33*256kB (UM) 32*512kB (UM) 29*1024kB (UM) 14*2048kB (U) 16*4096kB (UM) = 155016kB
Mar 28 23:04:55 pve kernel: Node 0 Normal: 2766*4kB (UME) 175*8kB (ME) 73*16kB (ME) 657*32kB (UME) 49*64kB (U) 2*128kB (U) 2*256kB (U) 2*512kB (U) 0*1024kB 0*2048kB 0*4096kB = 39584kB
Mar 28 23:04:55 pve kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 28 23:04:55 pve kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 28 23:04:55 pve kernel: 12792 total pagecache pages
Mar 28 23:04:55 pve kernel: 0 pages in swap cache
Mar 28 23:04:55 pve kernel: Free swap = 0kB
Mar 28 23:04:55 pve kernel: Total swap = 0kB
Mar 28 23:04:55 pve kernel: 10447443 pages RAM
Mar 28 23:04:55 pve kernel: 0 pages HighMem/MovableOnly
Mar 28 23:04:55 pve kernel: 219086 pages reserved
Mar 28 23:04:55 pve kernel: 0 pages hwpoisoned
Mar 28 23:04:55 pve kernel: Tasks state (memory values in pages):
Mar 28 23:04:55 pve kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Mar 28 23:04:55 pve kernel: [ 546] 0 546 12365 288 106496 0 -250 systemd-journal
Mar 28 23:04:55 pve kernel: [ 573] 0 573 6737 472 73728 0 -1000 systemd-udevd
Mar 28 23:04:55 pve kernel: [ 1145] 103 1145 1969 96 57344 0 0 rpcbind
Mar 28 23:04:55 pve kernel: [ 1154] 101 1154 2307 160 57344 0 -900 dbus-daemon
Mar 28 23:04:55 pve kernel: [ 1158] 0 1158 38187 128 61440 0 -1000 lxcfs
Mar 28 23:04:55 pve kernel: [ 1161] 0 1161 1766 83 49152 0 0 ksmtuned
Mar 28 23:04:55 pve kernel: [ 1164] 0 1164 69539 128 81920 0 0 pve-lxc-syscall
Mar 28 23:04:55 pve kernel: [ 1166] 0 1166 3018 512 61440 0 0 smartd
Mar 28 23:04:55 pve kernel: [ 1167] 0 1167 4289 288 73728 0 0 systemd-logind
Mar 28 23:04:55 pve kernel: [ 1168] 0 1168 1327 64 57344 0 0 qmeventd
Mar 28 23:04:55 pve kernel: [ 1169] 0 1169 583 64 40960 0 -1000 watchdog-mux
Mar 28 23:04:55 pve kernel: [ 1171] 0 1171 60164 416 98304 0 0 zed
Mar 28 23:04:55 pve kernel: [ 1403] 0 1403 1256 128 49152 0 0 lxc-monitord
Mar 28 23:04:55 pve kernel: [ 1422] 0 1422 1468 64 49152 0 0 agetty
Mar 28 23:04:55 pve kernel: [ 1463] 0 1463 3853 352 77824 0 -1000 sshd
Mar 28 23:04:55 pve kernel: [ 1481] 100 1481 4715 170 69632 0 0 chronyd
Mar 28 23:04:55 pve kernel: [ 1483] 100 1483 2633 210 65536 0 0 chronyd
Mar 28 23:04:55 pve kernel: [ 1605] 0 1605 10663 165 73728 0 0 master
Mar 28 23:04:55 pve kernel: [ 1607] 104 1607 10775 224 73728 0 0 qmgr
Mar 28 23:04:55 pve kernel: [ 1609] 0 1609 181841 359 172032 0 0 rrdcached
Mar 28 23:04:55 pve kernel: [ 1617] 0 1617 205475 14882 405504 0 0 pmxcfs
Mar 28 23:04:55 pve kernel: [ 1623] 0 1623 1652 96 53248 0 0 cron
Mar 28 23:04:55 pve kernel: [ 1631] 0 1631 39329 24215 303104 0 0 pve-firewall
Mar 28 23:04:55 pve kernel: [ 1634] 0 1634 38044 22884 294912 0 0 pvestatd
Mar 28 23:04:55 pve kernel: [ 1636] 0 1636 615 32 45056 0 0 bpfilter_umh
Mar 28 23:04:55 pve kernel: [ 1660] 0 1660 58325 33546 405504 0 0 pvedaemon
Mar 28 23:04:55 pve kernel: [ 1677] 0 1677 54844 27594 356352 0 0 pve-ha-crm
Mar 28 23:04:55 pve kernel: [ 1678] 33 1678 58652 33918 458752 0 0 pveproxy
Mar 28 23:04:55 pve kernel: [ 1684] 33 1684 20204 12640 192512 0 0 spiceproxy
Mar 28 23:04:55 pve kernel: [ 1686] 0 1686 54685 27434 376832 0 0 pve-ha-lrm
Mar 28 23:04:55 pve kernel: [ 1805] 0 1805 53630 27734 364544 0 0 pvescheduler
Mar 28 23:04:55 pve kernel: [ 836879] 0 836879 1249927 623936 5791744 0 0 kvm
Mar 28 23:04:55 pve kernel: [ 895097] 0 895097 19796 96 61440 0 0 pvefw-logger
Mar 28 23:04:55 pve kernel: [ 895102] 33 895102 20266 12669 176128 0 0 spiceproxy work
Mar 28 23:04:55 pve kernel: [1128213] 0 1128213 8506345 4932965 41132032 0 0 kvm
Mar 28 23:04:55 pve kernel: [1141159] 0 1141159 60436 33931 421888 0 0 pvedaemon worke
Mar 28 23:04:55 pve kernel: [1142148] 0 1142148 60466 33995 425984 0 0 pvedaemon worke
Mar 28 23:04:55 pve kernel: [1142797] 33 1142797 60883 34467 438272 0 0 pveproxy worker
Mar 28 23:04:55 pve kernel: [1143811] 0 1143811 60436 33867 421888 0 0 pvedaemon worke
Mar 28 23:04:55 pve kernel: [1144290] 33 1144290 60871 34435 438272 0 0 pveproxy worker
Mar 28 23:04:55 pve kernel: [1145678] 104 1145678 10764 224 77824 0 0 pickup
Mar 28 23:04:55 pve kernel: [1146432] 33 1146432 60670 34115 434176 0 0 pveproxy worker
Mar 28 23:04:55 pve kernel: [1150823] 0 1150823 1366 64 45056 0 0 sleep
Mar 28 23:04:55 pve kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=qemu.slice,mems_allowed=0,global_oom,task_memcg=/qemu.slice/100.scope,task=kvm,pid=1128213,uid=0
Mar 28 23:04:55 pve kernel: Out of memory: Killed process 1128213 (kvm) total-vm:34025380kB, anon-rss:19731476kB, file-rss:384kB, shmem-rss:0kB, UID:0 pgtables:40168kB oom_score_adj:0
Mar 28 23:04:55 pve systemd[1]: 100.scope: A process of this unit has been killed by the OOM killer.
Mar 28 23:04:55 pve systemd[1]: 100.scope: Failed with result 'oom-kill'.
Mar 28 23:04:55 pve systemd[1]: 100.scope: Consumed 52min 42.828s CPU time.
Mar 28 23:04:55 pve kernel: oom_reaper: reaped process 1128213 (kvm), now anon-rss:0kB, file-rss:256kB, shmem-rss:0kB
Mar 28 23:05:20 pve kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Mar 28 23:05:20 pve kernel: zd16: p1 p2 p3
Mar 28 23:05:20 pve kernel: tap100i0 (unregistering): left allmulticast mode
Mar 28 23:05:20 pve kernel: fwbr100i0: port 2(tap100i0) entered disabled state
Mar 28 23:05:23 pve kernel: zd0: p1 p2
Mar 28 23:05:23 pve qmeventd[1151017]: Starting cleanup for 100
Mar 28 23:05:23 pve kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Mar 28 23:05:23 pve kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Mar 28 23:05:23 pve kernel: fwln100i0 (unregistering): left allmulticast mode
Mar 28 23:05:23 pve kernel: fwln100i0 (unregistering): left promiscuous mode
Mar 28 23:05:23 pve kernel: fwbr100i0: port 1(fwln100i0) entered disabled state
Mar 28 23:05:23 pve kernel: fwpr100p0 (unregistering): left allmulticast mode
Mar 28 23:05:23 pve kernel: fwpr100p0 (unregistering): left promiscuous mode
Mar 28 23:05:23 pve kernel: vmbr0: port 2(fwpr100p0) entered disabled state
Mar 28 23:05:23 pve qmeventd[1151017]: Finished cleanup for 100
Mar 28 23:17:02 pve CRON[1153171]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 28 23:17:02 pve CRON[1153172]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Mar 28 23:17:02 pve CRON[1153171]: pam_unix(cron:session): session closed for user root
Mar 28 23:32:54 pve smartd[1166]: Device: /dev/sdb [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 59 to 60
Mar 28 23:32:54 pve smartd[1166]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 41 to 40
Mar 29 00:00:15 pve systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
Mar 29 00:00:15 pve systemd[1]: Starting logrotate.service - Rotate log files...
Mar 29 00:00:16 pve systemd[1]: Reloading pveproxy.service - PVE API Proxy Server...
Mar 29 00:00:16 pve systemd[1]: dpkg-db-backup.service: Deactivated successfully.
Mar 29 00:00:16 pve systemd[1]: Finished dpkg-db-backup.service - Daily dpkg database backup service.
Mar 29 00:00:17 pve pveproxy[1160946]: send HUP to 1678
Mar 29 00:00:17 pve pveproxy[1678]: received signal HUP
Mar 29 00:00:17 pve pveproxy[1678]: server closing
Mar 29 00:00:17 pve pveproxy[1678]: server shutdown (restart)
Mar 29 00:00:17 pve systemd[1]: Reloaded pveproxy.service - PVE API Proxy Server.
Mar 29 00:00:17 pve systemd[1]: Reloading spiceproxy.service - PVE SPICE Proxy Server...
Mar 29 00:00:18 pve spiceproxy[1160950]: send HUP to 1684
Mar 29 00:00:18 pve spiceproxy[1684]: received signal HUP
Mar 29 00:00:18 pve spiceproxy[1684]: server closing
Mar 29 00:00:18 pve spiceproxy[1684]: server shutdown (restart)
Mar 29 00:00:18 pve systemd[1]: Reloaded spiceproxy.service - PVE SPICE Proxy Server.
Mar 29 00:00:18 pve systemd[1]: Stopping pvefw-logger.service - Proxmox VE firewall logger...
Mar 29 00:00:18 pve pvefw-logger[895097]: received terminate request (signal)
Mar 29 00:00:18 pve pvefw-logger[895097]: stopping pvefw logger
Mar 29 00:00:18 pve spiceproxy[1684]: restarting server
Mar 29 00:00:18 pve spiceproxy[1684]: starting 1 worker(s)
Mar 29 00:00:18 pve spiceproxy[1684]: worker 1160959 started
Mar 29 00:00:18 pve systemd[1]: pvefw-logger.service: Deactivated successfully.
Mar 29 00:00:18 pve systemd[1]: Stopped pvefw-logger.service - Proxmox VE firewall logger.
Mar 29 00:00:18 pve systemd[1]: pvefw-logger.service: Consumed 4.588s CPU time.
Mar 29 00:00:18 pve systemd[1]: Starting pvefw-logger.service - Proxmox VE firewall logger...
Mar 29 00:00:18 pve pvefw-logger[1160961]: starting pvefw logger