XFS possible memory allocation deadlock on ext4 container

seechiller

Renowned Member
Jan 5, 2014
18
1
68
Hello,

I've migrated my old proxmox server to a new system running on 4.4, the new system uses 2 raid 10 arrays, formated with xfs. I get many times a month:

Code:
[11127866.527660] XFS: loop5(22218) possible memory allocation deadlock size 44960 in kmem_alloc (mode:0x2400240)

As soon as I get the error above one container stop responding, all other containers and vms runs without any problem, so the problem could not be the file system on which the disk images are stored. The container with this problem runs mysql server with a lot of transactions (log server).

As soon as I enter
Code:
echo 2 > /proc/sys/vm/drop_caches
the container runs again.


I found another threat, in this its recomendet to defrag, but this seams not to be the problem, defrag ratio:
Code:
root@pm:~# xfs_db -r -c "frag -f" /dev/mapper/pve-root
actual 44325, ideal 41709, fragmentation factor 5.90%

root@pm:~# xfs_db -r -c "frag -f" /dev/sdb1
actual 8832243, ideal 8117641, fragmentation factor 8.09%


I run this:
Code:
root@pm:~# blkid |grep loop5
/dev/loop5: UUID="314dcf4a-e614-4559-94aa-11e3b0c5230f" TYPE="ext4"

So now I’m really confused, why happens this xfs error in a disk image with type ext4?

I never had any problem with this container on the old hardware with ext4.

Thanks for any hint!
 
Last edited:
Nope, but the other guy seams to have exactly the same problem, he uses S-ATA, I use only SAS and SSD's. Today I had two times to “drop_caches”, this drives me crazy, thinking about to switch back to the old hardware.

On the other post I read that may be to much ram could be a problem, I've 128GB, should I go to 32GB or so?

BTW:
Code:
root@pm:~# cat /proc/slabinfo |grep xfs_inode
xfs_inode         1163546 1165722   1088   30    8 : tunables    0    0    0 : slabdata 120491 120491      0
 
Last edited:
what is the out of
free -h

and

slabtop

before you need to flush the caches ?
 
ok it happens now again, here the output from the host while the problem exists:

Code:
root@pm:~# free -h
             total       used       free     shared    buffers     cached
Mem:          125G       123G       2.3G       502M       1.3G       104G
-/+ buffers/cache:        17G       108G
Swap:          31G       2.6G        29G

Code:
 Active / Total Objects (% used)    : 15905599 / 18551354 (85.7%)
 Active / Total Slabs (% used)      : 634700 / 634700 (100.0%)
 Active / Total Caches (% used)     : 100 / 165 (60.6%)
 Active / Total Size (% used)       : 6033187.81K / 6361920.01K (94.8%)
 Minimum / Average / Maximum Object : 0.01K / 0.34K / 18.50K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
7416786 5023107  67%    0.10K 190174       39    760696K buffer_head
3300465 3292384  99%    0.19K  78583       42    628664K dentry
1995408 1994598  99%    1.05K 196677       30   6293664K ext4_inode_cache
1800589 1693479  94%    0.57K  64468       28   1031488K radix_tree_node
1272666 1271626  99%    1.06K  47203       30   1510496K xfs_inode
387008 386081  99%    0.06K   6047       64     24188K kmalloc-64
330684 316368  95%    0.04K   3242      102     12968K ext4_extent_status
216600 216502  99%    0.20K   5415       40     43320K vm_area_struct
193052 115798  59%    0.15K   3643       53     29144K xfs_ili
163968 159577  97%    0.19K   3904       42     31232K kmalloc-192
144984 143172  98%    0.38K   3452       42     55232K mnt_cache
121044 116796  96%    0.09K   2882       42     11528K kmalloc-96
109956 109213  99%    0.08K   2156       51      8624K anon_vma
 98432  91932  93%    0.03K    769      128      3076K kmalloc-32
 77690  77472  99%    0.12K   2285       34      9140K kernfs_node_cache
 71264  63895  89%    0.12K   2227       32      8908K kmalloc-128
 67584  64241  95%    0.02K    264      256      1056K kmalloc-16
 61285  60677  99%    0.05K    721       85      2884K ftrace_event_field
 57344  57220  99%    0.14K   2048       28      8192K btrfs_path
 48858  48450  99%    0.04K    479      102      1916K Acpi-Namespace
 45120  44981  99%    0.50K   1410       32     22560K kmalloc-512
 40432  40064  99%    0.55K   1444       28     23104K inode_cache
 33792  30010  88%    0.25K   1056       32      8448K kmalloc-256
 30214  30167  99%    0.64K    664       49     21248K shmem_inode_cache
 28004  28004 100%    0.75K   2663       42     85216K fuse_inode
 25280  24832  98%    0.06K    395       64      1580K ext4_free_data
 24576  24576 100%    0.01K     48      512       192K kmalloc-8
 21973  21973 100%    0.05K    301       73      1204K Acpi-Parse
 19880  19829  99%    0.07K    355       56      1420K Acpi-Operand
 15453  15198  98%    0.31K    303       51      4848K nf_conntrack_26
 15136  14828  97%    0.18K    344       44      2752K xfs_log_ticket
 14790  14382  97%    0.31K    290       51      4640K nf_conntrack_1
 14118  13130  93%    0.10K    362       39      1448K blkdev_ioc
 13952  13952 100%    0.03K    109      128       436K jbd2_revoke_record_s
 13719  13366  97%    0.31K    269       51      4304K nf_conntrack_22
 13410  13114  97%    0.61K    303       52      9696K proc_inode_cache
 12648  12196  96%    0.31K    248       51      3968K nf_conntrack_23
 12342  12087  97%    0.31K    242       51      3872K nf_conntrack_29
 11872  11872 100%    0.07K    212       56       848K ext4_io_end
 11684  11684 100%    0.09K    254       46      1016K trace_event_file
 11424  11255  98%    0.12K    336       34      1344K jbd2_journal_head
 11360  10361  91%    1.00K    355       32     11360K kmalloc-1024
 10388  10143  97%    0.32K    212       49      3392K request_sock_TCP
  9195   9195 100%    0.62K    250       51      8000K sock_inode_cache
  8316   8184  98%    0.36K    189       44      3024K blkdev_requests
  8294   8033  96%    0.27K    286       29      2288K tw_sock_TCPv6
  8262   8109  98%    0.31K    162       51      2592K nf_conntrack_51
  7936   7936 100%    0.02K     31      256       124K jbd2_revoke_table_s
  7840   7560  96%    0.23K    224       35      1792K cfq_queue
  6987   6987 100%    0.31K    137       51      2192K nf_conntrack_21
  6732   6732 100%    0.31K    132       51      2112K bio-2
  5829   5539  95%    0.27K    201       29      1608K tw_sock_TCP
  5712   5712 100%    0.31K    112       51      1792K nf_conntrack_8
  5559   5202  93%    0.31K    109       51      1744K nf_conntrack_36
  5440   5440 100%    0.02K     32      170       128K numa_policy

and after:
Code:
echo 1 > /proc/sys/vm/drop_caches

this:

Code:
root@pm:~# free -h
             total       used       free     shared    buffers     cached
Mem:          125G        16G       109G       502M       5.1M       1.2G
-/+ buffers/cache:        15G       110G
Swap:          31G       2.6G        29G

Code:
 Active / Total Objects (% used)    : 9702084 / 10929785 (88.8%)
 Active / Total Slabs (% used)      : 436832 / 436832 (100.0%)
 Active / Total Caches (% used)     : 100 / 165 (60.6%)
 Active / Total Size (% used)       : 4894711.02K / 5456993.47K (89.7%)
 Minimum / Average / Maximum Object : 0.01K / 0.50K / 18.50K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
3300297 3289825  99%    0.19K  78579       42    628632K dentry
1995408 1994656  99%    1.05K 196677       30   6293664K ext4_inode_cache
1537032 638153  41%    0.57K  55029       28    880464K radix_tree_node
1272666 1271804  99%    1.06K  47203       30   1510496K xfs_inode
362624 334183  92%    0.06K   5666       64     22664K kmalloc-64
329766 324708  98%    0.04K   3233      102     12932K ext4_extent_status
193052 115977  60%    0.15K   3643       53     29144K xfs_ili
181960 165440  90%    0.20K   4549       40     36392K vm_area_struct
163926 159907  97%    0.19K   3903       42     31224K kmalloc-192
146796  21630  14%    0.10K   3764       39     15056K buffer_head
144984 143194  98%    0.38K   3452       42     55232K mnt_cache
120750 116071  96%    0.09K   2875       42     11500K kmalloc-96
 98432  90134  91%    0.03K    769      128      3076K kmalloc-32
 93789  84704  90%    0.08K   1839       51      7356K anon_vma
 77690  77472  99%    0.12K   2285       34      9140K kernfs_node_cache
 71328  65104  91%    0.12K   2229       32      8916K kmalloc-128
 67584  64241  95%    0.02K    264      256      1056K kmalloc-16
 61285  60761  99%    0.05K    721       85      2884K ftrace_event_field
 57344  57220  99%    0.14K   2048       28      8192K btrfs_path
 48858  48450  99%    0.04K    479      102      1916K Acpi-Namespace
 40432  40064  99%    0.55K   1444       28     23104K inode_cache
 33792  24680  73%    0.25K   1056       32      8448K kmalloc-256
 32416  26701  82%    0.50K   1013       32     16208K kmalloc-512
 30214  30167  99%    0.64K    664       49     21248K shmem_inode_cache
 28004  28004 100%    0.75K   2663       42     85216K fuse_inode
 25280  24832  98%    0.06K    395       64      1580K ext4_free_data
 24576  24576 100%    0.01K     48      512       192K kmalloc-8
 22119  22119 100%    0.05K    303       73      1212K Acpi-Parse
 19880  19829  99%    0.07K    355       56      1420K Acpi-Operand
 16320  15688  96%    0.31K    320       51      5120K nf_conntrack_26
 15136  14828  97%    0.18K    344       44      2752K xfs_log_ticket
 14943  14688  98%    0.31K    293       51      4688K nf_conntrack_22
 14790  14382  97%    0.31K    290       51      4640K nf_conntrack_1
 14079  13126  93%    0.10K    361       39      1444K blkdev_ioc
 13952  13952 100%    0.03K    109      128       436K jbd2_revoke_record_s
 13380  12211  91%    0.61K    298       52      9536K proc_inode_cache
 12342  12087  97%    0.31K    242       51      3872K nf_conntrack_29
 12087  11785  97%    0.31K    237       51      3792K nf_conntrack_23
 11872  11872 100%    0.07K    212       56       848K ext4_io_end
 11684  11684 100%    0.09K    254       46      1016K trace_event_file
 11424  11255  98%    0.12K    336       34      1344K jbd2_journal_head
 11360  10173  89%    1.00K    355       32     11360K kmalloc-1024
 10388  10143  97%    0.32K    212       49      3392K request_sock_TCP
  9069   8821  97%    0.62K    244       51      7808K sock_inode_cache
  8439   8294  98%    0.27K    291       29      2328K tw_sock_TCPv6
  8316   8184  98%    0.36K    189       44      3024K blkdev_requests
  8262   8109  98%    0.31K    162       51      2592K nf_conntrack_51
  7936   7936 100%    0.02K     31      256       124K jbd2_revoke_table_s
  7840   7560  96%    0.23K    224       35      1792K cfq_queue
  6987   6987 100%    0.31K    137       51      2192K nf_conntrack_21
  6732   6732 100%    0.31K    132       51      2112K bio-2
  5829   5597  96%    0.27K    201       29      1608K tw_sock_TCP
  5712   5712 100%    0.31K    112       51      1792K nf_conntrack_8
  5559   5202  93%    0.31K    109       51      1744K nf_conntrack_36
  5440   5440 100%    0.02K     32      170       128K numa_policy

It looks like that all memory is consumed (cached) while the problem exists
 
Last edited:
looks at this:
cache
104G

I think this is what is causing your problem
whenever the kernel flushes this tremendous cache on your SAS drives (well not of all) all other IO will be blocked

you do not need to reduce the RAM of your server, you just need to ask him to cache a little bit less so he can do this cache flush quicklier

For this you can have a look at
https://cromwell-intl.com/linux/performance-tuning/disks.html

and follow the hints for "improve latency for interactive system"
for dirty_ration, dirty_background_ration, and vfs_cache_pressure
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!