Bug 12737 - vfs_acl_xattr - failure to get ACL on Linux if memory is fragmented
Summary: vfs_acl_xattr - failure to get ACL on Linux if memory is fragmented
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: VFS Modules (show other bugs)
Version: 4.6.0
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-08 21:19 UTC by Uri Simchoni
Modified: 2017-04-21 07:00 UTC (History)
2 users (show)

See Also:


Attachments
git-am fix for 4.6.next (10.13 KB, patch)
2017-04-18 07:32 UTC, Uri Simchoni
jra: review+
Details
git-am fix for 4.5.next (10.00 KB, patch)
2017-04-18 07:34 UTC, Uri Simchoni
jra: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Uri Simchoni 2017-04-08 21:19:26 UTC
We've come across incidents where we see on the console of our NAS
devices something like this:

smbd: page allocation failure: order:4, mode:0xc0d0
Pid: 6443, comm: smbd Tainted: P         C O 3.2.54 #1
Call Trace:
 [<ffffffff810b6345>] warn_alloc_failed+0xf5/0x140
 [<ffffffff810b7822>] ? drain_pages+0x32/0x90
 [<ffffffff810b78d0>] ? page_alloc_cpu_notify+0x50/0x50
 [<ffffffff810b78e1>] ? drain_local_pages+0x11/0x20
 [<ffffffff810b70c1>] __alloc_pages_nodemask+0x481/0x6d0
 [<ffffffff810e707f>] alloc_pages_current+0x7f/0xe0
 [<ffffffff810b5a99>] __get_free_pages+0x9/0x40
 [<ffffffff810ee3a8>] __kmalloc+0xe8/0xf0
 [<ffffffff811154e8>] getxattr+0x98/0x130
 [<ffffffff8110371d>] ? do_path_lookup+0x2d/0xc0
 [<ffffffff810eeb15>] ? kmem_cache_free+0x15/0x90
 [<ffffffff810ff92e>] ? putname+0x2e/0x40
 [<ffffffff8110453f>] ? user_path_at_empty+0x5f/0xa0
 [<ffffffff810a0ec0>] ? call_rcu_sched+0x10/0x20
 [<ffffffff8106d25a>] ? __put_cred+0x3a/0x50
 [<ffffffff81115654>] sys_getxattr+0x54/0x80
 [<ffffffff8105db54>] ? sys_setresgid+0x84/0x120
 [<ffffffff8169ced2>] system_call_fastpath+0x16/0x1b

...and a bunch of memory stats.

Our analysis is that:

1. Samba's vfs_acl_xattr() algorithm for getting an extended attribute
is "try with a 1K buffer and if that doesn't work try with 64K". This is
presumably an optimistic strategy to save system calls.

2. In older Linux (prior to 3.4), getxattr() used to require a
physically-contiguous (kmalloc'd) buffer whose size equals the max size
requested by the user. So even though no Linux in-tree file system
supports EA's larger than 4K, the getxattr() system call would try to
allocate 64K if Samba passes this number.

3. On busy servers, memory gets fragmented over time, and that's the
failure we've been seeing (notice the "order:4" - that means it tried
2^4 pages or 64K). That causes the getxattr to fail with ENOMEM.

4. In newer kernels (since commit 44c82498), it still tries the kmalloc,
but falls back to vmalloc. I'm not sure why kmalloc first, maybe that's
also some optimistic stragegy...

5. So in newer kernels, the 64K still incurs some extra-allocation
overhead, but at least it doesn't fail.

6. Since the initial kmalloc is done with GFP_KERNEL, I think there's a
middle case where the kernel would try to evict pages, and that might
cause a performance hit, all for memory that's not really required by
any in-tree file system (only ZFS supports 64K on Linux AFAIK).

Bottom line - Samba works well with recent Linux kernels, but we might
be taking a performance hit by asking for memory we probably don't need.
We might want to do the initial getxattr with 4K, or have a 4K step
between the 1K and the 64K.
Comment 1 Uri Simchoni 2017-04-09 09:46:31 UTC
Some corrections:
1. Many file systems support EAs of more than 4K, even on Linux. XFS is one of them.

2. The Linux fallback to vmalloc for getxattr came around kernel 3.6 in commit 779302e6
Comment 2 Uri Simchoni 2017-04-18 07:32:33 UTC
Created attachment 13157 [details]
git-am fix for 4.6.next
Comment 3 Uri Simchoni 2017-04-18 07:34:06 UTC
Created attachment 13158 [details]
git-am fix for 4.5.next

The 4.5.next fix is identical to the master / 4.6 fix, except that the inclusion of the test in tests.py had to be tweaked.
Comment 4 Jeremy Allison 2017-04-18 17:38:45 UTC
Comment on attachment 13157 [details]
git-am fix for 4.6.next

LGTM.
Comment 5 Jeremy Allison 2017-04-18 17:49:13 UTC
Reassigning to Karolin for inclusion in 4.6.next, 4.5.next.
Comment 6 Karolin Seeger 2017-04-19 09:34:20 UTC
(In reply to Jeremy Allison from comment #5)
Pushed to autobuild-v4-{6,5}-test.
Comment 7 Karolin Seeger 2017-04-21 07:00:26 UTC
(In reply to Karolin Seeger from comment #6)
Pushed to both branches.
Closing out bug report.

Thanks!