12737 – vfs_acl_xattr - failure to get ACL on Linux if memory is fragmented

Bug 12737 - vfs_acl_xattr - failure to get ACL on Linux if memory is fragmented

Summary: vfs_acl_xattr - failure to get ACL on Linux if memory is fragmented

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 4.1 and newer
Classification:	Unclassified
Component:	VFS Modules (show other bugs)
Version:	4.6.0
Hardware:	All All

Importance:	P5 normal (vote)
Target Milestone:	---
Assignee:	Karolin Seeger
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-04-08 21:19 UTC by Uri Simchoni
Modified:	2017-04-21 07:00 UTC (History)
CC List:	2 users (show)

See Also:

Attachments
git-am fix for 4.6.next (10.13 KB, patch) 2017-04-18 07:32 UTC, Uri Simchoni	jra: review+	Details
git-am fix for 4.5.next (10.00 KB, patch) 2017-04-18 07:34 UTC, Uri Simchoni	jra: review+	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Uri Simchoni 2017-04-08 21:19:26 UTC

We've come across incidents where we see on the console of our NAS
devices something like this:

smbd: page allocation failure: order:4, mode:0xc0d0
Pid: 6443, comm: smbd Tainted: P         C O 3.2.54 #1
Call Trace:
 [<ffffffff810b6345>] warn_alloc_failed+0xf5/0x140
 [<ffffffff810b7822>] ? drain_pages+0x32/0x90
 [<ffffffff810b78d0>] ? page_alloc_cpu_notify+0x50/0x50
 [<ffffffff810b78e1>] ? drain_local_pages+0x11/0x20
 [<ffffffff810b70c1>] __alloc_pages_nodemask+0x481/0x6d0
 [<ffffffff810e707f>] alloc_pages_current+0x7f/0xe0
 [<ffffffff810b5a99>] __get_free_pages+0x9/0x40
 [<ffffffff810ee3a8>] __kmalloc+0xe8/0xf0
 [<ffffffff811154e8>] getxattr+0x98/0x130
 [<ffffffff8110371d>] ? do_path_lookup+0x2d/0xc0
 [<ffffffff810eeb15>] ? kmem_cache_free+0x15/0x90
 [<ffffffff810ff92e>] ? putname+0x2e/0x40
 [<ffffffff8110453f>] ? user_path_at_empty+0x5f/0xa0
 [<ffffffff810a0ec0>] ? call_rcu_sched+0x10/0x20
 [<ffffffff8106d25a>] ? __put_cred+0x3a/0x50
 [<ffffffff81115654>] sys_getxattr+0x54/0x80
 [<ffffffff8105db54>] ? sys_setresgid+0x84/0x120
 [<ffffffff8169ced2>] system_call_fastpath+0x16/0x1b

...and a bunch of memory stats.

Our analysis is that:

1. Samba's vfs_acl_xattr() algorithm for getting an extended attribute
is "try with a 1K buffer and if that doesn't work try with 64K". This is
presumably an optimistic strategy to save system calls.

2. In older Linux (prior to 3.4), getxattr() used to require a
physically-contiguous (kmalloc'd) buffer whose size equals the max size
requested by the user. So even though no Linux in-tree file system
supports EA's larger than 4K, the getxattr() system call would try to
allocate 64K if Samba passes this number.

3. On busy servers, memory gets fragmented over time, and that's the
failure we've been seeing (notice the "order:4" - that means it tried
2^4 pages or 64K). That causes the getxattr to fail with ENOMEM.

4. In newer kernels (since commit 44c82498), it still tries the kmalloc,
but falls back to vmalloc. I'm not sure why kmalloc first, maybe that's
also some optimistic stragegy...

5. So in newer kernels, the 64K still incurs some extra-allocation
overhead, but at least it doesn't fail.

6. Since the initial kmalloc is done with GFP_KERNEL, I think there's a
middle case where the kernel would try to evict pages, and that might
cause a performance hit, all for memory that's not really required by
any in-tree file system (only ZFS supports 64K on Linux AFAIK).

Bottom line - Samba works well with recent Linux kernels, but we might
be taking a performance hit by asking for memory we probably don't need.
We might want to do the initial getxattr with 4K, or have a 4K step
between the 1K and the 64K.

Comment 1 Uri Simchoni 2017-04-09 09:46:31 UTC

Some corrections:
1. Many file systems support EAs of more than 4K, even on Linux. XFS is one of them.

2. The Linux fallback to vmalloc for getxattr came around kernel 3.6 in commit 779302e6

Comment 2 Uri Simchoni 2017-04-18 07:32:33 UTC

Created attachment 13157 [details]
git-am fix for 4.6.next

Comment 3 Uri Simchoni 2017-04-18 07:34:06 UTC

Created attachment 13158 [details]
git-am fix for 4.5.next

The 4.5.next fix is identical to the master / 4.6 fix, except that the inclusion of the test in tests.py had to be tweaked.

Comment 4 Jeremy Allison 2017-04-18 17:38:45 UTC

Comment on attachment 13157 [details]
git-am fix for 4.6.next

LGTM.

Comment 5 Jeremy Allison 2017-04-18 17:49:13 UTC

Reassigning to Karolin for inclusion in 4.6.next, 4.5.next.

Comment 6 Karolin Seeger 2017-04-19 09:34:20 UTC

(In reply to Jeremy Allison from comment #5)
Pushed to autobuild-v4-{6,5}-test.

Comment 7 Karolin Seeger 2017-04-21 07:00:26 UTC

(In reply to Karolin Seeger from comment #6)
Pushed to both branches.
Closing out bug report.

Thanks!