Created attachment 17150 [details]
Bug 14952 fixed a segmentation fault in paged_results which was introduced by 4d99cab6172 while fixing CVE-2020-10730.
We found another issue in the paged_results module, which causes segmentation faults that are triggered in real life e.g. when an Administrator uses ADUC to perform wildcard searches for accounts. In that case we see LDAP search requests like this in the LDB traces:
ldbsearch -H "ldap://$(hostname -f)" -s sub '(&(|(&(objectCategory=person)
1.2.840.113518.104.22.1684:=14)))(anr=a*))' objectClass userAccountControl name
description -P --controls "paged_results:0:16" ## or similar
Metze suggested debugging this with a direct ldbsearch against the sam.ldb and we found that simple paged searches like
ldbsearch -H sam.ldb -s sub '(|(foo=bar)(anr=c*))' \
ldbsearch -H sam.ldb -s sub '(!(anr==Administrator))' foo \
triggers undefined behavior that frequently ends up in a segmentation fault. As usual with memory corruption, the exact behavior varies on details like the exact search filter and the attributes requested.
The common thing is, that the `anr` (Ambiguous Name Resolution) is used in the search filter and that paged_results are requested and return more than a single page. In that situation, the core dump is triggered in a subsequent call to resolve_oids, which attempts to use a nested ldb_parse_tree structure that was generated by anr.c and stored by paged_results in its private_data for later reference. Unfortunately the code of paged_results attaches the required talloc memory chunk to the `req`, which seems to get freed in between requested result pages.
The attached gdb output shows the backtrace (generated with Samba 4.13.13, but the issue persists in later versions too).
I'll attach a proposal for a patch.
Created attachment 17151 [details]
The attached patch fixed the problem in my tests.
Created attachment 17152 [details]
WIP patches for master
thanks for the information!
Can you try if this patch would also fix the problem for you?
Yes, works and avoids the memory leak that I proposed.
Created attachment 17153 [details]
The attached patch fixes one regression from your patch: In case I don't specify any particular attributes with ldbsearch, with the new patch the search aborts like this:
root@test:~/ ldbsearch -H sam.ldb '(|(foo=bar)(anr=a*))' \
search failed - ldb out of memory at ../../source4/dsdb/samdb/ldb_modules/paged_results.c:751
Using the --trace option I see:
control: 1.2.840.113522.214.171.1249 crit:1 data:yes
So, in this case req->op.search.attrs is NULL and that should not be mistaken for a talloc oom condition. The attached patch fixed this in my test.
Created attachment 17156 [details]
Unfortunately Metzes patch (Comment 2 plus mine from Comment 4) does not fix the issues completely. A colleague of mine chose a different page size of 2 and then the issue occurred again, e.g. with:
ldbsearch -H /var/lib/samba/private/sam.ldb \
"(|(objectclass=person)(anr=Administrator))" objectClass \
Attached you can find a gdb bt full plus some observations throughout some points in the call stack.
I checked again with my initial path proposal (which may be bad for other reasons) and it fixed the issue.
In my testing, this bug is fixed by commit 5f0590362c5c0c5ee20503a67467f9be2d50e73b:
CVE-2021-3670 dsdb/anr: Do a copy of the potentially anr query before starting to modify it
RN: Do not modify the caller-supplied memory in the anr=* handling to
allow clear logging of the actual caller request after it has been processed.
which appears in Samba 4.14 and 4.15, but is not in 4.13.
(In reply to Joseph Sutton from comment #6)
Arvid, can you please check it 5f0590362c5c0c5ee20503a67467f9be2d50e73b
works for 4.13 as well? So that we can close this?
Segfaults are still happening with ldbsearches like
-H /var/lib/samba/private/sam.ldb \
The only case this patch fixed is the ldbsearch with anr and page size 1.
With every other number of page sizes the samba process still crashes.
Since the default page size in ADUAC uses 64, this patch won't help IRL.
I wanted to add some test cases in source4/dsdb/tests/python/vlv.py to make it reproducible but was unable to figure out where/how/when these test cases get called. If you could give a hint, I could offer adding the tests as a patch. Btw, what do you prefer, attaching patches here or creating a pull request in Gitlab?
Patches for master should start in GitLab per https://wiki.samba.org/index.php/Contribute
(In reply to Andrew Bartlett from comment #9)
Is this a request to me? I don't get it. I only reported the issue and suggested an area in the source code that may be relevant to the issue. But the issue is still unsolved and of practical relevance, as a simple ADUC search triggers this, and as the service segfaults, there are no results returned.
(In reply to Arvid Requate from comment #8)
I just tried this on master and it doesn't segfault, even under valgrind.
Is this only an issue on older versions?
I did a quick check with Samba 4.18.4 and couldn't reproduce it either, neither with the synthetic ldbsearch examples given above nor with the ADUC user search (which uses anr). Strange but in a good way :-)
I've been looking over this carefully, and I will likely take the memory correctness patches anyway and add a shallow copy of the tree (because it is easily implemented).
I'm pretty sure the patch referenced in comment #6 is the correct one, as the way that the ldb_parse_tree would corrupt is by being modified without being copied, such that the new children in the tree are not part of the memory held long-term by the talloc_steal().
Arvid, Do you know what version you were testing with in comment #8?
With Samba 4.13.13 I can reproduce this, and it is fixed once 5f0590362c5c0c5ee20503a67467f9be2d50e73b is applied.
Given this, my analysis and comment #12 I'm going to close this, but open a related bug and MR for the memory handling improvements.
Yes, 4.13.13 it was.
This bug was referenced in samba master: