Bug 10387 - net ads search on high latency networks can return a partial list with no error indication.
Summary: net ads search on high latency networks can return a partial list with no err...
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.0
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-18 00:31 UTC by Jeremy Allison
Modified: 2014-03-25 09:30 UTC (History)
1 user (show)

See Also:


Attachments
git-am fix that went into master. (2.23 KB, patch)
2014-01-24 19:06 UTC, Jeremy Allison
ddiss: review+
Details
Updated with Daniel's signed-off-by. (2.28 KB, patch)
2014-02-25 20:24 UTC, Jeremy Allison
ddiss: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Allison 2014-01-18 00:31:50 UTC
From Daniel Liberman:

                I have a small patch to propose. We had a small problem with
‘net ads search’es on some customer networks (definitely WANs). Apparently when
the latency is too high, if we query for all the accounts in an Active
Directory domain with a large number of users/groups, it will return just part
of the accounts (instead of the whole set), but not all of them. Running with
detailed debugs, it gives us a clue why:
 
ads_do_paged_search_args: ldap_search_with_timeout((sAMAccountType=805306368))
-> Time limit exceeded
 
                After that I started to try and reproduce the problem here. So
reduced ldap timeout to 1 second on smb.conf, and simulated a high latency
(~200ms) in our linux router between the client and the domain controller. I
can reproduce the scenario consistently. The real problem is: when just a
subset of the accounts is returned, the error code from the net ads search
command is 0, and no message is printed to stderr. Some other times no account
is returned, and we have the error message, and it returns -1. So, basically,
we were mislead to believe the command worked, when it didn’t.
 
                After debugging a lot I realized the problem: inside
ads_do_search_all_args(), if the first call to ads_do_paged_search_args(), the
proper error status is returned. And in this case no account is returned, so
this scenario is OK.

                But, if the execution is already inside the loop to get all the
accounts doing several calls to ads_do_paged_search_args(), and one of these
calls times out, the status returned is from the *first* call, so success. This
causes net_ads_search() to interpret the return from ads_do_search_retry() as
success and print all the accounts returned, but it’s only a subset.
 
                This patch attached fixes this behavior: it returns the proper
error code on ads_do_search_all_args(). I also modified net_ads_search() to
dump all the replies found so far anyway, and then return -1 in case of an
error.

                Just another piece of information: something very similar must
happen on “net ads user” calls (and others). We have had this problem in the
past, several times, and never got to the bottom of the issue: it returned only
a subset of the accounts, but the result from the command was not an error. I
always thought it returned cached information, or something. Since (now I know
that) net ads user also uses the LDAP search mechanism, we could even apply the
same “fix” to other commands.
Comment 1 Jeremy Allison 2014-01-18 00:32:14 UTC
I'll upload the patch shortly once I've got it into master.

Jeremy.
Comment 2 Jeremy Allison 2014-01-24 19:06:33 UTC
Created attachment 9616 [details]
git-am fix that went into master.

Applies cleanly to 4.0.next, 4.1.next.

Jeremy.
Comment 3 Jeremy Allison 2014-02-24 21:40:31 UTC
Comment on attachment 9616 [details]
git-am fix that went into master.

Sigh. This got dropped :-(.
Adding more reviewers - already gone into master.

Jeremy.
Comment 4 David Disseldorp 2014-02-25 10:04:33 UTC
Comment on attachment 9616 [details]
git-am fix that went into master.

The code change looks good, but the commit is missing the authors signed-off-by tag.
Comment 5 Jeremy Allison 2014-02-25 19:39:06 UTC
Pinged Daniel to get his signed-off-by.

Jeremy.
Comment 6 Jeremy Allison 2014-02-25 20:24:19 UTC
Created attachment 9723 [details]
Updated with Daniel's signed-off-by.
Comment 7 David Disseldorp 2014-02-26 09:53:06 UTC
Comment on attachment 9723 [details]
Updated with Daniel's signed-off-by.

LGTM.
Comment 8 Jeremy Allison 2014-02-26 17:09:20 UTC
Re-assigning to Karolin for inclusion in 4.0.next, 4.1.next.

Jeremy.
Comment 9 Karolin Seeger 2014-03-10 15:14:02 UTC
Pushed to autobuild-v4-1-test and autobuild-v4-0-test.
Comment 10 Karolin Seeger 2014-03-25 09:30:05 UTC
(In reply to comment #9)
> Pushed to autobuild-v4-1-test and autobuild-v4-0-test.

Pushed to v4-1-test and v4-0-test.
Closing out bug report.

Thanks!