Bug 8910 - resolve_ads() code can return zero addresses and miss valid DC IP addresses.
Summary: resolve_ads() code can return zero addresses and miss valid DC IP addresses.
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: Winbind (show other bugs)
Version: unspecified
Hardware: All All
: P5 regression
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
: 8893 (view as bug list)
Depends on:
Blocks: 8595
  Show dependency treegraph
Reported: 2012-04-30 20:13 UTC by Jeremy Allison
Modified: 2020-01-07 23:31 UTC (History)
4 users (show)

See Also:

git-am fix for 3.6.x. (12.20 KB, patch)
2012-04-30 22:10 UTC, Jeremy Allison
vl: review+
git-am fix for 3.5.x. (11.37 KB, patch)
2012-04-30 23:38 UTC, Jeremy Allison
vl: review+
Fix for 3.6. (704 bytes, patch)
2012-05-30 18:27 UTC, Ira Cooper
jra: review-
Modified fix. (506 bytes, patch)
2012-05-30 18:36 UTC, Jeremy Allison
no flags Details
Additional fix for 3.6.next. (1.15 KB, patch)
2012-05-30 21:34 UTC, Jeremy Allison
ira: review+

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Allison 2012-04-30 20:13:32 UTC
Found here @ Google. Bugs in the resolve_ads() code can cause this function to return null addresses, and also to return a limited list of valid IP addresses given a DC name.

Fixes for 3.5.x and 3.6.x to follow, once the patch has gone through autobuild into master.

Comment 1 Jeremy Allison 2012-04-30 22:10:56 UTC
Created attachment 7510 [details]
git-am fix for 3.6.x.

Michael, please review for 3.6.next. This is an important one as it can cause invalid IP addresses to be used when looking up DC names. Thanks,
Comment 2 Jeremy Allison 2012-04-30 23:38:44 UTC
Created attachment 7511 [details]
git-am fix for 3.5.x.
Comment 3 Jeremy Allison 2012-05-02 15:58:13 UTC
Reporter confirms this patch fixes the issue on site.

Comment 4 Jeremy Allison 2012-05-02 15:58:50 UTC
Comment on attachment 7510 [details]
git-am fix for 3.6.x.

Trying to get some attention for this fix :-).
Comment 5 Volker Lendecke 2012-05-09 14:06:56 UTC
Jeremy, I WILL ack it once you gave me the test vectors I asked for offline.
Comment 6 Jeremy Allison 2012-05-09 14:36:30 UTC
Understood - I just didn't want to forget this for release.
Comment 7 Jeremy Allison 2012-05-09 20:52:45 UTC
Info from original reporter (names and IP addresses changed to protect internal network):

  resolve_ads: Attempting to resolve DCs for XX.YY.FOO.BAR.COM using DNS
[2012/04/27 16:03:11.805623,  4] libads/dns.c:432(ads_dns_lookup_srv)
  ads_dns_lookup_srv: 14 records returned in the answer section.
[2012/04/27 16:03:16.367389,  5] libsmb/namecache.c:106(namecache_store)
  namecache_store: storing 17 addresses for XX.YY.FOO.BAR.COM#1c:,,,, 1001:1111:1111:1000:0:1111:1111:1111],,,,,,,,
[2012/04/27 16:03:16.368613,  4] libsmb/namequery.c:2054(get_dc_list)
  get_dc_list: returning 17 ip addresses in an ordered list
[2012/04/27 16:03:16.368652,  4] libsmb/namequery.c:2055(get_dc_list)
  get_dc_list: :0 :0 1001:1111:1111:1000:0:1111:1111:1111:389 :0 :0

This shows the position of the added zeros by the flawed code. I can construct a test vector from this.
Comment 8 Jeremy Allison 2012-05-17 15:49:45 UTC
*** Bug 8893 has been marked as a duplicate of this bug. ***
Comment 9 Jeremy Allison 2012-05-21 21:43:33 UTC
FYI. Patch to test remove_duplicate_addrs2() pushed to autobuild in master. It processes a set of test vectors interspaced with duplicates and zero addresses to a single list.

I'll post the git ref once it's passed autobuild.

Comment 10 Jeremy Allison 2012-05-21 23:47:08 UTC
Ok, git commit c531aac27c433e0eb068a8a4f0a6c90cad2e44fa in master adds the torture test LOCAL-remove_duplicate_addrs2.

Comment 11 Jeremy Allison 2012-05-21 23:47:54 UTC
Comment on attachment 7510 [details]
git-am fix for 3.6.x.

Ok, now the test is in master - can I get an ack for this fix for 3.6.next please ?

Comment 12 Volker Lendecke 2012-05-23 11:57:30 UTC
I fully believe you that the new code is right. Sorry to benit-picking, but for me the old code also survives that test. My wish would have been to get a test vector that fails against the old code and that succeeds against the new code. If you are telling me this is impossible to get, I will just ack this code,just to get this off my back.
Comment 13 Jeremy Allison 2012-05-23 19:24:37 UTC
Hmmm. I don't have a copy of the exact return values from the internal network that show the old code failing but new code succeeding. I'm not saying I can't get it, I'd have to get them to put the old version back with additional debugs and re-write the test case, but at this point (and as the new code is working and has been demonstrated as such on 2 different sites) I think it's good to go.

Just my 2 cents.

Comment 14 Jeremy Allison 2012-05-23 19:56:38 UTC
I could look at the logic of the old and new code and construct a test vector that explicitly will fail against the old code based on the understanding I gained in writing the new. Would that do ?

Comment 15 Jeremy Allison 2012-05-25 16:14:44 UTC
Karolin please push for 3.5.next and 3.6.next - thanks !
Comment 16 Karolin Seeger 2012-05-26 20:17:55 UTC
Pushed to v3-5-test and v3-6-test.
Closing out bug report.

Comment 17 Ira Cooper 2012-05-30 14:37:30 UTC
We've got a problem with this "fix" locally, net join can't find domain controllers.  :(

79658018ec73755bbd495963f977af61b4497bfb is the bad commit, for us.  I'm looking at why.  A reproduction of the other case would help me make sure I don't "break it the other way."

At the least, count me as - on the review ;).

Comment 18 Volker Lendecke 2012-05-30 14:44:37 UTC
Is it possible that you add DEBUGs to the code to extract the before/after arrays of IPs, just to have another test vector?
Comment 19 Jeremy Allison 2012-05-30 16:00:04 UTC
Thanks for testing Ira. All other people reporting the fix worked were on Linux. I'll examine as a priority.
Comment 20 Jeremy Allison 2012-05-30 16:08:04 UTC
The loop unrolling fix in 79658018ec73755bbd495963f977af61b4497bfb is the one most likely to have differences between Linux and Solaris. This isn't the code that modifies the removal of extra zero addresses (thank goodness) so it isn't a problem in the memcpy/array manipulation code - this is new functionality when we only have a name but no IP address.

Ira and I are working on this issue on IRC. More details when we have them.

Comment 21 Ira Cooper 2012-05-30 18:27:47 UTC
Created attachment 7609 [details]
Fix for 3.6.

This fixes the loop initialization for 3.6.
Comment 22 Jeremy Allison 2012-05-30 18:36:59 UTC
Created attachment 7610 [details]
Modified fix.

I think this is what it should be...

Comment 23 Jeremy Allison 2012-05-30 18:37:51 UTC
Comment on attachment 7609 [details]
Fix for 3.6.

Not quite enough. Modified fix below.
Comment 24 Jeremy Allison 2012-05-30 18:43:34 UTC
Ah - now I know why this passed tests @ Google and @ Dell. They were using the git-am fix for 3.5.x, which correctly uses a for() loop construct here. I screwed this up when forward porting to 3.6.x (and master).

Comment 25 Jeremy Allison 2012-05-30 18:45:41 UTC
Just goes to show there's no alternative to good old fashioned testing :-).

The torture test couldn't have caught that as this problem depends on specific returns from a DNS server, which we don't do in make test. Plus the core array modification code was correct, it was iterating over the array that was in error.

Comment 26 Jeremy Allison 2012-05-30 21:34:36 UTC
Created attachment 7611 [details]
Additional fix for 3.6.next.

Additional fix that went into master because I screwed up the forward port of the 3.5.x fix (which was tested on multiple sites) to the v3.6.x fix.

Ira has now tested this. Ira please review. This additional fix is needed for v3-6-test but *NOT* for v3-5-test.


Comment 27 Jeremy Allison 2012-05-30 21:37:14 UTC
Re-assigning to Karolin for inclusion in 3.6.next.

Karolin, please apply attachment 7611 [details] to v3-6-test, but not to v3-5-test.

Thanks, and sorry for the mistake !

Comment 28 Karolin Seeger 2012-05-31 18:40:32 UTC
(In reply to comment #27)
> Re-assigning to Karolin for inclusion in 3.6.next.
> Karolin, please apply attachment 7611 [details] to v3-6-test, but not to v3-5-test.
> Thanks, and sorry for the mistake !
> Jeremy.

Pushed to v3-6-test.
Closing out bug report.