Jerry, I have recompiled 3.0.23b, with full debugging info, on SuSE Linux Enterprise Server 8. I have stepped through a 'net ads info' command within GDB, and now know where the code is bombing-out. Well, we knew that before as the logfile told us; anyway... [2006/08/09 15:34:45, 5] libsmb/namequery.c:resolve_ads(1037) resolve_hosts: Attempting to resolve DC's for xxx.yyy.zzz.com using DNS [2006/08/09 15:34:45, 4] libads/dns.c:ads_dns_lookup_srv(286) ads_dns_lookup_srv: 21 records returned in the answer section. [2006/08/09 15:34:45, 1] libads/dns.c:ads_dns_parse_rr_srv(173) ads_dns_parse_rr_srv: Failed to parse RR record [2006/08/09 15:34:45, 1] libads/dns.c:ads_dns_lookup_srv(313) ads_dns_lookup_srv: Failed to parse answer record! It seems that we correctly get 21 domain controller records returned for my global AD domain. When trawling through the returned answers in ads_dns_lookup_srv() we eventually seem to overflow some memory bounds? Anyway, we start here in dns.c:ads_dns_lookup_srv(): 303 /* now we are at the answer section */ 304 305 for ( rrnum=0; rrnum<answer_count; rrnum++ ) { 306 if ( !ads_dns_parse_rr_srv( ctx, buffer, buffer+resp_len, &p, &dcs[rrnum] ) ) { 307 DEBUG(1,("ads_dns_lookup_srv: Failed to parse answer record!\n")); 308 return NT_STATUS_UNSUCCESSFUL; 309 } 310 } We manage to loop over ten or so DNS answers, before function ads_dns_parse_rr_srv() calls ads_dns_parse_rr() and fails. The latter function is returning 'False' here: 115 /* pull the name from the answer */ 116 117 namelen = dn_expand( start, end, p, hostname, sizeof(hostname) ); 118 if ( namelen < 0 ) { 119 return -1; 120 } 121 p += namelen; 122 rr->hostname = talloc_strdup( ctx, hostname ); 123 124 /* check that we have space remaining */ 125 126 if ( PTR_DIFF(p+10, end) > 0 ) 127 return False; I will dig deeper to try find the root cause. This function does seem to operate correctly on Solaris 8 with all 21 DC parsed correctly...
I think that I may have found the problem. On Solaris, the res_query() function can (and does) return an answer length bigger than the buffer supplied; and you're supposed to retry with a bigger buffer. Your code does just that. But, on my SLES server the res_query() is returning 512 -- the exact same size as the buffer supplied; not bigger as per Solaris. I wonder if, in that case, one should retry with a larger buffer? Trouble is, your code bases the new buffer size on the value previously returned by res_query() and in this Linux case we have no hint as to how much we've potentially overflowed by...
This fixes the issue for me! # diff -urN libads/dns.c libads/dns.c.orig --- libads/dns.c 2006-08-11 14:50:14.000000000 +1000 +++ libads/dns.c.orig 2006-08-11 14:45:53.000000000 +1000 @@ -230,7 +230,7 @@ NTSTATUS ads_dns_lookup_srv( TALLOC_CTX *ctx, const char *name, struct dns_rr_srv **dclist, int *numdcs ) { uint8 *buffer = NULL; - size_t buf_len = 0; + size_t buf_len; int resp_len = NS_PACKETSZ; struct dns_rr_srv *dcs = NULL; int query_count, answer_count, auth_count, additional_count; @@ -249,7 +249,7 @@ if ( buffer ) TALLOC_FREE( buffer ); - buf_len += ( NS_PACKETSZ * sizeof(uint8) ); + buf_len = resp_len * sizeof(uint8); if ( (buffer = TALLOC_ARRAY(ctx, uint8, buf_len)) == NULL ) { DEBUG(0,("ads_dns_lookup_srv: talloc() failed!\n")); @@ -261,7 +261,7 @@ TALLOC_FREE( buffer ); return NT_STATUS_UNSUCCESSFUL; } - } while ( buf_len <= resp_len && resp_len < MAX_DNS_PACKET_SIZE ); + } while ( buf_len < resp_len && resp_len < MAX_DNS_PACKET_SIZE ); p = buffer;
This is tad cleaner? And I have the files in the correct order this time!... # diff -urN libads/dns.c.orig libads/dns.c --- libads/dns.c.orig 2006-08-11 14:45:53.000000000 +1000 +++ libads/dns.c 2006-08-11 15:13:28.000000000 +1000 @@ -230,8 +230,8 @@ NTSTATUS ads_dns_lookup_srv( TALLOC_CTX *ctx, const char *name, struct dns_rr_srv **dclist, int *numdcs ) { uint8 *buffer = NULL; - size_t buf_len; - int resp_len = NS_PACKETSZ; + size_t buf_len = NS_PACKETSZ; + int resp_len; struct dns_rr_srv *dcs = NULL; int query_count, answer_count, auth_count, additional_count; uint8 *p = buffer; @@ -246,10 +246,10 @@ of large replies */ do { - if ( buffer ) + if ( buffer ) { TALLOC_FREE( buffer ); - - buf_len = resp_len * sizeof(uint8); + buf_len += ( NS_PACKETSZ * sizeof(uint8) ); + } if ( (buffer = TALLOC_ARRAY(ctx, uint8, buf_len)) == NULL ) { DEBUG(0,("ads_dns_lookup_srv: talloc() failed!\n")); @@ -261,7 +261,7 @@ TALLOC_FREE( buffer ); return NT_STATUS_UNSUCCESSFUL; } - } while ( buf_len < resp_len && resp_len < MAX_DNS_PACKET_SIZE ); + } while ( buf_len <= resp_len && resp_len < MAX_DNS_PACKET_SIZE ); p = buffer;
I just looked at the source code RPM for SLES8's GLIBC... Interesting comment in the 'resolv/res_send.c' file: if (truncating) { /* * Flush rest of answer so connection stays in synch. */ anhp->tc = 1; len = resplen - anssiz; while (len != 0) { char junk[PACKETSZ]; n = read(statp->_vcsock, junk, (len > sizeof junk) ? sizeof junk : len); if (n > 0) len -= n; else break; } /* Dirty fix - avoid read buffer overruns in applications * that naively assume the length returned by res_* * is always less than or equal the answer buffer size. * Simply truncating the answer here surely beats fixing * all calls of res_* in all applications */ resplen = anssiz; } This is 'glibc-2.2.5', and I'd suggest that the 'Dirty Fix' above is to blame for the DNS oddities described previously. I guess that someone at SuSE (of old) may have had some part to play?!... Indeed, there's a file called 'glibc-2.2-resolver2.patch' included in the source RPM. Guess the contents?!... --- resolv/res_send.c Fri Jan 5 03:35:42 2001 +++ resolv/res_send.c Thu Sep 26 09:59:59 2002 @@ -805,6 +805,13 @@ else break; } + /* Dirty fix - avoid read buffer overruns in applications + * that naively assume the length returned by res_* + * is always less than or equal the answer buffer size. + * Simply truncating the answer here surely beats fixing + * all calls of res_* in all applications + */ + resplen = anssiz; } Anyway, I'll leave it your good self to decide on the proper course of action. I'm loath to suggest a patch to Samba for what is essentially a SuSE specific oddity? Or maybe other's do the same thing? I doubt it? Anyway, it's a fairly ancient very of Linux too...
Created attachment 2503
just an "ancient suse specific bug" :-) ... closing out for us. Thanks for your research anyhow,