Hello, Recently, ethereal started crashing on startup. I traced it down to a memory dereference error in the samba wins NSS libraries: one argument of strcmp is bogus. Program received signal SIGSEGV, Segmentation fault. 0x4207fa78 in strcmp () from /lib/i686/libc.so.6 (gdb) bt #0 0x4207fa78 in strcmp () from /lib/i686/libc.so.6 #1 0x40466a8e in ms_fnmatch_lanman1 (pattern=0xbfffeff0 "eth0", string=0xffffffff <Address 0xffffffff out of bounds>) at lib/ms_fnmatch.c:117 #2 0x40466b98 in ms_fnmatch (pattern=0xbfffeff0 "eth0", string=0xffffffff <Address 0xffffffff out of bounds>) at lib/ms_fnmatch.c:145 #3 0x404431d2 in interpret_interface (token=0xbfffeff0 "eth0") at lib/interface.c:110 #4 0x40443853 in load_interfaces () at lib/interface.c:212 #5 0x40412b2f in nss_wins_init () at nsswitch/wins.c:89 #6 0x40412c14 in lookup_byname_backend (name=0x882ff78 "eng02", count=0xbffff164) at nsswitch/wins.c:124 #7 0x40412d8a in _nss_wins_gethostbyname_r (name=0x882ff78 "eng02", he=0x42132dc0, buffer=0x8830ee8 "\n", buflen=1024, errnop=0x42130b60, h_errnop=0xbffff1e8) at nsswitch/wins.c:287 #8 0x420f83f1 in gethostbyname_r@@GLIBC_2.1.2 () from /lib/i686/libc.so.6 #9 0x420f7dbd in gethostbyname () from /lib/i686/libc.so.6 #10 0x402f82b4 in XUnlockDisplay () from /usr/X11R6/lib/libX11.so.6 #11 0x402f956e in _X11TransConnect () from /usr/X11R6/lib/libX11.so.6 #12 0x402c06d6 in _X11TransConnectDisplay () from /usr/X11R6/lib/libX11.so.6 #13 0x402cf43b in XOpenDisplay () from /usr/X11R6/lib/libX11.so.6 #14 0x40241b13 in gdk_init_check () from /usr/lib/libgdk-1.2.so.0 #15 0x4019cca2 in gtk_init_check () from /usr/lib/libgtk-1.2.so.0 ---Type <return> to continue, or q <return> to quit---q I traced this further up the stack, and the bogus value comes from a failed call to memdup in load_interfaces(): (This is from the 3.0 version of Samba) http://samba.org/doxygen/samba/lib_2interface_8c-source.html 00184 /* probe the kernel for interfaces */ 00185 total_probed = get_interfaces(ifaces, MAX_INTERFACES); 00186 00187 if (total_probed > 0) { >>>>>>>>>>> 00188 probed_ifaces = memdup(ifaces, sizeof(ifaces [0])*total_probed); 00189 } 00190 00191 /* if we don't have a interfaces line then use all broadcast capable 00192 interfaces except loopback */ 00193 if (!ptr || !*ptr || !**ptr) { 00194 if (total_probed <= 0) { 00195 DEBUG(0,("ERROR: Could not determine network interfaces, you must use a interfaces config line\n")); 00196 exit(1); 00197 } 00198 for (i=0;i<total_probed;i++) { 00199 if (probed_ifaces[i].netmask.s_addr != allones_ip.s_addr && 00200 probed_ifaces[i].ip.s_addr != loopback_ip.s_addr) { The first problem is that the result of this allocation is not checked. (Someone else found this bug in 2001 but it was never fixed:) http://lists.samba.org/archive/samba-technical/2001-August/000485.html The second problem is that on my machine, ethereal is linked against the SNMP libraries, which also have a memdup() fn. You can see from below that something is causing samba to call the SNMP memdup instead of the lib/util.c memdup. Since the arguments are different, the call fails. Here are the startup libraries for ethereal. Note, libnss_wins.so does not appear here - the NSS routines bring them in later. I think might be why memcmp is coming from the SNMP libs in preference to libnss_wins.so. [root@p26 samba]# ldd /usr/sbin/ethereal libsnmp.so.0 => /usr/lib/libsnmp.so.0 (0x4002a000) libcrypto.so.2 => /lib/libcrypto.so.2 (0x40085000) libpcap.so.0.6.2 => /usr/lib/libpcap.so.0.6.2 (0x40148000) libgtk-1.2.so.0 => /usr/lib/libgtk-1.2.so.0 (0x40163000) libgdk-1.2.so.0 => /usr/lib/libgdk-1.2.so.0 (0x40291000) libgmodule-1.2.so.0 => /usr/lib/libgmodule-1.2.so.0 (0x402c6000) libglib-1.2.so.0 => /usr/lib/libglib-1.2.so.0 (0x402c9000) libdl.so.2 => /lib/libdl.so.2 (0x402ee000) libXi.so.6 => /usr/X11R6/lib/libXi.so.6 (0x402f1000) libXext.so.6 => /usr/X11R6/lib/libXext.so.6 (0x402f9000) libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x40306000) libm.so.6 => /lib/i686/libm.so.6 (0x403db000) libz.so.1 => /usr/lib/libz.so.1 (0x403fd000) libc.so.6 => /lib/i686/libc.so.6 (0x42000000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) [root@p26 samba]# p26 [~/rpms] nm /usr/lib/libsnmp.so.0 | grep memdup 0003b240 T memdup p26 [~/rpms] nm /lib/libnss_wins.so | grep memdup 00063254 T memdup 00063168 T smb_xmemdup 0006772c T talloc_memdup This computer is running Red Hat 7.2, and the version of Samba is 2.2.9 from Red Hat: "samba-2.2.7-3.7.3.src.rpm", but I believe the problem would exist on a more modern version of Linux, and with Samba v3. We use winbind etc to allow users to log in with their windows usernames. (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/EXTEL.COM.AU/mitch_davis/rpms/BUILD/ethereal- 0.9.16/ethereal Breakpoint 9, load_interfaces () at lib/interface.c:191 191 probed_ifaces = memdup(ifaces, sizeof(ifaces[0]) *total_probed); (gdb) s memdup (to=0xbfffe3e0, from=0x48 <Address 0x48 out of bounds>, size=3221221640) at /home/users/r/rs/rstory/src/net-snmp-5.0.7/snmplib/tools.c:210 210 /home/users/r/rs/rstory/src/net-snmp-5.0.7/snmplib/tools.c: No such file or directory. in /home/users/r/rs/rstory/src/net-snmp-5.0.7/snmplib/tools.c (gdb) Although I don't think it's the correct fix, I can make the problem go away with this patch: --- samba-2.2.7/source/include/proto.h.memdup Mon Jul 19 14:32:29 2004 +++ samba-2.2.7/source/include/proto.h Mon Jul 19 14:33:20 2004 @@ -552,7 +552,8 @@ void *smb_xmemdup(const void *p, size_t size); char *smb_xstrdup(const char *s); int smb_xvasprintf(char **ptr, const char *format, va_list ap); -void *memdup(void *p, size_t size); +void *samba_memdup(void *p, size_t size); +#define memdup(p, size) samba_memdup(p, size) char *myhostname(void); char *lock_path(char *name); char *pid_path(char *name); This disambiguates the memdup reference, without altering the calling code. Hope this helps, Mitch.
*** Bug 1540 has been marked as a duplicate of this bug. ***
I discussed this with a friend of mine who's in the OS division of a major vendor. He says that libnss_wins.so should not export so many symbols - it should only export the ones the NSS layer in libc expects. He gives this example: xxx@chook 1032> nm -D /usr/lib/libnss_dns.so | awk '$2=="T"' 0000000000002350 T _nss_dns_gethostbyaddr_r 0000000000001f60 T _nss_dns_gethostbyname2_r 0000000000003fa0 T _nss_dns_gethostbyname_r 0000000000004310 T _nss_dns_getnetbyaddr_r 0000000000004080 T _nss_dns_getnetbyname_r I had a look at the other NSS libs, and the pattern is repeated - certainly no chance of a namespace conflict. libnss_wins.so and libnss_winbind.so are the only libraries in the NSS set which export more than the necessary _nss_*_* symbols. I suggest that some jiggery-pokery is needed so that libnss_wins.so is generated with far fewer visible dynamic symbols, and that the names internal to libnss_wins.so need to be munged or scoped somehow so there can be no collision with some other library. (nm one of the other libraries and see the "t" symbols, as compared to "T") Mitch.
The latest version of Samba uses the -Bsymbolic option when linking libnss_winbind. I think this mitigates the problem somewhat.
Hmm, if I remember, the .spec file from the RedHat RPM *removes* the -Bsymbolic. Not sure what the reasoning is, but that may be the cause. :-(
please retest against 3.0.20a (the current SAMBA_3_0_RELEASE branch) which will publically be availebl next week.