Bug 5697 - nmbd spins in reload_interfaces when only loopback has an IPv4 address
Summary: nmbd spins in reload_interfaces when only loopback has an IPv4 address
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.2
Classification: Unclassified
Component: Nmbd (show other bugs)
Version: 3.2.1
Hardware: PPC Linux
: P3 normal
Target Milestone: ---
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-18 07:10 UTC by Ted Percival
Modified: 2008-08-20 08:28 UTC (History)
0 users

See Also:


Attachments
Patch (2.71 KB, patch)
2008-08-19 19:01 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ted Percival 2008-08-18 07:10:14 UTC
Using Debian samba package version 2:3.2.1-1.

When my machine is used somewhere where no network connection is available, within about a minute nmbd spins the CPU in reload_interfaces().

#0  0x10092f90 in load_interfaces ()
#1  0x100269ac in reload_interfaces ()
#2  0x1002802c in main ()

I think this is because I have several interfaces that are "UP" (according to ifconfig), but none have an IPv4 address.

To reproduce:
eth0 is unplugged (UP BROADCAST MULTICAST) but without an IPv4 or IPv6 address.
eth1 is associated to an open wireless network, with an IPv4 address and a Link-scope IPv6 address.

There are two other interfaces, lo and wmaster0. Their details are below, because I'm not sure that they're involved.

After the system has been in this state for a while (maybe a few minutes), run
`ip addr del dev eth1 10.0.0.15`. (This seems to be what NetworkManager does when I'm not in range of any network that it chooses to associate with.)

Then within about 60 seconds, nmbd will begin to spin. When run with debug level 3, it continually logs this block of messages:

[2008/08/18 21:59:16,  2] nmbd/nmbd.c:reload_interfaces(206)
  reload_interfaces: ignoring non IPv4 interface.
[2008/08/18 21:59:16,  2] nmbd/nmbd.c:reload_interfaces(222)
  reload_interfaces: Ignoring loopback interface 127.0.0.1
[2008/08/18 21:59:16,  2] nmbd/nmbd.c:reload_interfaces(206)
  reload_interfaces: ignoring non IPv4 interface.
[2008/08/18 21:59:16,  2] lib/interface.c:add_interface(334)
  added interface lo ip=::1 bcast=::1 netmask=ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
[2008/08/18 21:59:16,  2] lib/interface.c:add_interface(334)
  added interface lo ip=127.0.0.1 bcast=127.255.255.255 netmask=255.0.0.0
[2008/08/18 21:59:16,  2] lib/interface.c:add_interface(334)
  added interface eth1 ip=fe80::211:24ff:fec7:eadd%eth1 bcast=fe80::ffff:ffff:ffff:ffff%eth1 netmask=ffff:ffff:ffff:ffff::
[2008/08/18 21:59:16,  2] nmbd/nmbd.c:reload_interfaces(206)
  reload_interfaces: ignoring non IPv4 interface.
[2008/08/18 21:59:16,  2] nmbd/nmbd.c:reload_interfaces(222)
  reload_interfaces: Ignoring loopback interface 127.0.0.1
[2008/08/18 21:59:16,  2] nmbd/nmbd.c:reload_interfaces(206)
  reload_interfaces: ignoring non IPv4 interface.
[2008/08/18 21:59:16,  2] lib/interface.c:add_interface(334)
  added interface lo ip=::1 bcast=::1 netmask=ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
[2008/08/18 21:59:16,  2] lib/interface.c:add_interface(334)
  added interface lo ip=127.0.0.1 bcast=127.255.255.255 netmask=255.0.0.0
[2008/08/18 21:59:16,  2] lib/interface.c:add_interface(334)
  added interface eth1 ip=fe80::211:24ff:fec7:eadd%eth1 bcast=fe80::ffff:ffff:ffff:ffff%eth1 netmask=ffff:ffff:ffff:ffff::
[2008/08/18 21:59:16,  2] nmbd/nmbd.c:reload_interfaces(206)

The other two interfaces reported by `/sbin/ifconfig` (mentioned earlier) are:
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1

wmaster0  Link encap:UNSPEC  HWaddr 00-11-22-33-44-55-66-77-00-00-00-00-00-00-00-00  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

(I've omitted the counters for brevity and modified the wmaster0 MAC address).

I am using NetworkManager 0.6.6-2 for managing network connections on a machine with a wired NIC (eth0) and wireless NIC (eth1).

Using Linux 2.6.26-rc9.
Comment 1 Ted Percival 2008-08-18 08:16:09 UTC
Here is my smb.conf's interfaces line:
  interfaces = lo eth*

Looking through the code history and bug comments I noticed this infinite looping was introduced in the fix to bug #5267.
Comment 2 Jeremy Allison 2008-08-18 14:52:19 UTC
Do you ever see the message :

                        DEBUG(0,("reload_interfaces: "
                                "No subnets to listen to. Waiting..\n"));

in the log ?

From your description of your setup the loop here :

195         /* find any interfaces that need adding */
196         for (n=iface_count() - 1; n >= 0; n--) {

in nmbd/nmbd.c should exit with the variable subnetlist == NULL.
Thus it should wait 5 seconds and try again (call load_interfaces) until it gets an IPv4 interface. Do you see these 5 second pauses ?

Once your IPv4 address is deleted the code at lines : 248-279 should delete the subnet record for it, leaving subnetlist == NULL.

Can you attach to the process under a debugger and walk through this list showing the variable states please ?

Jeremy.
Comment 3 Ted Percival 2008-08-18 22:20:40 UTC
I looked through the code and discovered what's going on.

It only happens when a loopback interface (lo) is specified in the config file. Automatic interface detection in load_interfaces() excludes `lo` because it doesn't have the IFF_BROADCAST flag.

The subnet addition/removal code in reload_interfaces is fine. In particular the subnet *adding* checks that
  (a) only IPv4 addresses are used; and
  (b) Loopback addresses are not used

However down in
> /* We need to wait if there are no subnets... */

there is this code:
> while (iface_count_v4() == 0 && !got_sig_term) {
>   sleep(5);
>   load_interfaces();
> }

The problem being that iface_count_v4() does not exclude loopback interfaces (it includes loopback interfaces), so lo's 127.0.0.1 address makes iface_count_v4() return 1, the condition false, and the sleep(5) never executes. So long as there has not been a SIGTERM, the code gets down to
> goto try_again;
and we're in loopyville.
Comment 4 Jeremy Allison 2008-08-19 19:01:24 UTC
Created attachment 3495 [details]
Patch

This should fix it. Thanks a *lot* for your help and analysis on this - I've been trying to track this one down for ages and your work pinpointed it perfectly !
Jeremy.
Comment 5 Karolin Seeger 2008-08-20 08:28:45 UTC
Closing out bug report.
Please re-open if it is still an issue for you.

Thanks for reporting!