Bug 13607 - Should winbind also wait for network-online?
Summary: Should winbind also wait for network-online?
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 4.8.4
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-06 18:30 UTC by Andreas Hasenack
Modified: 2022-06-13 10:23 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Hasenack 2018-09-06 18:30:23 UTC
A scenario was brought to my attention (https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1789097) where one can run winbind without smbd or nmbd, and that is when using only the libnss_wins.so.2 NSS module to do name resolution.

When there is no wins server specified in smb.conf, winbind seems to not detect changes in the network interfaces. If winbind starts up before the network is ready, it stays running, but won't "recover" if a network interface gets an IP later on. Name resolution via libnss_wins doesn't work.

If, however, there is a "wins server = x.x.x.x" entry in smb.conf, then even if winbind starts up before the network is available, name resolution (via wins this time, not broadcast) will work as soon as the network comes up.

I've seen commits to add network-online.target to the systemd service files for smb (a3d248f284eb2e5f4fe886310e481b28c9f1c392) and, before that, for nmb and samba (0e571054a61e9de69190ae023199d1670e097e88). Should the same be done to winbind's service file?


Steps to reproduce:
a) small smb.conf with no wins server entry. Something like:
[global]
        log file = /var/log/samba/log.%m
        logging = file
        map to guest = Bad User
        max log size = 1000
        obey pam restrictions = Yes
        pam password change = Yes
        panic action = /usr/share/samba/panic-action %d
        passwd chat = *Enter\snew\s*\spassword:* %n\n *Retype\snew\s*\spassword:* %n\n *password\supdated\ssuccessfully* .
        passwd program = /usr/bin/passwd %u
        server role = standalone server
        server string = %h server (Samba, Ubuntu)
        unix password sync = Yes
        usershare allow guests = Yes
        idmap config * : backend = tdb
        debug level = 4

b) do not install or enable the smbd, samba or nmbd services, as they have the network-online target already and winbind has a config to start after nmbd (if it exists)

c) make sure the wins nss module is installed. Then change the hosts line in /etc/nsswitch.conf to this:
hosts:          files wins dns

d) get console access, if this is a remote machine or a vm

e) Remove the IP from the interface, essentially breaking the network

f) restart winbind

g) confirm name resolution via broadcast is broken:
ping -c 1 <somenetbiosname>

h) add the IP back to the interface (for example, dhclient <nic>)

i) confirm name resolution via broadcast is still broken:
ping -c 1 <somenetbiosname>

If your smb.conf has a "wins server" directive pointing at a wins server in your network, then step (i) will work without restarting winbind.
Comment 1 Volker Lendecke 2018-09-07 09:44:06 UTC
We call a wins server dead for 10 minutes. Can you try a "net cache flush" between h) and i)?
Comment 2 Andreas Hasenack 2018-09-10 13:35:47 UTC
Note that when a wins server is specified in smb.conf, this problem does not happen.

So without a wins server in smb.conf, i.e., using only broadcast name resolution, calling "net cache flush" has no visible effect.

Please note the sequence of events:
- NIC has no IP
- winbind starts
- name resolution via broadcast fails (expected)
- NIC gets an IP
- name resolution via broadcast still fails (unexpected)

Here, in this state, the only thing that "fixes" it is a winbind restart. "net cache flush" had no effect.
Comment 3 Volker Lendecke 2018-09-10 13:51:34 UTC
Next try: Can you send a SIGHUP after the network started? This should trigger re-scanning the kernel for network interfaces. If that works, it should be possible to add a netlink listener that triggers the scan whenever interfaces come and go.
Comment 4 Andreas Hasenack 2018-09-10 14:26:44 UTC
That worked. After sending HUP to the parent process, the broadcast name resolution succeeded:

root@bionic-wins:~# ip a show dev ens3
2: ens3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:f1:92:89 brd ff:ff:ff:ff:ff:ff

root@bionic-wins:~# ping -c 1 bionic-smb2
ping: bionic-smb2: Temporary failure in name resolution

root@bionic-wins:~# dhclient ens3

root@bionic-wins:~# ps fxaw|grep winbind
 2088 pts/1    S+     0:00      \_ grep --color=auto winbind
 1509 ?        Ss     0:00 /usr/sbin/winbindd --foreground --no-process-group
 1532 ?        S      0:00  \_ /usr/sbin/winbindd --foreground --no-process-group

root@bionic-wins:~# ping -c 1 bionic-smb2
ping: bionic-smb2: Temporary failure in name resolution

root@bionic-wins:~# kill -HUP 1509

root@bionic-wins:~# ping -c 1 bionic-smb2
PING bionic-smb2 (192.168.122.104) 56(84) bytes of data.
64 bytes from bionic-smb2 (192.168.122.104): icmp_seq=1 ttl=64 time=0.833 ms

--- bionic-smb2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.833/0.833/0.833/0.000 ms
Comment 5 Christian Ehrhardt 2022-06-13 10:23:28 UTC
Hi,
Andreas confirmed back then that the SIGHUB triggered rescan will resolve the issue.

Was there anything added to later versions as a follow on to this or did this fall through the cracks?