Bug 7836 - A newly added printer isn't visbile to clients
Summary: A newly added printer isn't visbile to clients
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.5
Classification: Unclassified
Component: Printing (show other bugs)
Version: 3.5.6
Hardware: Other Linux
: P3 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks: 7803
  Show dependency treegraph
 
Reported: 2010-12-01 16:31 UTC by Lars Müller
Modified: 2011-05-23 18:41 UTC (History)
3 users (show)

See Also:


Attachments
Initial smbd log without the dummy0 printer (3.4.3) (750.39 KB, text/plain)
2010-12-03 14:52 UTC, Lars Müller
no flags Details
smbd log after the printer dummy0 got added (3.4.3) (4.44 KB, text/plain)
2010-12-03 14:52 UTC, Lars Müller
no flags Details
after first SIGHUP (3.4.3) (71.81 KB, text/plain)
2010-12-03 15:03 UTC, Lars Müller
no flags Details
after 2nd SIGHUP (3.4.3) (709.58 KB, text/plain)
2010-12-03 15:04 UTC, Lars Müller
no flags Details
Initial smbd log without the dummy0 printer (3.5.6) (298.64 KB, text/plain)
2010-12-05 13:04 UTC, Lars Müller
no flags Details
smbd log after the printer dummy0 got added (3.5.6) (213.10 KB, text/plain)
2010-12-05 13:06 UTC, Lars Müller
no flags Details
after first SIGHUP (3.5.6) (225.90 KB, text/plain)
2010-12-05 13:07 UTC, Lars Müller
no flags Details
after 2nd SIGHUP (3.5.6) (225.90 KB, text/plain)
2010-12-05 13:08 UTC, Lars Müller
no flags Details
after 3rd SIGHUP (3.5.6) (239.09 KB, text/plain)
2010-12-05 13:09 UTC, Lars Müller
no flags Details
after kill of the running and new launch of smbd (3.5.6) (520.46 KB, text/plain)
2010-12-05 13:11 UTC, Lars Müller
no flags Details
Fix race in cups async printer services reload. (6.99 KB, patch)
2010-12-20 02:48 UTC, Lars Müller
lars: review-
Details
reload printers test script (1.34 KB, text/plain)
2010-12-27 12:17 UTC, David Disseldorp
no flags Details
Patch set to make pcap cache changes visible after async refactor (33.43 KB, patch)
2011-01-06 09:09 UTC, Lars Müller
no flags Details
bso7836_fix_pcap_races_rb2.patch (31.58 KB, patch)
2011-01-07 13:12 UTC, David Disseldorp
no flags Details
bso7836_34t.patch (11.67 KB, patch)
2011-01-10 09:30 UTC, David Disseldorp
asn: review+
Details
bso7836_35t.patch (11.67 KB, patch)
2011-01-10 09:30 UTC, David Disseldorp
asn: review+
Details
bso7836_34t-0001-s3-printing-remove-pcap_cache_loaded-asserts.patch (1.62 KB, patch)
2011-01-17 13:28 UTC, David Disseldorp
gd: review+
asn: review+
Details
bso7836_35t-0001-s3-printing-remove-pcap_cache_loaded-asserts.patch (1.62 KB, patch)
2011-01-17 13:29 UTC, David Disseldorp
gd: review+
asn: review+
Details
3-5-test.patch (12.67 KB, patch)
2011-05-23 13:06 UTC, Guenther Deschner
asn: review+
gd: review+
Details
v3-4-test.patch (12.67 KB, patch)
2011-05-23 13:08 UTC, Guenther Deschner
asn: review+
gd: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lars Müller 2010-12-01 16:31:58 UTC
This issue applies to 3.4, 3.5, and 3.6.

The used printing system is CUPS.

Known way to work around the issue: send a SIGHUP twice to smbd

To reproduce you have to add a new printer via the lpadmin command

   lpadmin -p dummy0 -E -v /dev/null

Use "lpstat -v | grep dummy" to check if the printer was added.

Next call "smbclient -N -L localhost | grep dummy" to see if we got the printer.

From the logs we see cups_cache_reload_async() gets called and the intended read and write callbacks are executed for the printer in question.

From the log we see also pcap_cache_add_specific() gets called.
Comment 1 Lars Müller 2010-12-03 11:02:11 UTC
The function chain lp_servicenumber() -> add_a_service() -> hash_a_service() -> lp_add_printer() isn't triggered.

But we see these functions being called in the case a second SIGHUP is sent.

And without lp_add_printer() being called our new printer isn't available.

Now the next question is why isn't it triggered already before?

Therefore I like to understand what's causing the trigger or how this is expected to work.
Comment 2 Lars Müller 2010-12-03 14:52:09 UTC
Created attachment 6104 [details]
Initial smbd log without the dummy0 printer (3.4.3)
Comment 3 Lars Müller 2010-12-03 14:52:50 UTC
Created attachment 6105 [details]
smbd log after the printer dummy0 got added (3.4.3)
Comment 4 Lars Müller 2010-12-03 15:03:31 UTC
Created attachment 6106 [details]
after first SIGHUP (3.4.3)
Comment 5 Lars Müller 2010-12-03 15:04:19 UTC
Created attachment 6107 [details]
after 2nd SIGHUP (3.4.3)
Comment 6 Lars Müller 2010-12-05 12:59:39 UTC
For 3.5.6 this looks even worse.  Even a third SIGHUP doesn't make the dummy0 printer visible.

Only killing the running smbd and launching a new one showed the new printer.

Stacked log files of the 3.5.6 case follow.
Comment 7 Lars Müller 2010-12-05 13:04:52 UTC
Created attachment 6112 [details]
Initial smbd log without the dummy0 printer (3.5.6)
Comment 8 Lars Müller 2010-12-05 13:06:29 UTC
Created attachment 6113 [details]
smbd log after the printer dummy0 got added (3.5.6)
Comment 9 Lars Müller 2010-12-05 13:07:45 UTC
Created attachment 6114 [details]
after first SIGHUP (3.5.6)
Comment 10 Lars Müller 2010-12-05 13:08:48 UTC
Created attachment 6115 [details]
after 2nd SIGHUP (3.5.6)
Comment 11 Lars Müller 2010-12-05 13:09:44 UTC
Created attachment 6116 [details]
after 3rd SIGHUP (3.5.6)
Comment 12 Lars Müller 2010-12-05 13:11:52 UTC
Created attachment 6117 [details]
after kill of the running and new launch of smbd (3.5.6)

Now we see the dummy0 printer after a smbclient -L
Comment 13 Lars Müller 2010-12-20 02:48:01 UTC
Created attachment 6154 [details]
Fix race in cups async printer services reload.

Call post_cache_fill_fn() after cups_cache_reload() succeeded.

Patch written in pair programming with David Disseldorp <ddiss at suse dot de>
Comment 14 Lars Müller 2010-12-20 02:51:30 UTC
BTW adding the PID to the log file via setting "debug pid = Yes" had been quite useful.

Why aren't we doing this by default?  In particular since we're using async code this wide spread nowadays?
Comment 15 David Disseldorp 2010-12-21 08:31:33 UTC
Hmm, looks like the attached patch is not a simple rebase to master, an
additional race condition is introduced with commit
7022554 - s3-printing: Use printer list tdb in pcap.

Prior to commit 7022554 each smbd retained it's own pcap cache,
therefore ordering of events between smbd processes during pcap
update was not important. A two process example:
1) smbd1 receives HUP
2) smbd1 forks child1 to obtain cups printer info
3) smbd1 reloads printer shares prior to child1 completion (stale pcap)
4) smbd2 receives HUP
5) smbd2 forks child to obtain cups printer info
6) smbd2 reloads printer shares prior to child2 completion (stale pcap)
7) child1 completion, smbd1 pcap is updated
8) child2 completion, smbd2 pcap is updated

The simple fix for the above race condition was to have smbd only
reload printer shares following child completion, such that:
1) smbd1 receives HUP
2) smbd1 forks child1 to obtain cups printer info
3) smbd2 receives HUP
4) smbd2 forks child to obtain cups printer info
5) child1 completion, smbd1 pcap is updated
6) smbd1 reloads printer shares after child1 completion (fresh pcap)
7) child2 completion, smbd2 pcap is updated
8) smbd2 reloads printer shares after child2 completion (fresh pcap)



Since commit 7022554, smbds share a pcap cache (printer_list.tdb),
therefore ordering of events between smbd1 and smbd2 *is* important. A
two process example:
1) smbd1 receives HUP
2) smbd1 checks whether pcap needs refresh, it does
3) smbd1 marks pcap as refreshed
4) smbd1 forks child1 to obtain cups printer info
5) smbd1 reloads printer shares prior to child1 completion (stale pcap)
6) smbd2 receives HUP
7) smbd2 checks whether pcap needs refresh, it does not (due to step 3)
8) smbd2 reloads printer shares prior to child1 completion (stale pcap)
9) child1 completion, pcap is updated

In this case both smbd1 and smbd2 are reliant on the pcap update
performed on child1 completion. The fix used in 3.4 is suitable for
ensuring smbd1 obtains fresh pcap data for print shares, however
determining when smbd2 should reload printer shares is not so straight
forward.
Comment 16 David Disseldorp 2010-12-22 09:23:17 UTC
(In reply to comment #15)
> Since commit 7022554, smbds share a pcap cache (printer_list.tdb),
> therefore ordering of events between smbd1 and smbd2 *is* important. A
> two process example:
> 1) smbd1 receives HUP
> 2) smbd1 checks whether pcap needs refresh, it does
> 3) smbd1 marks pcap as refreshed
> 4) smbd1 forks child1 to obtain cups printer info
> 5) smbd1 reloads printer shares prior to child1 completion (stale pcap)
> 6) smbd2 receives HUP
> 7) smbd2 checks whether pcap needs refresh, it does not (due to step 3)
> 8) smbd2 reloads printer shares prior to child1 completion (stale pcap)
> 9) child1 completion, pcap is updated
> 
> In this case both smbd1 and smbd2 are reliant on the pcap update
> performed on child1 completion. The fix used in 3.4 is suitable for
> ensuring smbd1 obtains fresh pcap data for print shares, however
> determining when smbd2 should reload printer shares is not so straight
> forward.
> 

Thinking about this a little more, it may be worth moving the pcap cache reload code to the parent (listener) smbd process such that:
- the parent smbd adds a pcap_reload event handler which gets kicked on SIGHUP, smbcontrol reload and pcap cache time expiry
- the pcap_reload event handler reloads printer_list.tdb from the pcap backend and emits MSG_PCAP_CHANGED on completion
- child smbds register a message handler for MSG_PCAP_CHANGED which reads printer_list.tdb and updates printer services accordingly

This wouldn't be a small change, but would have the following advantages from the current architecture:
- removal of the race condition outlined in comment 15
- only one process requests printer information from the pcap backend
- printer_list.tdb has only a single writer

Does this sound sane?
Comment 17 David Disseldorp 2010-12-27 12:17:33 UTC
Created attachment 6170 [details]
reload printers test script
Comment 18 Lars Müller 2010-12-29 08:02:54 UTC
Comment on attachment 6154 [details]
Fix race in cups async printer services reload.

This version of the patch got superseded.
Comment 19 Lars Müller 2011-01-06 09:09:56 UTC
Created attachment 6186 [details]
Patch set to make pcap cache changes visible after async refactor

This patch set consists of:

0001-s3-printing-reload-shares-after-pcap-cache-fill.patch
0002-s3-printing-Initiate-pcap-reload-from-parent-smbd.patch
0003-s3-printing-remove-old-entries-in-pcap_cache_replace.patch
0004-s3-printing-remove-printer_list_need_refresh.patch
0005-s3-printing-use-time_mono-in-housekeeping-functions.patch

and is taken from oss.sgi.com/ddiss/samba branch bso7836_fix_pcap_races
Comment 20 Lars Müller 2011-01-06 09:22:02 UTC
Comment on attachment 6154 [details]
Fix race in cups async printer services reload.

Obsoleted by attachment #6186 [details]
Comment 21 Lars Müller 2011-01-07 07:20:09 UTC
This issue is a release show stopper for 3.6.0.
Comment 22 Jeremy Allison 2011-01-07 11:14:34 UTC
I know. I'm planning to do the review today (I have a cold/flu :-).
Jeremy.
Comment 23 David Disseldorp 2011-01-07 13:03:23 UTC
(In reply to comment #22)
> I know. I'm planning to do the review today (I have a cold/flu :-).

Thanks Jeremy. I've rebased against master and incorporated changes following feedback from bj@SerNet.DE and asn@samba.org. New branch is:

git://oss.sgi.com/ddiss/samba bso7836_fix_pcap_races_rb2

Get well soon :-)
Comment 24 David Disseldorp 2011-01-07 13:12:56 UTC
Created attachment 6188 [details]
bso7836_fix_pcap_races_rb2.patch
Comment 25 Lars Müller 2011-01-07 17:25:00 UTC
Comment on attachment 6186 [details]
Patch set to make pcap cache changes visible after async refactor

This patch got obsoleted by attachment #6188 [details]
Comment 26 David Disseldorp 2011-01-10 09:25:47 UTC
The fix for 3.4 and 3.5 is less invasive, as smbd processes maintain separate printcap caches rather than sharing a common printer_list.tdb. See:

git://oss.sgi.com/ddiss/samba bso7836_34t
git://oss.sgi.com/ddiss/samba bso7836_35t

Both branches include our change to reload shares after pcap cache fill, as well as a further commit which adds a parent smbd housekeeping function.
Comment 27 David Disseldorp 2011-01-10 09:30:00 UTC
Created attachment 6190 [details]
bso7836_34t.patch
Comment 28 David Disseldorp 2011-01-10 09:30:47 UTC
Created attachment 6191 [details]
bso7836_35t.patch
Comment 29 Andreas Schneider 2011-01-12 10:26:48 UTC
Comment on attachment 6190 [details]
bso7836_34t.patch

The patch looks fine for me.
Comment 30 Andreas Schneider 2011-01-12 10:28:01 UTC
Comment on attachment 6191 [details]
bso7836_35t.patch

The patch looks good.
Comment 31 Lars Müller 2011-01-14 06:30:43 UTC
Karolin, as both patches got a positive reviewe please merge attachment #6190 [details] into the 3.4 and #6191 into the 3.5 branch.
Comment 32 Karolin Seeger 2011-01-15 10:22:46 UTC
(In reply to comment #31)
> Karolin, as both patches got a positive reviewe please merge attachment #6190 [details]
> into the 3.4 and #6191 into the 3.5 branch.
> 

Done.
Closing out bug report.

Thanks!
Comment 33 Karolin Seeger 2011-01-15 12:07:37 UTC
Reverted both patches for now, because "s3-printing: reload shares after pcap cache fill" broke 'make test' on my system (v3-4-test).

Could you please investigate again?

Thanks!
Comment 34 Karolin Seeger 2011-01-15 12:38:22 UTC
(In reply to comment #33)
> Reverted both patches for now, because "s3-printing: reload shares after pcap
> cache fill" broke 'make test' on my system (v3-4-test).

v3-5-test seems to be affected also.

Comment 35 David Disseldorp 2011-01-17 04:28:53 UTC
(In reply to comment #33)
> Reverted both patches for now, because "s3-printing: reload shares after pcap
> cache fill" broke 'make test' on my system (v3-4-test).
> 
> Could you please investigate again?

Will do, thanks Karolin.
Comment 36 David Disseldorp 2011-01-17 13:20:58 UTC
two asserts introduced by the 'reload shares after pcap cache fill' change should be removed in the 3.4 and 3.5 branches:

    pcap_cache_loaded() assertions were added to the (re)load_printers()
    functions, to ensure the caller had called pcap_cache_reload() prior to
    reloading printer shares.
    
    The problem is, pcap_cache_loaded() returns false if the the pcap_cache
    contains no printer entries. i.e. pcap_cache_reload() has run but not
    detected any printers.
    
    Remove these assertions, correct call ordering is already enforced.

These assertions do not cause problems in master, as pcap_cache_loaded() is determined by the existence of a time stamp in printer_list.tdb. The time stamp is added by printer_list_mark_reload() in which gets called in pcap_cache_reload().

I've pushed a new commit to both bso7836_34t and bso7836_35t branches which removes the asserts.
Comment 37 David Disseldorp 2011-01-17 13:28:58 UTC
Created attachment 6208 [details]
bso7836_34t-0001-s3-printing-remove-pcap_cache_loaded-asserts.patch
Comment 38 David Disseldorp 2011-01-17 13:29:43 UTC
Created attachment 6209 [details]
bso7836_35t-0001-s3-printing-remove-pcap_cache_loaded-asserts.patch
Comment 39 Karolin Seeger 2011-01-19 08:08:59 UTC
Re-assigning to Günther for patch review.
Comment 40 Guenther Deschner 2011-05-11 15:26:50 UTC
David, I'm confused, is it now a 3.5/3.4 issue or something for master/3.6 ?
Comment 41 David Disseldorp 2011-05-11 16:02:15 UTC
(In reply to comment #40)
> David, I'm confused, is it now a 3.5/3.4 issue or something for master/3.6 ?

The fix for this bug made it into master and 3.6.

The 3.5 and 3.4 changes are different, as each process maintains it's own pcap cache rather than the shared printer_list.tdb in 3.6+. Those changes have not yet made it into the tree.
Comment 42 Guenther Deschner 2011-05-11 16:18:25 UTC
ok, thanks.
Comment 43 Guenther Deschner 2011-05-18 14:19:51 UTC
Comment on attachment 6208 [details]
bso7836_34t-0001-s3-printing-remove-pcap_cache_loaded-asserts.patch

looks good
Comment 44 Guenther Deschner 2011-05-18 14:19:59 UTC
Comment on attachment 6209 [details]
bso7836_35t-0001-s3-printing-remove-pcap_cache_loaded-asserts.patch

looks good
Comment 45 Guenther Deschner 2011-05-18 14:21:50 UTC
karolin, please add to 3.4 and 3.5. thanks.
Comment 46 Andreas Schneider 2011-05-18 14:40:07 UTC
Comment on attachment 6209 [details]
bso7836_35t-0001-s3-printing-remove-pcap_cache_loaded-asserts.patch

Looks good
Comment 47 Karolin Seeger 2011-05-18 17:57:11 UTC
(In reply to comment #45)
> karolin, please add to 3.4 and 3.5. thanks.

Both patches do not apply to the current branches:

user@host:/data/git/samba/v3-5-test> git am bso7836_35t-0001-s3-printing-remove-pcap_cache_loaded-asserts.patch
Applying: s3-printing: remove pcap_cache_loaded asserts
error: patch failed: source3/printing/load.c:57
error: source3/printing/load.c: patch does not apply
error: patch failed: source3/smbd/server.c:760
error: source3/smbd/server.c: patch does not apply
Patch failed at 0001 s3-printing: remove pcap_cache_loaded asserts

user@host:/data/git/samba/v3-4-test> git am bso7836_34t-0001-s3-printing-remove-pcap_cache_loaded-asserts.patch
Applying: s3-printing: remove pcap_cache_loaded asserts
error: patch failed: source3/printing/load.c:57
error: source3/printing/load.c: patch does not apply
error: patch failed: source3/smbd/server.c:715
error: source3/smbd/server.c: patch does not apply
Patch failed at 0001 s3-printing: remove pcap_cache_loaded asserts

Have I missed something?
Comment 48 David Disseldorp 2011-05-20 10:47:17 UTC
Hi Karolin,

(In reply to comment #47)
>
> Both patches do not apply to the current branches:

The patches were on top of the initial changes which were reverted (rightly so) following make test failure.

One option is to revert the revert commits, then pull in the "remove pcap_cache_loaded asserts" patch. i.e in v3.5-test:

ddiss@lutze:~/isms/samba/source3> git log --pretty=oneline origin/v3-5-test |grep Revert|head -n 2
b6268f507fa3276c2ef22c58bad400a3fed48cd9 Revert "s3-printing: update parent smbd pcap cache"
e4579eab7fe3eab7a5209e6de74e6fd2f53099d0 Revert "s3-printing: reload shares after pcap cache fill"

> git revert e4579eab7fe3eab7a5209e6de74e6fd2f53099d0
> git revert b6268f507fa3276c2ef22c58bad400a3fed48cd9

I've pushed two new branches:

  git://oss.sgi.com/ddiss/samba bso7836_35t_rr
and
  git://oss.sgi.com/ddiss/samba bso7836_34t_rr

  Revert "Revert "s3-printing: reload shares after pcap cache fill""
  Revert "Revert "s3-printing: update parent smbd pcap cache""
  s3-printing: remove pcap_cache_loaded asserts
Comment 49 Guenther Deschner 2011-05-23 13:06:56 UTC
Created attachment 6474 [details]
3-5-test.patch
Comment 50 Guenther Deschner 2011-05-23 13:08:38 UTC
Created attachment 6475 [details]
v3-4-test.patch
Comment 51 Guenther Deschner 2011-05-23 13:22:07 UTC
Karolin, please add to 3.4 and 3.5, thanks.
Comment 52 Karolin Seeger 2011-05-23 18:41:07 UTC
(In reply to comment #51)
> Karolin, please add to 3.4 and 3.5, thanks.

Pushed, closing out bug report.

Thanks!