Bug 13992 - SAMBA RPC share error in SAMBA Stretch 4.5.16 and Buster 4.9.5
Summary: SAMBA RPC share error in SAMBA Stretch 4.5.16 and Buster 4.9.5
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: unspecified
Hardware: x64 Linux
: P2 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-14 06:38 UTC by Nick Paterakis
Modified: 2019-12-09 02:22 UTC (History)
2 users (show)

See Also:


Attachments
Error Log File (922.64 KB, text/plain)
2019-06-14 06:38 UTC, Nick Paterakis
no flags Details
samba debug output (15.24 KB, text/plain)
2019-06-27 06:17 UTC, Paul Wise
no flags Details
extra samba config file (172 bytes, text/plain)
2019-06-27 06:17 UTC, Paul Wise
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Paterakis 2019-06-14 06:38:14 UTC
Created attachment 15249 [details]
Error Log File

Hi there we recently upgraded some client appliances on Linux from Debian Jesse to Debian Stretch and encountered an issue in the process with SAMBA. We discovered that net rpc share allowedusers from Samba in Debian stretch (4.5.16) and buster (4.9.5) only returns info about which users are allowed to mount the first share on the server.

When running the command this is the significant error message:

> $ net usersidlist | net rpc share allowedusers -d10 -U user@password
> -S10.10.10.1 |& grep secdesc
> Could not query secdesc for share sharename
 
The difference in the protocol trace is that old samba sends one NetShareGetInfo RPC call for each share but the new one sends only the first share returned by the NetShareEnumAll RPC call, which returns the full list of shares in both cases.

When debugging this in gdb, it looks like dcerpc_srvsvc_NetShareGetInfo is called for each of the shares but returns 0xC000020C (NT_STATUS_CONNECTION_DISCONNECTED) for everything except the first share.

Also attaching a log file from our local test harness where we've replicated the issue.

The workaround, which works, is to downgrade to Jesse-SAMBA.
Comment 1 Louis 2019-06-26 10:02:30 UTC
Hai, 

I would start with update your smb.conf, you running version is not correct. 

Your on debian, can you run this script, that give us a better overview on you settings. 

wget https://raw.githubusercontent.com/thctlo/samba4/master/samba-collect-debug-info.sh |bash    
and post the output, anonimize it where needed. 

For the smb.conf, i suggest, go here: 
https://wiki.samba.org/index.php/Setting_up_Samba_as_a_Domain_Member 
Start reading as of : Setting up a Basic smb.conf File 

And since you are upgrading from jessie ( samba 4.2 ) read also : 
https://wiki.samba.org/index.php/Updating_Samba 

I'm not saying, this is not a bug, but we want correct settings before we say its a bug.
Comment 2 Paul Wise 2019-06-27 06:17:27 UTC
Created attachment 15266 [details]
samba debug output
Comment 3 Paul Wise 2019-06-27 06:17:46 UTC
Created attachment 15267 [details]
extra samba config file
Comment 4 Paul Wise 2019-06-27 06:23:36 UTC
I've attached the debug output from the script.

The script missed the include directive we use smb.conf so I attached that too.

The script seems to think that winbindd is not running but it is:

    $ systemctl status winbind
    ● winbind.service - Samba Winbind Daemon
       Loaded: loaded (/lib/systemd/system/winbind.service; enabled; vendor preset: enabled)
       Active: active (running) since Thu 2019-06-13 08:05:55 ACST; 2 weeks 0 days ago
         Docs: man:winbindd(8)
               man:samba(7)
               man:smb.conf(5)
     Main PID: 822 (winbindd)
       Status: "winbindd: ready to serve connections..."
        Tasks: 5 (limit: 4915)
       CGroup: /system.slice/winbind.service
               ├─ 822 /usr/sbin/winbindd
               ├─1023 /usr/sbin/winbindd
               ├─3152 /usr/sbin/winbindd
               ├─3153 /usr/sbin/winbindd
               └─3155 /usr/sbin/winbindd

As far as I can tell we have configured samba and joined the domain correctly.
Comment 5 Nick Paterakis 2019-07-11 00:44:27 UTC
Hi SAMBA team - are you able to update this ticket and assess the additional diag information provided for a solution please?
Comment 6 Nick Paterakis 2019-08-23 02:00:52 UTC
Hi - we've been patiently awaiting an updated to this ticket and followed the last set of diagnostice steps to provide additional info as requested. Can we now please escalate this for action?
Comment 7 Andrew Bartlett 2019-08-23 02:42:02 UTC
(In reply to Nick Paterakis from comment #6)

The escalation path is here:
https://www.samba.org/samba/support/
Comment 8 Nick Paterakis 2019-08-23 03:31:36 UTC
Would prefer to maintain resolution within the bug raised - we provided diag information as request and are keen for the dev team to assess and respond as to when a solution wil be rendered?
Comment 9 Andrew Bartlett 2019-08-27 00:41:08 UTC
(In reply to Nick Paterakis from comment #8)
To be clear, the Samba Bugzilla is a bug tracking system, not a resource allocation system.

However, to help progress this:

Nothing in the info provided so far suggests a trivial fix, but perhaps more detail could be found if you turn up the log level.

My best suggestion, if you have the time, is to bisect between the two versions using git bisect and determine which patch broke the behaviour. 

That may help make the issue clear, which may in turn make this an attractive bug for a developer to take on.

Otherwise, my previous comment holds, Samba is significantly supported by those who support the companies that employ Samba developers.
Comment 10 Paul Wise 2019-08-27 01:11:44 UTC
Turning up the log level doesn't add any more messages.

I'll take a look at bisecting the issue.
Comment 11 Paul Wise 2019-08-27 04:43:59 UTC
The results of the bisect are that the following three commits are involved:

The first one (dc4a6a980a1) fails in the same way as our initial tests.

The second one (c939552b7e5) segfaults after printing the first user.

The third one (0d0d9820531) succeeds just like samba 4.2 from Debian jessie.

I'm inclined to think that the second one is responsible for this issue, since it also has a failure after the first user, but a failure of a different kind; SEGFAULT instead of connection errors.

commit dc4a6a980a16f7effb2b422ad6332936f457546c (refs/bisect/bad)
Author: Jeremy Allison <jra@samba.org>
Date:   Tue Jun 13 16:56:48 2017 -0700

    s3: libsmb: Correctly save and restore connection tcon in smbclient, smbcacls and smbtorture3.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=12831
    
    Signed-off-by: Jeremy Allison <jra@samba.org>
    Reviewed-by: Richard Sharpe <realrichardsharpe@gmail.com>
    (cherry picked from commit bd31d538a26bb21cbb53986a6105204da4392e2d)

commit c939552b7e52396ab78419ae0706759ff3ca30a3 (refs/bisect/skip-c939552b7e52396ab78419ae0706759ff3ca30a3)
Author: Jeremy Allison <jra@samba.org>
Date:   Tue Jun 13 16:37:39 2017 -0700

    s3: libsmb: Correctly do lifecycle management on cli->smb1.tcon and cli->smb2.tcon.
    
    Treat them identically. Create them on demand after for a tcon call,
    and delete them on a tdis call.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=12831
    
    Signed-off-by: Jeremy Allison <jra@samba.org>
    Reviewed-by: Richard Sharpe <realrichardsharpe@gmail.com>
    (cherry picked from commit 50f50256aa8805921c42d0f9f2f8f89d06d9bd93)

commit 0d0d9820531aca17a5300f4e4eb47f07a999aaca (refs/bisect/good-0d0d9820531aca17a5300f4e4eb47f07a999aaca)
Author: Jeremy Allison <jra@samba.org>
Date:   Tue Jun 13 16:36:54 2017 -0700

    s3: libsmb: Fix cli_state_has_tcon() to cope with SMB2 connections.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=12831
    
    Signed-off-by: Jeremy Allison <jra@samba.org>
    Reviewed-by: Richard Sharpe <realrichardsharpe@gmail.com>
    (cherry picked from commit c9178ed9cc69b9089292db28ac1a0b7a0519bc2c)
Comment 12 Paul Wise 2019-08-27 04:52:01 UTC
These commits are between 4.5.10 and 4.5.11.
Comment 13 Andrew Bartlett 2019-08-27 06:23:58 UTC
(In reply to Paul Wise from comment #12)
Can you still reproduce on current master.  I have a feeling I've seen this fixed already, but can't find it trivially.  

If not a tour of the git history for similar issues fixes elsewhere recently *might* give a clue as to the fix.
Comment 14 Paul Wise 2019-08-27 07:27:46 UTC
I'm still able to reproduce the issue with git master (b406b928242).
Comment 15 Paul Wise 2019-08-27 07:50:37 UTC
I've take a look at recent history but I didn't see anything related.
Comment 16 Paul Wise 2019-08-28 05:43:17 UTC
If I check out the first commit (dc4a6a980a1) and then revert the second one (c939552b7e5) then I get the correct results, so I think the issue is definitely caused by the second commit (c939552b7e5).
Comment 17 Paul Wise 2019-08-28 05:55:32 UTC
Looking at the first commit, I see that in it, the show_userlist function is missing a call to cli_state_restore_tcon in the error path while other functions do not miss that call. Testing if that changes anything.
Comment 18 Paul Wise 2019-08-28 05:56:01 UTC
That didn't fix the issue.
Comment 19 Paul Wise 2019-08-28 08:49:17 UTC
After some debugging:

Tt appears that the NT_STATUS_CONNECTION_DISCONNECTED value originates from rpccli_bh_raw_call_send when it detects that the stream is not connected. The stream is marked as not connected when rpccli_bh_is_connected calls rpccli_is_connected, which indirectly calls rpc_tstream_is_connected, which calls tstream_pending_bytes, which indirectly calls tstream_smbXcli_np_pending_bytes which calls smbXcli_conn_is_connected, which notices that the cli_nps->conn pointer is NULL and returns false to indicate the stream is not connected.

The conn pointer appears to become NULL in tstream_smbXcli_np_ref_destructor, which is called from TALLOC_FREE(cli->smb2.tcon) in cli_tree_connect_send called from cli_tree_connect. Commenting out that line somehow fixes the issue, but that clearly isn't the correct fix.
Comment 20 Paul Wise 2019-08-31 08:54:29 UTC
In summary:

The issue is that in cli_tree_connect_send, TALLOC_FREE(cli->smb2.tcon) clears the connection but smbXcli_tcon_create(cli) on the next line doesn't set the connection to something other than NULL.

The workaround for this issue is to pass one share on the command-line instead of passing no shares or passing more than one share.
Comment 21 Paul Wise 2019-09-08 05:52:42 UTC
I tried saving and restoring the conn pointer before/after the problematic free but that didn't help.
Comment 22 Nick Paterakis 2019-12-09 02:22:52 UTC
Hi Samba team - it's been a little while since we provided you with tech/debug detail affirming the bug condition - are we likely to see a resolution soon? This has blocked us from moving our clients to a new platfrom?