Bug 10720 - error: Unable to convert first SID
error: Unable to convert first SID
Status: RESOLVED FIXED
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Other
unspecified
x86 Linux
: P5 regression
: 4.2
Assigned To: Garming Sam
Samba QA Contact
:
: 10957 11062 (view as bug list)
Depends on:
Blocks: 11017
  Show dependency treegraph
 
Reported: 2014-07-16 15:19 UTC by Nick Semenkovich
Modified: 2015-01-20 11:26 UTC (History)
5 users (show)

See Also:


Attachments
smb.conf (1.52 KB, text/plain)
2014-07-17 15:37 UTC, Nick Semenkovich
no flags Details
50k lines, "log level=10", machine AIO15, domain=CORP (2.03 MB, text/plain)
2014-07-21 19:15 UTC, Nick Semenkovich
no flags Details
log. & log.w* (233.40 KB, application/gzip)
2014-07-22 15:47 UTC, Nick Semenkovich
no flags Details
Gencache dump (8.18 KB, text/plain)
2014-10-10 04:43 UTC, Garming Sam
no flags Details
Secrets.tdb dump (2.30 KB, application/octet-stream)
2014-10-10 04:45 UTC, Garming Sam
no flags Details
idmap.ldb dump (6.59 KB, text/plain)
2014-10-10 04:49 UTC, Garming Sam
no flags Details
Temporary fix for caching issue (1.88 KB, patch)
2014-10-17 03:50 UTC, Garming Sam
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Semenkovich 2014-07-16 15:19:53 UTC
In a relatively recent git build, samba throws a lot of "Unable to convert first SID ..." errors, and users are unable to connect to any samba shares (and encounter lots of other errors).

All clients are Windows 8.1 machines & this seems to occur intermittently (50% of the time?).

The error was introduced some time between:

9c19532 "s3-shadow-copy2: Protect against already converted names" <---- WORKING
and
085c7a7 "smbd: do_lock_cancel does not need "blr" anymore" <---- ERROR (90% sure)

It's also broken in the recent:

776c7ed "samlogon_cache: avoid overwriting info3->base.full_name.string." <---- ERROR


The error is:


[2014/07/16 10:01:31.089670,  0] ../source4/auth/unix_token.c:79(security_token_to_unix_token)
  Unable to convert first SID (S-1-5-21-1516742906-2559259289-1948011451-1114) in user token to a UID.  Conversion was returned as type 2, full token:
[2014/07/16 10:01:31.089759,  0] ../libcli/security/security_token.c:63(security_token_debug)
  Security token SIDs (7):
    SID[  0]: S-1-5-21-1516742906-2559259289-1948011451-1114
    SID[  1]: S-1-5-21-1516742906-2559259289-1948011451-513
    SID[  2]: S-1-1-0
    SID[  3]: S-1-5-2
    SID[  4]: S-1-5-11
    SID[  5]: S-1-5-32-545
    SID[  6]: S-1-5-32-554
   Privileges (0x          800000):
    Privilege[  0]: SeChangeNotifyPrivilege
   Rights (0x             400):
    Right[  0]: SeRemoteInteractiveLogonRight
[2014/07/16 10:01:31.126032,  0] ../source4/auth/unix_token.c:79(security_token_to_unix_token)
  Unable to convert first SID (S-1-5-21-1516742906-2559259289-1948011451-1114) in user token to a UID.  Conversion was returned as type 2, full token:
[2014/07/16 10:01:31.126104,  0] ../libcli/security/security_token.c:63(security_token_debug)
  Security token SIDs (7):
    SID[  0]: S-1-5-21-1516742906-2559259289-1948011451-1114
    SID[  1]: S-1-5-21-1516742906-2559259289-1948011451-513
    SID[  2]: S-1-1-0
    SID[  3]: S-1-5-2
    SID[  4]: S-1-5-11
    SID[  5]: S-1-5-32-545
    SID[  6]: S-1-5-32-554
   Privileges (0x          800000):
    Privilege[  0]: SeChangeNotifyPrivilege
   Rights (0x             400):
    Right[  0]: SeRemoteInteractiveLogonRight
[2014/07/16 10:01:31.226184,  0] ../source4/auth/unix_token.c:79(security_token_to_unix_token)
  Unable to convert first SID (S-1-5-21-1516742906-2559259289-1948011451-1114) in user token to a UID.  Conversion was returned as type 2, full token:
[2014/07/16 10:01:31.226261,  0] ../libcli/security/security_token.c:63(security_token_debug)
  Security token SIDs (7):
    SID[  0]: S-1-5-21-1516742906-2559259289-1948011451-1114
    SID[  1]: S-1-5-21-1516742906-2559259289-1948011451-513
    SID[  2]: S-1-1-0
    SID[  3]: S-1-5-2
    SID[  4]: S-1-5-11
    SID[  5]: S-1-5-32-545
    SID[  6]: S-1-5-32-554
   Privileges (0x          800000):
    Privilege[  0]: SeChangeNotifyPrivilege
   Rights (0x             400):
    Right[  0]: SeRemoteInteractiveLogonRight
[2014/07/16 10:01:53.038474,  2] ../source3/smbd/server.c:457(remove_child_pid)
  Could not find child 27639 -- ignoring
Comment 1 Nick Semenkovich 2014-07-17 15:29:40 UTC
Slowly working through a bisect here -- this makes samba >085c7a7 completely unusable for us.


$ git bisect start
# good: [9c195321345846195e465158f88dad5fd7699389] s3-shadow-copy2: Protect against already converted names
git bisect good 9c195321345846195e465158f88dad5fd7699389
# bad: [085c7a71fde37cb2a67a4c67673cd22f8acf0861] smbd: do_lock_cancel does not need "blr" anymore
git bisect bad 085c7a71fde37cb2a67a4c67673cd22f8acf0861
# good: [fbd418d2f666c3e2af31d34667b63c03fdca6ca6] make: Add gdbtestenv target.
git bisect good fbd418d2f666c3e2af31d34667b63c03fdca6ca6
# good: [95a55df021b3f112a18c64a5f5897182ae8b7df8] winbindd: Allow the AD-DC to call getdcname
git bisect good 95a55df021b3f112a18c64a5f5897182ae8b7df8
...
Comment 2 Volker Lendecke 2014-07-17 15:33:34 UTC
(In reply to comment #1)
> Slowly working through a bisect here -- this makes samba >085c7a7 completely
> unusable for us.
> 
> 
> $ git bisect start
> # good: [9c195321345846195e465158f88dad5fd7699389] s3-shadow-copy2: Protect
> against already converted names
> git bisect good 9c195321345846195e465158f88dad5fd7699389
> # bad: [085c7a71fde37cb2a67a4c67673cd22f8acf0861] smbd: do_lock_cancel does not
> need "blr" anymore
> git bisect bad 085c7a71fde37cb2a67a4c67673cd22f8acf0861
> # good: [fbd418d2f666c3e2af31d34667b63c03fdca6ca6] make: Add gdbtestenv target.
> git bisect good fbd418d2f666c3e2af31d34667b63c03fdca6ca6
> # good: [95a55df021b3f112a18c64a5f5897182ae8b7df8] winbindd: Allow the AD-DC to
> call getdcname
> git bisect good 95a55df021b3f112a18c64a5f5897182ae8b7df8
> ...

It is really 085c7a7???? Wow, I am really stunned that this commit breaks login.

Can you send a debug level 10 log and your smb.conf file?
Comment 3 Nick Semenkovich 2014-07-17 15:34:47 UTC
> > $ git bisect start
> > # good: [9c195321345846195e465158f88dad5fd7699389] s3-shadow-copy2: Protect
> > against already converted names
> > git bisect good 9c195321345846195e465158f88dad5fd7699389
> > # bad: [085c7a71fde37cb2a67a4c67673cd22f8acf0861] smbd: do_lock_cancel does not
> > need "blr" anymore
> > git bisect bad 085c7a71fde37cb2a67a4c67673cd22f8acf0861
> > # good: [fbd418d2f666c3e2af31d34667b63c03fdca6ca6] make: Add gdbtestenv target.
> > git bisect good fbd418d2f666c3e2af31d34667b63c03fdca6ca6
> > # good: [95a55df021b3f112a18c64a5f5897182ae8b7df8] winbindd: Allow the AD-DC to
> > call getdcname
> > git bisect good 95a55df021b3f112a18c64a5f5897182ae8b7df8
> > ...
> 
> It is really 085c7a7???? Wow, I am really stunned that this commit breaks
> login.
> 
> Can you send a debug level 10 log and your smb.conf file?

Still bisecting -- so I believe it's somewhere between:

95a55df021b3f112a18c64a5f5897182ae8b7df8 winbindd: Allow the AD-DC to <--- GOOD
085c7a71fde37cb2a67a4c67673cd22f8acf0861 smbd: do_lock_cancel does not <--- BAD
Comment 4 Nick Semenkovich 2014-07-17 15:37:03 UTC
Created attachment 10119 [details]
smb.conf
Comment 5 Volker Lendecke 2014-07-18 07:57:05 UTC
Any news? Do you have a full debug level 10 log of that failure?
Comment 6 Nick Semenkovich 2014-07-18 14:36:53 UTC
(In reply to comment #5)
> Any news? Do you have a full debug level 10 log of that failure?

Still bisecting. A bit of a slow process as machine is used during much of the week.

(Currently running/bisecting @ 27ef395)
Comment 7 Nick Semenkovich 2014-07-21 19:05:01 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Any news? Do you have a full debug level 10 log of that failure?
> 
> Still bisecting. A bit of a slow process as machine is used during much of the
> week.
> 
> (Currently running/bisecting @ 27ef395)

Bisected more, I think I have a level 10 log now (uploading soon).


The error is *much* more intermittent than I thought, which makes this a bit tricky. I *think* the commit is now between:

[GOOD] 5d069a04 selftest: Make the wbinfo userinfo tests work properly...
and
[ BAD] 27ef3957 smbd: Remove an outdated comment


There are four commits in that range:

https://git.samba.org/?p=samba.git;a=commitdiff;h=f3710320c
https://git.samba.org/?p=samba.git;a=commitdiff;h=af7f88721
https://git.samba.org/?p=samba.git;a=commitdiff;h=da3a798
https://git.samba.org/?p=samba.git;a=commitdiff;h=0b77cd96
Comment 8 Nick Semenkovich 2014-07-21 19:15:07 UTC
Created attachment 10137 [details]
50k lines, "log level=10", machine AIO15, domain=CORP
Comment 9 Volker Lendecke 2014-07-22 09:51:15 UTC
(In reply to comment #8)
> Created attachment 10137 [details]
> 50k lines, "log level=10", machine AIO15, domain=CORP

So this is very likely due to the change to winbindd. For some reason the SID -1133 is being seen as converted to a group ID instead of a user ID. Sorry to send you into another round, but for this we would need to see all the winbind log files (log.w*) with debug level 10. Also, "net cache list | grep 1133" could reveal some valuable information.

Thanks a lot!

Volker
Comment 10 Nick Semenkovich 2014-07-22 15:42:04 UTC
> So this is very likely due to the change to winbindd. For some reason the SID
> -1133 is being seen as converted to a group ID instead of a user ID. Sorry to
> send you into another round, but for this we would need to see all the winbind
> log files (log.w*) with debug level 10. Also, "net cache list | grep 1133"
> could reveal some valuable information.
> 
> Thanks a lot!
> 
> Volker

The *current* `net cache list` (running on the *GOOD* 5d069a0):

Key: IDMAP/SID2XID/S-1-5-21-1516742906-2559259289-1948011451-1133        Timeout: Mon Jul 28 14:05:39 2014       Value: 3000053:B  
Key: IDMAP/UID2SID/3000053       Timeout: Mon Jul 28 14:05:39 2014       Value: S-1-5-21-1516742906-2559259289-1948011451-1133  
Key: IDMAP/GID2SID/3000053       Timeout: Mon Jul 28 14:05:39 2014       Value: S-1-5-21-1516742906-2559259289-1948011451-1133


I'll attach the log.w* files (they don't show much) and the "log." file.
Comment 11 Nick Semenkovich 2014-07-22 15:47:29 UTC
Created attachment 10145 [details]
log. & log.w*
Comment 12 Volker Lendecke 2014-07-24 08:50:40 UTC
(In reply to comment #11)
> Created attachment 10145 [details]
> log. & log.w*

There isn't really much, right. If you get into this situation, is it repeatable at all? This to me seems a bit like a blocker.
Comment 13 Nick Semenkovich 2014-07-24 16:16:01 UTC
> There isn't really much, right. If you get into this situation, is it
> repeatable at all? This to me seems a bit like a blocker.

I haven't had a chance to debug more this week -- but OTOH:

- This is definitely intermittent
- When a machine is affected, it seems to be persistently broken (users can't connect to the AD controller or any shares), though users can logon (presumably with cached credentials)
- Affected machines remain broken across multiple reboots
- Affected machines sometimes recover after I revert Samba versions -- and then aren't affected later on. e.g. Computer1 breaks on HEAD -> revert to 085c7a7 & Computer1 works -> move back to HEAD and Computer1 still works fine.


(This network has 20 ~identical fully patched Win 8.1 machines.)

Let me know if there's any additional data I can provide.
Comment 14 Volker Lendecke 2014-07-25 08:31:20 UTC
This is not a client issue, it is definitely a server issue. If you have a client in such a situation and at that point the problem is repeatable, we need to capture the full call chain. smbd with debug level 10 log, net cache list, winbind with all debug level 10 logs. That's our only chance, sorry.
Comment 15 Nick Semenkovich 2014-07-25 18:36:36 UTC
(In reply to comment #14)
> This is not a client issue, it is definitely a server issue. If you have a
> client in such a situation and at that point the problem is repeatable, we need
> to capture the full call chain. smbd with debug level 10 log, net cache list,
> winbind with all debug level 10 logs. That's our only chance, sorry.

So this would be `net cache list` plus all the log files w/ a smb.conf of:

        [global]
        log level = 10
        log file = /ramcache/log.%U

Sound about right? Need anything else?
Comment 16 Volker Lendecke 2014-07-28 14:00:18 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > This is not a client issue, it is definitely a server issue. If you have a
> > client in such a situation and at that point the problem is repeatable, we need
> > to capture the full call chain. smbd with debug level 10 log, net cache list,
> > winbind with all debug level 10 logs. That's our only chance, sorry.
> 
> So this would be `net cache list` plus all the log files w/ a smb.conf of:
> 
>         [global]
>         log level = 10
>         log file = /ramcache/log.%U
> 
> Sound about right? Need anything else?

I'm not entirely sure where winbind in your configuration puts its log files, but having all those is essential too. You can bump up the debug level of a running process with "smbcontrol <pid> debug 10". That might be essential here because a winbind restart might fix the problem, not sure.
Comment 17 Nick Semenkovich 2014-09-14 18:51:36 UTC
Finally had a chance to get back to this.

This is still occurring on the latest HEAD at:

7bc2e2d0d4a6706769a436798aa29d18a8f7c221
messaging_dgm: Don't expose the messaging_dgm_context master
author	Volker Lendecke <vl@samba.org>	
Wed, 10 Sep 2014 14:13:18 +0000 (16:13 +0200)


Attaching level 10 logs from 7bc2e2d0d4a + net cache list
Comment 18 Nick Semenkovich 2014-09-14 19:08:37 UTC
Logs are very large at level 10. Full .tar.gz at: http://web.mit.edu/semenko/Public/bug-10720.tar.gz

Client machine: aio4
Username: lisa
Domain controller: runway
Comment 19 Nick Semenkovich 2014-10-02 15:34:07 UTC
I'm still seeing this as of a git head from yesterday (6c05cd3e89) and it makes samba entirely unusable, with a percentage of clients completely failing to access any domain services ("Security ID structure invalid") and the same samba logs as below.

(We remain stuck right before f3710320cef | https://git.samba.org/?p=samba.git;a=commit;h=f3710320cef475ebac561882c8fdaf8e51c8b7c3)


I've added some additional logs of this (with varying log levels 5-10, since some are a bit cleaner to browse) here:
http://web.mit.edu/semenko/Public/varying-log-levels-bug-10720.tar.gz



In addition to the "Unable to convert first SID" error, I also see some attempts to map SID S-0-0, which results in "Could not convert sid S-0-0: NT_STATUS_NONE_MAPPED" while clients are trying map a shared/mapped network drive called "DROPBOX".


Happy to debug in any other way, test patches, etc.
Comment 20 Volker Lendecke 2014-10-07 10:04:04 UTC
Andrew, the commit Nick points at is around the winbind3 work you did. Please take a look.
Comment 21 ChuanQi 2014-10-09 23:31:21 UTC
Hi,

I have encountered the same problem too. Is this problem only happen to the latest samba 4 by updating to the latest version through GIT update? Anyone try to downgrade the version to official release version and solve this issues?

This issues randomly has users and groups security tab corrupted. Is there any work around?

Regards,

CQ
Comment 22 Garming Sam 2014-10-10 04:43:17 UTC
Created attachment 10337 [details]
Gencache dump

I've managed to coincidentally run into what seems to be the same issue. For me it's not the first SID though, and it's for enterprise admins. 

It seems to be an issue with the cache. My configuration was Samba as the AD-DC and attached is the dump of the cache and to follow are secrets.tdb and idmap.tdb. 

The suspects in question are the uid to sid functions.
Comment 23 Garming Sam 2014-10-10 04:45:16 UTC
Created attachment 10338 [details]
Secrets.tdb dump
Comment 24 Garming Sam 2014-10-10 04:49:03 UTC
Created attachment 10339 [details]
idmap.ldb dump

Line 42 contains the offending SID for enterprise admins. It's marked here as ID_TYPE_BOTH.
Comment 25 Nick Semenkovich 2014-10-16 15:20:25 UTC
This remains an issue testing with yesterday's head:

> b8fce2c65b
libcli/smb: use lib/crypto SHA512 functions, do not depend on heimdal.
author	Günther Deschner <gd@samba.org>	
Mon, 13 Oct 2014 18:01:05 +0000 (20:01 +0200)
committer	Günther Deschner <gd@samba.org>	
Tue, 14 Oct 2014 11:13:08 +0000 (13:13 +0200)
Comment 26 Garming Sam 2014-10-17 03:50:17 UTC
Created attachment 10353 [details]
Temporary fix for caching issue

This is just a patch which currently disables the use of the idmap cache in some cases. This isn't really ideal generally, but if Nick wouldn't mind seeing if this fixes the issue, it would be much appreciated. 

Some basic testing on my own machine indicates that the issue no longer occurs with this patch in place. 

Nick, you'll have to remove gencache.tdb and gencache_notrans.tdb before starting up Samba and checking it works.
Comment 27 Nick Semenkovich 2014-10-17 04:42:34 UTC
Patch applied cleanly to head. Tentatively appears to be working -- will know for sure tomorrow. Thanks Garming!
Comment 28 Nick Semenkovich 2014-10-18 03:59:28 UTC
Looks like this is stable, applied atop > b8fce2c65b.
Comment 29 Andrew Bartlett 2014-11-25 00:56:03 UTC
*** Bug 10957 has been marked as a duplicate of this bug. ***
Comment 30 Andrew Bartlett 2014-11-25 00:58:18 UTC
To clear the invalid cache entry (with or without the patch), use:

net cache flush

You can see the invalid cache entry with:

net cache list

(look for SIDs that belong to groups, with type U).

We expect to have an improved patch shortly, that avoids us squashing ID_TYPE_BOTH mappings to only UID or GID mappings.
Comment 31 Karolin Seeger 2014-11-27 10:58:12 UTC
Is this a showstopper for 4.2.0?
Comment 32 Stefan Metzmacher 2014-11-29 10:10:05 UTC
This is a 4.2.0 blocker.

Garming, Andrew: should we just do with the temporary fix and clean it up later?

Are the changes in source3/winbindd/idmap_util.c needed?
idmap_backends_unixid_to_sid() may reset
map.xid.type = ID_TYPE_BOTH.

I think the pdb_interface.c changes are all we need.
Comment 33 Stefan Metzmacher 2014-12-18 15:36:06 UTC
fixed in v4-2-test
Comment 34 Andrew Bartlett 2015-01-20 11:26:44 UTC
*** Bug 11062 has been marked as a duplicate of this bug. ***