Bug 12833 - Profile (ntuser.dat) locked "forever" when shutting down but not when logging off then shutting down
Summary: Profile (ntuser.dat) locked "forever" when shutting down but not when logging...
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Michael Adam
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-10 01:46 UTC by Jobst Schmalenbach
Modified: 2018-01-13 08:11 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jobst Schmalenbach 2017-06-10 01:46:53 UTC
I have had this problem in both the 3.6.X and and 4.2.X stream of samba.
I, too, am not the only one having this problem and it is reported on a variety of distros, e.g. Ubuntu and Centos.

It is really annoying when

 * you have to reboot after an (un)install
 * you have to reboot after an upgrade
 * shutdown then move to another workstation to continue to work there 

The behavior, too, is different for the two cases:

 * when you SHUTDOWN while logged in, then restart and login again you get
   the dreadful "preparing your profile" and you are logged in with a temporary
   profile, this is due to ntuser.dat staying locked for a long time, and the
   length can be forever or until you "/etc/rc.d/init.d/smb reload" which frees
   up the lock. 

 * when you LOGOUT then SHUTDOWN the ntuser.dat file is unlocked 5s after and
   there is no problem with logging in again.

I consider this a bug as the behavior of the locking is different if you shutdown or logout then shutdown. 

I also consider it a bug because it is time depending. When you shutdown in the evening and come back the next morning and login there is no problem as the lock will be gone by then. I have observed oplocks on ntuser.dat for more then three hours by doing a "lsof | grep -i USERNAME | grep smbd" or "lsof | grep -i ntuser" - I had to go home then.

I also consider a shutdown while logged in the same as logging out then shutting down as they are essentially the same.

I know too, you can set "oplocks=no" to solve this problem but then why is it ok when logging out then shutting down? Also there is nothing wrong with locks, I can see there is lot of merit. For example a person logs in at one workstation, then goes to another station works there, shuts down and continues to work on the first station - this has never led to any problems over the last ten years .... due to locks.

IMHO a cleaner should be running to clean up these things, i.e. the cleaner is started when the person logs out and makes sure everything is in order ... 

I have also done this test and there was no problem or corruption of the files involved:

 * logged in and waited for everything to work
 * just opened one file, made a change, saved
 * then shutdown and started the machine again
 * found the PID of the process holding on to NTUSER.DAT
 * gave it a whack with "kill -9 PID"
 * logged in

there was absolute no problem ... 

Why holding on to the lock in the first place?
Comment 1 Jobst Schmalenbach 2017-06-10 01:50:10 UTC
Forgot to add:

* https://lists.samba.org/archive/samba/2017-January/206133.html
* mailing list samba@lists.samba.org,
  Subject Domain Logout, then domain login again, profile corrupt -> replaced by TEMP profile
  date: 9.06.2017
Comment 2 Stefan Metzmacher 2017-06-12 08:20:52 UTC
I guess the problem is that we send an oplock/lease break to the old connection
and don't get an ack in time. As the tcp timeouts are too long to detect the
broken connection and windows may not send tcp rst, smbd believes the connection
is still connected and just downgrade the oplock, while keeping the file
open, which causes the NT_STATUS_SHARING_VIOLATION.

The work towards multi-channel support, will hopefully fix that
as it will detect the broken connection much sooner and close the file.
Comment 3 Stefan Metzmacher 2017-06-12 08:25:42 UTC
(In reply to Stefan Metzmacher from comment #2)

Something like the following in the [global] section:

socket options = TCP_KEEPCNT=4 TCP_KEEPIDLE=240 TCP_KEEPINTVL=15

might detect the broken connection sooner,
it starts to sends the first tcp keepalive after
being idle for 4 minutes (240s) and continues to send
3 additonal keepalives every 15s until the broken connection
is finally detected after 5 minutes.
Comment 4 Stefan Metzmacher 2017-06-12 08:35:48 UTC
(In reply to Stefan Metzmacher from comment #3)

I guess it's giving up after 4 minutes and 45 seconds...

Depending on how fast you machines reboot, you may need to adjust the values.
The lowest useful values would be:

socket options = TCP_KEEPCNT=5 TCP_KEEPIDLE=30 TCP_KEEPINTVL=1

As the OPLOCK_BREAK_TIMEOUT is 30 seconds and smbd forces
a downgraded after OPLOCK_BREAK_TIMEOUT*2.
Comment 5 Jobst Schmalenbach 2017-06-14 01:10:54 UTC
(In reply to Stefan Metzmacher from comment #2)

I know that Roaming Profiles can be a) PITA and b) different sizes depending on what people store in them, although I have so far succeeded to store mostly everything in the HOME share ... but some there is always that on stupid programmer using hard coded path and I cannot win - meaning loads of data stored in the profile. I know some games that do this :-(((((

So because of the different sizes Windows needs to get more or less data across when logging off (which is part of shut down) hence why the entire process isn't always the same time - which, granted, makes it tricky.

I would guess that windows has some sort of "hey I am finished now" flag that is send from the workstation to the server after logoff - so why not capturing this flag and close the lock?

Why does the server need to keep the lock? A logoff is a logoff, meaning that the user does not want to be logged in anymore. Also once you click that button for logoff there is no going back ....

I checked what happens to the file (lsof). When you logoff the lock is released immediately ... when you shutdown it is not? That seems strange to me.
Comment 6 Stefan Metzmacher 2017-06-14 01:44:36 UTC
(In reply to Jobst Schmalenbach from comment #5)

If the client really sends the logoff everything is fine,
the problem comes when the client reboots and doesn't
send a logoff nor close the tcp connection.
Comment 7 Sysadmin HTL-Leonding 2018-01-12 21:56:56 UTC
(In reply to Stefan Metzmacher from comment #6)

When you shutdown client 1 and then try to login at client 2 the profile is still locked long after client 1 had been shutdown.

This issue seems cause trouble to many people (see links below). On the technet some people are reporting that this also happens when the server is a Windows Server 2012 R2 (haven't verified myself) - so this seems to be an issue with at least Windows 10 occuring when being shutdown or restarted on fast computers (having SSD).

Some people claim that a shutdown script sleeping for about 15 seconds has helped them (Windows 10 probably closes the connection then/releases the locks).

Tried it with samba 4.7.4:
When you have shutdown a Windows 10 1703 x64 client with current monthly rollups (fast startup disabled)(without the workaround mentioned above) a
netstat -a -n -o | grep IP_OF_SHUTDOWN_COMPUTER
still shows the connection in the state ESTABLISHED with a very long keepalive timeout. This connection was shown on the server in netstat much longer than the OPLOCK_BREAK_TIMEOUT * 2 time (if it is really defaulting to 30 seconds).

Don't know whether the suggested TCP_KEEPALIVE options have any negative side effects (i.e. to Windows clients going to standby/hibernate without releasing the locks), but they seem to help for real shutdowns/reboots. After the KEEPALIVE has expired the connection is closed, the lock is freed and you can login on the next client without any issues noticed.

If they should have negative impacts, maybe it would be possible to create an option for smb.conf to have this keepalive options only be applied on connections to the profile share? Users having trouble when using standby/hibernate clients could have them use another share path without the short keepalive delay.

Best solution would be when Microsoft would fix that on the Windows clients, but nobody knows if that will happen.

https://forge.univention.org/bugzilla/show_bug.cgi?id=41759
https://lists.samba.org/archive/samba/2017-January/206235.html
https://social.technet.microsoft.com/Forums/en-US/ba54df5f-4356-4cee-b724-303aaad51266/file-locks-persisting-after-shutdownrestart-causes-serious-roaming-profile-redirected-folder?forum=win10itprogeneral
https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1392647