Bug 10671 - Samba file corruption as a result of failed lock check
Samba file corruption as a result of failed lock check
Status: RESOLVED FIXED
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services
4.1.8
x64 Linux
: P5 critical
: ---
Assigned To: Karolin Seeger
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-24 09:27 UTC by Antun Horvat
Modified: 2014-07-17 18:09 UTC (History)
4 users (show)

See Also:


Attachments
Wireshark Network Trace (198.53 KB, application/octet-stream)
2014-06-24 10:25 UTC, Antun Horvat
no flags Details
git-am fix for master. (5.79 KB, patch)
2014-06-24 21:49 UTC, Jeremy Allison
vl: review+
Details
git-am fix for 4.1.next. (5.81 KB, patch)
2014-06-24 21:50 UTC, Jeremy Allison
vl: review+
Details
git-am fix for 4.1.next with vl's cleanup, without smbtorture test. (3.35 KB, patch)
2014-06-26 10:12 UTC, David Disseldorp
jra: review+
vl: review+
metze: review+
Details
Additional patch for v4-1-test (ignores samba3.smb2.oplock.exclusive5) (857 bytes, patch)
2014-07-15 10:59 UTC, Stefan Metzmacher
jra: review+
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Antun Horvat 2014-06-24 09:27:02 UTC
There is an issue that I can reproduce ever time on 4.1.6 and 4.1.8 and probably on other smbd daemons which in certain scenarios can corrupt (truncate) files.

An example of such situation involves two PC-s (PC1 and PC2).
PC1 has installed Office 2003 and 2010, and PC2 has only wordpad installed.

Now, open the same folder on samba share with both PCs and with PC1 create empty word file "WORD". Now, with PC2 open that file with wordpad, and after that WITH Word IN PC1. 

Now, when you have in parallel both files opened, click save on wordpad which will result with error window saying that the file is already opened and used by other process


THE ISSUE HERE IS THAT SAMBA IN BACKGROUND CALLS OPEN with O_TRUNC WHICH DESTROYS ALL PAST FILE CONTENTS. 

I tried this with oplocks enabled and disabled. 

If you need logs please tell me. I did not include the logs since the problem can be easily reproduced on any machine.
Comment 1 Volker Lendecke 2014-06-24 10:14:17 UTC
HMMM. It might well be you just found a problem indeed.... :-(

Can you send a wireshark trace of the failure?
Comment 2 Antun Horvat 2014-06-24 10:25:57 UTC
Created attachment 10046 [details]
Wireshark Network Trace
Comment 3 Volker Lendecke 2014-06-24 11:46:47 UTC
This is a bug that came in with 4.1. Use 4.0 as an emergency workaround.
Comment 4 Antun Horvat 2014-06-24 11:49:36 UTC
Is there some temporary workaround for 4.1 or should I wait official patch for this bug?
Comment 5 Volker Lendecke 2014-06-24 12:10:55 UTC
Sorry, the only workaround is to downgrade to 4.0.
Comment 6 Antun Horvat 2014-06-24 12:27:52 UTC
Ok, thank you for your help
Comment 7 Jeremy Allison 2014-06-24 18:01:34 UTC
(In reply to comment #6)
> Ok, thank you for your help

Yep. This is a bad bug - we'll get a fix for you to test ASAP !

Thanks for reporting.

Jeremy.
Comment 8 Jeremy Allison 2014-06-24 21:49:25 UTC
Created attachment 10049 [details]
git-am fix for master.

Actual fix based on Volker's work but adds regression test.

The S_ISFIFO check may not be strictly necessary, but doesn't hurt (might make the code a bit more complex than it needs to be) - Volker, feel free to remove that part if you wish.

Jeremy.
Comment 9 Jeremy Allison 2014-06-24 21:50:36 UTC
Created attachment 10050 [details]
git-am fix for 4.1.next.

Back-ported from the master patch.
Comment 10 Antun Horvat 2014-06-24 23:02:40 UTC
I have tested the patch on freshly compiled 4.1.9 version, and I can confirm that this patch does resolve this issue.
Comment 11 Jeremy Allison 2014-06-24 23:11:04 UTC
(In reply to comment #10)
> I have tested the patch on freshly compiled 4.1.9 version, and I can confirm
> that this patch does resolve this issue.

Great ! Thanks for your help in reviewing. Let's wait until Volker gets a chance to review and if he's happy we'll get this into master + a release asap !

Cheers,

Jeremy.
Comment 12 Volker Lendecke 2014-06-25 08:54:13 UTC
Comment on attachment 10049 [details]
git-am fix for master.

This is certainly good to go. I've added to minor patches on top of this, sending it to samba-technical for (hopefully quick) review before pushing
Comment 13 Jeremy Allison 2014-06-25 17:45:46 UTC
Re-assigning to Karolin for inclusion in 4.1.next.

Jeremy.
Comment 14 David Disseldorp 2014-06-26 10:12:15 UTC
Created attachment 10054 [details]
git-am fix for 4.1.next with vl's cleanup, without smbtorture test.

I'd like to keep both master and the 4.1 branch consistent. Jeremy, do you mind if we merge both your fix and Volker's cleanup patch? Please see attached.
Comment 15 Jeremy Allison 2014-06-26 18:20:30 UTC
Comment on attachment 10054 [details]
git-am fix for 4.1.next with vl's cleanup, without smbtorture test.

LGTM.
Comment 16 David Disseldorp 2014-06-26 18:29:16 UTC
Comment on attachment 10050 [details]
git-am fix for 4.1.next.

Thanks for the feedback Volker and Jeremy - marking previous patch obsolete.
Comment 17 Karolin Seeger 2014-07-01 07:12:35 UTC
(In reply to comment #15)
> Comment on attachment 10054 [details]
> git-am fix for 4.1.next with vl's cleanup, without smbtorture test.
> 
> LGTM.

Pushed to autobuild-v4-1-test.
Comment 18 Guenther Deschner 2014-07-01 13:39:00 UTC
Do we know if samba versions prior 4.1.x are affected as well ?
Comment 19 Volker Lendecke 2014-07-01 13:44:12 UTC
(In reply to comment #18)
> Do we know if samba versions prior 4.1.x are affected as well ?

4.1 only
Comment 20 Guenther Deschner 2014-07-01 14:59:48 UTC
Thanks for confirmation, Volker.
Comment 21 Stefan Metzmacher 2014-07-15 10:15:42 UTC
Comment on attachment 10054 [details]
git-am fix for 4.1.next with vl's cleanup, without smbtorture test.

This fails autobuild:

[499/1593 in 41m17s] samba3.smb2.oplock(s3dc)
EXCLUSIVE5: open with exclusive oplock
second open with attributes only and NTCREATEX_DISP_OVERWRITE_IF dispostion causes oplock break
Acking to level II [0x01] in oplock handler
Acking to none [0x00] in oplock handler
Acking to none [0x00] in oplock handler
UNEXPECTED(failure): samba3.smb2.oplock.exclusive5(s3dc)
REASON: _StringException: _StringException: (../source4/torture/smb2/oplock.c:710): wrong value for break_info.count got 0x2 - should be 0x1

FAILED (1 failures, 0 errors and 0 unexpected successes in 0 testsuites)

A summary with detailed information can be found in:
  ./bin/ab/summary
ERROR: test failed with exit code 1
Comment 22 Stefan Metzmacher 2014-07-15 10:18:59 UTC
(In reply to comment #21)
> Comment on attachment 10054 [details]
> git-am fix for 4.1.next with vl's cleanup, without smbtorture test.
> 
> This fails autobuild:
> 
> [499/1593 in 41m17s] samba3.smb2.oplock(s3dc)
> EXCLUSIVE5: open with exclusive oplock
> second open with attributes only and NTCREATEX_DISP_OVERWRITE_IF dispostion
> causes oplock break
> Acking to level II [0x01] in oplock handler
> Acking to none [0x00] in oplock handler
> Acking to none [0x00] in oplock handler
> UNEXPECTED(failure): samba3.smb2.oplock.exclusive5(s3dc)
> REASON: _StringException: _StringException:
> (../source4/torture/smb2/oplock.c:710): wrong value for break_info.count got
> 0x2 - should be 0x1
> 
> FAILED (1 failures, 0 errors and 0 unexpected successes in 0 testsuites)
> 
> A summary with detailed information can be found in:
>   ./bin/ab/summary
> ERROR: test failed with exit code 1

We need to backport something like:

commit 20669d4a75386eef4fdcea07fb99812c4e09de13
Author: Volker Lendecke <vl@samba.org>
Date:   Thu Sep 26 16:15:31 2013 -0700

    smbd: Fix raw.batch.exclusive[59]
    
    The level we have to break to depend on the breakers create_disposition:
    If we overwrite, we have to break to none.
    
    This patch overloads the "op_type" field in the break message we send
    across to the smbd holding the oplock with the oplock level we want to
    break to. Because it depends on the create_disposition in the breaking
    open, only the breaker can make that decision. We might want to use
    a different mechanism for this in the future, but for now using the
    op_type field seems acceptable to me.
    
    Signed-off-by: Volker Lendecke <vl@samba.org>
    Reviewed-by: Stefan Metzmacher <metze@samba.org>

This was part of a larger patchset:
git log -15 cdd232cc06a5652ad9f6800d5baf017632099cd8

Or we decide to ignore the failure...
Comment 23 Volker Lendecke 2014-07-15 10:45:04 UTC
(In reply to comment #22)
> (In reply to comment #21)
> > Comment on attachment 10054 [details] [details]
> > git-am fix for 4.1.next with vl's cleanup, without smbtorture test.
> > 
> > This fails autobuild:
> > 
> > [499/1593 in 41m17s] samba3.smb2.oplock(s3dc)
> > EXCLUSIVE5: open with exclusive oplock
> > second open with attributes only and NTCREATEX_DISP_OVERWRITE_IF dispostion
> > causes oplock break
> > Acking to level II [0x01] in oplock handler
> > Acking to none [0x00] in oplock handler
> > Acking to none [0x00] in oplock handler
> > UNEXPECTED(failure): samba3.smb2.oplock.exclusive5(s3dc)
> > REASON: _StringException: _StringException:
> > (../source4/torture/smb2/oplock.c:710): wrong value for break_info.count got
> > 0x2 - should be 0x1
> > 
> > FAILED (1 failures, 0 errors and 0 unexpected successes in 0 testsuites)
> > 
> > A summary with detailed information can be found in:
> >   ./bin/ab/summary
> > ERROR: test failed with exit code 1
> 
> We need to backport something like:
> 
> commit 20669d4a75386eef4fdcea07fb99812c4e09de13
> Author: Volker Lendecke <vl@samba.org>
> Date:   Thu Sep 26 16:15:31 2013 -0700
> 
>     smbd: Fix raw.batch.exclusive[59]
> 
>     The level we have to break to depend on the breakers create_disposition:
>     If we overwrite, we have to break to none.
> 
>     This patch overloads the "op_type" field in the break message we send
>     across to the smbd holding the oplock with the oplock level we want to
>     break to. Because it depends on the create_disposition in the breaking
>     open, only the breaker can make that decision. We might want to use
>     a different mechanism for this in the future, but for now using the
>     op_type field seems acceptable to me.
> 
>     Signed-off-by: Volker Lendecke <vl@samba.org>
>     Reviewed-by: Stefan Metzmacher <metze@samba.org>
> 
> This was part of a larger patchset:
> git log -15 cdd232cc06a5652ad9f6800d5baf017632099cd8
> 
> Or we decide to ignore the failure...

To be honest -- here I think we should opt to add this to knownfail. During my leases work I already found quite a few cases where Samba breaks oplocks differently from Windows, this would add just one more. Backporting all those fixes is beyond this bug I guess.
Comment 24 Stefan Metzmacher 2014-07-15 10:58:26 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > (In reply to comment #21)
> > > Comment on attachment 10054 [details] [details] [details]
> > > git-am fix for 4.1.next with vl's cleanup, without smbtorture test.
> > > 
> > > This fails autobuild:
> > > 
> > > [499/1593 in 41m17s] samba3.smb2.oplock(s3dc)
> > > EXCLUSIVE5: open with exclusive oplock
> > > second open with attributes only and NTCREATEX_DISP_OVERWRITE_IF dispostion
> > > causes oplock break
> > > Acking to level II [0x01] in oplock handler
> > > Acking to none [0x00] in oplock handler
> > > Acking to none [0x00] in oplock handler
> > > UNEXPECTED(failure): samba3.smb2.oplock.exclusive5(s3dc)
> > > REASON: _StringException: _StringException:
> > > (../source4/torture/smb2/oplock.c:710): wrong value for break_info.count got
> > > 0x2 - should be 0x1
> > > 
> > > FAILED (1 failures, 0 errors and 0 unexpected successes in 0 testsuites)
> > > 
> > > A summary with detailed information can be found in:
> > >   ./bin/ab/summary
> > > ERROR: test failed with exit code 1
> > 
> > We need to backport something like:
> > 
> > commit 20669d4a75386eef4fdcea07fb99812c4e09de13
> > Author: Volker Lendecke <vl@samba.org>
> > Date:   Thu Sep 26 16:15:31 2013 -0700
> > 
> >     smbd: Fix raw.batch.exclusive[59]
> > 
> >     The level we have to break to depend on the breakers create_disposition:
> >     If we overwrite, we have to break to none.
> > 
> >     This patch overloads the "op_type" field in the break message we send
> >     across to the smbd holding the oplock with the oplock level we want to
> >     break to. Because it depends on the create_disposition in the breaking
> >     open, only the breaker can make that decision. We might want to use
> >     a different mechanism for this in the future, but for now using the
> >     op_type field seems acceptable to me.
> > 
> >     Signed-off-by: Volker Lendecke <vl@samba.org>
> >     Reviewed-by: Stefan Metzmacher <metze@samba.org>
> > 
> > This was part of a larger patchset:
> > git log -15 cdd232cc06a5652ad9f6800d5baf017632099cd8
> > 
> > Or we decide to ignore the failure...
> 
> To be honest -- here I think we should opt to add this to knownfail. During my
> leases work I already found quite a few cases where Samba breaks oplocks
> differently from Windows, this would add just one more. Backporting all those
> fixes is beyond this bug I guess.

Sounds ok for me.
Comment 25 Stefan Metzmacher 2014-07-15 10:59:39 UTC
Created attachment 10110 [details]
Additional patch for v4-1-test (ignores samba3.smb2.oplock.exclusive5)
Comment 26 Jeremy Allison 2014-07-15 19:30:58 UTC
Comment on attachment 10110 [details]
Additional patch for v4-1-test (ignores samba3.smb2.oplock.exclusive5)

I'm OK with this for 4.1.next.
Comment 27 Jeremy Allison 2014-07-15 19:31:26 UTC
Karolin, please try and push again for 4.1.next - thanks !

Jeremy.
Comment 28 Karolin Seeger 2014-07-17 06:52:25 UTC
(In reply to comment #27)
> Karolin, please try and push again for 4.1.next - thanks !
> 
> Jeremy.

Pushed additional patch to autobuild-v4-1-test.
Thanks for investigating!
Comment 29 Karolin Seeger 2014-07-17 18:09:16 UTC
Pushed to v4-1-test.
Closing out bug report.

Thanks!