Bug 9058 - Files not deleted, smbstatus shows "Segmentation fault"
Files not deleted, smbstatus shows "Segmentation fault"
Status: RESOLVED FIXED
Product: Samba 3.6
Classification: Unclassified
Component: File services
3.6.6
x86 Windows 7
: P5 critical
: ---
Assigned To: Karolin Seeger
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-23 09:49 UTC by Georg Weickelt
Modified: 2012-08-30 06:29 UTC (History)
9 users (show)

See Also:


Attachments
smbstatus segfaulting - strace of (122.99 KB, text/plain)
2012-08-01 22:24 UTC, Craig Falconer
no flags Details
valgrind (199.45 KB, text/plain)
2012-08-07 13:24 UTC, maurer
no flags Details
wireshark try to delete paper.pdf (23.35 KB, application/octet-stream)
2012-08-07 13:26 UTC, maurer
no flags Details
debug 10 trying to delte paper.pdf (403.85 KB, text/plain)
2012-08-07 13:27 UTC, maurer
no flags Details
valgrind smbstatus with debug rpm installed (185.11 KB, text/plain)
2012-08-07 19:25 UTC, maurer
no flags Details
.zip of open_file binary (+source) (24.52 KB, application/octet-stream)
2012-08-21 21:42 UTC, Ken Harris
no flags Details
Raw patch that fixes the problem here. (1.97 KB, patch)
2012-08-22 00:21 UTC, Jeremy Allison
no flags Details
Updated fix that correctly types 'i' as int, not uint32_t. (1.96 KB, patch)
2012-08-22 00:24 UTC, Jeremy Allison
no flags Details
git-am fix for 3.6.next. (2.39 KB, patch)
2012-08-22 18:53 UTC, Jeremy Allison
vl: review+
Details
wireshark trace on samba server (314.71 KB, application/octet-stream)
2012-08-23 07:09 UTC, maurer
no flags Details
debug10 file log.%m (4.06 MB, text/plain)
2012-08-23 07:15 UTC, maurer
no flags Details
debug10 log.smbd (376.02 KB, text/plain)
2012-08-23 07:16 UTC, maurer
no flags Details
Patch (8.75 KB, patch)
2012-08-23 14:16 UTC, Volker Lendecke
no flags Details
git-am version of Volker's fix with metze's additional change. (8.63 KB, patch)
2012-08-23 16:47 UTC, Jeremy Allison
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Georg Weickelt 2012-07-23 09:49:23 UTC
I have some problems running samba 3.6.6-115.1-2824-SUSE-SL12.1-x86_64. Some files are not deletable via Windows.
Smbstatus shows some lines and last line is: "Segmentation fault (core dumped)"
There are other users with similiar problems, see: https://lists.samba.org/archive/samba/2012-July/168342.html
Comment 1 Georg Weickelt 2012-07-23 11:53:45 UTC
Without knowing what ist means, the stacktrace of smbstatus:

#0  0x00007ffb7bdfb121 in __strlen_sse2 () from /lib64/libc.so.6
#1  0x00007ffb7d647ba5 in traverse_fn (_state=0x7fffabae4110, rec=<optimized out>) at locking/locking.c:1768
#2  traverse_fn (rec=<optimized out>, _state=0x7fffabae4110) at locking/locking.c:1735
#3  0x00007ffb7d682ffc in db_tdb_traverse_read_func (tdb=<optimized out>, kbuf=..., dbuf=..., private_data=<optimized out>) at lib/dbwrap_tdb.c:282
#4  0x00007ffb7c11197c in ?? () from /usr/lib64/libtdb.so.1
#5  0x00007ffb7c111aec in tdb_traverse_read () from /usr/lib64/libtdb.so.1
#6  0x00007ffb7d683246 in db_tdb_traverse_read (db=0x7ffb7d99d2c0, f=0x7ffb7d647af0 <traverse_fn>, private_data=0x7fffabae4110) at lib/dbwrap_tdb.c:296
#7  0x00007ffb7d64ba3b in share_mode_forall (fn=<optimized out>, private_data=<optimized out>) at locking/locking.c:1794
#8  0x00007ffb7d646ecb in main (argc=<optimized out>, argv=<optimized out>) at utils/status.c:473
Comment 2 Craig Falconer 2012-08-01 22:24:37 UTC
Created attachment 7734 [details]
smbstatus segfaulting - strace of
Comment 3 Craig Falconer 2012-08-01 22:34:37 UTC
We've had the same kind of issue here with Samba 3.6.6 compiled from source.

smbstatus -L or -B produce a segfault as per my attachment.

Deleting /var/lib/samba/locking.tdb and restarting samba fixes it for a short time but the issues return within hours.

Windows clients see symptoms like:
* I delete a file and it goes away but immediately reappears on a refresh.
* Sometimes the next day the file has actually deleted.
* User could not reliably do an import with MS excel, possibly related to temporary files being created but not deleted.


I can provide copies of a broken and a working locking.tdb if required - please advise.

# ll `locate locking.tdb`
-rw-r--r-- 1 root root 860160 Aug  2 10:31 /var/lib/samba/locking.tdb
-rw-r--r-- 1 root root  40200 Aug  2 10:02 /var/lib/samba/save/locking.tdb

# file `locate locking.tdb`
/var/lib/samba/locking.tdb:      TDB database version 6, little-endian hash size 10007 bytes
/var/lib/samba/save/locking.tdb: TDB database version 6, little-endian hash size 10007 bytes

(where save is the borked one)


Downgrading to samba3x-3.5.10-0.109.el5_8  from elrepo seems to have fixed the problem.
Comment 4 Georg Weickelt 2012-08-02 05:22:11 UTC
see also this description:
https://lists.samba.org/archive/samba/2012-August/168515.html
Comment 5 Huibert 2012-08-03 09:36:49 UTC
Same here,
seems locking error occurrs when opened Word/Excel which give problems re-saving
document (typical message, file in use, can not ovewrite error while saving. Depends on the choice of oplock-settings).
After this, smbstatus crashes.
Re-witten or attempted re-written Word/Excel files tend to vanish from the server after you would stop or re-run smbd. The process itself (smbd) keeps running when experiencing the problem and keeps operating; no crash.

Info: 
Linux 3.2.5-3.fc16.x86_64 #1 SMP Thu Feb 9 01:24:38 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
smbd 3.6.6 built from sources.

kill(29325, SIG_0)                      = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2917, ...}) = 0
write(1, "29325        99         DENY_NON"..., 14229325        99         DENY_NONE  0x100080    RDONLY     NONE             /xx/xx/xx/xx Wed Aug  1 18:33:02 2012
) = 142
kill(29325, SIG_0)                      = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2917, ...}) = 0
write(1, "29325        99         DENY_NON"..., 14229325        99         DENY_NONE  0x100080    RDONLY     NONE             /xx/xx/xx/xx   Wed Aug  1 13:40:10 2012
) = 142
--- {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f9fea5feb5c} (Segmentation fault) ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)


smb.conf:
[global]
        kernel oplocks = no
        oplocks = no
        level2 oplocks = no

# but any choice here does not improve...
Comment 6 Huibert 2012-08-03 09:38:43 UTC
Additional info:
max protocol = SMB2 + x64 system
Comment 7 maurer 2012-08-05 19:30:02 UTC
Hi

I can confirm both problems

Centos-6.3 x86_64
samba-3.6.6 (compiled from source)
security=ADS
max protocol= smb2

We have updated from 3.6.4 to 3.6.6  
and after the update smbstatus sometimes crashes with
 
Aug  5 20:36:21 rmc-euphrat kernel: smbstatus[29512]: segfault at 7f7e036aaab0 ip 00007f7d08830b5f sp 00007fff2d9e80a8 error 4 in libc-2.12.so[7f7d086ff000+189000]

and sometimes files could not be deleted by the user any more


Regards

Hansjörg
Comment 8 maurer 2012-08-05 19:38:16 UTC
one additional comment

I have been able to reproduce the problem with a Latex GUI  TeXnicCenter on a testsystem .
With TeXnicCenter I was unable to write to a existing file any more.

Disabling smb2 on the testsystem let the write succeed

Regards

Hansjörg
Comment 9 Volker Lendecke 2012-08-06 08:25:07 UTC
Can you run that under valgrind?
Comment 10 Georg Weickelt 2012-08-07 05:30:44 UTC
No, I'm sorry-I do not know how valgrid works.
Comment 11 Volker Lendecke 2012-08-07 06:26:30 UTC
Just install valgrind (possibly via "zypper install valgrind"), and run

valgrind smbstatus

and post the resulting output.
Comment 12 maurer 2012-08-07 13:22:34 UTC
Hi Volker

sorry for the delay, I am on vacation and I needed a user to reproduce it on a testsystem.
But now I have a valgrind run (two with different options) and a debug 10 and wireshark output,

The interesting point was, that the user initially was able to delete the file
paper.pdf.
After the core dump of smbstatus he could not delete the file any more.
It disappears, but after an F5 it is back again (Windows7)

The debug 10 and wireshark is from this deletion attempt

Regrads

Hnasjörg
Comment 13 maurer 2012-08-07 13:24:18 UTC
Created attachment 7741 [details]
valgrind

two runs
valgrind smbstatus 
and 
valgrind --tool=memcheck --leak-check=full -v --num-callers=20 --trace-children=yes --leak-check=full --show-reachable=yes smbstatus
Comment 14 maurer 2012-08-07 13:26:17 UTC
Created attachment 7742 [details]
wireshark try to delete  paper.pdf
Comment 15 maurer 2012-08-07 13:27:40 UTC
Created attachment 7743 [details]
debug 10 trying to delte paper.pdf
Comment 16 Volker Lendecke 2012-08-07 15:20:07 UTC
(In reply to comment #13)
> Created attachment 7741 [details]
> valgrind
> 
> two runs
> valgrind smbstatus 
> and 
> valgrind --tool=memcheck --leak-check=full -v --num-callers=20
> --trace-children=yes --leak-check=full --show-reachable=yes smbstatus

Crap, no debug symbols. Can you install the debug package for smbstatus (wherever that might be) as well?
Comment 17 Huibert 2012-08-07 15:33:08 UTC
Loaded 3.6.7 from sources: solved my problem, have tested it now for half a day.
No more smbstatus crashes and using "level2 oplocks = yes" sufficient for Word/Excel, it seems.
Rgds. Huibert.
Comment 18 maurer 2012-08-07 19:25:57 UTC
Created attachment 7744 [details]
valgrind smbstatus with debug rpm installed

I could trigger the error only after I have deleted a file in the share
Bevor it, smbstatus works fine
Comment 19 Georg Weickelt 2012-08-16 09:54:21 UTC
It also happens with version 3.6.7 and  deactivated SMB2 (maybe not so often). So there is no workaround at this time.
Comment 20 Christian Ambach 2012-08-16 11:31:59 UTC
Would be interesting to know if this is a regression in the 3.6.x series. Can somebody test with an earlier version (e.g. 3.6.5)?
Comment 21 maurer 2012-08-16 11:34:12 UTC
with 3.6.5 it works fine
The problem starts with 3.6.6
We downgraded to 3.6.5 again
Comment 22 Dmitry Miksir 2012-08-16 16:24:34 UTC
Look like I got same issue. It's reproduce very easy for me: create new xlsx file, open this file, close this file (even without save), try to delete file.
If I try to access this "deleted" file using smbclient - error message NT_STATUS_DELETE_PENDING.
File physically deleted only after restart smb.
Tested on 3.6.7. 
3.6.5 work nice.
Windows 7 x64, office 2010, samba-based domain.

smb.conf

[global]
        dos charset = CP866
        workgroup = *****
        server string = File Server BELKA
        interfaces = eth0, lo
        bind interfaces only = Yes
        passdb backend = ldapsam:ldap://ldap.*****.int/
        log level = 3
        syslog = 0
        log file = /var/log/samba/log.%m
        max log size = 50000
        debug pid = Yes
        debug uid = Yes
        max protocol = SMB2
        time server = Yes
        add machine script = sudo /usr/sbin/smbldap-useradd -w "%u"
        logon path = \\%N\profiles\%U\%a
        logon drive = H:
        logon script = logon.cmd %U
        domain logons = Yes
        os level = 240
        preferred master = Yes
        domain master = Yes
        wins support = Yes
        ldap admin dn = cn=samba,dc=*****
        ldap delete dn = Yes
        ldap group suffix = ou=groups
        ldap idmap suffix = ou=idmap
        ldap machine suffix = ou=people
        ldap passwd sync = yes
        ldap suffix = ou=samba,dc=*****
        ldap ssl = no
        ldap user suffix = ou=people
        admin users = root
        create mask = 0700
        directory mask = 0700
        map acl inherit = Yes
        nt acl support = Yes
        ea support = yes
        hide dot files = No
        hide special files = Yes
        #hide unreadable = Yes
        delete veto files = Yes
        veto files = /*.eml/*.nws/*.{*}/._*/.DS_Store/
        #veto oplock files = /*.doc/*.xls/*.docx/*.xlsx/*.DOC/*.XLS/*.DOCX/*.XLSX/
        map archive = No
        map readonly = no
        map system = no
        #store dos attributes = yes
        follow symlinks = No
        profile acls = No
        #socket options = TCP_NODELAY SO_SNDBUF=65536 SO_RCVBUF=65536 IPTOS_LOWDELAY
        load printers = yes
        printing = cups
        printcap name = cups
        disable spoolss = no
        cups options = raw
        #use client driver = yes
        cups server = localhost:631

[homes]
        comment = Home Directories
        read only = No
        browseable = No


[profiles]
        path = /storage/profiles
        read only = No
        store dos attributes = Yes
        create mask = 0600
        directory mask = 0700
        browseable = no
        guest ok = no
        printable = no
        profile acls = yes
        csc policy = disable

[netlogon]
        comment = The domain logon service
        path = /var/lib/samba/netlogon
        public = no
        writeable = no
        browsable = no

[printers]
        comment = All Printers
        path = /var/spool/samba
        use client driver = yes
        printable = yes
        print ok = yes
        browseable = yes
        public = yes
        guest ok = yes
        writable = no
        read only = yes


[print$]
        comment = Printer Drivers
        path = /var/lib/samba/drivers
        browseable = yes
        guest ok = no
        read only = yes
        write list = root, miksir

[Projects]
        path = /storage/projects
        read only = No
        vfs objects = full_audit
        vfs_full_audit:success = aio_write chflags ftruncate mkdir mknod pwrite rename rmdir unlink write
        vfs_full_audit:failure = none
        full_audit:success = aio_write chflags ftruncate mkdir mknod pwrite rename rmdir unlink write
        full_audit:failure = none
        full_audit:facility = local5
        invalid users = backup

Any kind of play with options (disable oplock, full_audit etc) not helps.

Share use ACL's

getfacl /storage/projects
getfacl: Removing leading '/' from absolute path names
# file: storage/projects
# owner: miksir
# group: Domain\040Users
user::rwx
group::---
group:Domain\040Users:rwx
mask::rwx
other::---
default:user::rwx
default:group::---
default:group:Domain\040Users:rwx
default:mask::rwx
default:other::---

Also I have lvl10 log of this situation, can mail it.
Comment 23 Dmitry Miksir 2012-08-16 16:25:52 UTC
PS: I'm used this RPM 
ftp://ftp.sernet.de/pub/samba/3.6/centos/6/i386
Comment 24 maurer 2012-08-20 08:22:29 UTC
Steps to reproduce

samba 3.6.6 und Cents-6.3 x_86_64
Client Windows 7 x64 with MS Excel 2010

smb.conf below

* Connect with MS Explorer to samba server as user maurerh to \\ftp\tmp\maurerh
* RightClick "New Microsoft Excel Sheet" (Translatet from germany"

[root@rmc-donau x86_64]# smbstatus

Samba version 3.6.6-itsd.el6
PID     Username      Group         Machine
-------------------------------------------------------------------
11375     maurerh       dlr_maurerh_p  129.247.189.18 (129.247.189.18)
11376     maurerh       dlr_maurerh_p  10.97.0.152  (10.97.0.152)

Service      pid     machine       Connected at
-------------------------------------------------------
tmp          11376   10.97.0.152   Mon Aug 20 10:11:59 2012
ftp          11375   129.247.189.18  Mon Aug 20 10:11:58 2012

Locked files:
Pid          Uid        DenyMode   Access      R/W        Oplock           SharePath   Name   Time
--------------------------------------------------------------------------------------------------
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   outgoing   Mon Aug 20 10:12:00 2012
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   .   Mon Aug 20 10:11:59 2012
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   .   Mon Aug 20 10:12:00 2012
11376        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:17:44 2012
11376        7740       DENY_NONE  0x120196    WRONLY     BATCH            /home_local/tmp   maurerh/Microsoft Excel-Arbeitsblatt (neu).xlsx   Mon Aug 20 10:17:47 2012
11376        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/tmp   .   Mon Aug 20 10:11:59 2012
11376        7740       DENY_ALL   0x100080    RDONLY     NONE             /home_local/tmp   .   Mon Aug 20 10:13:11 2012


* RMB on the Excel File -> Open

[root@rmc-donau x86_64]# smbstatus

Samba version 3.6.6-itsd.el6
PID     Username      Group         Machine
-------------------------------------------------------------------
11375     maurerh       dlr_maurerh_p  129.247.189.18 (129.247.189.18)
11376     maurerh       dlr_maurerh_p  10.97.0.152  (10.97.0.152)

Service      pid     machine       Connected at
-------------------------------------------------------
tmp          11376   10.97.0.152   Mon Aug 20 10:11:59 2012
ftp          11375   129.247.189.18  Mon Aug 20 10:11:58 2012

Locked files:
Pid          Uid        DenyMode   Access      R/W        Oplock           SharePath   Name   Time
--------------------------------------------------------------------------------------------------
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   outgoing   Mon Aug 20 10:12:00 2012
11376        7740       DENY_WRITE 0x13019f    RDWR       BATCH            /home_local/tmp   maurerh/~$Microsoft Excel-Arbeitsblatt (neu).xlsx   Mon Aug 20 10:18:21 2012
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   .   Mon Aug 20 10:11:59 2012
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   .   Mon Aug 20 10:12:00 2012
11376        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:17:44 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
11376        7740       DENY_WRITE 0x12019f    RDWR       NONE             /home_local/tmp   maurerh/Microsoft Excel-Arbeitsblatt (neu).xlsx   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh/Microsoft Excel-Arbeitsblatt (neu).xlsx   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh/Microsoft Excel-Arbeitsblatt (neu).xlsx   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh/Microsoft Excel-Arbeitsblatt (neu).xlsx   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/tmp   .   Mon Aug 20 10:11:59 2012

* just print to the save Icon

[root@rmc-donau x86_64]# smbstatus

Samba version 3.6.6-itsd.el6
PID     Username      Group         Machine
-------------------------------------------------------------------
11375     maurerh       dlr_maurerh_p  129.247.189.18 (129.247.189.18)
11376     maurerh       dlr_maurerh_p  10.97.0.152  (10.97.0.152)

Service      pid     machine       Connected at
-------------------------------------------------------
tmp          11376   10.97.0.152   Mon Aug 20 10:11:59 2012
ftp          11375   129.247.189.18  Mon Aug 20 10:11:58 2012

Locked files:
Pid          Uid        DenyMode   Access      R/W        Oplock           SharePath   Name   Time
--------------------------------------------------------------------------------------------------
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   outgoing   Mon Aug 20 10:12:00 2012
11376        7740       DENY_WRITE 0x13019f    RDWR       BATCH            /home_local/tmp   maurerh/~$Microsoft Excel-Arbeitsblatt (neu).xlsx   Mon Aug 20 10:18:21 2012
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   .   Mon Aug 20 10:11:59 2012
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   .   Mon Aug 20 10:12:00 2012
11376        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:17:44 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012
Segmentation fault (core dumped)


* Close Excel (*.tmp and Excel File remain)

* Delete the Files in Explorer, Press F5, *.tmp reappears

* restart smbd -> file disappears



[root@rmc-donau x86_64]#  valgrind smbstatus                                                                                       
==11536== Memcheck, a memory error detector                                                                                        
==11536== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.                                                          
==11536== Using Valgrind-3.6.0 and LibVEX; rerun with -h for copyright info                                                        
==11536== Command: smbstatus                                                                                                       
==11536==                                                                                                                          

Samba version 3.6.6-itsd.el6
PID     Username      Group         Machine                        
-------------------------------------------------------------------
11375     maurerh       dlr_maurerh_p  129.247.189.18 (129.247.189.18)
11376     maurerh       dlr_maurerh_p  10.97.0.152  (10.97.0.152)     

Service      pid     machine       Connected at
-------------------------------------------------------
tmp          11376   10.97.0.152   Mon Aug 20 10:11:59 2012
ftp          11375   129.247.189.18  Mon Aug 20 10:11:58 2012

Locked files:
Pid          Uid        DenyMode   Access      R/W        Oplock           SharePath   Name   Time
--------------------------------------------------------------------------------------------------
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   outgoing   Mon Aug 20 10:12:00 2012
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   .   Mon Aug 20 10:11:59 2012       
11375        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/ftp   .   Mon Aug 20 10:12:00 2012       
11376        7740       DENY_NONE  0x100081    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:17:44 2012 
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012 
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012 
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012 
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012 
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:18:21 2012 
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:20:44 2012 
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:20:44 2012 
11376        7740       DENY_NONE  0x100080    RDONLY     NONE             /home_local/tmp   maurerh   Mon Aug 20 10:20:44 2012 
==11536== Invalid read of size 1                                                                                                
==11536==    at 0x4C287D2: strlen (mc_replace_strmem.c:282)                                                                     
==11536==    by 0x15FA50: traverse_fn (locking.c:1768)                                                                          
==11536==    by 0x19B0AB: db_tdb_traverse_read_func (dbwrap_tdb.c:282)
==11536==    by 0x1EEAE4: tdb_traverse_internal (traverse.c:190)
==11536==    by 0x1EEC5B: tdb_traverse_read (traverse.c:229)
==11536==    by 0x19B525: db_tdb_traverse_read (dbwrap_tdb.c:296)
==11536==    by 0x15F99A: share_mode_forall (locking.c:1794)
==11536==    by 0x15EFCD: main (status.c:473)
==11536==  Address 0x8c561a04 is not stack'd, malloc'd or (recently) free'd
==11536==
==11536==
==11536== Process terminating with default action of signal 11 (SIGSEGV)
==11536==  Access not within mapped region at address 0x8C561A04
==11536==    at 0x4C287D2: strlen (mc_replace_strmem.c:282)
==11536==    by 0x15FA50: traverse_fn (locking.c:1768)
==11536==    by 0x19B0AB: db_tdb_traverse_read_func (dbwrap_tdb.c:282)
==11536==    by 0x1EEAE4: tdb_traverse_internal (traverse.c:190)
==11536==    by 0x1EEC5B: tdb_traverse_read (traverse.c:229)
==11536==    by 0x19B525: db_tdb_traverse_read (dbwrap_tdb.c:296)
==11536==    by 0x15F99A: share_mode_forall (locking.c:1794)
==11536==    by 0x15EFCD: main (status.c:473)
==11536==  If you believe this happened as a result of a stack
==11536==  overflow in your program's main thread (unlikely but
==11536==  possible), you can try to increase the size of the
==11536==  main thread stack using the --main-stacksize= flag.
==11536==  The main thread stack size used in this run was 10485760.
==11536==
==11536== HEAP SUMMARY:
==11536==     in use at exit: 117,188 bytes in 1,203 blocks
==11536==   total heap usage: 6,110 allocs, 4,907 frees, 2,547,378 bytes allocated
==11536==
==11536== LEAK SUMMARY:
==11536==    definitely lost: 1,112 bytes in 5 blocks
==11536==    indirectly lost: 90,914 bytes in 980 blocks
==11536==      possibly lost: 12,646 bytes in 109 blocks
==11536==    still reachable: 12,516 bytes in 109 blocks
==11536==         suppressed: 0 bytes in 0 blocks
==11536== Rerun with --leak-check=full to see details of leaked memory
==11536==
==11536== For counts of detected and suppressed errors, rerun with: -v
==11536== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 6 from 6)
Segmentation fault (core dumped)




 





[global]
        workgroup = XXX
        realm = INTRA.XXX.DE
        netbios name = FTPSERVER
        server string = RM-FTP-Server
        interfaces = 127.0.0.1, eth0
        bind interfaces only = Yes
        security = ADS
        password server = *
        username map = /etc/samba/smbusers
        log level = 1
        syslog = 0
        log file = /var/log/samba/log.%m
        printcap name = /dev/null
        machine password timeout = 604800
        os level = 25
        preferred master = No
        local master = No
        domain master = No
        dns proxy = No
        encrypt passwords = yes

        idmap config * : backend = tdb
        idmap config * : range = 1000001-1999999

        idmap config XXX : backend  = ad
        idmap config XXX : schema_mode = rfc2307
        idmap config XXX : readonly = yes
        idmap config XXX : range = 1000-1000000

        max protocol = smb2


        wins server =
        create mask = 0664
        directory mask = 0775
        use sendfile = Yes
        hide dot files = No
        map archive = No
        dont descend = lost+found
        load printers= no
        printing = bsd
        printcap name = /dev/null




[tmp]
        path = /home_local/tmp
        comment = tmp-Share
        browseable = yes
        writeable = yes
        wide links = no
Comment 25 Björn Baumbach 2012-08-20 15:47:25 UTC
I tried to reproduce this issue in my testenv with CentOS and Samba 3.6.6 without success.
I can create, write, rewrite and delete all files. I'm also able to delete the ~$ files.

Office 2010 does not remove these ~$ files automatically, but I see the same on a share on a Windows 2008 r2 server. After the second open and re-write, Office does not delete them. Typically Windows users don't see these files, because the files have the hidden flag AND they are so called "protected operating system files".

If you want to hide the files you need to enable the "store dos attributes" option.

In the next step I'll build 3.6.6 from source on a clean CentOS 6.3 64bit and try it again.

best regards
Björn
Comment 26 maurer 2012-08-21 06:38:27 UTC
Hi

I think the key point to the problem is, that if I open a new Excel file with Office2010 on a 3.6.6 server with max protocol = smb2 , and klick on the save button, a following smbstatus dumbs core

Can you reproduce this?

Without max protocol = smb2 
- smbstatus did not dump core 
- the *.tmp file is automatically removed  on Close of Excel

With max protocol = smb2 
- smbstatus dumps core
- the *.tmp file stays on disk after Close of Excel
- after a restart of smbd it disappears

I think this fits to the obervation of  NT_STATUS_DELETE_PENDING at comment #22

Can I provide further information concerning the core dump of smbstatus?
 
Regards

Hansjörg
Comment 27 Volker Lendecke 2012-08-21 18:29:34 UTC
It's really bad, but no matter how hard we try, we don't get this reproduced. We spent 2 full person days now trying to catch that problem. I think someone suffering from this needs to contract someone from http://www.samba.org/samba/support to be able to remotely log in and see the problem happening.
Comment 28 maurer 2012-08-21 19:07:56 UTC
If you want me to provide you ssh access to a machine, where we can reproduce the problem, let me know and I will contact you with the required information
Comment 29 YOUZHONG YANG 2012-08-21 20:29:34 UTC
Similar issue(not sure if it's the same one) can be reproduced in 3.6.6 but not in 3.6.4:

1. compile the following program, say we get open_file.exe
#include <stdio.h>
#include <tchar.h>
#include <windows.h>

int _tmain(int argc, _TCHAR* argv[])
{
       HANDLE h;
       
       h = CreateFile(argv[1], 
              GENERIC_READ, 
              FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE,
              NULL, OPEN_EXISTING, 0, NULL);
       printf("hit <cr>\n");
       getchar();
       CloseHandle(h);

       return 0;
}

2. set HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\LanmanWorkstation\Parameters\DirectoryCacheLifetime to 0 

3. run "open_file <an existing file on the samba share>". Don't press Enter key.

4. In Windows Explorer, delete the above file, and then refresh the folder, the file reappears.

5. run smbstatus, it crashes with the following stack trace:

core '/var/core/core.smbstatus.3176' of 3176:   /.../samba/bin/smbstatus
 fffffd7ffd2afcf4 strlen () + 14
 000000000049cf25 db_tdb_traverse_read_func () + 55
 fffffd7fec73abda tdb_traverse_internal () + ba
 fffffd7fec73ad3d tdb_traverse_read () + 6d
 000000000049d166 db_tdb_traverse_read () + 56
 00000000004682cf share_mode_forall () + 2f
 00000000004ebe04 main () + 7e4
 00000000004613ac _start () + 6c

Thanks to Ken Harris for the reproduction steps.
Comment 30 Volker Lendecke 2012-08-21 20:46:32 UTC
Can you send me the binary?
Comment 31 Jeremy Allison 2012-08-21 21:06:48 UTC
Attach the binary to the bug then we can both look at it :-).

Thanks,

Jeremy.
Comment 32 Ken Harris 2012-08-21 21:42:42 UTC
Created attachment 7801 [details]
.zip of open_file binary (+source)

Here's the test program binary for reproduction as per Youzhong's instructions. The program holds a handle to a file (argv[1]) until you hit <cr>. Enjoy! -ken
Comment 33 Jeremy Allison 2012-08-21 23:47:33 UTC
Ok, reproduced this - working on a fix right now...

Jeremy.
Comment 34 Jeremy Allison 2012-08-22 00:05:40 UTC
traverse_fn() inside source3/locking/locking.c wasn't correctly updated when we moved to NDR encoding for the delete on close entries.

This is my fault (I remember doing this back-port). I'll fix asap.

Sorry for the bug.

Jeremy.
Comment 35 Jeremy Allison 2012-08-22 00:21:55 UTC
Created attachment 7802 [details]
Raw patch that fixes the problem here.

Ok, here's the fix. I forgot to update the traverse_fn() to correctly use the parse_share_modes() function when I added the backport of the NDR encoding of the delete on close tokens.

So you'd never see this bug unless you had delete on close tokens stored in the locking.tdb, which is why we couldn't reproduce it.

Please test and if everyone reports it fixes the issue (and Volker gives a + review) then we're good to go.

Sorry for the error.

Jeremy.
Comment 36 Jeremy Allison 2012-08-22 00:24:45 UTC
Created attachment 7803 [details]
Updated fix that correctly types 'i' as int, not uint32_t.
Comment 37 maurer 2012-08-22 06:35:56 UTC
Hi

thanks for the fix.
I applied it to 3.6.7 and smbstatus did not core dump any more now.

But the excel tmp file  maurerh/769C49BC.tmp could still not be deleted form the share.
When I try to delete it from the client, after an F5 it reappears and it still is on the linux filesystem.
If I restart smb it disappears automatically form the client and from the filesystem.

Should I take an additional wireshark trace or level10 log?

Regards

Hansjörg
Comment 38 Volker Lendecke 2012-08-22 15:21:56 UTC
(In reply to comment #37)
> Hi
> 
> thanks for the fix.
> I applied it to 3.6.7 and smbstatus did not core dump any more now.
> 
> But the excel tmp file  maurerh/769C49BC.tmp could still not be deleted form
> the share.
> When I try to delete it from the client, after an F5 it reappears and it still
> is on the linux filesystem.
> If I restart smb it disappears automatically form the client and from the
> filesystem.
> 
> Should I take an additional wireshark trace or level10 log?

A debug level 10 log would be interesting. As it is probably very large, what we need is the timestamp when you tried to delete the file, so that we can find it in the logs. Also, we need the full logs, with "max log size = 0".

Volker
Comment 39 Jeremy Allison 2012-08-22 18:53:34 UTC
Created attachment 7809 [details]
git-am fix for 3.6.next.

Fix for the smbstatus crash for 3.6.next.
Comment 40 Jeremy Allison 2012-08-22 21:12:38 UTC
Re-assigning to Karolin to get the smbstatus fix merged now vl has approved the fix. If we can get good data on the Excel issue we can revisit this.

Jeremy.
Comment 41 maurer 2012-08-23 07:06:57 UTC
Hi

took a level10 debug and a wireshark capture
node 10.97.0.152 (no action from other nodes)
First column is the time i noted while performing the action 
I tried to identifiy the action in the wireshark trace on the samba server and noted the exact time stamp an Packet Number from wireshark in order to make it easier to find it in the debug file



8:21:45		open off Microsoft Excel-Arbeitsmappe and edit
8:22:10 	Save ( ~Packet 480 or 08:22:11.525647 in log.10.97.0.152_debug) 
8:22:20 	Close of Excel F2E79813.tmp still here (without max protocol = smb2 it is deleted (~Paket 962 08:22:21.361378 in log.10.97.0.152_debug )

                [2012/08/23 08:23:22.495651,  5]  smbd/close.c:398(close_remove_share_mode)
                close_remove_share_mode: file maurerh/F2E79813.tmp. Delete on close was set - deleting file.


8:22:50 	manual delete off F2E79813.tmp ( ~ Paket 1062 08:22:49.404191 in  log.10.97.0.152_debug)
8:23:05 	Press F5 F2E79813.tmp still here ( ~ Paket 1091 08:23:06.114077 in  log.10.97.0.152_debug )

                automaticalliy deleted on restart of smbd at about 08:23:22.495651
Comment 42 maurer 2012-08-23 07:09:44 UTC
Created attachment 7812 [details]
wireshark trace on samba server
Comment 43 maurer 2012-08-23 07:15:31 UTC
Created attachment 7813 [details]
debug10 file log.%m
Comment 44 maurer 2012-08-23 07:16:20 UTC
Created attachment 7814 [details]
debug10 log.smbd
Comment 45 Ken Harris 2012-08-23 14:12:26 UTC
(In reply to comment #42)
> Created attachment 7812 [details]
> wireshark trace on samba server

Hi: FWIW, I looked at smb_wireshark trace and noticed the following:

1. Packet #338 comprises compounded CREATE and IOCTL requests on "Microsoft Excel-Arbeitsblatt (neu).xlsx"
2. Packet #339 contains the replies to #338, including a handle (76d83ddb-3d04-d7d0-db3d-000000000000) which is _never_ closed in this wireshare trace.
3. Packet #560 contains a SETINFO RENAME request, which renames "Microsoft Excel-Arbeitsblatt (neu).xlsx" -> "F2E79813.tmp"
4. Packet #580 contains the SETINFO DELETE request.

So long as the handle returned in #338 remains unclosed, I would expect "F2E79813.tmp" to remain undeleted and inaccessible; but what changed between 3.6.5 and 3.6.6 that would influence the client in this way? 

Hansjörg, perhaps you could do a wireshark capture with 3.6.5 and compare?

Hope this helps,
-Ken
Comment 46 Volker Lendecke 2012-08-23 14:16:41 UTC
Created attachment 7816 [details]
Patch

Can you please try the attached patch?
Comment 47 Volker Lendecke 2012-08-23 14:18:33 UTC
(In reply to comment #45)
> (In reply to comment #42)
> > Created attachment 7812 [details] [details]
> > wireshark trace on samba server
> 
> Hi: FWIW, I looked at smb_wireshark trace and noticed the following:
> 
> 1. Packet #338 comprises compounded CREATE and IOCTL requests on "Microsoft
> Excel-Arbeitsblatt (neu).xlsx"
> 2. Packet #339 contains the replies to #338, including a handle
> (76d83ddb-3d04-d7d0-db3d-000000000000) which is _never_ closed in this
> wireshare trace.
> 3. Packet #560 contains a SETINFO RENAME request, which renames "Microsoft
> Excel-Arbeitsblatt (neu).xlsx" -> "F2E79813.tmp"
> 4. Packet #580 contains the SETINFO DELETE request.
> 
> So long as the handle returned in #338 remains unclosed, I would expect
> "F2E79813.tmp" to remain undeleted and inaccessible; but what changed between
> 3.6.5 and 3.6.6 that would influence the client in this way? 
> 
> Hansjörg, perhaps you could do a wireshark capture with 3.6.5 and compare?
> 
> Hope this helps,

Yep, came to exactly the same conclusion. The patch just posted should fix it.

Thanks!

Volker
Comment 48 Volker Lendecke 2012-08-23 14:22:20 UTC
Probably the 

-       if (req->compat_chain_fsp) {

in 99807792d743b7f258f0d433689b6231b260bddf is the one that caused this. Not sure how this applies in the big picture of the file_id changes, but the error is due to smb2_ioctl not transferring the chain_fsp in a compound request properly
Comment 49 Volker Lendecke 2012-08-23 14:25:49 UTC
BTW, Hans-Jörg, thanks for the patience and exact logfiles.
Comment 50 Stefan Metzmacher 2012-08-23 16:04:56 UTC
(In reply to comment #46)
> Created attachment 7816 [details]
> Patch
> 
> Can you please try the attached patch?

The 'default:' branch should look like this

+               in_fsp = file_fsp_smb2(req, in_file_id_persistent,
+                                       in_file_id_volatile);
+               if (in_fsp == NULL) {
+                       return smbd_smb2_request_error(req, NT_STATUS_FILE_CLOSED);
+               }

Like it's done in master, otherwise you revert
parts of 99807792d743b7f258f0d433689b6231b260bddf and
in_fsp is always NULL.
Comment 51 Jeremy Allison 2012-08-23 16:47:23 UTC
Created attachment 7817 [details]
git-am version of Volker's fix with metze's additional change.

Metze, Volker - this is the combination of Volker's patch and your requested change. Let me know if it's correct.

Testers, please try this one and report back if it fixes the error.

Thanks !

Jeremy.
Comment 52 maurer 2012-08-24 05:51:05 UTC
Thank you, i am offsite at the moment, will test it on monday
Comment 53 Jeremy Allison 2012-08-24 19:33:16 UTC
Marking this a "blocker" for next release. At least we have a patch to be tested :-).
Comment 54 Volker Lendecke 2012-08-26 20:36:35 UTC
Comment on attachment 7817 [details]
git-am version of Volker's fix with metze's additional change.

Fixes my test here. But we should wait for the reporters to confirm.
Comment 55 maurer 2012-08-27 06:05:35 UTC
Hi

tested it and the tmp files are removed now.
Actually now when excel is open I do not see a persistent *.tmp file any more 
but a file named
~$Microsoft Excel-Arbeitsblatt (neu).xlsx
which is removed after the close of excel

Thank you very much for your attention and for the impressiv quick fix.

I know it is not common or even not polite with so many followers to raise the focus on another strange problem that exists in the whole 3.6.x series.

It is bug 
https://bugzilla.samba.org/show_bug.cgi?id=8881
It is nor critical because a workarround 
( https://bugzilla.samba.org/show_bug.cgi?id=8881#c3 )
exists, but a little bit anoying...

If someone would be able to take a look at it, I can do further testing

Regards

Hansjörg
Comment 56 Karolin Seeger 2012-08-27 08:42:14 UTC
Pushed to v3-6-test.
Will be included in the next bugfix release.
Closing out bug report.

Thanks!
Comment 57 YOUZHONG YANG 2012-08-28 16:31:48 UTC
(In reply to comment #56)
> Pushed to v3-6-test.
> Will be included in the next bugfix release.
> Closing out bug report.
> 
> Thanks!

Does it contain the fix for smbstatus crash? Thanks.
Comment 58 Karolin Seeger 2012-08-29 07:15:05 UTC
(In reply to comment #57)
> (In reply to comment #56)
> > Pushed to v3-6-test.
> > Will be included in the next bugfix release.
> > Closing out bug report.
> > 
> > Thanks!
> 
> Does it contain the fix for smbstatus crash? Thanks.

Yes, exactly.
Comment 59 Ira Cooper 2012-08-29 19:38:06 UTC
Karolin,

I just tested this on a local system.  It currently crashes, using v3-6-test.

It needs the patch from Jeremy that Volker +'d "git-am fix for 3.6.next."

Due to the crash still being there, I am reopening the issue.

-Ira
Comment 60 Karolin Seeger 2012-08-30 06:29:47 UTC
(In reply to comment #59)
> Karolin,
> 
> I just tested this on a local system.  It currently crashes, using v3-6-test.
> 
> It needs the patch from Jeremy that Volker +'d "git-am fix for 3.6.next."
> 
> Due to the crash still being there, I am reopening the issue.
> 
> -Ira

Ira, thanks a lot for catching this one!
Pushed to v3-6-test.
Closing out bug report.

Thanks!