Bug 6651 - smbd SIGSEGV when breaking oplocks
Summary: smbd SIGSEGV when breaking oplocks
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.4
Classification: Unclassified
Component: File services (show other bugs)
Version: 3.4.0
Hardware: Other Linux
: P3 regression
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL: http://bugs.debian.org/541171
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-20 03:42 UTC by Debian samba package maintainers (PUBLIC MAILING LIST)
Modified: 2020-12-11 07:08 UTC (History)
3 users (show)

See Also:


Attachments
Proposed fix for 3.4.1 and master. (1.64 KB, patch)
2009-08-20 19:31 UTC, Jeremy Allison
no flags Details
Updated patch. (3.79 KB, patch)
2009-08-21 16:30 UTC, Jeremy Allison
no flags Details
Git formatted patch for 3.4.1. (4.34 KB, patch)
2009-08-21 17:20 UTC, Jeremy Allison
no flags Details
Attempt to find the crash. (526 bytes, patch)
2009-08-24 23:14 UTC, Jeremy Allison
no flags Details
Next part of the fix (for master) (1.11 KB, patch)
2009-09-03 09:58 UTC, Jeremy Allison
no flags Details
Hopefully last part of the fix... (2.25 KB, patch)
2009-09-03 09:59 UTC, Jeremy Allison
no flags Details
Full backport to v3-4-test (274.50 KB, patch)
2009-09-04 06:29 UTC, Stefan Metzmacher
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Debian samba package maintainers (PUBLIC MAILING LIST) 2009-08-20 03:42:50 UTC
Reported in Debian BTS (this is a copy/paste, but you may want to go back to the original bug report, seel "URL"):

Package: samba
Version: 2:3.4.0-2+b1

Hello,
  whenever I try to access same file from both Linux locally and from 
XP remotely (for example building software from same tree, once on 
Linux, once on Windows), samba commits suicide in oplocks code:

push_file_id_24 (buf=0x7fffdacbf2c0 "\22\b", id=0x30) at lib/file_id.c:71
71      lib/file_id.c: No such file or directory.
        in lib/file_id.c
(gdb) bt
#0  push_file_id_24 (buf=0x7fffdacbf2c0 "\22\b", id=0x30) at 
lib/file_id.c:71
#1  0x00000000006fc295 in break_kernel_oplock (msg_ctx=0x1d1e500, 
fsp=0x0) at smbd/oplock.c:43
#2  0x00000000006d4f97 in tevent_common_check_signal (ev=0x1d1e440) at 
../lib/tevent/tevent_signal.c:311
#3  0x00000000006d34b5 in run_events (ev=0x7fffdacbf2c0, selrtn=-1, 
read_fds=0x7fffdacbf4f0, write_fds=0x7fffdacbf470) at lib/events.c:89
#4  0x0000000000509a23 in smbd_server_connection_loop_once () at 
smbd/process.c:798
#5  smbd_process () at smbd/process.c:2151
#6  0x0000000000943e95 in smbd_accept_connection (ev=<value optimized 
out>, fde=<value optimized out>, flags=<value optimized out>, 
private_data=<value optimized out>) at smbd/server.c:394
#7  0x00000000006d35b6 in run_events (ev=0x1d1e440, selrtn=1, 
read_fds=0x7fffdacbf8e0, write_fds=0x7fffdacbf860) at lib/events.c:126
#8  0x00000000006d37ff in s3_event_loop_once (ev=0x1d1e440, 
location=<value optimized out>) at lib/events.c:185
#9  0x00000000006d3bdc in _tevent_loop_once (ev=0x1d1e440, 
location=0xa78def "smbd/server.c:680") at ../lib/tevent/tevent.c:478
#10 0x000000000094492d in smbd_parent_loop (argc=<value optimized out>, 
argv=<value optimized out>) at smbd/server.c:680
#11 main (argc=<value optimized out>, argv=<value optimized out>) at 
smbd/server.c:1250

tevent_common_check_signal invoked this tevent_signal:

(gdb) print *se
$3 = {prev = 0x0, next = 0x1cb8af0, event_ctx = 0x1d1e440, signum = 35, 
sa_flags = 4, handler = 0x6fc8f0 <linux_oplock_signal_handler>, 
private_data = 0x1d28060, handler_name = 0x9f4705 
"linux_oplock_signal_handler",
  location = 0x9f46ed "smbd/oplock_linux.c:210", additional_data = 
0x1d37720}

For signal 35 it says:

(gdb) print i
$14 = 35
(gdb) print sig_state->signal_count[35]
$19 = {count = 27, seen = 25}
(gdb) print count
$20 = 2

So count was 2, and it matches with expected state.  But code to call 
handlers says:

for (j=0;j<count;j++) {
  /* note the use of the sig_info array as a
     ring buffer */
  int ofs = ((count-1) + j) % SA_INFO_QUEUE_COUNT;
  se->handler(ev, se, i, 1,
              (void*)&sig_state->sig_info[i][ofs],
              se->private_data);
}

So for count=2 it will try to access elements (2-1)+0 = 1, and (2-1)+1 = 
 2.  But elements 0 & 1 are ones which are valid:

(gdb) print sig_state->sig_info[35][0]._sifields._sigpoll.si_fd
$43 = 31
(gdb) print sig_state->sig_info[35][1]._sifields._sigpoll.si_fd
$44 = 32
(gdb) print sig_state->sig_info[35][2]._sifields._sigpoll.si_fd
$45 = 1918967843

I admit that I have troubles understanding what this code tries to do, 
but as far as I can tell second index to sig_info[i][xxx] array should 
be using some different indices.

As far as I can tell attached patch could do the trick.  Unfortunately 
Samba is FTBFS on my box (seems to be completely confused about talloc.h 
& libtalloc.so) so I could not even compile-test it...
                                                                Petr

[samba.diff (text/plain, inline)]
diff -urN samba-3.4.0.orig/lib/tevent/tevent_signal.c samba-3.4.0/lib/tevent/tevent_signal.c
--- samba-3.4.0.orig/lib/tevent/tevent_signal.c 2009-07-03 04:21:14.000000000 -0700
+++ samba-3.4.0/lib/tevent/tevent_signal.c      2009-08-11 20:50:35.000000000 -0700
@@ -32,8 +32,8 @@
 
 #define NUM_SIGNALS 64
 
-/* maximum number of SA_SIGINFO signals to hold in the queue */
-#define SA_INFO_QUEUE_COUNT 100
+/* maximum number of SA_SIGINFO signals to hold in the queue.  Must be power of two. */
+#define SA_INFO_QUEUE_COUNT 64
 
 struct sigcounter {
        uint32_t count;
@@ -44,6 +44,9 @@
 #define SIG_SEEN(s, n) (s).seen += (n)
 #define SIG_PENDING(s) ((s).seen != (s).count)
 
+#define SIG_QUEUE_IDX(sig) (sig_state->signal_count[(sig)].count % SA_INFO_QUEUE_COUNT)
+#define SIG_DEQUEUE_IDX(sig, idx) ((sig_state->signal_count[(sig)].seen + (idx)) % SA_INFO_QUEUE_COUNT)
+
 struct tevent_common_signal_list {
        struct tevent_common_signal_list *prev, *next;
        struct tevent_signal *se;
@@ -70,10 +73,7 @@
 */
 static uint32_t sig_count(struct sigcounter s)
 {
-       if (s.count >= s.seen) {
-               return s.count - s.seen;
-       }
-       return 1 + (0xFFFFFFFF & ~(s.seen - s.count));
+       return s.count - s.seen;
 }
 
 /*
@@ -97,7 +97,7 @@
                                              void *uctx)
 {
        uint32_t count = sig_count(sig_state->signal_count[signum]);
-       sig_state->sig_info[signum][count] = *info;
+       sig_state->sig_info[signum][SIG_QUEUE_IDX(signum)] = *info;
 
        tevent_common_signal_handler(signum);
 
@@ -307,21 +307,11 @@
                                for (j=0;j<count;j++) {
                                        /* note the use of the sig_info array as a
                                           ring buffer */
-                                       int ofs = ((count-1) + j) % SA_INFO_QUEUE_COUNT;
+                                       int ofs = SIG_DEQUEUE_IDX(i, j);
                                        se->handler(ev, se, i, 1, 
-                                                   (void*)&sig_state->sig_info[i][ofs], 
+                                                   (void*)&sig_state->sig_info[i][ofs],
                                                    se->private_data);
                                }
-                               if (SIG_PENDING(sig_state->sig_blocked[i])) {
-                                       /* we'd filled the queue, unblock the
-                                          signal now */
-                                       sigset_t set;
-                                       sigemptyset(&set);
-                                       sigaddset(&set, i);
-                                       SIG_SEEN(sig_state->sig_blocked[i], 
-                                                sig_count(sig_state->sig_blocked[i]));
-                                       sigprocmask(SIG_UNBLOCK, &set, NULL);
-                               }
                                if (se->sa_flags & SA_RESETHAND) {
                                        talloc_free(se);
                                }
@@ -335,6 +325,35 @@
                }
                SIG_SEEN(sig_state->signal_count[i], count);
                SIG_SEEN(sig_state->got_signal, count);
+#ifdef SA_SIGINFO
+               /* It does not matter whether we first look at pending, and then
+                  increment seen, or vice versa, there will be always race.
+
+                  If we first unblock signal, and then increment seen, it can
+                  happen that hundreds of signals will get delivered immediately
+                  after unblocking - and queue will overflow, as end condition
+                  checks for count+1==MAX, and we are already beyond that point.
+
+                  If instead we first increment seen, and then do unblocking, it is
+                  instead possible that (at least two) signals which got us into
+                  blocked state were just delivered after incrementing seen.  In
+                  that case we'll errorneously enable signal, although queue is
+                  full, again potentially overflowing queue.
+
+                  As I can trigger first crash by sitting in gdb for long time,
+                  while I cannot easily trigger second, let's first increment
+                  seen, and then look at pending. */
+               if (SIG_PENDING(sig_state->sig_blocked[i])) {
+                       /* we'd filled the queue, unblock the
+                          signal now */
+                       sigset_t set;
+                       sigemptyset(&set);
+                       sigaddset(&set, i);
+                       SIG_SEEN(sig_state->sig_blocked[i], 
+                                sig_count(sig_state->sig_blocked[i]));
+                       sigprocmask(SIG_UNBLOCK, &set, NULL);
+               }
+#endif
        }
 
        return 1;
Comment 1 Jeremy Allison 2009-08-20 19:30:47 UTC
Wonderful bug report - thanks. I have a fix for this. Can you tell me exactly how you repeated this so I can confirm my fix ?

Jeremy.
Comment 2 Jeremy Allison 2009-08-20 19:31:30 UTC
Created attachment 4580 [details]
Proposed fix for 3.4.1 and master.
Comment 3 Jeremy Allison 2009-08-20 19:31:59 UTC
This is a blocker for 3.4.1 - once I have a reproducible test case to ensure my patch is good.
Jeremy.
Comment 4 Petr Vandrovec 2009-08-20 22:27:40 UTC
I'm not doing anything special.  Inside VMware just check out sources, and do 'make' on both Linux & Windows from same tree.

For a bit simpler environment you just need to trigger two oplock breaks concurrently.  What works for me is:  on Linux box where samba server runs do:

while true; do touch q & touch r; done

and on Windows box get test.bat

@echo off
:1
type %1
goto :1

and then run 'test q' in one cmd and 'test r' in other one.  Sooner or later (on my box with test above crash happens in less than 10 seconds) your Windows should say that 'The specified network name is no longer available' and (at least on Debian) you get an email that things did not work for Samba.
Comment 5 Stefan Metzmacher 2009-08-21 06:46:49 UTC
Jeremy,

I don't understand why your patch would be correct,
just from reading the patch. I'll try to have a look
at it when it's applied. I hope to have time for it on monday.

metze
Comment 6 Jeremy Allison 2009-08-21 11:37:44 UTC
Yes, it's not obvious. The key is that sig_state->sig_info[signum][j] is NOT A RING BUFFER (despite the comment, which I why I removed it :-). It's actually a regular buffer that is filled in from position zero and reset to zero entries on every invocation of tevent_common_check_signal().

When a signal comes in and is handled by tevent_common_signal_handler_info(), sig_count() returns the next empty position in 
sig_state->sig_info[signum][], and then copies the signal info into it.

So the loop in tevent_common_check_signal() must go from zero to count (as that's what is filled in), and if the buffer was full so the signal was blocked it must not re-enable signals until the buffer is drained again by the SIG_SEEN() macro on sig_state->signal_count[signum].

I'm testing now, and will commit today. Feel free to examine closely when it's done.

Jeremy.

Comment 7 Petr Vandrovec 2009-08-21 15:17:55 UTC
Well, I'm not sure you can get away with saying it is not ring buffer - I think that you have to fix it to make it behave like ring buffer (i.e. what my patch tried to do).

I have no idea what's locking rules in smbd, but let's say that you get 5 signals from kernel about breaking leases.  They'll be at slots 0-4.  You start processing them, and while you are doing so, you'll receive 5 more notifications.  Signal handler will put them at slots 5-9.  Now you complete processing, and bump seen pointer by 5.  You'll notice that there are 5 requests still pending, and so ... you'll process slots 0-4 again, instead of processing slots 5-9.
Comment 8 Jeremy Allison 2009-08-21 15:25:46 UTC
That's an excellent point and a case I didn't consider (incoming signals whilst within tevent_common_check_signal(). I'll take another look at your patch, but I don't think your changes to sig_count() are correct, they don't take into account the wrapping at the 32-bit boundary (that's why I rejected it initially).

So you're correctl unless we block signals whilst calling tevent_common_check_signal() then sig_state->sig_info[signum][] must act as a ring buffer. I think a combination of the two patches may be correct....

Updated patch to follow.

Jeremy.
Comment 9 Jeremy Allison 2009-08-21 16:30:22 UTC
Created attachment 4585 [details]
Updated patch.

Ok, this is an update on the previous patch. As Petr Vandrovec pointed out, sig_state->sig_info[signum][] has to be treated as a ring buffer. The key to the change is that on receipt of a signal the siginfo_t data isn't stored into:

int count = sig_count(sig_state->signal_count[signum])
sig_state->sig_info[signum][count] = *info

but is stored into (count + sig_state->signal_count[signum].seen MOD buffer size) (SA_INFO_QUEUE_COUNT) instead. sig_state->signal_count[signum].seen MOD buffer size is the index of the first unprocessed siginfo_t data.

When processing the received signals inside tevent_common_check_signal() the same calculation is used to find the first entry in the ring buffer to process. The other changes are the same.

The only additional modification (I found this useful when hand-testing the fix) is to zero fill the siginfo_t array when allocating, and memset each processed entry back to zero after it was processed. This will make finding runtime errors in this code easier, as if a handler is ever called with a siginfo_t struct that is zero filled we know we have a logic error.

I'm pretty satisfied with this fix, and plan to commit one it's passed testing (unless Petr Vandrovec can spot any more logic errors or race conditions I've missed).

Petr, Metze and Volker, please review - thanks !

Jeremy.
Comment 10 Jeremy Allison 2009-08-21 17:17:43 UTC
Pushed to master. Please review.

Another problem with your patch Petr was that your modification to
tevent_common_signal_handler_info() used SIG_QUEUE_IDX (defiend as #define
SIG_QUEUE_IDX(sig) (sig_state->signal_count[(sig)].count %
SA_INFO_QUEUE_COUNT) to index the position to store the incoming signal
siginfo_t. However, you need to use the .seen element as the base of the index
for storage (+count). Your index should have been SIG_DEQUEUE_IDX(sig, count)
(using your definition of SIG_DEQUEUE_IDX) to store in incoming siginfo_t.

I preferred to make these calculations explicit in the patch I pushed.

git formatted patch for 3.4.1 to follow.

Jeremy.
Comment 11 Jeremy Allison 2009-08-21 17:20:42 UTC
Created attachment 4586 [details]
Git formatted patch for 3.4.1.
Comment 12 Petr Vandrovec 2009-08-21 17:27:59 UTC
Thanks.  I'll test it as soon as it arrives to Debian (or when I figure out what's different between my box and everybody else) - as I wrote in original patch, it was not even compile-tested as Samba does not build on my box due to some confusion where talloc.h lives...
Comment 13 Jeremy Allison 2009-08-21 17:44:52 UTC
It would really help if you could test it now :-). It works here, but as you found and did a wonderful job of identifying the issue I'd really trust your testing. Can you just remove the separate talloc library from your debian box (Samba comes with its own version of talloc) ?

Jeremy.
Comment 14 Petr Vandrovec 2009-08-21 21:08:54 UTC
Checked out master from git, built it with default options without Debian patches (it did build...), and it survived in my build environment for about 40 minutes, so I think it is fixed.
Comment 15 Jeremy Allison 2009-08-21 22:41:25 UTC
Thanks a *lot* for doing that test, much appreciated. Metze, Volker, once it passes your review I think it's good for 3.4.1.
Jeremy.
Comment 16 Petr Vandrovec 2009-08-24 21:32:39 UTC
I'm afraid that there is still some problem...  Unfortunately I have no idea where that thing dumped core file - definitely not into /usr/local/samba/var/cores/smbd, that one is empty...

[2009/08/23 01:18:08,  1] smbd/service.c:1050(make_connection_snum)
  petr-win2 (::ffff:10.20.86.92) connect to service rootfs initially as user petr (uid=884, gid=201) (pid 14668)
[2009/08/24 19:22:07,  0] lib/fault.c:41(fault_report)
  ===============================================================
[2009/08/24 19:22:07,  0] lib/fault.c:42(fault_report)
  INTERNAL ERROR: Signal 11 in pid 14668 (3.4.0-GIT-f9c0f88-devel)
  Please read the Trouble-Shooting section of the Samba3-HOWTO
[2009/08/24 19:22:07,  0] lib/fault.c:44(fault_report)
  
  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2009/08/24 19:22:07,  0] lib/fault.c:45(fault_report)
  ===============================================================
[2009/08/24 19:22:07,  0] lib/util.c:1447(smb_panic)
  PANIC (pid 14668): internal error
[2009/08/24 19:22:07,  0] lib/util.c:1551(log_stack_trace)
  BACKTRACE: 17 stack frames:
   #0 /usr/local/samba/sbin/smbd(log_stack_trace+0x1c) [0x7fc8dc27ec3f]
   #1 /usr/local/samba/sbin/smbd(smb_panic+0x55) [0x7fc8dc27ed41]
   #2 /usr/local/samba/sbin/smbd [0x7fc8dc26fa19]
   #3 /lib/libpthread.so.0 [0x7fc8da6be720]
   #4 /usr/local/samba/sbin/smbd(push_file_id_24+0) [0x7fc8dc2a29d4]
   #5 /usr/local/samba/sbin/smbd(break_kernel_oplock+0x25) [0x7fc8dc2baa88]
   #6 /usr/local/samba/sbin/smbd [0x7fc8dc2bb12c]
   #7 /usr/local/samba/sbin/smbd(tevent_common_check_signal+0xf7) [0x7fc8dc28ecb1]
   #8 /usr/local/samba/sbin/smbd(run_events+0x30) [0x7fc8dc28d2ff]
   #9 /usr/local/samba/sbin/smbd(smbd_process+0x82f) [0x7fc8dc0a2f00]
   #10 /usr/local/samba/sbin/smbd [0x7fc8dc58b0aa]
   #11 /usr/local/samba/sbin/smbd(run_events+0x130) [0x7fc8dc28d3ff]
   #12 /usr/local/samba/sbin/smbd [0x7fc8dc28d670]
   #13 /usr/local/samba/sbin/smbd(_tevent_loop_once+0x87) [0x7fc8dc28d9e7]
   #14 /usr/local/samba/sbin/smbd(main+0xf85) [0x7fc8dc58c1ed]
   #15 /lib/libc.so.6(__libc_start_main+0xe6) [0x7fc8d8e6b5c6]
   #16 /usr/local/samba/sbin/smbd [0x7fc8dc02e7d9]
[2009/08/24 19:22:07,  0] lib/util.c:1452(smb_panic)
  smb_panic(): calling panic action [/usr/share/samba/panic-action 14668]
[2009/08/24 19:22:08,  0] lib/util.c:1460(smb_panic)
  smb_panic(): action returned status 0
[2009/08/24 19:22:08,  0] lib/fault.c:321(dump_core)
  dumping core in /usr/local/samba/var/cores/smbd
[2009/08/24 19:22:08,  1] smbd/service.c:1050(make_connection_snum)
  petr-win2 (::ffff:10.20.86.92) connect to service petr initially as user petr (uid=884, gid=201) (pid 2220)

I suspect that it died because "counter.seen % SA_INFO_QUEUE_COUNT" after I managed to break 2^32 leases or something like that - in my patch I changed SA_INFO_QUEUE_COUNT to 64 to make sure it is power of two, but in yours it stayed at 100...  Though I have no idea whether my build could break 2^32 leases during past three days, it looks like quite a lot, so maybe some other explanation is necessary.  Unfortunately without core I have no hints available.
Comment 17 Jeremy Allison 2009-08-24 23:14:31 UTC
Created attachment 4595 [details]
Attempt to find the crash.

I can't see any reason why SA_INFO_QUEUE_COUNT needs to be a power of two. The '%' operator simply returns the returns the remainder of counter.seen divided SA_INFO_QUEUE_COUNT, ensuring the offset remains within 0 to (SA_INFO_QUEUE_COUNT-1).

The problem isn't occurring in the signal handler, it's occurring when we don't find an fsp matching a file descriptor. I've added the attached patch to master to help track this condition down.

In order to get a debug case, compile with -g and then add the line:

panic action = /bin/sleep 999999

into the [global] section of your smb.conf. When the panic happens the process will be left to allow you to attach to it with gdb and troubleshoot why linux_oplock_signal_handler() is getting invoked with an invalid fd (I'm guessing it'll be zero).

Jeremy.
Comment 18 Petr Vandrovec 2009-08-24 23:51:00 UTC
But it means that it will jump from 95 (4294967295 % 100) to 0 when wraparound happens, and so 96th queued entry will replace 0th - it will believe it can queue 100 entries, but instead of using slots 0-99 it will use slots 0-95, and then slots 0-3 again...

Unfortunately it seems that by default Samba does not provide debug symbols, so I'll have to rebuild and try again :-(
Comment 19 Petr Vandrovec 2009-08-25 02:42:44 UTC
And it defaults to -pie which cannot be debugged on Debian systems as only RedHat has PIE patches in gdb.  Oh well, another try.  That one failed because clean build did not clean lib/tevent, so I had no symbols there.  So you can see it is nicely reproducible...

Looks like race between client closing file and server receiving oplock break notification - notification gets queued, client closes file, server tries to process notification, and dies because fd is already closed (what if fd would be already reopened for some other client, would it mean that server would ask client to release oplock it never asked for?)

(gdb) bt
#0  0x00007f422af970b5 in __libc_waitpid (pid=26462, stat_loc=0x7fff6cf7f70c, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:32
#1  0x00007f422af386b1 in do_system (line=0x2e09ee0 "/bin/sleep 99999") at ../sysdeps/posix/system.c:149
#2  0x000000000070f556 in smb_panic (why=<value optimized out>) at lib/util.c:1453
#3  0x00000000006feec0 in fault_report (sig=1) at lib/fault.c:47
#4  sig_fault (sig=1) at lib/fault.c:70
#5  <signal handler called>
#6  push_file_id_24 (buf=0x7fff6cf7fc00 "", id=0x30) at lib/file_id.c:63
#7  0x000000000074cf75 in break_kernel_oplock (msg_ctx=0x2df36e0, fsp=0x0) at smbd/oplock.c:43
#8  0x000000000072016e in tevent_common_check_signal (ev=0x2d8c430) at ../lib/tevent/tevent_signal.c:326
#9  0x000000000071e5e5 in run_events (ev=0x7fff6cf7fc00, selrtn=0, read_fds=0x0, write_fds=0x0) at lib/events.c:94
#10 0x0000000000527cd7 in smbd_server_connection_loop_once () at smbd/process.c:790
#11 smbd_process () at smbd/process.c:2232
#12 0x0000000000a20fbe in smbd_accept_connection (ev=<value optimized out>, fde=<value optimized out>, flags=<value optimized out>, private_data=<value optimized out>) at smbd/server.c:404
#13 0x000000000071e6e6 in run_events (ev=0x2d8c430, selrtn=1, read_fds=0x7fff6cf80110, write_fds=0x7fff6cf80090) at lib/events.c:131
#14 0x000000000071e94f in s3_event_loop_once (ev=0x2d8c430, location=<value optimized out>) at lib/events.c:194
#15 0x000000000071ed40 in _tevent_loop_once (ev=0x2d8c430, location=0xbe5cd0 "smbd/server.c:692") at ../lib/tevent/tevent.c:488
#16 0x0000000000a21a59 in smbd_parent_loop (argc=<value optimized out>, argv=<value optimized out>) at smbd/server.c:692
#17 main (argc=<value optimized out>, argv=<value optimized out>) at smbd/server.c:1267
(gdb) info locals
se = 0x2e03bb0
count = 2
sl = <value optimized out>
next = 0x0
clear_processed_siginfo = <value optimized out>
i = 35
(gdb) print sig_state->signal_count[35]
$2 = {count = 547, seen = 529}
(gdb) print sig_state->sig_info[35][29]
$3 = {si_signo = 35, si_errno = 0, si_code = 3, ... _sigpoll = { si_band = 1089, si_fd = 46}}}
(gdb) print sig_state->sig_info[35][30]
$4 = {si_signo = 35, si_errno = 0, si_code = 3, ... _sigpoll = { si_band = 1089, si_fd = 44}}}

petr-dev3:/usr/src/git/samba/source3# ls -la /proc/25557/fd/44
lr-x------ 1 root root 64 2009-08-25 00:32 /proc/25557/fd/44 -> /home/petr/vmk1/bora/public/vmware_pack_end.h
petr-dev3:/usr/src/git/samba/source3# ls -la /proc/25557/fd/46
ls: cannot access /proc/25557/fd/46: No such file or directory

Just for confirmation that [29] is indeed first used event:

(gdb) print sig_state->sig_info[35][28]
$7 = {si_signo = 0, si_errno = 0, si_code = 0, _sifields = {_pad = {0 <repeats 28 times>}, _kill = {si_pid = 0, si_uid = 0}, _timer = {si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0, 
        sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {
      si_addr = 0x0}, _sigpoll = {si_band = 0, si_fd = 0}}}

count is 2 as there were 2 oplock break request queued when code started processing them.  Before I attached gdb apparently 16 more got queued.
Comment 20 Jeremy Allison 2009-08-25 10:52:12 UTC
Ok, I think I see the point about the wraparound point. Unfortunately I'm out on vacation/business trip until Sept. 5th. As this code is only in master (not released code) it's not an urgent fix to get into the tree so we can take the time to get it right. I'll try and take a look over this on vacation. I do see the point about the race also - this is handled differently in the earlier branches (the race was explicitly handled). I think the panic when fsp can't be found is incorrect - it should just log a message here and ignore the error (as the client may have closed the file). Re-use of the fd is another problem - we need to remove any pending signal fd on close. I'll look into this one also either on my trip or when I return.

Thanks for persevering on this.

Jeremy.
Comment 21 Jeremy Allison 2009-08-25 11:03:44 UTC
By the way, I'm also learning a lot by working with you (Petr), and I just wanted you to know I appreciate that !

Thanks,

Jeremy.
Comment 22 Petr Vandrovec 2009-08-25 18:04:24 UTC
Do not worry.  I'm on vacation from tomorrow until Sep 15th, so I won't bring more bad news for next three weeks unless I'm forced to work from vacation...
Comment 23 Karolin Seeger 2009-09-01 04:03:05 UTC
(In reply to comment #20)
> As this code is only in master (not
> released code) it's not an urgent fix to get into the tree so we can take the
> time to get it right.

Okay, lowering security as this is not a showstopper for 3.4.1 any longer.

Comment 24 Stefan Metzmacher 2009-09-01 04:30:50 UTC
I think the code in v3-4-test and master is the same regarding
this oplock code.
Comment 25 Karolin Seeger 2009-09-02 03:40:01 UTC
Ok, then it's a showstopper, again...
Comment 26 Jeremy Allison 2009-09-02 07:23:41 UTC
Ok, I have a fix for this. I will commit to master once I'm back online in Chicago (Sept. 3rd) - I'm working briefly in an airport lounge at the moment :-). Sorry for the delay.
Jeremy.
Comment 27 Jeremy Allison 2009-09-03 09:58:30 UTC
Created attachment 4639 [details]
Next part of the fix (for master)
Comment 28 Jeremy Allison 2009-09-03 09:59:52 UTC
Created attachment 4640 [details]
Hopefully last part of the fix...
Comment 29 Jeremy Allison 2009-09-03 10:01:59 UTC
Here are the commit comments for the (final?) two parts of the patch:

    Another part of the fix for bug 6651 - smbd SIGSEGV when breaking oplocks.
    SA_INFO_QUEUE_COUNT *MUST* be a power of 2, in order for the ring buffer
    wrap to work correctly at the 32 bit boundary. Thanks to Petr
    Vandrovec <petr@vandrovec.name> for this.

    Hopefully last part of the fix for bug 6651 - smbd SIGSEGV when breaking oplocks.
    This one is subtle. There is a race condition where a signal can be
    queued for oplock break, and then the file can be closed by the client
    before the signal can be processed. Currently if this occurs we panic
    (we can't match an incoming signal fd with a fsp pointer). Simply log
    the error (at debug level 10 right now, might be too much) and then
    return without processing the break request. It looks like there is
    another race condition with this fix, but here's why it won't happen.
    If the signal was pending (caused by a kernel oplock break from a
    local file open), and the client closed the file and then re-opened
    another file which happened to use the same file descriptor as the
    file just closed, then theoretically the oplock break requests could
    be processed on the wrong fd. Here's why this should be very rare..
    Processing a pending signal always take precedence over an incoming
    network request, so as long as the client close request is non-chained
    then the break signal should always be harmlessly processed *before*
    the open can be called. If the open is chained onto the close, and
    the fd on the new open is the same as the old closed fd, then it's
    possible this race will occur. However, all that will happen is that
    we'll lose the oplock on this file. A shame, but not a fatal event.

I'll port these to 3.4.1 once I've finished my commitments at the conf. Please review in the meantime. Thanks !

Jeremy.
Comment 30 Stefan Metzmacher 2009-09-04 06:29:57 UTC
Created attachment 4647 [details]
Full backport to v3-4-test

This patch brings lib/tevent in sync with master
and contains the 2 fixes from smbd/oplock_linux.c

metze
Comment 31 Michael Adam 2009-09-08 09:26:51 UTC
The patchset attached by metze seems to be so big because it contains the
update of the config.sub/config.guess files. But this is useful on its own
(and requested in another bug report anyways). 

In this case, I think it is the right thing to sync the tevent changes
(instead of taking the pain of developing a more minimal version of the patch).

Michael
Comment 32 Jeremy Allison 2009-09-08 16:11:54 UTC
+1 looks good to me (makes the 3.4.1 tevent_signal identical to master).
Re-assigning to Karolin for inclusion.
Jeremy.
Comment 33 Karolin Seeger 2009-09-09 05:28:48 UTC
Pushed, will be included in 3.4.1.
Closing out bug report.

Thanks!