Bug 10797 - smbd panic at find_oplock_types
Summary: smbd panic at find_oplock_types
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: SMB2 (show other bugs)
Version: 3.6.12
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
: 10283 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-09-05 23:26 UTC by Hemanth
Modified: 2014-09-30 18:32 UTC (History)
2 users (show)

See Also:


Attachments
SMB2 torture oplock test to recreate the issue. (3.93 KB, application/octet-stream)
2014-09-06 19:46 UTC, Hemanth
no flags Details
git-am fix for 4.1.next (1.24 KB, patch)
2014-09-12 21:59 UTC, Jeremy Allison
vl: review+
Details
Patch (690 bytes, patch)
2014-09-29 08:18 UTC, Volker Lendecke
jra: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hemanth 2014-09-05 23:26:23 UTC
I am able to recreate this smb panic with the help of smbtorture test. This issue actually related to alternate data streams. In one of our customer core dumps, we could see the file is having alternate data stream.

Here are the steps to recreate the panic:

1) create some file say oplock.txt. 
2) create a stream on the file. ex: oplock.txt:stream1
3) Open file oplock.txt with BATCH_OPLOCK. do not close the handle
4) Open the stream file(oplock.txt:stream1) with or without oplock. Do not close the handle here as well.
5) Open the base file (oplock.txt) again. You will see smb panic.

Just to remind we are using samba 3.6.12+ stack.

I have reproduced this issue on a freenas server as well which is running samba 3.6.12 stack. But did not see much changes in Master code specific to this area. We should be hitting the same issue.

Log snippet during panic on freenas:

  parse_share_modes: owrt: Fri Sep  5 14:53:12 2014 PDT, cwrt: Wed Dec 31 16:00:00 1969 PST, ntok: 0, num_share_modes: 2
[2014/09/05 15:01:38.881370, 10] locking/locking.c:725(parse_share_modes)
  parse_share_modes: share_mode_entry[0]:  pid = 5178, share_access = 0x7, private_options = 0x0, access_mask = 0x120089, mid = 0x4, type= 0x2, gen_id = 1836070363, uid = 0, flags = 0, file
_id 1e7aff2c:3616f:0, name_hash = 0x2ab3c7
[2014/09/05 15:01:38.881384, 10] locking/locking.c:725(parse_share_modes)
  parse_share_modes: share_mode_entry[1]:  pid = 5178, share_access = 0x7, private_options = 0x0, access_mask = 0x0, mid = 0x0, type= 0x0, gen_id = 1836070364, uid = 0, flags = 0, file_id 1
e7aff2c:3616f:0, name_hash = 0x2ab3c7
[2014/09/05 15:01:38.881394,  0] lib/util.c:1117(smb_panic)
  PANIC (pid 5178): Bad no oplock entry.


I think this is what its happening..

When we successfully open base file step#3, we register a shared mode lock with BATCH_OPLOCK on base file. When we try to open the stream file in step#4, we are doing an internal open on base file with access mask as zero, oplock as zero(NO_OPLOCK) and req as NULL in create_file_unixpath().

In create_file_unixpath()
....
		/* Open the base file. */
		status = create_file_unixpath(conn, NULL, smb_fname_base, 0,
					      FILE_SHARE_READ
					      | FILE_SHARE_WRITE
					      | FILE_SHARE_DELETE,
					      base_create_disposition,
					      0, 0, 0, 0, 0, NULL, NULL,
					      &base_fsp, NULL);
		TALLOC_FREE(smb_fname_base);
...

But we avoid breaking existing batch oplock as we treat this as internal open.  In open_file_ntcreate()..
/* First pass - send break only on batch oplocks. */
		if ((req != NULL) && ===> we skip if req is NULL.
				delay_for_batch_oplocks(fsp,
					req->mid,
					oplock_request,
					batch_entry)) {
			schedule_defer_open(lck, request_time, req);
			TALLOC_FREE(lck);
			return NT_STATUS_SHARING_VIOLATION;
		}

But this entry is also getting added to share mode lock list for the base file. Now we have a BATCH_OPLOCK and NO_OPLOCK on base file oplock.txt. Next open request on base file is causing to traverse the list and leading to panic situation.


I have observed that we are actually setting the flag "INTERNAL_OPEN_ONLY" in create_file_unixpath when we see the req as NULL.
...
	if (req == NULL) {
		oplock_request |= INTERNAL_OPEN_ONLY;
	}
...

But in open_file_ntcreate(), we are unconditionally clearing this INTERNAL_OPEN_ONLY flag.

...
	/* Ensure no SAMBA_PRIVATE bits can be set. */
	fsp->oplock_type = (oplock_request & ~SAMBA_PRIVATE_OPLOCK_MASK);
....

Due to this, we never treated this shared lock entry as internal.

In find_oplock_types() we have the checks..

...
if ((oplock_request & INTERNAL_OPEN_ONLY) || is_stat_open(fsp->access_mask)) {
		return;
	}
....
		if (lck->share_modes[i].op_type == NO_OPLOCK &&
				is_stat_open(lck->share_modes[i].access_mask)) {
			/* We ignore stat opens in the table - they
			   always have NO_OPLOCK and never get or
			   cause breaks. JRA. */
			continue;
		}
...

This particular shared lock is skipping all these checks. It was neither having INTERNAL_OPEN_FLAG nor a stat open(since access mask is zero). 

I want to know..
a) If its intentional to clear the SAMBA_PRIVATE_OPLOCK_MASK before adding the lock to share mode list.
b) Why are we not using access mask as (SYNCHRONIZE_ACCESS|FILE_READ_ATTRIBUTES| FILE_WRITE_ATTRIBUTES) which will mark the open as stat_open and will allow to skip this share lock check which will avoid panic.
Comment 1 Hemanth 2014-09-05 23:31:38 UTC
stack looks like this but line numbers will not be matched with exact 3.6.12 as we taken few critical fixes in between.

(gdb) bt
#0  0x0000000802cf3ffc in thr_kill () from /lib/libc.so.7
#1  0x0000000802d8f58b in abort () from /lib/libc.so.7
#2  0x00000000007925d1 in dump_core () at lib/fault.c:414
#3  0x00000000007a246f in smb_panic (why=<optimized out>) at lib/util.c:1133
#4  0x0000000000502276 in find_oplock_types (fsp=<optimized out>, oplock_request=<optimized out>, lck=0x803781e50, pp_batch=0x7fffffffc5d0, pp_ex_or_batch=0x7fffffffc5d8,
    got_level2=0x7fffffffc62d, got_no_oplock=0x7fffffffc62e) at smbd/open.c:1112
#5  0x0000000000507c5e in open_file_ntcreate (fsp=<optimized out>, pinfo=<optimized out>, private_flags=<optimized out>, oplock_request=0, new_dos_attributes=0,
    create_options=<optimized out>, create_disposition=<optimized out>, share_access=<optimized out>, access_mask=65664, req=<optimized out>, conn=<optimized out>) at smbd/open.c:1942
#6  create_file_unixpath (conn=0x803753c50, req=0x803f28e00, smb_fname=0x803f28f90, access_mask=65664, share_access=7, create_disposition=1, create_options=<optimized out>,
    file_attributes=0, oplock_request=0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x7fffffffc6d8, pinfo=0x7fffffffc6e4) at smbd/open.c:3426
#7  0x0000000000508f4b in create_file_default (conn=0x803753c50, req=0x803f28e00, root_dir_fid=0, smb_fname=0x803f28f90, access_mask=65664, share_access=7, create_disposition=1,
    create_options=2097216, file_attributes=0, oplock_request=0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x7fffffffe328, pinfo=0x7fffffffe0d4) at smbd/open.c:3814
Comment 2 Hemanth 2014-09-06 19:46:21 UTC
Created attachment 10262 [details]
SMB2 torture oplock test to recreate the issue.
Comment 3 Jeremy Allison 2014-09-12 21:59:48 UTC
Created attachment 10285 [details]
git-am fix for 4.1.next

Fix that went into master - applies cleanly to 4.1.next. Torture test would need to be back-ported to 4.1.x (doesn't apply).

Fix would need to be back-ported to 4.0.x also, not sure we need to do that but I'll let Volker decide.

Jeremy.
Comment 4 Jeremy Allison 2014-09-13 16:49:47 UTC
Re-assigning to Karolin for inclusion in 4.1.next.
Jeremy.
Comment 5 Karolin Seeger 2014-09-17 09:32:53 UTC
Pushed to autobuild-v4-1-test.
Comment 6 Karolin Seeger 2014-09-27 17:53:13 UTC
It seems like this patch breaks make test.
Could someone please investigate?

Thanks!

From stdout:

[499/1596 in 40m15s] samba3.smb2.oplock(s3dc)
0: Opening stream: oplock_test\test_stream1.txt:Stream One:$DATA with 9
1: Opening stream: oplock_test\test_stream1.txt::$DATA with 9
2: Opening stream: oplock_test\test_stream1.txt:Stream One:$DATA with 8
3: Opening stream: oplock_test\test_stream1.txt::$DATA with 8
Opening base file: oplock_test\test_stream1.txt with 9
4: Opening stream: oplock_test\test_stream1.txt:Stream One:$DATA with 9
Opening base file: oplock_test\test_stream1.txt with 9
5: Opening stream: oplock_test\test_stream1.txt::$DATA with 9
Acking to level II [0x01] in oplock handler
Opening base file: oplock_test\test_stream1.txt with 9
6: Opening stream: oplock_test\test_stream1.txt:Stream One:$DATA with 8
Opening base file: oplock_test\test_stream1.txt with 9
7: Opening stream: oplock_test\test_stream1.txt::$DATA with 8
Acking to level II [0x01] in oplock handler
Opening stream: oplock_test\test_stream1.txt:Stream One:$DATA with 8
Opening base file: oplock_test\test_stream1.txt with 9
Opening stream again: oplock_test\test_stream1.txt with 9
Acking to level II [0x01] in oplock handler
UNEXPECTED(uxsuccess): samba3.smb2.oplock.stream1(s3dc)
REASON: Content-Type: text/plain;charset=utf8
reason
0

FAILED (0 failures, 0 errors and 1 unexpected successes in 0 testsuites)

A summary with detailed information can be found in:
  ./bin/ab/summary
ERROR: test failed with exit code 1

From stderr:
Traceback (most recent call last):
  File "/memdisk/kseeger/a41/b901549/samba/selftest/filter-subunit", line 93, in
<module>
    ret = subunithelper.parse_results(msg_ops, statistics, sys.stdin)
  File "/memdisk/kseeger/a41/b901549/samba/selftest/subunithelper.py", line 183,
in parse_results
    msg_ops.output_msg(l)
  File "/memdisk/kseeger/a41/b901549/samba/selftest/subunithelper.py", line 263,
in output_msg
    sys.stdout.write(msg)
IOError: [Errno 32] Broken pipe
/memdisk/kseeger/a41/b901549/samba/selftest/selftest.pl: PID[9659]: Got SIGPIPE
teardown environments.
teardown_env(plugin_s4_dc)
samba: EOF on stdin - terminating
samba child process 38790 exited with value 0
teardown_env(dc)
teardown_env(member)
smbd child process 32474 exited with value 0
nmbd child process 32472 exited with value 0
winbindd child process 32473 exited with value 0
teardown_env(s4member)
Exiting pid 33181 on SIGTERM
smbd child process 32655 exited with value 0
nmbd child process 32653 exited with value 0
winbindd child process 32654 exited with value 0
teardown_env(maptoguest)
smbd child process 34210 exited with value 0
nmbd child process 34208 exited with value 0
winbindd child process 34209 exited with value 0
teardown_env(s3dc)
smbd child process 27828 exited with value 0
nmbd child process 27826 exited with value 0
winbindd child process 27827 exited with value 0
teardown_env(simpleserver)
smbd child process 31850 exited with value 0
nmbd child process 31848 exited with value 0
winbindd child process 31849 exited with value 0
teardown_env(ktest)
smbd child process 34000 exited with value 0
nmbd child process 33998 exited with value 0
winbindd child process 33999 exited with value 0
perl(9659)-+-pstree(61865)
           |-samba(28187)
           `-samba(33181)
/memdisk/kseeger/a41/b901549/samba/selftest/selftest.pl: PID[9659]: Exiting...
make: *** [test] Error 1
Comment 7 Volker Lendecke 2014-09-29 08:18:44 UTC
Created attachment 10318 [details]
Patch
Comment 8 Volker Lendecke 2014-09-29 12:45:26 UTC
(In reply to comment #7)
> Created attachment 10318 [details]
> Patch

This patch should fix it
Comment 9 Jeremy Allison 2014-09-29 18:30:16 UTC
Comment on attachment 10318 [details]
Patch

Ooohh. Unexpected success is good :-).

LGTM !
Comment 10 Jeremy Allison 2014-09-29 18:31:03 UTC
Re-assigning back to Karolin for inclusion in 4.1.next. A test that was previously failing now passing is always good :-).
Comment 11 Karolin Seeger 2014-09-29 19:03:37 UTC
Thanks a lot, Volker and Jeremy!
Comment 12 Karolin Seeger 2014-09-29 19:03:56 UTC
Pushed both patches to autobuild-v4-1-test.
Comment 13 Hemanth 2014-09-29 23:21:24 UTC
*** Bug 10283 has been marked as a duplicate of this bug. ***
Comment 14 Karolin Seeger 2014-09-30 18:32:27 UTC
Pushed to v4-1-test.
Closing out bug report.

Thanks!