Bug 14672 - smbd panic when two clients open same file
Summary: smbd panic when two clients open same file
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.12.9
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-15 17:21 UTC by Lev
Modified: 2021-04-20 10:09 UTC (History)
2 users (show)

See Also:


Attachments
smbd logs (219.89 KB, application/x-zip-compressed)
2021-03-15 17:21 UTC, Lev
no flags Details
smbd 4.14.0 logs (2.13 MB, application/x-zip-compressed)
2021-03-16 08:46 UTC, Lev
no flags Details
Possible patch for master (and older versions) (1.32 KB, patch)
2021-03-16 20:41 UTC, Ralph Böhme
no flags Details
Possible patch for master (and older versions) (3.39 KB, patch)
2021-03-17 15:37 UTC, Ralph Böhme
no flags Details
git-am fix for 4.14.next, 4.13.next, 4.12.next. (3.98 KB, patch)
2021-03-18 18:18 UTC, Jeremy Allison
slow: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lev 2021-03-15 17:21:24 UTC
Created attachment 16549 [details]
smbd logs

I have an smb share with enabled kernel oplocks. Two clients open/close the same file in a loop:

>>> while True:
...  f = open('s:/file')
...  f.close()

One of the smbd processes panics on double-free:

[2021/03/15 18:11:58.687372,  0, pid=26201, effective(65534, 65534), real(65534, 0)] ../../source3/lib/popt_common.c:68(popt_s3_talloc_log_fn)
  talloc: access after free error - first free may be at ../../source3/smbd/open.c:2909
[2021/03/15 18:11:58.687839,  0, pid=26201, effective(65534, 65534), real(65534, 0)] ../../source3/lib/popt_common.c:68(popt_s3_talloc_log_fn)
  Bad talloc magic value - access after free

(gdb) bt
#0  0x00007f2f1c0ebf47 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f2f1c0ed8b1 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f2f1feec915 in dump_core () at ../../source3/lib/dumpcore.c:338
#3  0x00007f2f1fedc240 in smb_panic_s3 (why=0x7f2f1f872680 "Bad talloc magic value - access after free") at ../../source3/lib/util.c:861
#4  0x00007f2f209ce6e0 in smb_panic (why=0x7f2f1f872680 "Bad talloc magic value - access after free") at ../../lib/util/fault.c:174
#5  0x00007f2f1f86dcb1 in talloc_abort (reason=0x7f2f1f872680 "Bad talloc magic value - access after free") at ../../lib/talloc/talloc.c:508
#6  0x00007f2f1f86dcc4 in talloc_abort_access_after_free () at ../../lib/talloc/talloc.c:513
#7  0x00007f2f1f86dd49 in talloc_chunk_from_ptr (ptr=0x7f2f04d05d00) at ../../lib/talloc/talloc.c:530
#8  0x00007f2f1f870171 in _talloc_free (ptr=0x7f2f04d05d00, location=0x7f2f206bd480 "../../source3/smbd/open.c:2811") at ../../lib/talloc/talloc.c:1767
#9  0x00007f2f204d1dac in poll_open_fn (ev=0x7f2f04cb2560, te=0x7f2f04d1cf60, current_time=..., private_data=0x7f2f04da33a0) at ../../source3/smbd/open.c:2811
#10 0x00007f2f1f25d9fc in tevent_common_invoke_timer_handler (te=0x7f2f04d1cf60, current_time=..., removed=0x0) at ../../lib/tevent/tevent_timed.c:370
#11 0x00007f2f1f25dbff in tevent_common_loop_timer_delay (ev=0x7f2f04cb2560) at ../../lib/tevent/tevent_timed.c:442
#12 0x00007f2f1f25f8b9 in epoll_event_loop (epoll_ev=0x7f2f04c2c580, tvalp=0x7fff85b401a0) at ../../lib/tevent/tevent_epoll.c:667
#13 0x00007f2f1f2601cf in epoll_event_loop_once (ev=0x7f2f04cb2560, location=0x7f2f206cd478 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent_epoll.c:937
#14 0x00007f2f1f25c931 in std_event_loop_once (ev=0x7f2f04cb2560, location=0x7f2f206cd478 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent_standard.c:110
#15 0x00007f2f1f2542a4 in _tevent_loop_once (ev=0x7f2f04cb2560, location=0x7f2f206cd478 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent.c:772
#16 0x00007f2f1f2545f6 in tevent_common_loop_wait (ev=0x7f2f04cb2560, location=0x7f2f206cd478 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent.c:895
#17 0x00007f2f1f25c9d3 in std_event_loop_wait (ev=0x7f2f04cb2560, location=0x7f2f206cd478 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent_standard.c:141
#18 0x00007f2f1f254699 in _tevent_loop_wait (ev=0x7f2f04cb2560, location=0x7f2f206cd478 "../../source3/smbd/process.c:4170") at ../../lib/tevent/tevent.c:914
#19 0x00007f2f205056a1 in smbd_process (ev_ctx=0x7f2f04cb2560, msg_ctx=0x7f2f04cf9320, sock_fd=9, interactive=false) at ../../source3/smbd/process.c:4170
#20 0x0000563b62d082bb in smbd_accept_connection (ev=0x7f2f04cb2560, fde=0x7f2f04cfa9e0, flags=1, private_data=0x7f2f04d656c0) at ../../source3/smbd/server.c:1019

The issue is easily reproducible if I add sleep(1) into poll_open_done() and run the simple script above from two clients.

The 1st free was in poll_open_done(). I guess the problem is that poll_open_done() frees open_rec (and its child open_rec->watch_req), but not open_rec->te. Deferred open_file() can fail again with NT_STATUS_NETWORK_BUSY, so the open will be deferred for the 2nd time. But now the original timer can fire and call poll_open_fn() that will try to free open_rec->watch_req again.

Attached are logs from both smbd processes.
Comment 1 Ralph Böhme 2021-03-15 18:57:00 UTC
With which versions are you getting the crashes?
Comment 2 Lev 2021-03-16 08:45:59 UTC
(In reply to Ralph Böhme from comment #1)

I was using 4.12.9, but now reproduced also with 4.14.0. Attached new logs.
-Lev.
Comment 3 Lev 2021-03-16 08:46:35 UTC
Created attachment 16550 [details]
smbd 4.14.0 logs
Comment 4 Ralph Böhme 2021-03-16 20:41:35 UTC
Created attachment 16552 [details]
Possible patch for master (and older versions)

Lev, can you test with this patch? It fixes the issue in my environment where I was able to reproduce the problem.
Comment 5 Ralph Böhme 2021-03-17 15:37:26 UTC
Created attachment 16554 [details]
Possible patch for master (and older versions)

Previous version was incomplete and did not fix the root cause of the crash, this one should do it.

The core fix is making sure we don't leave a dangling pointer in poll_open_done() when freeing the sharemode watch subreq. We have a copy of that pointer we just freed in open_rec->watch_req and the open retry timer function access this pointer to free the watch_req which will either crash or do things much worse, like freeing a completely random talloc object (seen crashes caused by this on another system).
Comment 6 Lev 2021-03-18 08:51:58 UTC
(In reply to Ralph Böhme from comment #5)
Thanks, Ralph, I applied the patchset in my environment and it fixed the panic.
-Lev.
Comment 7 Samba QA Contact 2021-03-18 18:05:04 UTC
This bug was referenced in samba master:

065ed088b3d5710c288e46a5bf1e063f9a29c8cc
171a58ff3e8ee07cf5d7af08eabcb4a7379e7ce5
591c9196962b695b01c0d86918b8f8a263e9665c
Comment 8 Jeremy Allison 2021-03-18 18:18:17 UTC
Created attachment 16557 [details]
git-am fix for 4.14.next, 4.13.next, 4.12.next.

Cherry-picked from master.
Comment 9 Ralph Böhme 2021-03-18 18:54:05 UTC
Reassigning to Karolin for inclusion in 4.13 and 4.14.
Comment 10 Karolin Seeger 2021-03-31 09:10:13 UTC
Pushed to autobuild-v4-{14,13}-test.
Comment 11 Samba QA Contact 2021-03-31 10:14:12 UTC
This bug was referenced in samba v4-13-test:

f8d67bc3d7d4c2cf1a16a67072fdd097044072dd
27cd9103dc68bf5a23026eb1be75127f0bc831cd
42e7b36454db64120b9940c42592f2fa6d668ad7
Comment 12 Samba QA Contact 2021-03-31 11:11:45 UTC
This bug was referenced in samba v4-14-test:

da71738e987310f19d63d2b575c354156dadbf8f
ebec84c886e7d7807609a3492e02fd1347898aba
051585ef3616fc845ebbcbcf18c46f9d0cab2b00
Comment 13 Karolin Seeger 2021-04-01 10:28:54 UTC
Pushed to both branches.
Closing out bug report.

Thanks!
Comment 14 Samba QA Contact 2021-04-20 10:09:42 UTC
This bug was referenced in samba v4-14-stable (Release samba-4.14.3):

da71738e987310f19d63d2b575c354156dadbf8f
ebec84c886e7d7807609a3492e02fd1347898aba
051585ef3616fc845ebbcbcf18c46f9d0cab2b00