I see the same error on 3 almost identical (but completely separated) servers. They are all running Sernet Samba 4.1.4-7 on Ubuntu 12.04.4 LTS. They are all stand-alone domain controllers with a few file shares. The busiest one of them (+/- 15 concurrent users) had 17 of these crashes during the past hour. I have attached a fragment of the logfile that shows the error and a stack trace. I can provide full log files with a higher log level but i can't post them on a public site because they contain sensitive data (user names, file names, etc.). Not completely sure if this is related: we are also currently having lots of problems with stubborn file locks on .mdb databases and exe files on the shares and even on policy xml files on sysvol. We have only been able to release them by looking up the smbd process that has the file open and doing a "kill -9" on it.
Created attachment 9636 [details] Fragment of log.smbd that shows the error.
Created attachment 9637 [details] smb.conf
Ok, anyone on the CC list able to run this with debug symbols or (even better) with debug symbols under valgrind? Also, someone able to get us a network trace? Thanks, Volker
Created attachment 9638 [details] full backtrace with debugging symbols
I have attached one backtrace with debug symbols. More can be found @ http://oele.net/smbd-backtraces/
I've just seen a similar crash on my Debian Wheezy machine (as mentioned on the samba mailing list on Monday). I've downgraded it to 4.0.14-8, but the crashes are not definitely gone as it seems. The log attached shows the crash for this 4.0 version - sorry that it's log level 3, I've increased it to 10 waiting for the next crash. Just to mention that not only 4.1 seems to be affected.
Created attachment 9645 [details] Samba 4.0(!) log fragment for similar crash
Unfortunately i seem to be unable to get any useful output from valgrind. The server does not have enough memory to reproduce the problem when samba is running in valgrind. The machine either completely runs out of memory and becomes unresponsive or, when i limit the number of smb processes, it keeps on running without the error occurring. Any ideas? What kind of network trace is needed? BTW, this is really starting to become a huge problem for us. If there is anything i can do to accelerate the resolution of this bug, *please* let me know. I'm also online on the samba irc channels as 'Oele'.
My workaround for the moment was further downgrading, at the moment I'm using 4.0.12 and this seems to run stable. Don't know if this is an option for you at the moment.
Created attachment 9646 [details] preliminary patch Metze a few days ago fixed one talloc hierarchy problem. Although I haven't positively verified that this patch fixes this particular bug, I would be happy about feedback whether this patch does anything good/bad for you here.
Philip: Unfortunately i cannot find ubuntu or debian packages of that version on the sernet site. Did you compile samba from source? Volker: Unfortunately that patch does not solve the problem. I guess there are two ways to get to the bottom of this problem: 1) find out how to reproduce this in a test environment. I don't know how, but i could start by adding more Win 8.1 clients to my test domain and try doing 'random things' on them. and/or 2) make sure my production server has enough memory to run valgrind. if needed i can give you shell access to this box. This option seems to be the most certain one. Do you agree? Any other ideas?
Hello, same mentioned problems for me as well. Server: Ubuntu 12.04 64bit - Samba 4.1.4 (PDC) Clients: 5 x Win 8.1 Pro 64bit, 3 x Win 7 Pro 64bit Sander: I've 5 Windows 8.1 Pro Clients an it's a matter of minutes before errors start to show when all of them are in use. I'm currently working on better access so I can contribute logs or a live enviroment to test, so please mention when something is needed. I will try my best.
Hello, same problems on my installation: Sernet Samba 4.1.4 on OpenSUSE 12.1 as AD DC. It's migrated from a 3.6.x -> 4.1.4 then done a classicupgrade and now it runs as AD DC. I had no problems with 3.6.x and the same clients. No Problems with the 30 Windows 7 Clients. But smbd crashes at logon from the 3 Windows 8.1 Clients. After setting the Windows 8.1 Clients to smb1 only (http://support.microsoft.com/kb/2696547/de) the Win 8.1 clients can logon without crashing the smbd. I can not provide a crashdump because this a production environment and in my test environment I have no Win 8.1 yet :-(
Created attachment 9656 [details] Valgrind output Replaced the server that had max 4 GB memory with another machine that has 26 GB. I am able to run samba in valgrind now without the server collapsing. The problem does not occur when valgrind is running. As soon as i run samba without valgrind, the problem occurs within 1-2 minutes. Valgrind does show the attached output when running with "valgrind --leak-check=full --trace-children=yes samba". Don't know how useful this is?
Created attachment 9658 [details] another valgrind log More valgrind output; this time samba was running with "-i M single". This one does show "definitely lost" errors.
Created attachment 9659 [details] another valgrind log Sorry, this is the correct log.
Created attachment 9660 [details] Patch This should fix a memory buffer overwrite which *might* affect this. Please give it a try, the valgrind output was very valuable for this, thanks!
I get this compilation error. Looks like SVAL expects only 2 arguments? [2824/3853] Compiling source3/smbd/smb2_notify.c ../source3/smbd/smb2_ioctl_network_fs.c: In function ‘fsctl_validate_neg_info’: ../source3/smbd/smb2_ioctl_network_fs.c:397:62: error: macro "SVAL" passed 3 arguments, but takes just 2 ../source3/smbd/smb2_ioctl_network_fs.c:397:2: error: ‘SVAL’ undeclared (first use in this function) ../source3/smbd/smb2_ioctl_network_fs.c:397:2: note: each undeclared identifier is reported only once for each function it appears in ../source3/smbd/smb2_ioctl_network_fs.c:398:56: error: macro "SVAL" passed 3 arguments, but takes just 2 Waf: Leaving directory `/root/sernet-samba-src/samba-4.1.4/bin' Build failed: -> task failed (err #1): {task: cc smb2_ioctl_network_fs.c -> smb2_ioctl_network_fs_91.o} make[2]: *** [all] Error 1 make[2]: Leaving directory `/root/sernet-samba-src/samba-4.1.4' make[1]: *** [override_dh_auto_build] Error 2 make[1]: Leaving directory `/root/sernet-samba-src/samba-4.1.4' make: *** [build] Error 2
gna, it should be SSVAL instead of SVAL. Patch to follow :-)
Created attachment 9661 [details] Patch next try ;-)
Well, that might have been it. It's been running for over 30 minutes without a single crash now. Will keep you posted! Thanks a lot for your effort, it's really appreciated!
Just for the sake of completeness: You can download "old" versions from SerNet here: https://download.sernet.de/packages/samba/old/ ; I just added the lines deb "https://XXX:XXX@download.sernet.de/packages/samba/old/4.0/deb/4.0.12-8/debian" wheezy main deb-src "https://XXX:XXX@download.sernet.de/packages/samba/old/4.0/deb/4.0.12-8/debian" wheezy main to my /etc/apt/sources.list and did the upgrade using aptitude.
Thanks Philipp, that's good to know! I still haven't seen a single crash; it has been running for 4 hours now. I'm a bit reluctant to draw definitive conclusions yet because there are no 'real' users right now - only idle workstations that are doing stuff on the server for whatever reason (?). On the other hand, during the last few nights that was enough to crash the server every few minutes. If the system 'survives' next monday i'm pretty confident that the problem is fixed without any side effects ;)
I have same error with Samba 4.1.4. on version 4.1.2 was not seen problems. *** Error in `/usr/bin/smbd': free(): invalid next size (fast): 0x00007f94c05f30d0 *** *** Error in `/usr/bin/smbd': malloc(): memory corruption: 0x00007f94c05f3150 *** The error appears on the shares with a lot of files or folders. I have 2 shares with a lot of files in the share`s root , if I disable the biggest of them is no more faults occur. And the error is presented without even opening the most shares, but just when entering the server. And it should probably clarify the problem is in Windows 8 (or rather I 8.1).
Created attachment 9668 [details] first error log debug level 10
Created attachment 9669 [details] second error log debug level 10
(In reply to comment #24) > I have same error with Samba 4.1.4. > on version 4.1.2 was not seen problems. > > > *** Error in `/usr/bin/smbd': free(): invalid next size (fast): > 0x00007f94c05f30d0 *** > *** Error in `/usr/bin/smbd': malloc(): memory corruption: 0x00007f94c05f3150 > *** > > The error appears on the shares with a lot of files or folders. > > I have 2 shares with a lot of files in the share`s root , if I disable the > biggest of them is no more faults occur. And the error is presented without > even opening the most shares, but just when entering the server. > > And it should probably clarify the problem is in Windows 8 (or rather I 8.1). Is this with or without the patch in https://bugzilla.samba.org/attachment.cgi?id=9661 ?
(In reply to comment #27) > (In reply to comment #24) > > I have same error with Samba 4.1.4. > > on version 4.1.2 was not seen problems. > > > > > > *** Error in `/usr/bin/smbd': free(): invalid next size (fast): > > 0x00007f94c05f30d0 *** > > *** Error in `/usr/bin/smbd': malloc(): memory corruption: 0x00007f94c05f3150 > > *** > > > > The error appears on the shares with a lot of files or folders. > > > > I have 2 shares with a lot of files in the share`s root , if I disable the > > biggest of them is no more faults occur. And the error is presented without > > even opening the most shares, but just when entering the server. > > > > And it should probably clarify the problem is in Windows 8 (or rather I 8.1). > > Is this with or without the patch in > > https://bugzilla.samba.org/attachment.cgi?id=9661 > > ? I applied the patch (Debian testing). Tested with several "real" and virtualized Win 7 and 8.1 Clients; no problems so far any more.
We haven't seen any problems today. I think we can consider this bug fixed! Thank you Volker!
Created attachment 9671 [details] git-am for for 4.1.next. Cherry-pick from master.
Created attachment 9672 [details] git-am fix for 4.0.next. Back-ported to 4.0.next (location changed). Volker please review ! Thank, Jeremy.
Karo, please pick the two patches for 4.1 and 4.0. Thanks, Volker
*** Bug 10441 has been marked as a duplicate of this bug. ***
(In reply to comment #32) > Karo, please pick the two patches for 4.1 and 4.0. > > Thanks, > > Volker Pushed to autobuild-v4-1-test and autobuild-v4-0-test.
Pushed to v4-1-test and v4-0-test. Closing out bug report. Thanks!
*** Bug 10408 has been marked as a duplicate of this bug. ***