Bug 10415 - *** glibc detected *** /usr/sbin/smbd: free(): invalid next size (fast)
Summary: *** glibc detected *** /usr/sbin/smbd: free(): invalid next size (fast)
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.1.4
Hardware: x64 Linux
: P5 major (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
: 10408 10441 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-02-03 15:57 UTC by Sander Plas
Modified: 2014-03-11 21:02 UTC (History)
6 users (show)

See Also:


Attachments
Fragment of log.smbd that shows the error. (149.35 KB, text/plain)
2014-02-03 15:58 UTC, Sander Plas
no flags Details
smb.conf (1.16 KB, text/plain)
2014-02-03 16:01 UTC, Sander Plas
no flags Details
full backtrace with debugging symbols (14.69 KB, text/plain)
2014-02-04 16:46 UTC, Sander Plas
no flags Details
Samba 4.0(!) log fragment for similar crash (88.92 KB, text/plain)
2014-02-05 15:25 UTC, Philipp Thunen
no flags Details
preliminary patch (1.22 KB, patch)
2014-02-06 15:48 UTC, Volker Lendecke
no flags Details
Valgrind output (75.85 KB, text/plain)
2014-02-07 14:27 UTC, Sander Plas
no flags Details
another valgrind log (112.54 KB, text/plain)
2014-02-07 15:16 UTC, Sander Plas
no flags Details
another valgrind log (334.03 KB, text/plain)
2014-02-07 15:22 UTC, Sander Plas
no flags Details
Patch (1.05 KB, patch)
2014-02-07 15:39 UTC, Volker Lendecke
no flags Details
Patch (1.04 KB, patch)
2014-02-07 16:03 UTC, Volker Lendecke
no flags Details
first error log debug level 10 (378.25 KB, text/plain)
2014-02-08 11:31 UTC, Vladimir
no flags Details
second error log debug level 10 (350.48 KB, text/plain)
2014-02-08 11:33 UTC, Vladimir
no flags Details
git-am for for 4.1.next. (1.29 KB, patch)
2014-02-10 18:09 UTC, Jeremy Allison
vl: review+
Details
git-am fix for 4.0.next. (1.21 KB, patch)
2014-02-10 18:10 UTC, Jeremy Allison
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sander Plas 2014-02-03 15:57:57 UTC
I see the same error on 3 almost identical (but completely separated) servers. They are all running Sernet Samba 4.1.4-7 on Ubuntu 12.04.4 LTS. 

They are all stand-alone domain controllers with a few file shares. 

The busiest one of them (+/- 15 concurrent users) had 17 of these crashes during the past hour. 

I have attached a fragment of the logfile that shows the error and a stack trace. 

I can provide full log files with a higher log level but i can't post them on a public site because they contain sensitive data (user names, file names, etc.). 

Not completely sure if this is related: we are also currently having lots of problems with stubborn file locks on .mdb databases and exe files on the shares and even on policy xml files on sysvol. We have only been able to release them by looking up the smbd process that has the file open and doing a "kill -9" on it.
Comment 1 Sander Plas 2014-02-03 15:58:44 UTC
Created attachment 9636 [details]
Fragment of log.smbd that shows the error.
Comment 2 Sander Plas 2014-02-03 16:01:08 UTC
Created attachment 9637 [details]
smb.conf
Comment 3 Volker Lendecke 2014-02-04 11:05:23 UTC
Ok, anyone on the CC list able to run this with debug symbols or (even better) with debug symbols under valgrind? Also, someone able to get us a network trace?

Thanks,

Volker
Comment 4 Sander Plas 2014-02-04 16:46:30 UTC
Created attachment 9638 [details]
full backtrace with debugging symbols
Comment 5 Sander Plas 2014-02-04 16:47:28 UTC
I have attached one backtrace with debug symbols. 

More can be found  @ http://oele.net/smbd-backtraces/
Comment 6 Philipp Thunen 2014-02-05 15:24:29 UTC
I've just seen a similar crash on my Debian Wheezy machine (as mentioned on the samba mailing list on Monday). I've downgraded it to 4.0.14-8, but the crashes are not definitely gone as it seems. The log attached shows the crash for this 4.0 version - sorry that it's log level 3, I've increased it to 10 waiting for the next crash. Just to mention that not only 4.1 seems to be affected.
Comment 7 Philipp Thunen 2014-02-05 15:25:06 UTC
Created attachment 9645 [details]
Samba 4.0(!) log fragment for similar crash
Comment 8 Sander Plas 2014-02-05 22:34:37 UTC
Unfortunately i seem to be unable to get any useful output from valgrind. 

The server does not have enough memory to reproduce the problem when samba is running in valgrind. 

The machine either completely runs out of memory and becomes unresponsive or, when i limit the number of smb processes, it keeps on running without the error occurring. 

Any ideas? 

What kind of network trace is needed? 

BTW, this is really starting to become a huge problem for us. If there is anything i can do to accelerate the resolution of this bug, *please* let me know. 

I'm also online on the samba irc channels as 'Oele'.
Comment 9 Philipp Thunen 2014-02-06 10:52:36 UTC
My workaround for the moment was further downgrading, at the moment I'm using 4.0.12 and this seems to run stable. Don't know if this is an option for you at the moment.
Comment 10 Volker Lendecke 2014-02-06 15:48:29 UTC
Created attachment 9646 [details]
preliminary patch

Metze a few days ago fixed one talloc hierarchy problem. Although I haven't positively verified that this patch fixes this particular bug, I would be happy about feedback whether this patch does anything good/bad for you here.
Comment 11 Sander Plas 2014-02-07 09:31:33 UTC
Philip: Unfortunately i cannot find ubuntu or debian packages of that version on the sernet site. Did you compile samba from source? 

Volker: Unfortunately that patch does not solve the problem. 

I guess there are two ways to get to the bottom of this problem: 

1) find out how to reproduce this in a test environment. I don't know how, but i could start by adding more Win 8.1 clients to my test domain and try doing 'random things' on them. 

and/or

2) make sure my production server has enough memory to run valgrind. if needed i can give you shell access to this box. This option seems to be the most certain one. 

Do you agree? Any other ideas?
Comment 12 Marc 2014-02-07 09:50:09 UTC
Hello, same mentioned problems for me as well. 

Server: Ubuntu 12.04 64bit - Samba 4.1.4 (PDC)
Clients: 5 x Win 8.1 Pro 64bit, 3 x Win 7 Pro 64bit

Sander: I've 5 Windows 8.1 Pro Clients an it's a matter of minutes before errors start to show when all of them are in use.

I'm currently working on better access so I can contribute logs or a live enviroment to test, so please mention when something is needed. I will try my best.
Comment 13 Ingo Göppert 2014-02-07 10:29:31 UTC
Hello, same problems on my installation: Sernet Samba 4.1.4 on OpenSUSE 12.1 as AD DC. It's migrated from a 3.6.x -> 4.1.4 then done a classicupgrade and now it runs as AD DC. I had no problems with 3.6.x and the same clients.

No Problems with the 30 Windows 7 Clients. But smbd crashes at logon from the 3 Windows 8.1 Clients. After setting the Windows 8.1 Clients to smb1 only (http://support.microsoft.com/kb/2696547/de) the Win 8.1 clients can logon without crashing the smbd.

I can not provide a crashdump because this a production environment and in my test environment I have no Win 8.1 yet :-(
Comment 14 Sander Plas 2014-02-07 14:27:15 UTC
Created attachment 9656 [details]
Valgrind output

Replaced the server that had max 4 GB memory with another machine that has 26 GB. 

I am able to run samba in valgrind now without the server collapsing. 

The problem does not occur when valgrind is running. As soon as i run samba without valgrind, the problem occurs within 1-2 minutes. 

Valgrind does show the attached output when running with "valgrind --leak-check=full --trace-children=yes samba". Don't know how useful this is?
Comment 15 Sander Plas 2014-02-07 15:16:30 UTC
Created attachment 9658 [details]
another valgrind log

More valgrind output; this time samba was running with "-i M single". This one does show "definitely lost" errors.
Comment 16 Sander Plas 2014-02-07 15:22:21 UTC
Created attachment 9659 [details]
another valgrind log

Sorry, this is the correct log.
Comment 17 Volker Lendecke 2014-02-07 15:39:01 UTC
Created attachment 9660 [details]
Patch

This should fix a memory buffer overwrite which *might* affect this.

Please give it a try, the valgrind output was very valuable for this, thanks!
Comment 18 Sander Plas 2014-02-07 15:51:15 UTC
I get this compilation error. Looks like SVAL expects only 2 arguments? 

[2824/3853] Compiling source3/smbd/smb2_notify.c
../source3/smbd/smb2_ioctl_network_fs.c: In function ‘fsctl_validate_neg_info’:
../source3/smbd/smb2_ioctl_network_fs.c:397:62: error: macro "SVAL" passed 3 arguments, but takes just 2
../source3/smbd/smb2_ioctl_network_fs.c:397:2: error: ‘SVAL’ undeclared (first use in this function)
../source3/smbd/smb2_ioctl_network_fs.c:397:2: note: each undeclared identifier is reported only once for each function it appears in
../source3/smbd/smb2_ioctl_network_fs.c:398:56: error: macro "SVAL" passed 3 arguments, but takes just 2
Waf: Leaving directory `/root/sernet-samba-src/samba-4.1.4/bin'
Build failed:  -> task failed (err #1): 
	{task: cc smb2_ioctl_network_fs.c -> smb2_ioctl_network_fs_91.o}
make[2]: *** [all] Error 1
make[2]: Leaving directory `/root/sernet-samba-src/samba-4.1.4'
make[1]: *** [override_dh_auto_build] Error 2
make[1]: Leaving directory `/root/sernet-samba-src/samba-4.1.4'
make: *** [build] Error 2
Comment 19 Volker Lendecke 2014-02-07 15:59:37 UTC
gna, it should be SSVAL instead of SVAL. Patch to follow :-)
Comment 20 Volker Lendecke 2014-02-07 16:03:27 UTC
Created attachment 9661 [details]
Patch

next try ;-)
Comment 21 Sander Plas 2014-02-07 16:58:54 UTC
Well, that might have been it. It's been running for over 30 minutes without a single crash now. 

Will keep you posted! Thanks a lot for your effort, it's really appreciated!
Comment 22 Philipp Thunen 2014-02-07 19:06:19 UTC
Just for the sake of completeness: You can download "old" versions from SerNet here: https://download.sernet.de/packages/samba/old/ ; I just added the lines
deb "https://XXX:XXX@download.sernet.de/packages/samba/old/4.0/deb/4.0.12-8/debian" wheezy main
deb-src "https://XXX:XXX@download.sernet.de/packages/samba/old/4.0/deb/4.0.12-8/debian" wheezy main
to my /etc/apt/sources.list and did the upgrade using aptitude.
Comment 23 Sander Plas 2014-02-07 20:24:10 UTC
Thanks Philipp, that's good to know! 

I still haven't seen a single crash; it has been running for 4 hours now. 

I'm a bit reluctant to draw definitive conclusions yet because there are no 'real' users right now - only idle workstations that are doing stuff on the server for whatever reason (?). 

On the other hand, during the last few nights that was enough to crash the server every few minutes. 

If the system 'survives' next monday i'm pretty confident that the problem is fixed without any side effects ;)
Comment 24 Vladimir 2014-02-08 11:25:58 UTC
I have same error with Samba 4.1.4.
on version 4.1.2 was not seen problems.


*** Error in `/usr/bin/smbd': free(): invalid next size (fast): 0x00007f94c05f30d0 ***
*** Error in `/usr/bin/smbd': malloc(): memory corruption: 0x00007f94c05f3150 ***

The error appears on the shares with a lot of files or folders.

I have 2 shares with a lot of files in the share`s root , if I disable the biggest of them is no more faults occur. And the error is presented without even opening the most shares, but just when entering the server.

And it should probably clarify the problem is in Windows 8 (or rather I 8.1).
Comment 25 Vladimir 2014-02-08 11:31:44 UTC
Created attachment 9668 [details]
first error log debug level 10
Comment 26 Vladimir 2014-02-08 11:33:16 UTC
Created attachment 9669 [details]
second error log debug level 10
Comment 27 Volker Lendecke 2014-02-08 14:32:19 UTC
(In reply to comment #24)
> I have same error with Samba 4.1.4.
> on version 4.1.2 was not seen problems.
> 
> 
> *** Error in `/usr/bin/smbd': free(): invalid next size (fast):
> 0x00007f94c05f30d0 ***
> *** Error in `/usr/bin/smbd': malloc(): memory corruption: 0x00007f94c05f3150
> ***
> 
> The error appears on the shares with a lot of files or folders.
> 
> I have 2 shares with a lot of files in the share`s root , if I disable the
> biggest of them is no more faults occur. And the error is presented without
> even opening the most shares, but just when entering the server.
> 
> And it should probably clarify the problem is in Windows 8 (or rather I 8.1).

Is this with or without the patch in 

https://bugzilla.samba.org/attachment.cgi?id=9661

?
Comment 28 Stefan Buckmann 2014-02-09 12:17:49 UTC
(In reply to comment #27)
> (In reply to comment #24)
> > I have same error with Samba 4.1.4.
> > on version 4.1.2 was not seen problems.
> > 
> > 
> > *** Error in `/usr/bin/smbd': free(): invalid next size (fast):
> > 0x00007f94c05f30d0 ***
> > *** Error in `/usr/bin/smbd': malloc(): memory corruption: 0x00007f94c05f3150
> > ***
> > 
> > The error appears on the shares with a lot of files or folders.
> > 
> > I have 2 shares with a lot of files in the share`s root , if I disable the
> > biggest of them is no more faults occur. And the error is presented without
> > even opening the most shares, but just when entering the server.
> > 
> > And it should probably clarify the problem is in Windows 8 (or rather I 8.1).
> 
> Is this with or without the patch in 
> 
> https://bugzilla.samba.org/attachment.cgi?id=9661
> 
> ?

I applied the patch (Debian testing). Tested with several "real" and virtualized Win 7 and 8.1 Clients; no problems so far any more.
Comment 29 Sander Plas 2014-02-10 16:33:21 UTC
We haven't seen any problems today. I think we can consider this bug fixed! 

Thank you Volker!
Comment 30 Jeremy Allison 2014-02-10 18:09:48 UTC
Created attachment 9671 [details]
git-am for for 4.1.next.

Cherry-pick from master.
Comment 31 Jeremy Allison 2014-02-10 18:10:52 UTC
Created attachment 9672 [details]
git-am fix for 4.0.next.

Back-ported to 4.0.next (location changed).

Volker please review !

Thank,

Jeremy.
Comment 32 Volker Lendecke 2014-02-10 19:45:09 UTC
Karo, please pick the two patches for 4.1 and 4.0.

Thanks,

Volker
Comment 33 sascha 2014-02-13 15:32:11 UTC
*** Bug 10441 has been marked as a duplicate of this bug. ***
Comment 34 Karolin Seeger 2014-02-14 19:05:08 UTC
(In reply to comment #32)
> Karo, please pick the two patches for 4.1 and 4.0.
> 
> Thanks,
> 
> Volker

Pushed to autobuild-v4-1-test and autobuild-v4-0-test.
Comment 35 Karolin Seeger 2014-02-16 16:12:52 UTC
Pushed to v4-1-test and v4-0-test.
Closing out bug report.

Thanks!
Comment 36 Justin Maggard 2014-03-11 21:02:11 UTC
*** Bug 10408 has been marked as a duplicate of this bug. ***