The Samba-Bugzilla – Bug 4276
corrupted tdb file causes loop in tdb_traverse, hence 100% CPU used by smbd or/and nmbd
Last modified: 2009-07-15 04:00:13 UTC
Our servers use samba-3.0.10-1.4E.2, from time to time smbd or/and nmbd use 100% cpu resource, even at weekend when the load is very low(basically no user using the server). As I know there were many reports about similiar problem(high cpu usage by smbd&nmbd) and it was doubted as file lock(spin lock) problem.
But when I "tdbdumb /var/cache/samba/gencache.tdb" and found it outputed info infinetely though gencache.tdb was only 49K. The same file on other server is ok.
Obviously gencache.tdb was corrupted and tdbdump ran madly with the corrupted file. The reason gencache.tdb was corrupted might be an other bug at least on samba-3.0.10 and I am not sure it had been solved or not.
I looked into the codes in tdb directory and found it is because the corrupted tdb file causing loop in tdb_traverse() in tdb.c, the further reason is tdb_next_lock(), called in tdb_traverse(), keeps returning with two repeating offsets of two records in the corrupted file. No wonder tdbdump was looping as well... The same situation happens on smbd and nmbd, which makes smbd&nmbd
comsume all CPU resource!
The same thing happened when I tested the corupted file with samba-3.0.23d. Maybe samba developers have already fixed the tdb file corrupted problem. But definately it is a bug when samba faces corrupted tdb file...
As the information in gencache.tdb may be sensitive, if required I would like to provide it if it is approved by my boss :)
Pls let me know if you need further information.
I checked the corrupted tdb file, the reason caused loop is two records's "(tdb_off ) next" point to each other.
So maybe it is not unappropriate to say it's a bug of samba dealing with such kind of corrupted tdb file.
The strange is how samba create such "loop" tdb file :)
(In reply to comment #1)
> I checked the corrupted tdb file, the reason caused loop is two records's
> "(tdb_off ) next" point to each other.
> So maybe it is not unappropriate to say it's a bug of samba dealing with such
> kind of corrupted tdb file.
> The strange is how samba create such "loop" tdb file :)
So maybe it is not unappropriate
So maybe it is unappropriate
Just to ack this bug: Yes, it is known. Jeremy Allison is right now working on that issue.
(In reply to comment #3)
> Just to ack this bug: Yes, it is known. Jeremy Allison is right now working on
> that issue.
Thanks Volker and Jeremy.
I would try my best to do more testing as posibble--as the problem happens on our production servers, maybe I can't do too much testing(like runing newly compiled testing code etc) on the servers.
But on the other hand maybe it is also not easy to simulate the production enviroment to produce the "loop" tdb files, as the "loop" tdb file doesn't happen frequently on production servers...
Your version sounds like RedHat, so it's unlikely that you have your tdb's on reiserfs. Just wanted to ask, because reiserfs likes a tdb for lunch and another one for dinner.... :-)
Our server is Red Hat Enterprise Linux ES release 4 (Nahant Update 3) with kernal
2.6.9-34.ELsmp and samba is samba-3.0.10-1.4E.2, filesystem is ext3
I have the same problem.
In samba 3.0.32 kernel 22.214.171.124 Slackware 11
A power cut saves a corrupted /var/cache/samba/printing/lp.tdb and
smbd uses % 99.9 of CPU.
I must kill smbd, nmbd and delete lp.tdb.
Then restart samba and all works fine again.
*** This bug has been marked as a duplicate of bug 5105 ***