Bug 4276 - corrupted tdb file causes loop in tdb_traverse, hence 100% CPU used by smbd or/and nmbd
Summary: corrupted tdb file causes loop in tdb_traverse, hence 100% CPU used by smbd o...
Status: RESOLVED DUPLICATE of bug 5105
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: nmbd (show other bugs)
Version: 3.0.23d
Hardware: x86 Linux
: P3 major
Target Milestone: none
Assignee: Volker Lendecke
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-12-01 23:52 UTC by Hank Lin
Modified: 2009-07-15 04:00 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hank Lin 2006-12-01 23:52:18 UTC
Our servers use samba-3.0.10-1.4E.2, from time to time smbd or/and nmbd use 100% cpu resource, even at weekend when the load is very low(basically no user using the server). As I know there were many reports about similiar problem(high cpu usage by smbd&nmbd) and it was doubted as file lock(spin lock) problem. 

But when I "tdbdumb /var/cache/samba/gencache.tdb" and found it outputed info infinetely though gencache.tdb was only 49K. The same file on other server is ok.

Obviously gencache.tdb was corrupted and tdbdump ran madly with the corrupted file. The reason gencache.tdb was corrupted might be an other bug at least on samba-3.0.10 and I am not sure it had been solved or not.

I looked into the codes in tdb directory and found it is because the corrupted tdb file causing loop in tdb_traverse() in tdb.c, the further reason is tdb_next_lock(), called in tdb_traverse(), keeps returning with two repeating offsets of two records in the corrupted file. No wonder tdbdump was looping as well...  The same situation happens on smbd and nmbd, which makes smbd&nmbd  
comsume all CPU resource!

The same thing happened when I tested the corupted file with samba-3.0.23d. Maybe samba developers have already fixed the tdb file corrupted problem. But definately it is a bug when samba faces corrupted tdb file...

As the information in gencache.tdb may be sensitive, if required I would like to provide it if it is approved by my boss :)

Pls let me know if you need further information.
Comment 1 Hank Lin 2006-12-02 05:33:19 UTC
I checked the corrupted tdb file, the reason caused loop is two records's "(tdb_off ) next" point to each other.

So maybe it is not unappropriate to say it's a bug of samba dealing with such kind of corrupted tdb file.

The strange is how samba create such "loop" tdb file :)
Comment 2 Hank Lin 2006-12-02 05:34:41 UTC
(In reply to comment #1)
> I checked the corrupted tdb file, the reason caused loop is two records's
> "(tdb_off ) next" point to each other.
> 
> So maybe it is not unappropriate to say it's a bug of samba dealing with such
> kind of corrupted tdb file.
> 
> The strange is how samba create such "loop" tdb file :)
> 

So maybe it is not unappropriate 
-----should be:
So maybe it is unappropriate 
Comment 3 Volker Lendecke 2006-12-02 05:51:13 UTC
Just to ack this bug: Yes, it is known. Jeremy Allison is right now working on that issue.

Volker
Comment 4 Hank Lin 2006-12-02 20:33:07 UTC
(In reply to comment #3)
> Just to ack this bug: Yes, it is known. Jeremy Allison is right now working on
> that issue.
> 
> Volker
> 

Thanks Volker and Jeremy. 

I would try my best to do more testing as posibble--as the problem happens on our production servers, maybe I can't do too much testing(like runing newly compiled testing code etc) on the servers.

But on the other hand maybe it is also not easy to simulate the production enviroment to produce the "loop" tdb files, as the "loop" tdb file doesn't happen frequently on production servers...
Comment 5 Volker Lendecke 2006-12-03 02:30:20 UTC
Your version sounds like RedHat, so it's unlikely that you have your tdb's on reiserfs. Just wanted to ask, because reiserfs likes a tdb for lunch and another one for dinner.... :-)

Volker
Comment 6 Hank Lin 2006-12-03 16:16:32 UTC
Yes, Volker.

Our server is Red Hat Enterprise Linux ES release 4 (Nahant Update 3) with kernal 
 2.6.9-34.ELsmp and samba is samba-3.0.10-1.4E.2, filesystem is ext3
Comment 7 Claudio Romero 2009-06-29 19:09:13 UTC
I have the same problem.
In samba 3.0.32 kernel 2.4.36.6 Slackware 11

A power cut saves a corrupted /var/cache/samba/printing/lp.tdb and
smbd uses % 99.9 of CPU.

I must kill smbd, nmbd and delete lp.tdb.
Then restart samba and all works fine again.

regards
Claudio Romero
Comment 8 Volker Lendecke 2009-07-15 04:00:13 UTC

*** This bug has been marked as a duplicate of bug 5105 ***