4732 – infinite loop in tdb_store / tdb_allocate

Bug 4732 - infinite loop in tdb_store / tdb_allocate

Summary: infinite loop in tdb_store / tdb_allocate

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 3.0
Classification:	Unclassified
Component:	Printing (show other bugs)
Version:	3.0.25a
Hardware:	x64 Linux

Importance:	P3 major
Target Milestone:	none
Assignee:	Samba Bugzilla Account
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-06-22 17:38 UTC by Kris Jurka
Modified:	2012-01-09 11:05 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
tdb file it's looping over (28.00 KB, application/octet-stream) 2007-06-22 17:57 UTC, Kris Jurka	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kris Jurka 2007-06-22 17:38:07 UTC

When printing from a WinXP or Win2K machine to a CUPS exported printer I'm getting an infinite loop in tdb_store/tdb_allocate.  I attached to the stuck process with gdb and got a number of backtraces that seem to be looping around tdb_store / tdb_allocate.  This looks similar to Bug 4276, but I don't believe I have any corrupt tdb files.  That is "find /var -name '*.tdb' -exec tdbdump {} \;" doesn't go into an infinite loop.

https://bugzilla.samba.org/show_bug.cgi?id=4276

#0  0x000000000061db38 in rec_free_read (tdb=0xa6b6d0, off=24752,
    rec=0x7fffe552a900) at tdb/common/freelist.c:34
#1  0x000000000061e0bb in tdb_allocate (tdb=0xa6b6d0,
    length=<value optimized out>, rec=0x7fffe552a900)
    at tdb/common/freelist.c:289
#2  0x000000000061d2cc in tdb_store (tdb=0xa6b6d0, key=
      {dptr = 0x7fffe552a980 "UPDATING/ml1650", dsize = 15}, dbuf=
      {dptr = 0x7fffe552aaa0 "I#", dsize = 4}, flag=1) at tdb/common/tdb.c:514
#3  0x0000000000628b59 in set_updating_pid (sharename=0x7fffe552b990 "ml1650",
    updating=1) at printing/printing.c:925
#4  0x000000000062a8f6 in print_queue_update_with_lock (
    sharename=0x7fffe552b990 "ml1650", current_printif=0x9a57c0,
    lpq_command=0x7fffe552b590 "ml1650", lprm_command=0x7fffe552b190 "")
    at printing/printing.c:1332
#5  0x000000000062bf65 in print_queue_receive (msg_type=<value optimized out>,
    src=<value optimized out>, buf=<value optimized out>,
    msglen=<value optimized out>, private_data=<value optimized out>)
    at printing/printing.c:1374
#6  0x0000000000614ad0 in message_dispatch () at lib/messages.c:531
#7  0x00000000006295c0 in start_background_queue () at printing/printing.c:1430
#8  0x00000000006bb5ef in main (argc=<value optimized out>,
    argv=0x7fffe552bed8) at smbd/server.c:1074

#0  0x000000000061edd5 in tdb_oob (tdb=0xa6b6d0, len=24504, probe=0)
    at tdb/common/io.c:38
#1  0x000000000061e8e2 in tdb_read (tdb=0xa6b6d0, off=24480,
    buf=0x7fffe552a900, len=24, cv=0) at tdb/common/io.c:116
#2  0x000000000061db35 in rec_free_read (tdb=0xa6b6d0, off=24504, rec=0x0)
    at tdb/common/freelist.c:34
#3  0x000000000061e0bb in tdb_allocate (tdb=0xa6b6d0,
    length=<value optimized out>, rec=0x7fffe552a900)
    at tdb/common/freelist.c:289
#4  0x000000000061d2cc in tdb_store (tdb=0xa6b6d0, key=
      {dptr = 0x7fffe552a980 "UPDATING/ml1650", dsize = 15}, dbuf=
      {dptr = 0x7fffe552aaa0 "I#", dsize = 4}, flag=1) at tdb/common/tdb.c:514
#5  0x0000000000628b59 in set_updating_pid (sharename=0x7fffe552b990 "ml1650",
    updating=1) at printing/printing.c:925
#6  0x000000000062a8f6 in print_queue_update_with_lock (
    sharename=0x7fffe552b990 "ml1650", current_printif=0x9a57c0,
    lpq_command=0x7fffe552b590 "ml1650", lprm_command=0x7fffe552b190 "")
    at printing/printing.c:1332
#7  0x000000000062bf65 in print_queue_receive (msg_type=<value optimized out>,
    src=<value optimized out>, buf=<value optimized out>,
    msglen=<value optimized out>, private_data=<value optimized out>)
    at printing/printing.c:1374
#8  0x0000000000614ad0 in message_dispatch () at lib/messages.c:531
#9  0x00000000006295c0 in start_background_queue () at printing/printing.c:1430
#10 0x00000000006bb5ef in main (argc=<value optimized out>,
    argv=0x7fffe552bed8) at smbd/server.c:1074

Comment 1 Jeremy Allison 2007-06-22 17:41:59 UTC

Can you attach the tdb file to this bug report please, or if it's too big make it available somewhere ? This will help investigate.
Jeremy.

Comment 2 Kris Jurka 2007-06-22 17:43:25 UTC

Which tdb file?  I can't tell which one it's working on.

Comment 3 Jeremy Allison 2007-06-22 17:49:47 UTC

If you've got the process in gdb you can print out the file descriptor number in pdb->tdb->fd and then look in /proc/<processid>/fd/ for the fd number. This will appear as a symlink to the filename.
Jeremy.

Comment 4 Kris Jurka 2007-06-22 17:57:42 UTC

Created attachment 2778 [details]
tdb file it's looping over

Comment 5 Jeremy Allison 2007-06-22 18:36:06 UTC

Yes, this file has a corrupted freelist containing a loop.
The problem is - how did it get like this.

Comment 6 Jeremy Allison 2007-06-22 18:43:19 UTC

What version of Linux are you using, and what filesystem are these tdb's stored on ?
Jeremy.

Comment 7 Kris Jurka 2007-06-22 22:33:52 UTC

It's Debian unstable on an ext3 filesystem.  I have had a pair of power failures in the past week or two that may have been the cause.  Is there anything else to be gained from my system in its present state?  Can I just delete the .tdb file and things will be well again?

Comment 8 Richard Sharpe 2009-01-18 16:46:08 UTC

Hmmm, I notice a number of pages out on the web with stuff like:

   rec_free_read bad magic 0x42424242

This is something I am also now looking into.

Seems like TDB free list corruption happens reasonably often.

Comment 9 Richard Sharpe 2009-01-19 21:40:34 UTC

For the bug I am looking at I have what is claimed to have been the corrupt TDB file. However, the offsets for the free list seem wrong.

Comment 10 Richard Sharpe 2009-01-20 17:56:48 UTC

Based on the corruption I have seen in a winbindd_idmap.tdb file from a customer site their machine crashed in the middle of the file being written out, leaving it in a corrupt state.

My suggestion would be to convert these critical operations to using TDBs transactional facilities (tdb_transaction_start, tdb_transaction_commit, tdb_transaction_cancel).

Certainly for the idmap file it would be unusual to get more than one new entry in the file every minute (or even tens of minutes).

Comment 11 Andreas Schneider 2012-01-09 11:05:16 UTC

This is probably fixed with a newer Samba version. If you still see the error. Please open a new bug with 'debug level = 10' logfiles of the new Samba version 3.5 or 3.6. Thanks!