Bug 4732 - infinite loop in tdb_store / tdb_allocate
Summary: infinite loop in tdb_store / tdb_allocate
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: Printing (show other bugs)
Version: 3.0.25a
Hardware: x64 Linux
: P3 major
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-06-22 17:38 UTC by Kris Jurka
Modified: 2012-01-09 11:05 UTC (History)
1 user (show)

See Also:


Attachments
tdb file it's looping over (28.00 KB, application/octet-stream)
2007-06-22 17:57 UTC, Kris Jurka
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kris Jurka 2007-06-22 17:38:07 UTC
When printing from a WinXP or Win2K machine to a CUPS exported printer I'm getting an infinite loop in tdb_store/tdb_allocate.  I attached to the stuck process with gdb and got a number of backtraces that seem to be looping around tdb_store / tdb_allocate.  This looks similar to Bug 4276, but I don't believe I have any corrupt tdb files.  That is "find /var -name '*.tdb' -exec tdbdump {} \;" doesn't go into an infinite loop.

https://bugzilla.samba.org/show_bug.cgi?id=4276

#0  0x000000000061db38 in rec_free_read (tdb=0xa6b6d0, off=24752,
    rec=0x7fffe552a900) at tdb/common/freelist.c:34
#1  0x000000000061e0bb in tdb_allocate (tdb=0xa6b6d0,
    length=<value optimized out>, rec=0x7fffe552a900)
    at tdb/common/freelist.c:289
#2  0x000000000061d2cc in tdb_store (tdb=0xa6b6d0, key=
      {dptr = 0x7fffe552a980 "UPDATING/ml1650", dsize = 15}, dbuf=
      {dptr = 0x7fffe552aaa0 "I#", dsize = 4}, flag=1) at tdb/common/tdb.c:514
#3  0x0000000000628b59 in set_updating_pid (sharename=0x7fffe552b990 "ml1650",
    updating=1) at printing/printing.c:925
#4  0x000000000062a8f6 in print_queue_update_with_lock (
    sharename=0x7fffe552b990 "ml1650", current_printif=0x9a57c0,
    lpq_command=0x7fffe552b590 "ml1650", lprm_command=0x7fffe552b190 "")
    at printing/printing.c:1332
#5  0x000000000062bf65 in print_queue_receive (msg_type=<value optimized out>,
    src=<value optimized out>, buf=<value optimized out>,
    msglen=<value optimized out>, private_data=<value optimized out>)
    at printing/printing.c:1374
#6  0x0000000000614ad0 in message_dispatch () at lib/messages.c:531
#7  0x00000000006295c0 in start_background_queue () at printing/printing.c:1430
#8  0x00000000006bb5ef in main (argc=<value optimized out>,
    argv=0x7fffe552bed8) at smbd/server.c:1074

#0  0x000000000061edd5 in tdb_oob (tdb=0xa6b6d0, len=24504, probe=0)
    at tdb/common/io.c:38
#1  0x000000000061e8e2 in tdb_read (tdb=0xa6b6d0, off=24480,
    buf=0x7fffe552a900, len=24, cv=0) at tdb/common/io.c:116
#2  0x000000000061db35 in rec_free_read (tdb=0xa6b6d0, off=24504, rec=0x0)
    at tdb/common/freelist.c:34
#3  0x000000000061e0bb in tdb_allocate (tdb=0xa6b6d0,
    length=<value optimized out>, rec=0x7fffe552a900)
    at tdb/common/freelist.c:289
#4  0x000000000061d2cc in tdb_store (tdb=0xa6b6d0, key=
      {dptr = 0x7fffe552a980 "UPDATING/ml1650", dsize = 15}, dbuf=
      {dptr = 0x7fffe552aaa0 "I#", dsize = 4}, flag=1) at tdb/common/tdb.c:514
#5  0x0000000000628b59 in set_updating_pid (sharename=0x7fffe552b990 "ml1650",
    updating=1) at printing/printing.c:925
#6  0x000000000062a8f6 in print_queue_update_with_lock (
    sharename=0x7fffe552b990 "ml1650", current_printif=0x9a57c0,
    lpq_command=0x7fffe552b590 "ml1650", lprm_command=0x7fffe552b190 "")
    at printing/printing.c:1332
#7  0x000000000062bf65 in print_queue_receive (msg_type=<value optimized out>,
    src=<value optimized out>, buf=<value optimized out>,
    msglen=<value optimized out>, private_data=<value optimized out>)
    at printing/printing.c:1374
#8  0x0000000000614ad0 in message_dispatch () at lib/messages.c:531
#9  0x00000000006295c0 in start_background_queue () at printing/printing.c:1430
#10 0x00000000006bb5ef in main (argc=<value optimized out>,
    argv=0x7fffe552bed8) at smbd/server.c:1074
Comment 1 Jeremy Allison 2007-06-22 17:41:59 UTC
Can you attach the tdb file to this bug report please, or if it's too big make it available somewhere ? This will help investigate.
Jeremy.

Comment 2 Kris Jurka 2007-06-22 17:43:25 UTC
Which tdb file?  I can't tell which one it's working on.
Comment 3 Jeremy Allison 2007-06-22 17:49:47 UTC
If you've got the process in gdb you can print out the file descriptor number in pdb->tdb->fd and then look in /proc/<processid>/fd/ for the fd number. This will appear as a symlink to the filename.
Jeremy.
Comment 4 Kris Jurka 2007-06-22 17:57:42 UTC
Created attachment 2778 [details]
tdb file it's looping over
Comment 5 Jeremy Allison 2007-06-22 18:36:06 UTC
Yes, this file has a corrupted freelist containing a loop.
The problem is - how did it get like this.
Comment 6 Jeremy Allison 2007-06-22 18:43:19 UTC
What version of Linux are you using, and what filesystem are these tdb's stored on ?
Jeremy.
Comment 7 Kris Jurka 2007-06-22 22:33:52 UTC
It's Debian unstable on an ext3 filesystem.  I have had a pair of power failures in the past week or two that may have been the cause.  Is there anything else to be gained from my system in its present state?  Can I just delete the .tdb file and things will be well again?
Comment 8 Richard Sharpe 2009-01-18 16:46:08 UTC
Hmmm, I notice a number of pages out on the web with stuff like:

   rec_free_read bad magic 0x42424242

This is something I am also now looking into.

Seems like TDB free list corruption happens reasonably often.
Comment 9 Richard Sharpe 2009-01-19 21:40:34 UTC
For the bug I am looking at I have what is claimed to have been the corrupt TDB file. However, the offsets for the free list seem wrong.
Comment 10 Richard Sharpe 2009-01-20 17:56:48 UTC
Based on the corruption I have seen in a winbindd_idmap.tdb file from a customer site their machine crashed in the middle of the file being written out, leaving it in a corrupt state.

My suggestion would be to convert these critical operations to using TDBs transactional facilities (tdb_transaction_start, tdb_transaction_commit, tdb_transaction_cancel).

Certainly for the idmap file it would be unusual to get more than one new entry in the file every minute (or even tens of minutes).
Comment 11 Andreas Schneider 2012-01-09 11:05:16 UTC
This is probably fixed with a newer Samba version. If you still see the error. Please open a new bug with 'debug level = 10' logfiles of the new Samba version 3.5 or 3.6. Thanks!