When printing from a WinXP or Win2K machine to a CUPS exported printer I'm getting an infinite loop in tdb_store/tdb_allocate. I attached to the stuck process with gdb and got a number of backtraces that seem to be looping around tdb_store / tdb_allocate. This looks similar to Bug 4276, but I don't believe I have any corrupt tdb files. That is "find /var -name '*.tdb' -exec tdbdump {} \;" doesn't go into an infinite loop. https://bugzilla.samba.org/show_bug.cgi?id=4276 #0 0x000000000061db38 in rec_free_read (tdb=0xa6b6d0, off=24752, rec=0x7fffe552a900) at tdb/common/freelist.c:34 #1 0x000000000061e0bb in tdb_allocate (tdb=0xa6b6d0, length=<value optimized out>, rec=0x7fffe552a900) at tdb/common/freelist.c:289 #2 0x000000000061d2cc in tdb_store (tdb=0xa6b6d0, key= {dptr = 0x7fffe552a980 "UPDATING/ml1650", dsize = 15}, dbuf= {dptr = 0x7fffe552aaa0 "I#", dsize = 4}, flag=1) at tdb/common/tdb.c:514 #3 0x0000000000628b59 in set_updating_pid (sharename=0x7fffe552b990 "ml1650", updating=1) at printing/printing.c:925 #4 0x000000000062a8f6 in print_queue_update_with_lock ( sharename=0x7fffe552b990 "ml1650", current_printif=0x9a57c0, lpq_command=0x7fffe552b590 "ml1650", lprm_command=0x7fffe552b190 "") at printing/printing.c:1332 #5 0x000000000062bf65 in print_queue_receive (msg_type=<value optimized out>, src=<value optimized out>, buf=<value optimized out>, msglen=<value optimized out>, private_data=<value optimized out>) at printing/printing.c:1374 #6 0x0000000000614ad0 in message_dispatch () at lib/messages.c:531 #7 0x00000000006295c0 in start_background_queue () at printing/printing.c:1430 #8 0x00000000006bb5ef in main (argc=<value optimized out>, argv=0x7fffe552bed8) at smbd/server.c:1074 #0 0x000000000061edd5 in tdb_oob (tdb=0xa6b6d0, len=24504, probe=0) at tdb/common/io.c:38 #1 0x000000000061e8e2 in tdb_read (tdb=0xa6b6d0, off=24480, buf=0x7fffe552a900, len=24, cv=0) at tdb/common/io.c:116 #2 0x000000000061db35 in rec_free_read (tdb=0xa6b6d0, off=24504, rec=0x0) at tdb/common/freelist.c:34 #3 0x000000000061e0bb in tdb_allocate (tdb=0xa6b6d0, length=<value optimized out>, rec=0x7fffe552a900) at tdb/common/freelist.c:289 #4 0x000000000061d2cc in tdb_store (tdb=0xa6b6d0, key= {dptr = 0x7fffe552a980 "UPDATING/ml1650", dsize = 15}, dbuf= {dptr = 0x7fffe552aaa0 "I#", dsize = 4}, flag=1) at tdb/common/tdb.c:514 #5 0x0000000000628b59 in set_updating_pid (sharename=0x7fffe552b990 "ml1650", updating=1) at printing/printing.c:925 #6 0x000000000062a8f6 in print_queue_update_with_lock ( sharename=0x7fffe552b990 "ml1650", current_printif=0x9a57c0, lpq_command=0x7fffe552b590 "ml1650", lprm_command=0x7fffe552b190 "") at printing/printing.c:1332 #7 0x000000000062bf65 in print_queue_receive (msg_type=<value optimized out>, src=<value optimized out>, buf=<value optimized out>, msglen=<value optimized out>, private_data=<value optimized out>) at printing/printing.c:1374 #8 0x0000000000614ad0 in message_dispatch () at lib/messages.c:531 #9 0x00000000006295c0 in start_background_queue () at printing/printing.c:1430 #10 0x00000000006bb5ef in main (argc=<value optimized out>, argv=0x7fffe552bed8) at smbd/server.c:1074
Can you attach the tdb file to this bug report please, or if it's too big make it available somewhere ? This will help investigate. Jeremy.
Which tdb file? I can't tell which one it's working on.
If you've got the process in gdb you can print out the file descriptor number in pdb->tdb->fd and then look in /proc/<processid>/fd/ for the fd number. This will appear as a symlink to the filename. Jeremy.
Created attachment 2778 [details] tdb file it's looping over
Yes, this file has a corrupted freelist containing a loop. The problem is - how did it get like this.
What version of Linux are you using, and what filesystem are these tdb's stored on ? Jeremy.
It's Debian unstable on an ext3 filesystem. I have had a pair of power failures in the past week or two that may have been the cause. Is there anything else to be gained from my system in its present state? Can I just delete the .tdb file and things will be well again?
Hmmm, I notice a number of pages out on the web with stuff like: rec_free_read bad magic 0x42424242 This is something I am also now looking into. Seems like TDB free list corruption happens reasonably often.
For the bug I am looking at I have what is claimed to have been the corrupt TDB file. However, the offsets for the free list seem wrong.
Based on the corruption I have seen in a winbindd_idmap.tdb file from a customer site their machine crashed in the middle of the file being written out, leaving it in a corrupt state. My suggestion would be to convert these critical operations to using TDBs transactional facilities (tdb_transaction_start, tdb_transaction_commit, tdb_transaction_cancel). Certainly for the idmap file it would be unusual to get more than one new entry in the file every minute (or even tens of minutes).
This is probably fixed with a newer Samba version. If you still see the error. Please open a new bug with 'debug level = 10' logfiles of the new Samba version 3.5 or 3.6. Thanks!