Bug 7610 - winbindd_cache.tdb grows too large when scaled
winbindd_cache.tdb grows too large when scaled
Product: Samba 3.5
Classification: Unclassified
Component: Winbind
Other Linux
: P3 regression
: ---
Assigned To: Karolin Seeger
Samba QA Contact
Depends on:
  Show dependency treegraph
Reported: 2010-08-09 09:56 UTC by Dwight Croonenberghs
Modified: 2011-05-18 18:29 UTC (History)
2 users (show)

See Also:

Fix for 3.4.x. (4.19 KB, patch)
2010-12-08 18:25 UTC, Jeremy Allison
no flags Details
git-am fix for 3.5.next (2.14 KB, patch)
2011-05-06 21:42 UTC, Jeremy Allison
idra: review+

Note You need to log in before you can comment on or make changes to this bug.
Description Dwight Croonenberghs 2010-08-09 09:56:43 UTC
The file winbindd_cache.tdb grows too large when connecting to a Domain Controler containing 58000 users and 4500 groups.

I verified the problem using tdbtool and observed that windbindd_cache.tdb required 23 MB to hold  227K records.  Database integrity was OK.  With no modification to Samba v3.4.8 the resulting file sizes used to store this data were 20 times more than needed: winbindd_cache.tdb (484 MB), winbindd_cache.tdb.bak (764 MB), winbindd_cache.tdb.bak.old (485 MB).

I patched the root cause of this bug such that resulting .tdb file size was reduced to about 1.25 times the actual tdb record storage requirements.


diff -ur samba-3.4.8/lib/tdb/common/freelist.c samba-3.4.8/lib/tdb/common/freelist.c.1
--- samba-3.4.8/lib/tdb/common/freelist.c       2010-07-06 14:49:33.000000000 -0600
+++ samba-3.4.8/lib/tdb/common/freelist.c.1     2010-07-06 14:51:29.000000000 -0600
@@ -351,7 +351,7 @@
        /* we didn't find enough space. See if we can expand the
           database and if we can then try again */
-       if (tdb_expand(tdb, length + sizeof(*rec)) == 0)
+       if (tdb_expand(tdb, sizeof(*rec)) == 0)
                goto again;
        tdb_unlock(tdb, -1, F_WRLCK);
Comment 1 Jeremy Allison 2010-09-30 16:35:22 UTC
Ok, I've taken a close look at this and the patch looks wrong.

tdb_allocate() takes a tdb_len_t length parameter, which specifies the amount of space we need to expand the tdb by. The sizeof(*rec) is then added into the new length requested as this contains the header info needed by the new record.

Simply changing the length calculation from "length + sizeof(*rec)" to "sizeof(*rec)" will end up asking for too small a size.

The reason you patch still works in the tdb code is the following code in tdb_expand():

        /* always make room for at least 100 more records, and at
           least 25% more space. Round the database up to a multiple
           of the page size */
        new_size = MAX(tdb->map_size + size*100, tdb->map_size * 1.25);
        size = TDB_ALIGN(new_size, tdb->page_size) - tdb->map_size;

which always expands to allow new 100 records of the requested size (to stop fragmentation of the free list inside the tdb).

Inside the Samba 3.0.x code the same calculation in tdb is:

        /* always make room for at least 10 more records, and round
           the database up to a multiple of the page size */
        size = TDB_ALIGN(tdb->map_size + size*10, tdb->page_size) - tdb->map_size;

So it's asking for 10 times fewer records on each expansion. The use of tdb in printing (and the freelist fragmentation problems) were the reason for these changes. Can you try changing the "size*100" to "size*10" in tdb_expand() to see if this gives you the same effect. This might be able to be tuning parameter.


Comment 2 Dwight Croonenberghs 2010-09-30 22:04:19 UTC
(In reply to comment #1)
Our trigger point may have been a large record but even so the size that is passed was not a record size but in our experience something much larger.  I witnessed the tdb file growing in multi-megabyte steps.  The giant tdb files were produced in about 10 minutes containing 95% empty space.  The bug we saw was that the left size of MAX was growing larger than the right size even when '.tdb' was over 10 MB. I verified this all with instrumentation and that is how I located the root cause location.  I wanted to give you the root cause location to consider and you have for your purposes decided to keep that code in place. 

I have another even more optimal solution which I prefer over my first suggestion.  The following few line fix would solve our problem and keep the tdb file from growing by no more than 1MB at a time. This offers more assurances than tuning experimentation.  I like this because a 24 MB tdb file would be left holding 23 MB or more of data (95% data).  If someone at some time in the future needs to grow by more than a 1 MB step the tuning parameter is available to adjust in MB.  And if someone adjusts the original size input, either up or down, we still have no worries.

This following proposed code change is all very clear don't you think?

new_size = MAX(tdb->map_size + size*100, tdb->map_size * 1.25);
/* restrict growth to 1 MB or less */
opt_size = MIN(new_size, tdb->map_size + (1 x 1024 x 1024));
size = TDB_ALIGN(opt_size, tdb->page_size) - tdb->map_size;

Comment 3 Jeremy Allison 2010-12-08 18:25:27 UTC
Created attachment 6128 [details]
Fix for 3.4.x.

Here is a patch that adds a new call to the tdb library.

void tdb_set_filesize_expansion_factor(truct tdb_context *tdb,
                        unsigned int record_number_expansion_factor,
                        double min_filesize_expansion_factor);

What this does is allows a calling application to control
how tdb does filesize expansion when needing more free space.
By default it uses the 100x record size, or 1.25 filesize,
whichever is greater.

I added this call into the winbindd_cache.c code to limit
the winbindd_cache.tdb expansion to 10x record size, and
no filesize factor.

This should apply cleanly to 3.4.8 (I based it on
the v3-4-test git tree).

Can you let me know if this fixes your problem  ?

If so I'll propose it as an addition to the "official"
TDB API going forward.

Comment 4 Chris Green 2011-05-03 01:20:40 UTC
In my view, the underlying problem is that the overallocation formula (100 times the requested addition) scales badly when the value to be cached is large. For example, a moderately large domain of 10,000 user accounts can result in a user list of approximately 1.5 MB; attempting to cache this results in a 150MB allocation, which is quite unnecessary.

We made the following patch to tdb/common/io.c to create a sliding scale (values smaller than 10KB result in a 100x overallocation, decreasing to 3x for values of 1MB or larger).

Index: source/lib/tdb/common/io.c
--- source/lib/tdb/common/io.c  (revision 49961)
+++ source/lib/tdb/common/io.c  (revision 49962)
@@ -290,6 +290,22 @@
        return 0;

+#ifdef RAPID
+/* PR#9525
+ * The resize strategy in tdb_expand breaks down for very large records.
+ * Some domains generate user lists of 2MB or so, which would lead to
+ * massive resizes if the original "100 records" strategy were followed.
+ * Instead, we will use a sliding scale from 100 times for small records
+ * down to 3 for the largest records.
+ */
+static tdb_off_t tdb_scaled_resize(tdb_off_t record_size)
+       if (record_size < 10000) return 100;
+       else if (record_size < 100000) return 33;
+       else if (record_size < 1000000) return 10;
+       else /* record_size >= 1MB */ return 3;

 /* expand the database at least size bytes by expanding the underlying
    file and doing the mmap again if necessary */
@@ -305,11 +321,18 @@

        /* must know about any previous expansions by another process */
        tdb->methods->tdb_oob(tdb, tdb->map_size + 1, 1);
+#ifdef RAPID
+       /* Make room for a reasonably-sized number of records,
+        * rather than a fixed 100.
+        */
+        new_size = MAX(tdb->map_size + tdb_scaled_resize(size), tdb->map_size * 1.25);
        /* always make room for at least 100 more records, and at
            least 25% more space. Round the database up to a multiple
            of the page size */
        new_size = MAX(tdb->map_size + size*100, tdb->map_size * 1.25);
+       /* Always round up to a multiple of the page size. */
        size = TDB_ALIGN(new_size, tdb->page_size) - tdb->map_size;

        if (!(tdb->flags & TDB_INTERNAL))
Comment 5 Jeremy Allison 2011-05-06 21:42:37 UTC
Created attachment 6437 [details]
git-am fix for 3.5.next

Here is the fix that went into master and 3.6.0. Can you review to see if it fixes your problem. Pushback if you feel we need finer granularity than this patch provides.

Comment 6 Jeremy Allison 2011-05-06 21:43:04 UTC
Raised this to blocker, as this must be addressed before another 3.5.x goes out.
Comment 7 Jeremy Allison 2011-05-18 18:24:58 UTC
idra and Rusty both ok'ed this fix for 3.5.x, so re-assigning to Karolin for inclusion in 3.5.9.

Comment 8 Karolin Seeger 2011-05-18 18:29:31 UTC
(In reply to comment #7)
> idra and Rusty both ok'ed this fix for 3.5.x, so re-assigning to Karolin for
> inclusion in 3.5.9.
> Jeremy.

Pushed to v3-5-test.
Closing out bug report.