The file winbindd_cache.tdb grows too large when connecting to a Domain Controler containing 58000 users and 4500 groups. I verified the problem using tdbtool and observed that windbindd_cache.tdb required 23 MB to hold 227K records. Database integrity was OK. With no modification to Samba v3.4.8 the resulting file sizes used to store this data were 20 times more than needed: winbindd_cache.tdb (484 MB), winbindd_cache.tdb.bak (764 MB), winbindd_cache.tdb.bak.old (485 MB). I patched the root cause of this bug such that resulting .tdb file size was reduced to about 1.25 times the actual tdb record storage requirements. ________________________________________ diff -ur samba-3.4.8/lib/tdb/common/freelist.c samba-3.4.8/lib/tdb/common/freelist.c.1 --- samba-3.4.8/lib/tdb/common/freelist.c 2010-07-06 14:49:33.000000000 -0600 +++ samba-3.4.8/lib/tdb/common/freelist.c.1 2010-07-06 14:51:29.000000000 -0600 @@ -351,7 +351,7 @@ /* we didn't find enough space. See if we can expand the database and if we can then try again */ - if (tdb_expand(tdb, length + sizeof(*rec)) == 0) + if (tdb_expand(tdb, sizeof(*rec)) == 0) goto again; fail: tdb_unlock(tdb, -1, F_WRLCK); ________________________________________
Ok, I've taken a close look at this and the patch looks wrong. tdb_allocate() takes a tdb_len_t length parameter, which specifies the amount of space we need to expand the tdb by. The sizeof(*rec) is then added into the new length requested as this contains the header info needed by the new record. Simply changing the length calculation from "length + sizeof(*rec)" to "sizeof(*rec)" will end up asking for too small a size. The reason you patch still works in the tdb code is the following code in tdb_expand(): /* always make room for at least 100 more records, and at least 25% more space. Round the database up to a multiple of the page size */ new_size = MAX(tdb->map_size + size*100, tdb->map_size * 1.25); size = TDB_ALIGN(new_size, tdb->page_size) - tdb->map_size; which always expands to allow new 100 records of the requested size (to stop fragmentation of the free list inside the tdb). Inside the Samba 3.0.x code the same calculation in tdb is: /* always make room for at least 10 more records, and round the database up to a multiple of the page size */ size = TDB_ALIGN(tdb->map_size + size*10, tdb->page_size) - tdb->map_size; So it's asking for 10 times fewer records on each expansion. The use of tdb in printing (and the freelist fragmentation problems) were the reason for these changes. Can you try changing the "size*100" to "size*10" in tdb_expand() to see if this gives you the same effect. This might be able to be tuning parameter. Jeremy.
(In reply to comment #1) Our trigger point may have been a large record but even so the size that is passed was not a record size but in our experience something much larger. I witnessed the tdb file growing in multi-megabyte steps. The giant tdb files were produced in about 10 minutes containing 95% empty space. The bug we saw was that the left size of MAX was growing larger than the right size even when '.tdb' was over 10 MB. I verified this all with instrumentation and that is how I located the root cause location. I wanted to give you the root cause location to consider and you have for your purposes decided to keep that code in place. I have another even more optimal solution which I prefer over my first suggestion. The following few line fix would solve our problem and keep the tdb file from growing by no more than 1MB at a time. This offers more assurances than tuning experimentation. I like this because a 24 MB tdb file would be left holding 23 MB or more of data (95% data). If someone at some time in the future needs to grow by more than a 1 MB step the tuning parameter is available to adjust in MB. And if someone adjusts the original size input, either up or down, we still have no worries. This following proposed code change is all very clear don't you think? new_size = MAX(tdb->map_size + size*100, tdb->map_size * 1.25); /* restrict growth to 1 MB or less */ opt_size = MIN(new_size, tdb->map_size + (1 x 1024 x 1024)); size = TDB_ALIGN(opt_size, tdb->page_size) - tdb->map_size;
Created attachment 6128 [details] Fix for 3.4.x. Here is a patch that adds a new call to the tdb library. void tdb_set_filesize_expansion_factor(truct tdb_context *tdb, unsigned int record_number_expansion_factor, double min_filesize_expansion_factor); What this does is allows a calling application to control how tdb does filesize expansion when needing more free space. By default it uses the 100x record size, or 1.25 filesize, whichever is greater. I added this call into the winbindd_cache.c code to limit the winbindd_cache.tdb expansion to 10x record size, and no filesize factor. This should apply cleanly to 3.4.8 (I based it on the v3-4-test git tree). Can you let me know if this fixes your problem ? If so I'll propose it as an addition to the "official" TDB API going forward. Jeremy.
In my view, the underlying problem is that the overallocation formula (100 times the requested addition) scales badly when the value to be cached is large. For example, a moderately large domain of 10,000 user accounts can result in a user list of approximately 1.5 MB; attempting to cache this results in a 150MB allocation, which is quite unnecessary. We made the following patch to tdb/common/io.c to create a sliding scale (values smaller than 10KB result in a 100x overallocation, decreasing to 3x for values of 1MB or larger). Index: source/lib/tdb/common/io.c =================================================================== --- source/lib/tdb/common/io.c (revision 49961) +++ source/lib/tdb/common/io.c (revision 49962) @@ -290,6 +290,22 @@ return 0; } +#ifdef RAPID +/* PR#9525 + * The resize strategy in tdb_expand breaks down for very large records. + * Some domains generate user lists of 2MB or so, which would lead to + * massive resizes if the original "100 records" strategy were followed. + * Instead, we will use a sliding scale from 100 times for small records + * down to 3 for the largest records. + */ +static tdb_off_t tdb_scaled_resize(tdb_off_t record_size) +{ + if (record_size < 10000) return 100; + else if (record_size < 100000) return 33; + else if (record_size < 1000000) return 10; + else /* record_size >= 1MB */ return 3; +} +#endif /* expand the database at least size bytes by expanding the underlying file and doing the mmap again if necessary */ @@ -305,11 +321,18 @@ /* must know about any previous expansions by another process */ tdb->methods->tdb_oob(tdb, tdb->map_size + 1, 1); - +#ifdef RAPID + /* Make room for a reasonably-sized number of records, + * rather than a fixed 100. + */ + new_size = MAX(tdb->map_size + tdb_scaled_resize(size), tdb->map_size * 1.25); +#else /* always make room for at least 100 more records, and at least 25% more space. Round the database up to a multiple of the page size */ new_size = MAX(tdb->map_size + size*100, tdb->map_size * 1.25); +#endif + /* Always round up to a multiple of the page size. */ size = TDB_ALIGN(new_size, tdb->page_size) - tdb->map_size; if (!(tdb->flags & TDB_INTERNAL))
Created attachment 6437 [details] git-am fix for 3.5.next Here is the fix that went into master and 3.6.0. Can you review to see if it fixes your problem. Pushback if you feel we need finer granularity than this patch provides. Jeremy.
Raised this to blocker, as this must be addressed before another 3.5.x goes out. Jeremy.
idra and Rusty both ok'ed this fix for 3.5.x, so re-assigning to Karolin for inclusion in 3.5.9. Jeremy.
(In reply to comment #7) > idra and Rusty both ok'ed this fix for 3.5.x, so re-assigning to Karolin for > inclusion in 3.5.9. > > Jeremy. Pushed to v3-5-test. Closing out bug report. Thanks!