memcache_init takes a max_size parameter that defines the maximum allowed size for the cache. When adding talloc objects through memcache_add_talloc, the talloc object becomes part of the cache, but only the size of a pointer to the object is accounted in the actual size tracking. This needs to be fixed to also add the size of the talloc object. Patch to follow.
https://lists.samba.org/archive/samba-technical/2019-March/133115.html
Created attachment 15021 [details] git-am fix for master. Note, this bug depends on the fixes in bug: https://bugzilla.samba.org/show_bug.cgi?id=13871 otherwise it reliably crashes in: make test TESTS=samba3.base.rw1 This probably means the patchset should also increase the default stat cache size parameter.
Comment on attachment 15021 [details] git-am fix for master. This patch is buggy I think. Don't use.
OK, I'm coming to the conclusion that this just won't work, sorry :-(. Adding copious debug messages to memcache show that talloc objects given to the cache can change talloc size whilst being in there :-(. I'm guessing that some pointers to sub-objects in the talloc tree are being retained outsize - which means that talloc_total_size() on the incoming pointer can be different when the TALLOC_FREE is being called. I don't know exactly where this is being done, but I really suspect it is being done. I'll upload the (intentionally horrid :-) patchset that dumps out size calculations whenever cache->size is being modified so you can take a look. To test, add the git-am fix associated with this bug, then patch the debugs on top, then run: make test TESTS=samba3.base.rw1 >&/tmp/l1 it'll crash (in the added smb_panic) and then you can grep for: grep 'cache->size' /tmp/l1 to see what I mean.
Created attachment 15023 [details] Nasty debug patch for investigating the problem :-).
I found the problem. In the codepath that replaces an existing talloced object with another talloc object, i subtracted the size of the replaced talloc object from the cache size, but forgot to add the size of the new object. With that fix, the rw1 test passes. For additional safety, i think we might also want to store the size of the talloc size in memcache_element, to ensure that any changes (e.g. allocation of talloc children) does not change the size that is subtracted later from the memcache usage.
Can you upload the fixed version for me to take a look at ? Yes, I'd already gotten to the 'store initial talloc_size' solution too :-). In that case we can remove any calls to talloc_total_size() outside of the first one that adds in the new memory. Of course no one *should* be modifying memory that's been given to the memcache, but we don't prevent it. And as I found out, the results of getting the size calculation wrong can be bad :-). I think in order to add this code in we need to make the generic memcache size at least 16MB by default, as now accounting for the talloc stored sizes may change smbd behavior in unexpected ways.
I sent the update here: https://lists.samba.org/archive/samba-technical/2019-April/133173.html The default size of the pool has not been updated since, i am not sure what a sensible increase would be. Adding a number is the easy part, maybe we can wait for some feedback on samba-technical
Update: https://lists.samba.org/archive/samba-technical/2019-April/133175.html
Created attachment 15050 [details] patches for 4.9
Created attachment 15051 [details] patches for 4.10
Re-assigning to Karolin for inclusion in 4.9.next, 4.10.next.
(In reply to Jeremy Allison from comment #13) Pushed to autobuild-v4-{9,10}-test.
(In reply to Karolin Seeger from comment #14) Pushed to both branches. Closing out bug report. Thanks!