The Samba-Bugzilla – Bug 8010
str_checksum often returns same value for different strings [Patch]
Last modified: 2011-03-23 19:59:40 UTC
Created attachment 6290 [details]
Patch which fixes the problem
I have a folder with ~90 photos: IMG_XXXX.JPG where XXXX is a four digit number, almost consecutive (photos from camera for one day).
Current implementation gives about 30 different checksums for this set of files.
Problem mitigates itself in Gnome's nautilus which uses inode numbers (which are generated by this function) when it counts total size of directory. It treats same inode numbers as hardlinks and counts them once. As the result total folder size is 1/3 of sum of sizes of files.
Gnome bug: https://bugzilla.gnome.org/show_bug.cgi?id=598920
Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/nautilus/+bug/453747
I've created a small patch which uses Fowler–Noll–Vo hash function to generate inode numbers. It seems to be better distributes. Also, for platforms with 64bit inode number this patch generates number with all 64 bits used reducing possibility of collisions.
Created attachment 6291 [details]
Does the attached patch also help? We recently added a better and fast hash function to tdb, which might also help here.
BTW, this would be 3.6 only, the jenkins hash was not yet there in 3.5
Created attachment 6293 [details]
Patch with Bob Jenkins hash function to generate inode numbers
This patch is for latest 3.5, it uses Bob Jekins cache from 3.6 to generate inode numbers
And -- does that solve your problem as well?
(In reply to comment #4)
> And -- does that solve your problem as well?
Yes, both my patches solve problem I have - tested on ubuntu, with their samba-3.5.4 package.
I didn't test 3.6 though, but there is no reason for it not to be fine too.
Created attachment 6295 [details]
git-am patch for 3.5.next
Here is the same patch as in attachment 6293 [details] (Patch with Bob Jenkins hash function to generate inode numbers) but with tdb_jenkins_hash() renamed to just jenkins_hash() in order to avoid conflicts if a newer tdb library is used.
Volker, please evaluate for 3.5.next and re-assign to Karolin if you agree.
I might be missing something but there seems to be no 'jenkins_hash.c' file in this patch.
Created attachment 6297 [details]
git-am patch for 3.5.next
git fumbled-fingers - thanks for catching that ! This should be the full patch.
(In reply to comment #8)
> Created attachment 6297 [details]
> git-am patch for 3.5.next
> git fumbled-fingers - thanks for catching that ! This should be the full patch.
Pushed to v3-5-test.
Closing out bug report.