Bug 8010 - str_checksum often returns same value for different strings [Patch]
Summary: str_checksum often returns same value for different strings [Patch]
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.5
Classification: Unclassified
Component: libsmbclient (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-15 04:55 UTC by Nikolay Martynov
Modified: 2011-03-23 19:59 UTC (History)
1 user (show)

See Also:


Attachments
Patch which fixes the problem (2.61 KB, patch)
2011-03-15 04:55 UTC, Nikolay Martynov
no flags Details
Patch (1.06 KB, patch)
2011-03-15 08:37 UTC, Volker Lendecke
no flags Details
Patch with Bob Jenkins hash function to generate inode numbers (16.02 KB, patch)
2011-03-15 21:14 UTC, Nikolay Martynov
no flags Details
git-am patch for 3.5.next (2.03 KB, patch)
2011-03-16 18:30 UTC, Jeremy Allison
no flags Details
git-am patch for 3.5.next (15.79 KB, patch)
2011-03-16 20:02 UTC, Jeremy Allison
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nikolay Martynov 2011-03-15 04:55:55 UTC
Created attachment 6290 [details]
Patch which fixes the problem

I have a folder with ~90 photos: IMG_XXXX.JPG where XXXX is a four digit number, almost consecutive (photos from camera for one day).
Current implementation gives about 30 different checksums for this set of files.

Problem mitigates itself in Gnome's nautilus which uses inode numbers (which are generated by this function) when it counts total size of directory. It treats same inode numbers as hardlinks and counts them once. As the result total folder size is 1/3 of sum of sizes of files.
Gnome bug: https://bugzilla.gnome.org/show_bug.cgi?id=598920
Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/nautilus/+bug/453747

I've created a small patch which uses Fowler–Noll–Vo hash function to generate inode numbers. It seems to be better distributes. Also, for platforms with 64bit inode number this patch generates number with all 64 bits used reducing possibility of collisions.
Comment 1 Volker Lendecke 2011-03-15 08:37:11 UTC
Created attachment 6291 [details]
Patch

Does the attached patch also help? We recently added a better and fast hash function to tdb, which might also help here.
Comment 2 Volker Lendecke 2011-03-15 08:38:32 UTC
BTW, this would be 3.6 only, the jenkins hash was not yet there in 3.5
Comment 3 Nikolay Martynov 2011-03-15 21:14:54 UTC
Created attachment 6293 [details]
Patch with Bob Jenkins hash function to generate inode numbers

This patch is for latest 3.5, it uses Bob Jekins cache from 3.6 to generate inode numbers
Comment 4 Volker Lendecke 2011-03-15 21:31:53 UTC
And -- does that solve your problem as well?

Volker
Comment 5 Nikolay Martynov 2011-03-15 21:42:30 UTC
(In reply to comment #4)
> And -- does that solve your problem as well?
> 
> Volker

Ah, sorry.
Yes, both my patches solve problem I have - tested on ubuntu, with their samba-3.5.4 package.
I didn't test 3.6 though, but there is no reason for it not to be fine too.
Comment 6 Jeremy Allison 2011-03-16 18:30:48 UTC
Created attachment 6295 [details]
git-am patch for 3.5.next

Here is the same patch as in attachment 6293 [details] (Patch with Bob Jenkins hash function to generate inode numbers) but with tdb_jenkins_hash() renamed to just jenkins_hash() in order to avoid conflicts if a newer tdb library is used.
Volker, please evaluate for 3.5.next and re-assign to Karolin if you agree.

Jeremy.
Comment 7 Nikolay Martynov 2011-03-16 19:41:19 UTC
Hi,

  I might be missing something but there seems to be no 'jenkins_hash.c' file in this patch.
Comment 8 Jeremy Allison 2011-03-16 20:02:05 UTC
Created attachment 6297 [details]
git-am patch for 3.5.next

git fumbled-fingers - thanks for catching that ! This should be the full patch.
Jeremy.
Comment 9 Karolin Seeger 2011-03-23 19:59:40 UTC
(In reply to comment #8)
> Created attachment 6297 [details]
> git-am patch for 3.5.next
> 
> git fumbled-fingers - thanks for catching that ! This should be the full patch.
> Jeremy.

Pushed to v3-5-test.
Closing out bug report.

Thanks!