Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDerrick Stolee <dstolee@microsoft.com>2020-05-11 14:56:11 +0300
committerJunio C Hamano <gitster@pobox.com>2020-05-11 19:33:56 +0300
commit88093289cdcfe99c5a3d7ef6f36ee45aa3018824 (patch)
treede313c91da6df5a188929293717ac501a10cf229 /bloom.c
parent891c17c95497712126700cdf5cd887bdb0ff3d55 (diff)
Documentation: changed-path Bloom filters use byte words
In Documentation/technical/commit-graph-format.txt, the definition of the BIDX chunk specifies the length is a number of 8-byte words. During development we discovered that using 8-byte words in the Murmur3 hash algorithm causes issues with big-endian versus little- endian machines. Thus, the hash algorithm was adapted to work on a byte-by-byte basis. However, this caused a change in the definition of a "word" in bloom.h. Now, a "word" is a single byte, which allows filters to be as small as two bytes. These length-two filters are demonstrated in t0095-bloom.sh, and a larger filter of length 25 is demonstrated as well. The original point of using 8-byte words was for alignment reasons. It also presented opportunities for extremely sparse Bloom filters when there were a small number of changes at a commit, creating a very low false-positive rate. However, modifying the format at this point is unlikely to be a valuable exercise. Also, this use of single-byte granularity does present opportunities to save space. It is unclear if 8-byte alignment of the filters would present any meaningful performance benefits. Modify the format document to reflect reality. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'bloom.c')
0 files changed, 0 insertions, 0 deletions