Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-10Merge branch 'jk/oidhash'Junio C Hamano
Code clean-up to remove hardcoded SHA-1 hash from many places. * jk/oidhash: hashmap: convert sha1hash() to oidhash() hash.h: move object_id definition from cache.h khash: rename oid helper functions khash: drop sha1-specific map types pack-bitmap: convert khash_sha1 maps into kh_oid_map delta-islands: convert island_marks khash to use oids khash: rename kh_oid_t to kh_oid_set khash: drop broken oid_map typedef object: convert create_object() to use object_id object: convert internal hash_obj() to object_id object: convert lookup_object() to use object_id object: convert lookup_unknown_object() to use object_id pack-objects: convert locate_object_entry_hash() to object_id pack-objects: convert packlist_find() to use object_id pack-bitmap-write: convert some helpers to use object_id upload-pack: rename a "sha1" variable to "oid" describe: fix accidental oid/hash type-punning
2019-07-10Merge branch 'ds/close-object-store'Junio C Hamano
The commit-graph file is now part of the "files that the runtime may keep open file descriptors on, all of which would need to be closed when done with the object store", and the file descriptor to an existing commit-graph file now is closed before "gc" finalizes a new instance to replace it. * ds/close-object-store: packfile: rename close_all_packs to close_object_store packfile: close commit-graph in close_all_packs commit-graph: use raw_object_store when closing
2019-07-10Merge branch 'ds/commit-graph-write-refactor'Junio C Hamano
Renamed from commit-graph-format-v2 and changed scope. * ds/commit-graph-write-refactor: commit-graph: extract write_commit_graph_file() commit-graph: extract copy_oids_to_commits() commit-graph: extract count_distinct_commits() commit-graph: extract fill_oids_from_all_packs() commit-graph: extract fill_oids_from_commit_hex() commit-graph: extract fill_oids_from_packs() commit-graph: create write_commit_graph_context commit-graph: remove Future Work section commit-graph: collapse parameters into flags commit-graph: return with errors during write commit-graph: fix the_repository reference
2019-06-20object: convert create_object() to use object_idJeff King
There are no callers left of create_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: use raw_object_store when closingDerrick Stolee
The close_commit_graph() method took a repository struct, but then only uses the raw_object_store within. Change the function prototype to make the method more flexible. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: extract write_commit_graph_file()Derrick Stolee
The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract write_commit_graph_file() that takes all of the information in the context struct and writes the data to a commit-graph file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: extract copy_oids_to_commits()Derrick Stolee
The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract copy_oids_to_commits(), which fills the commits list with the distinct commits from the oids list. During this loop, it also counts the number of "extra" edges from octopus merges. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: extract count_distinct_commits()Derrick Stolee
The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract count_distinct_commits(), which sorts the oids list, then iterates through to find duplicates. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: extract fill_oids_from_all_packs()Derrick Stolee
The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract fill_oids_from_all_packs() that reads all pack-files for commits and fills the oid list in the context. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: extract fill_oids_from_commit_hex()Derrick Stolee
The write_commit_graph() method is too complex, so we are extracting helper functions one by one. Extract fill_oids_from_commit_hex() that reads the given commit id list and fille the oid list in the context. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: extract fill_oids_from_packs()Derrick Stolee
The write_commit_graph() method is too complex, so we are extracting helper functions one by one. This extracts fill_oids_from_packs() that reads the given pack-file list and fills the oid list in the context. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: create write_commit_graph_contextDerrick Stolee
The write_commit_graph() method is too large and complex. To simplify it, we should extract several helper functions. However, we will risk repeating a lot of declarations related to progress incidators and object id or commit lists. Create a new write_commit_graph_context struct that contains the core data structures used in this process. Replace the other local variables with the values inside the context object. Following this change, we will start to lift code segments wholesale out of the write_commit_graph() method and into helper functions. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: collapse parameters into flagsDerrick Stolee
The write_commit_graph() and write_commit_graph_reachable() methods currently take two boolean parameters: 'append' and 'report_progress'. As we update these methods, adding more parameters this way becomes cluttered and hard to maintain. Collapse these parameters into a 'flags' parameter, and adjust the callers to provide flags as necessary. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12commit-graph: return with errors during writeDerrick Stolee
The write_commit_graph() method uses die() to report failure and exit when confronted with an unexpected condition. This use of die() in a library function is incorrect and is now replaced by error() statements and an int return type. Return zero on success and a negative value on failure. Now that we use 'goto cleanup' to jump to the terminal condition on an error, we have new paths that could lead to uninitialized values. New initializers are added to correct for this. The builtins 'commit-graph', 'gc', and 'commit' call these methods, so update them to check the return value. Test that 'git commit-graph write' returns a proper error code when hitting a failure condition in write_commit_graph(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-19Merge branch 'js/commit-graph-parse-leakfix'Junio C Hamano
Leakfix. * js/commit-graph-parse-leakfix: commit-graph: fix memory leak
2019-05-08Merge branch 'nd/sha1-name-c-wo-the-repository'Junio C Hamano
Further code clean-up to allow the lowest level of name-to-object mapping layer to work with a passed-in repository other than the default one. * nd/sha1-name-c-wo-the-repository: (34 commits) sha1-name.c: remove the_repo from get_oid_mb() sha1-name.c: remove the_repo from other get_oid_* sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name submodule-config.c: use repo_get_oid for reading .gitmodules sha1-name.c: add repo_get_oid() sha1-name.c: remove the_repo from get_oid_with_context_1() sha1-name.c: remove the_repo from resolve_relative_path() sha1-name.c: remove the_repo from diagnose_invalid_index_path() sha1-name.c: remove the_repo from handle_one_ref() sha1-name.c: remove the_repo from get_oid_1() sha1-name.c: remove the_repo from get_oid_basic() sha1-name.c: remove the_repo from get_describe_name() sha1-name.c: remove the_repo from get_oid_oneline() sha1-name.c: add repo_interpret_branch_name() sha1-name.c: remove the_repo from interpret_branch_mark() sha1-name.c: remove the_repo from interpret_nth_prior_checkout() sha1-name.c: remove the_repo from get_short_oid() sha1-name.c: add repo_for_each_abbrev() sha1-name.c: store and use repo in struct disambiguate_state sha1-name.c: add repo_find_unique_abbrev_r() ...
2019-05-07commit-graph: fix memory leakJosh Steadmon
Free the commit graph when verify_commit_graph_lite() reports an error. Credit to OSS-Fuzz for finding this leak. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16commit.cocci: refactor code, avoid double rewriteNguyễn Thái Ngọc Duy
"maybe" pointer in 'struct commit' is tricky because it can be lazily initialized to take advantage of commit-graph if available. This makes it not safe to access directly. This leads to a rule in commit.cocci to rewrite 'x->maybe_tree' to 'get_commit_tree(x)'. But that rule alone could lead to incorrectly rewrite assignments, e.g. from x->maybe_tree = yes to get_commit_tree(x) = yes Because of this we have a second rule to revert this effect. Szeder found out that we could do better by performing the assignment rewrite rule first, then the remaining is read-only access and handled by the current first rule. For this to work, we need to transform "x->maybe_tree = y" to something that does NOT contain "x->maybe_tree" to avoid the original first rule. This is where set_commit_tree() comes in. Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01commit-graph: improve & i18n error messagesÆvar Arnfjörð Bjarmason
Change the error emitted when a commit-graph file is corrupt so that we actually mention the commit-graph, e.g. change errors like: error: improper chunk offset 0000000000385e0c To: error: commit-graph improper chunk offset 0000000000385e0c As discussed in the commits leading up to this one the commit-graph machinery is now used by common commands like "status". If the graph was corrupt we'd often emit some error that gave no indication what was wrong. Now some of them are still cryptic, but they'll at least mention "commit-graph" to give the user a hint as to where to look. While I'm at it mark some of the strings that hadn't been marked for translation. It's clear from the commit history and the code that this was merely forgotten at the time, and wasn't intentional.p5 Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01commit-graph write: don't die if the existing graph is corruptÆvar Arnfjörð Bjarmason
When the commit-graph is written we end up calling parse_commit(). This will in turn invoke code that'll consult the existing commit-graph about the commit, if the graph is corrupted we die. We thus get into a state where a failing "commit-graph verify" can't be followed-up with a "commit-graph write" if core.commitGraph=true is set, the graph either needs to be manually removed to proceed, or core.commitGraph needs to be set to "false". Change the "commit-graph write" codepath to use a new parse_commit_no_graph() helper instead of parse_commit() to avoid this. The latter will call repo_parse_commit_internal() with use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit graph with commit parsing", 2018-04-10). Not using the old graph at all slows down the writing of the new graph by some small amount, but is a sensible way to prevent an error in the existing commit-graph from spreading. Just fixing the current issue would be likely to result in code that's inadvertently broken in the future. New code might use the commit-graph at a distance. To detect such cases introduce a "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our corruption tests, and test that a "write/verify" combo works after every one of our current test cases where we now detect commit-graph corruption. Some of the code changes here might be strictly unnecessary, e.g. I was unable to find cases where the parse_commit() called from write_graph_chunk_data() didn't exit early due to "item->object.parsed" being true in repo_parse_commit_internal() (before the use_commit_graph=1 has any effect). But let's also convert those cases for good measure, we do not have exhaustive tests for all possible types of commit-graph corruption. This might need to be re-visited if we learn to write the commit-graph incrementally, but probably not. Hopefully we'll just start by finding out what commits we have in total, then read the old graph(s) to see what they cover, and finally write a new graph file with everything that's missing. In that case the new graph writing code just needs to continue to use e.g. a parse_commit() that doesn't consult the existing commit-graphs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01commit-graph: don't pass filename to load_commit_graph_one_fd_st()Ævar Arnfjörð Bjarmason
An earlier change implemented load_commit_graph_one_fd_st() in a way that was bug-compatible with earlier code in terms of the "graph file %s is too small" error message printing out the path to the commit-graph (".git/objects/info/commit-graph"). But change that, because: * A function that takes an already-open file descriptor also needing the filename isn't very intuitive. * The vast majority of errors we might emit when loading the graph come from parse_commit_graph(), which doesn't report the filename. Let's not do that either in this case for consistency. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01commit-graph: don't early exit(1) on e.g. "git status"Ævar Arnfjörð Bjarmason
Make the commit-graph loading code work as a library that returns an error code instead of calling exit(1) when the commit-graph is corrupt. This means that e.g. "status" will now report commit-graph corruption as an "error: [...]" at the top of its output, but then proceed to work normally. This required splitting up the load_commit_graph_one() function so that the code that deals with open()-ing and stat()-ing the graph can now be called independently as open_commit_graph(). This is needed because "commit-graph verify" where the graph doesn't exist isn't an error. See the third paragraph in 283e68c72f ("commit-graph: add 'verify' subcommand", 2018-06-27). There's a bug in that logic where we conflate the intended ENOENT with other errno values (e.g. EACCES), but this change doesn't address that. That'll be addressed in a follow-up change. I'm then splitting most of the logic out of load_commit_graph_one() into load_commit_graph_one_fd_st(), which allows for providing an existing file descriptor and stat information to the loading code. This isn't strictly needed, but it would be redundant and confusing to open() and stat() the file twice for some of the codepaths, this allows for calling open_commit_graph() followed by load_commit_graph_one_fd_st(). The "graph_file" still needs to be passed to that function for the the "graph file %s is too small" error message. This leaves load_commit_graph_one() unused by everything except the internal prepare_commit_graph_one() function, so let's mark it as "static". If someone needs it in the future we can remove the "static" attribute. I could also rewrite its sole remaining user ("prepare_commit_graph_one()") to use load_commit_graph_one_fd_st() instead, but let's leave it at this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01commit-graph: fix segfault on e.g. "git status"Ævar Arnfjörð Bjarmason
When core.commitGraph=true is set, various common commands now consult the commit graph. Because the commit-graph code is very trusting of its input data, it's possibly to construct a graph that'll cause an immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In some other cases where git immediately exits with a cryptic error about the graph being broken. The root cause of this is that while the "commit-graph verify" sub-command exhaustively verifies the graph, other users of the graph simply trust the graph, and will e.g. deference data found at certain offsets as pointers, causing segfaults. This change does the bare minimum to ensure that we don't segfault in the common fill_commit_in_graph() codepath called by e.g. setup_revisions(), to do this instrument the "commit-graph verify" tests to always check if "status" would subsequently segfault. This fixes the following tests which would previously segfault: not ok 50 - detect low chunk count not ok 51 - detect missing OID fanout chunk not ok 52 - detect missing OID lookup chunk not ok 53 - detect missing commit data chunk Those happened because with the commit-graph enabled setup_revisions() would eventually call fill_commit_in_graph(), where e.g. g->chunk_commit_data is used early as an offset (and will be 0x0). With this change we get far enough to detect that the graph is broken, and show an error instead. E.g.: $ git status; echo $? error: commit-graph is missing the Commit Data chunk 1 That also sucks, we should *warn* and not hard-fail "status" just because the commit-graph is corrupt, but fixing is left to a follow-up change. A side-effect of changing the reporting from graph_report() to error() is that we now have an "error: " prefix for these even for "commit-graph verify". Pseudo-diff before/after: $ git commit-graph verify -commit-graph is missing the Commit Data chunk +error: commit-graph is missing the Commit Data chunk Changing that is OK. Various errors it emits now early on are prefixed with "error: ", moving these over and changing the output doesn't break anything. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06Merge branch 'ab/commit-graph-write-progress'Junio C Hamano
The codepath to show progress meter while writing out commit-graph file has been improved. * ab/commit-graph-write-progress: commit-graph write: emit a percentage for all progress commit-graph write: add itermediate progress commit-graph write: remove empty line for readability commit-graph write: add more descriptive progress output commit-graph write: show progress for object search commit-graph write: more descriptive "writing out" output commit-graph write: add "Writing out" progress output commit-graph: don't call write_graph_chunk_extra_edges() unnecessarily commit-graph: rename "large edges" to "extra edges"
2019-02-06Merge branch 'ab/commit-graph-write-optim'Junio C Hamano
The codepath to write out commit-graph has been optimized by following the usual pattern of visiting objects in in-pack order. * ab/commit-graph-write-optim: commit-graph write: use pack order when finding commits
2019-02-06Merge branch 'js/commit-graph-chunk-table-fix'Junio C Hamano
The codepath to read from the commit-graph file attempted to read past the end of it when the file's table-of-contents was corrupt. * js/commit-graph-chunk-table-fix: Makefile: correct example fuzz build commit-graph: fix buffer read-overflow commit-graph, fuzz: add fuzzer for commit-graph
2019-02-06Merge branch 'sb/more-repo-in-api'Junio C Hamano
The in-core repository instances are passed through more codepaths. * sb/more-repo-in-api: (23 commits) t/helper/test-repository: celebrate independence from the_repository path.h: make REPO_GIT_PATH_FUNC repository agnostic commit: prepare free_commit_buffer and release_commit_memory for any repo commit-graph: convert remaining functions to handle any repo submodule: don't add submodule as odb for push submodule: use submodule repos for object lookup pretty: prepare format_commit_message to handle arbitrary repositories commit: prepare logmsg_reencode to handle arbitrary repositories commit: prepare repo_unuse_commit_buffer to handle any repo commit: prepare get_commit_buffer to handle any repo commit-reach: prepare in_merge_bases[_many] to handle any repo commit-reach: prepare get_merge_bases to handle any repo commit-reach.c: allow get_merge_bases_many_0 to handle any repo commit-reach.c: allow remove_redundant to handle any repo commit-reach.c: allow merge_bases_many to handle any repo commit-reach.c: allow paint_down_to_common to handle any repo commit: allow parse_commit* to handle any repo object: parse_object to honor its repository argument object-store: prepare has_{sha1, object}_file to handle any repo object-store: prepare read_object_file to deal with any repo ...
2019-01-29Merge branch 'bc/sha-256'Junio C Hamano
Add sha-256 hash and plug it through the code to allow building Git with the "NewHash". * bc/sha-256: hash: add an SHA-256 implementation using OpenSSL sha256: add an SHA-256 implementation using libgcrypt Add a base implementation of SHA-256 support commit-graph: convert to using the_hash_algo t/helper: add a test helper to compute hash speed sha1-file: add a constant for hash block size t: make the sha1 test-tool helper generic t: add basic tests for our SHA-1 implementation cache: make hashcmp and hasheq work with larger hashes hex: introduce functions to print arbitrary hashes sha1-file: provide functions to look up hash algorithms sha1-file: rename algorithm to "sha1"
2019-01-24commit-graph write: emit a percentage for all progressÆvar Arnfjörð Bjarmason
Follow-up 01ca387774 ("commit-graph: split up close_reachable() progress output", 2018-11-19) by making the progress bars in close_reachable() report a completion percentage. This fixes the last occurrence where in the commit graph writing where we didn't report that. The change in 01ca387774 split up the 1x progress bar in close_reachable() into 3x, but left them as dumb counters without a percentage completion. Fixing that is easy, and the only reason it wasn't done already is because that commit was rushed in during the v2.20.0 RC period to fix the unrelated issue of over-reporting commit numbers. See [1] and follow-ups for ML activity at the time and [2] for an alternative approach where the progress bars weren't split up. Now for e.g. linux.git we'll emit: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write Finding commits for commit graph among packed objects: 100% (6529159/6529159), done. Expanding reachable commits in commit graph: 100% (815990/815980), done. Computing commit graph generation numbers: 100% (815983/815983), done. Writing out commit graph in 4 passes: 100% (3263932/3263932), done. 1. https://public-inbox.org/git/20181119202300.18670-1-avarab@gmail.com/ 2. https://public-inbox.org/git/20181122153922.16912-11-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24commit-graph write: add itermediate progressÆvar Arnfjörð Bjarmason
Add progress output to sections of code between "Annotating[...]" and "Computing[...]generation numbers". This can collectively take 5-10 seconds on a large enough repository. On a test repository with I have with ~7 million commits and ~50 million objects we'll now emit: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write Finding commits for commit graph among packed objects: 100% (124763727/124763727), done. Loading known commits in commit graph: 100% (18989461/18989461), done. Expanding reachable commits in commit graph: 100% (18989507/18989461), done. Clearing commit marks in commit graph: 100% (18989507/18989507), done. Counting distinct commits in commit graph: 100% (18989507/18989507), done. Finding extra edges in commit graph: 100% (18989507/18989507), done. Computing commit graph generation numbers: 100% (7250302/7250302), done. Writing out commit graph in 4 passes: 100% (29001208/29001208), done. Whereas on a medium-sized repository such as linux.git these new progress bars won't have time to kick in and as before and we'll still emit output like: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write Finding commits for commit graph among packed objects: 100% (6529159/6529159), done. Expanding reachable commits in commit graph: 815990, done. Computing commit graph generation numbers: 100% (815983/815983), done. Writing out commit graph in 4 passes: 100% (3263932/3263932), done. The "Counting distinct commits in commit graph" phase will spend most of its time paused at "0/*" as we QSORT(...) the list. That's not optimal, but at least we don't seem to be stalling anymore most of the time. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24commit-graph write: remove empty line for readabilityÆvar Arnfjörð Bjarmason
Remove the empty line between a QSORT(...) and the subsequent oideq() for-loop. This makes it clearer that the QSORT(...) is being done so that we can run the oideq() loop on adjacent OIDs. Amends code added in 08fd81c9b6 ("commit-graph: implement write_commit_graph()", 2018-04-02). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24commit-graph write: add more descriptive progress outputÆvar Arnfjörð Bjarmason
Make the progress output shown when we're searching for commits to include in the graph more descriptive. This amends code I added in 7b0f229222 ("commit-graph write: add progress output", 2018-09-17). Now, on linux.git, we'll emit this sort of output in the various modes we support: $ git commit-graph write Finding commits for commit graph among packed objects: 100% (6529159/6529159), done. [...] # Actually we don't emit this since this takes almost no time at # all. But if we did (s/_delayed//) we'd show: $ git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits Finding commits for commit graph from 630 refs: 100% (630/630), done. [...] $ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack Finding commits for commit graph in 3 packs: 6529159, done. [...] The middle on of those is going to be the output users might see in practice, since it'll be emitted when they get the commit graph via gc.writeCommitGraph=true. But as noted above you need a really large number of refs for this message to show. It'll show up on a test repository I have with ~165k refs: Finding commits for commit graph from 165203 refs: 100% (165203/165203), done. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24commit-graph write: show progress for object searchÆvar Arnfjörð Bjarmason
Show the percentage progress for the "Finding commits for commit graph" phase for the common case where we're operating on all packs in the repository, as "commit-graph write" or "gc" will do. Before we'd emit on e.g. linux.git with "commit-graph write": Finding commits for commit graph: 6529159, done. [...] And now: Finding commits for commit graph: 100% (6529159/6529159), done. [...] Since the commit graph only includes those commits that are packed (via for_each_packed_object(...)) the approximate_object_count() returns the actual number of objects we're going to process. Still, it is possible due to a race with "gc" or another process maintaining packs that the number of objects we're going to process is lower than what approximate_object_count() reported. In that case we don't want to stop the progress bar short of 100%. So let's make sure it snaps to 100% at the end. The inverse case is also possible and more likely. I.e. that a new pack has been added between approximate_object_count() and for_each_packed_object(). In that case the percentage will go beyond 100%, and we'll do nothing to snap it back to 100% at the end. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24commit-graph write: more descriptive "writing out" outputÆvar Arnfjörð Bjarmason
Make the "Writing out" part of the progress output more descriptive. Depending on the shape of the graph we either make 3 or 4 passes over it. Let's present this information to the user in case they're wondering what this number, which is much larger than their number of commits, has to do with writing out the commit graph. Now e.g. on linux.git we emit: $ ~/g/git/git --exec-path=$HOME/g/git -C ~/g/linux commit-graph write Finding commits for commit graph: 6529159, done. Expanding reachable commits in commit graph: 815990, done. Computing commit graph generation numbers: 100% (815983/815983), done. Writing out commit graph in 4 passes: 100% (3263932/3263932), done. A note on i18n: Why are we using the Q_() function and passing a number & English text for a singular which'll never be used? Because the plural rules of translated languages may not match those of English, and to use the plural function we need to use this format. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24commit-graph write: add "Writing out" progress outputÆvar Arnfjörð Bjarmason
Add progress output to be shown when we're writing out the commit-graph, this adds to the output already added in 7b0f229222 ("commit-graph write: add progress output", 2018-09-17). As noted in that commit most of the progress output isn't displayed on small repositories, but before this change we'd noticeably hang for 2-3 seconds at the end on medium sized repositories such as linux.git. Now we'll instead show output like this, and reduce the human-observable times at which we're not producing progress output: $ ~/g/git/git --exec-path=$HOME/g/git -C ~/g/2015-04-03-1M-git commit-graph write Finding commits for commit graph: 13064614, done. Expanding reachable commits in commit graph: 1000447, done. Computing commit graph generation numbers: 100% (1000447/1000447), done. Writing out commit graph: 100% (3001341/3001341), done. This "Writing out" number is 3x or 4x the number of commits, depending on the graph we're processing. A later change will make this explicit to the user. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24commit-graph: don't call write_graph_chunk_extra_edges() unnecessarilySZEDER Gábor
The optional 'Extra Edge List' chunk of the commit graph file stores parent information for commits with more than two parents. Since the chunk is optional, write_commit_graph() looks through all commits to find those with more than two parents, and then writes the commit graph file header accordingly, i.e. if there are no such commits, then there won't be a 'Extra Edge List' chunk written, only the three mandatory chunks. However, when it later comes to writing actual chunk data, write_commit_graph() unconditionally invokes write_graph_chunk_extra_edges(), even when it was decided earlier that that chunk won't be written. Strictly speaking there is no bug here, because write_graph_chunk_extra_edges() won't write anything if it doesn't find any commits with more than two parents, but then it unnecessarily and in vain looks through all commits once again in search for such commits. Don't call write_graph_chunk_extra_edges() when that chunk won't be written to spare an unnecessary iteration over all commits. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-22commit-graph: rename "large edges" to "extra edges"SZEDER Gábor
The optional 'Large Edge List' chunk of the commit graph file stores parent information for commits with more than two parents, and the names of most of the macros, variables, struct fields, and functions related to this chunk contain the term "large edges", e.g. write_graph_chunk_large_edges(). However, it's not a really great term, as the edges to the second and subsequent parents stored in this chunk are not any larger than the edges to the first and second parents stored in the "main" 'Commit Data' chunk. It's the number of edges, IOW number of parents, that is larger compared to non-merge and "regular" two-parent merge commits. And indeed, two functions in 'commit-graph.c' have a local variable called 'num_extra_edges' that refer to the same thing, and this "extra edges" term is much better at describing these edges. So let's rename all these references to "large edges" in macro, variable, function, etc. names to "extra edges". There is a GRAPH_OCTOPUS_EDGES_NEEDED macro as well; for the sake of consistency rename it to GRAPH_EXTRA_EDGES_NEEDED. We can do so safely without causing any incompatibility issues, because the term "large edges" doesn't come up in the file format itself in any form (the chunk's magic is {'E', 'D', 'G', 'E'}, there is no 'L' in there), but only in the specification text. The string "large edges", however, does come up in the output of 'git commit-graph read' and in tests looking at its input, but that command is explicitly documented as debugging aid, so we can change its output and the affected tests safely. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-22commit-graph write: use pack order when finding commitsÆvar Arnfjörð Bjarmason
Slightly optimize the "commit-graph write" step by using FOR_EACH_OBJECT_PACK_ORDER with for_each_object_in_pack(). See commit [1] and [2] for the facility and a similar optimization for "cat-file". On Linux it is around 5% slower to run: echo 1 >/proc/sys/vm/drop_caches && cat .git/objects/pack/* >/dev/null && git cat-file --batch-all-objects --batch-check --unordered Than the same thing with the "cat" omitted. This is as expected, since we're iterating in pack order and the "cat" is extra work. Before this change the opposite was true of "commit-graph write". We were 6% faster if we first ran "cat" to efficiently populate the FS cache for our sole big pack on linux.git, than if we had populated it via for_each_object_in_pack(). Now we're 3% faster without the "cat" instead. My tests were done on an unloaded Linux 3.10 system with 10 runs for each. Derrick Stolee did his own tests on Windows[3] showing a 2% improvement with a high degree of accuracy. 1. 736eb88fdc ("for_each_packed_object: support iterating in pack-order", 2018-08-10) 2. 0750bb5b51 ("cat-file: support "unordered" output for --batch-all-objects", 2018-08-10) 3. https://public-inbox.org/git/f71fa868-25e8-a9c9-46a6-611b987f1a8f@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-19Merge branch 'ds/commit-graph-assert-missing-parents'Junio C Hamano
Tightening error checking in commit-graph writer. * ds/commit-graph-assert-missing-parents: commit-graph: writing missing parents is a BUG
2019-01-16commit-graph: fix buffer read-overflowJosh Steadmon
fuzz-commit-graph identified a case where Git will read past the end of a buffer containing a commit graph if the graph's header has an incorrect chunk count. A simple bounds check in parse_commit_graph() prevents this. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-16commit-graph, fuzz: add fuzzer for commit-graphJosh Steadmon
Break load_commit_graph_one() into a new function, parse_commit_graph(). The latter function operates on arbitrary buffers, which makes it suitable as a fuzzing target. Since parse_commit_graph() is only called by load_commit_graph_one() (and the fuzzer described below), we omit error messages that would be duplicated by the caller. Adds fuzz-commit-graph.c, which provides a fuzzing entry point compatible with libFuzzer (and possibly other fuzzing engines). Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15Merge branch 'ab/commit-graph-progress-fix'Junio C Hamano
* ab/commit-graph-progress-fix: commit-graph: split up close_reachable() progress output
2019-01-03commit-graph: writing missing parents is a BUGDerrick Stolee
When writing a commit-graph, we write GRAPH_MISSING_PARENT if the parent's object id does not appear in the list of commits to be written into the commit-graph. This was done as the initial design allowed commits to have missing parents, but the final version requires the commit-graph to be closed under reachability. Thus, this GRAPH_MISSING_PARENT value should never be written. However, there are reasons why it could be written! These range from a bug in the reachable-closure code to a memory error causing the binary search into the list of object ids to fail. In either case, we should fail fast and avoid writing the commit-graph file with bad data. Remove the GRAPH_MISSING_PARENT constant in favor of the constant GRAPH_EDGE_LAST_MASK, which has the same value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28commit-graph: convert remaining functions to handle any repoStefan Beller
Convert all functions to handle arbitrary repositories in commit-graph.c that are used by functions taking a repository argument already. Notable exclusion is write_commit_graph and its local functions as that only works on the_repository. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-20commit-graph: split up close_reachable() progress outputÆvar Arnfjörð Bjarmason
Amend the progress output added in 7b0f229222 ("commit-graph write: add progress output", 2018-09-17) so that the total numbers it reports aren't higher than the total number of commits anymore. See [1] for a bug report pointing that out. When I added this I wasn't intending to provide an accurate count, but just have some progress output to show the user the command wasn't hanging[2]. But since we are showing numbers, let's make them accurate. The progress descriptions were suggested by Derrick Stolee in [3]. As noted in [2] we are unlikely to show anything except the "Expanding reachable..." message even on fairly large repositories such as linux.git. On a test repository I have with north of 7 million commits all of these are displayed. Two of them don't show up for long, but as noted in [5] future-proofing this for if the loops become more expensive in the future makes sense. 1. https://public-inbox.org/git/20181010203738.GE23446@szeder.dev/ 2. https://public-inbox.org/git/87pnwhea8y.fsf@evledraar.gmail.com/ 3. https://public-inbox.org/git/f7a0cbee-863c-61d3-4959-5cec8b43c705@gmail.com/ 4. https://public-inbox.org/git/20181015160545.GG19800@szeder.dev/ 5. https://public-inbox.org/git/87murle8da.fsf@evledraar.gmail.com/ Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-14commit-graph: convert to using the_hash_algobrian m. carlson
Instead of using hard-coded constants for object sizes, use the_hash_algo to look them up. In addition, use a function call to look up the object ID version and produce the correct value. For now, we use version 1, which means to use the default algorithm used in the rest of the repository. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13sha1-file: use an object_directory for the main object dirJeff King
Our handling of alternate object directories is needlessly different from the main object directory. As a result, many places in the code basically look like this: do_something(r->objects->objdir); for (odb = r->objects->alt_odb_list; odb; odb = odb->next) do_something(odb->path); That gets annoying when do_something() is non-trivial, and we've resorted to gross hacks like creating fake alternates (see find_short_object_filename()). Instead, let's give each raw_object_store a unified list of object_directory structs. The first will be the main store, and everything after is an alternate. Very few callers even care about the distinction, and can just loop over the whole list (and those who care can just treat the first element differently). A few observations: - we don't need r->objects->objectdir anymore, and can just mechanically convert that to r->objects->odb->path - object_directory's path field needs to become a real pointer rather than a FLEX_ARRAY, in order to fill it with expand_base_dir() - we'll call prepare_alt_odb() earlier in many functions (i.e., outside of the loop). This may result in us calling it even when our function would be satisfied looking only at the main odb. But this doesn't matter in practice. It's not a very expensive operation in the first place, and in the majority of cases it will be a noop. We call it already (and cache its results) in prepare_packed_git(), and we'll generally check packs before loose objects. So essentially every program is going to call it immediately once per program. Arguably we should just prepare_alt_odb() immediately upon setting up the repository's object directory, which would save us sprinkling calls throughout the code base (and forgetting to do so has been a source of subtle bugs in the past). But I've stopped short of that here, since there are already a lot of other moving parts in this patch. - Most call sites just get shorter. The check_and_freshen() functions are an exception, because they have entry points to handle local and nonlocal directories separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13rename "alternate_object_database" to "object_directory"Jeff King
In preparation for unifying the handling of alt odb's and the normal repo object directory, let's use a more neutral name. This patch is purely mechanical, swapping the type name, and converting any variables named "alt" to "odb". There should be no functional change, but it will reduce the noise in subsequent diffs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19Merge branch 'ds/commit-graph-leakfix'Junio C Hamano
Code clean-up. * ds/commit-graph-leakfix: commit-graph: reduce initial oid allocation builtin/commit-graph.c: UNLEAK variables commit-graph: clean up leaked memory during write
2018-10-16Merge branch 'ds/commit-graph-with-grafts'Junio C Hamano
The recently introduced commit-graph auxiliary data is incompatible with mechanisms such as replace & grafts that "breaks" immutable nature of the object reference relationship. Disable optimizations based on its use (and updating existing commit-graph) when these incompatible features are in use in the repository. * ds/commit-graph-with-grafts: commit-graph: close_commit_graph before shallow walk commit-graph: not compatible with uninitialized repo commit-graph: not compatible with grafts commit-graph: not compatible with replace objects test-repository: properly init repo commit-graph: update design document refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback refs.c: migrate internal ref iteration to pass thru repository argument