Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-23Merge branch 'tb/pack-bitmap-traversal-with-boundary'Junio C Hamano
The object traversal using reachability bitmap done by "pack-object" has been tweaked to take advantage of the fact that using "boundary" commits as representative of all the uninteresting ones can save quite a lot of object enumeration. * tb/pack-bitmap-traversal-with-boundary: pack-bitmap.c: use commit boundary during bitmap traversal pack-bitmap.c: extract `fill_in_bitmap()` object: add object_array initializer helper function
2023-05-08object: add object_array initializer helper functionTaylor Blau
The object_array API has an OBJECT_ARRAY_INIT macro, but lacks a function to initialize an object_array at a given location in memory. Introduce `object_array_init()` to implement such a function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24hash-ll.h: split out of hash.h to remove dependency on repository.hElijah Newren
hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object.h: move some inline functions and defines from cache.hElijah Newren
The object_type() inline function is very tied to the enum object_type declaration within object.h, and just seemed to make more sense to live there. That makes S_ISGITLINK and some other defines make sense to go with it, as well as the create_ce_mode() and canon_mode() inline functions. Move all these inline functions and defines from cache.h to object.h. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24object.h: stop depending on cache.h; make cache.h depend on object.hElijah Newren
Things should be able to depend on object.h without pulling in all of cache.h. Move an enum to allow this. Note that a couple files previously depended on things brought in through cache.h indirectly (revision.h -> commit.h -> object.h -> cache.h). As such, this change requires making existing dependencies more explicit in half a dozen files. The inclusion of strbuf.h in some headers if of particular note: these headers directly embedded a strbuf in some new structs, meaning they should have been including strbuf.h all along but were indirectly getting the necessary definitions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-07parse_object(): allow skipping hash checkJeff King
The parse_object() function checks the object hash of any object it parses. This is a nice feature, as it means we may catch bit corruption during normal use, rather than waiting for specific fsck operations. But it also can be slow. It's particularly noticeable for blobs, where except for the hash check, we could return without loading the object contents at all. Now one may wonder what is the point of calling parse_object() on a blob in the first place then, but usually it's not intentional: we were fed an oid from somewhere, don't know the type, and want an object struct. For commits and trees, the parsing is usually helpful; we're about to look at the contents anyway. But this is less true for blobs, where we may be collecting them as part of a reachability traversal, etc, and don't actually care what's in them. And blobs, of course, tend to be larger. We don't want to just throw out the hash-checks for blobs, though. We do depend on them in some circumstances (e.g., rev-list --verify-objects uses parse_object() to check them). It's only the callers that know how they're going to use the result. And so we can help them by providing a special flag to skip the hash check. We could just apply this to blobs, as they're going to be the main source of performance improvement. But if a caller doesn't care about checking the hash, we might as well skip it for other object types, too. Even though we can't avoid reading the object contents, we can still skip the actual hash computation. If this seems like it is making Git a little bit less safe against corruption, it may be. But it's part of a series of tradeoffs we're already making. For instance, "rev-list --objects" does not open the contents of blobs it prints. And when a commit graph is present, we skip opening most commits entirely. The important thing will be to use this flag in cases where it's safe to skip the check. For instance, when serving a pack for a fetch, we know the client will fully index the objects and do a connectivity check itself. There's little to be gained from the server side re-hashing a blob itself. And indeed, most of the time we don't! The revision machinery won't open up a blob reached by traversal, but only one requested directly with a "want" line. So applied properly, this new feature shouldn't make anything less safe in practice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19revision: allow --ancestry-path to take an argumentElijah Newren
We have long allowed users to run e.g. git log --ancestry-path master..seen which shows all commits which satisfy all three of these criteria: * are an ancestor of seen * are not an ancestor of master * have master as an ancestor This commit allows another variant: git log --ancestry-path=$TOPIC master..seen which shows all commits which satisfy all of these criteria: * are an ancestor of seen * are not an ancestor of master * have $TOPIC in their ancestry-path that last bullet can be defined as commits meeting any of these criteria: * are an ancestor of $TOPIC * have $TOPIC as an ancestor * are $TOPIC This also allows multiple --ancestry-path arguments, which can be used to find commits with any of the given topics in their ancestry path. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-03reflog: libify delete reflog function and helpersJohn Cai
Currently stash shells out to reflog in order to delete refs. In an effort to reduce how much we shell out to a subprocess, libify the functionality that stash needs into reflog.c. Add a reflog_delete function that is pretty much the logic in the while loop in builtin/reflog.c cmd_reflog_delete(). This is a function that builtin/reflog.c and builtin/stash.c can both call. Also move functions needed by reflog_delete and export them. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28*.[ch] *_INIT macros: use { 0 } for a "zero out" idiomÆvar Arnfjörð Bjarmason
In C it isn't required to specify that all members of a struct are zero'd out to 0, NULL or '\0', just providing a "{ 0 }" will accomplish that. Let's also change code that provided N zero'd fields to just provide one, and change e.g. "{ NULL }" to "{ 0 }" for consistency. I.e. even if the first member is a pointer let's use "0" instead of "NULL". The point of using "0" consistently is to pick one, and to not have the reader wonder why we're not using the same pattern everywhere. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-30builtin/pack-objects.c: remove duplicate hash lookupTaylor Blau
In the original code from 08cdfb1337 (pack-objects --keep-unreachable, 2007-09-16), we add each object to the packing list with type `obj->type`, where `obj` comes from `lookup_unknown_object()`. Unless we had already looked up and parsed the object, this will be `OBJ_NONE`. That's fine, since oe_set_type() sets the type_valid bit to '0', and we determine the real type later on. So the only thing we need from the object lookup is access to the `flags` field so that we can mark that we've added the object with `OBJECT_ADDED` to avoid adding it again (we can just pass `OBJ_NONE` directly instead of grabbing it from the object). But add_object_entry() already rejects duplicates! This has been the behavior since 7a979d99ba (Thin pack - create packfile with missing delta base., 2006-02-19), but 08cdfb1337 didn't take advantage of it. Moreover, to do the OBJECT_ADDED check, we have to do a hash lookup in `obj_hash`. So we can drop the lookup_unknown_object() call completely, *and* the OBJECT_ADDED flag, too, since the spot we're touching here is the only location that checks it. In the end, we perform the same number of hash lookups, but with the added bonus that we don't waste memory allocating an OBJ_NONE object (if we were traversing, we'd need it eventually, but the whole point of this code path is not to traverse). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-29object.h: add lookup_object_by_type() functionJeff King
In some cases it's useful for efficiency reasons to get the type of an object before deciding whether to parse it, but we still want an object struct. E.g., in reachable.c, bitmaps give us the type, but we just want to mark flags on each object. Likewise, we may loop over every object and only parse tags in order to peel them; checking the type first lets us avoid parsing the non-tags. But our lookup_blob(), etc, functions make getting an object struct annoying: we have to call the right function for every type. And we cannot just use the generic lookup_object(), because it only returns an already-seen object; it won't allocate a new object struct. Let's provide a function that dispatches to the correct lookup_* function based on a run-time type. In fact, reachable.c already has such a helper, so we'll just make that public. I did change the return type from "void *" to "struct object *". While the former is a clever way to avoid casting inside the function, it's less safe and less informative to people reading the function declaration. The next commit will add a new caller. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-29object.h: expand docstring for lookup_unknown_object()Jeff King
The lookup_unknown_object() system is not often used and is somewhat confusing. Let's try to explain it a bit more (which is especially important as I'm adding a related but slightly different function in the next commit). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-16Merge branch 'jt/push-negotiation'Junio C Hamano
"git push" learns to discover common ancestor with the receiving end over protocol v2. * jt/push-negotiation: send-pack: support push negotiation fetch: teach independent negotiation (no packfile) fetch-pack: refactor command and capability write fetch-pack: refactor add_haves() fetch-pack: refactor process_acks()
2021-05-05fetch: teach independent negotiation (no packfile)Jonathan Tan
Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-13lookup_unknown_object(): take a repository argumentJeff King
All of the other lookup_foo() functions take a repository argument, but lookup_unknown_object() was never converted, and it uses the_repository internally. Let's fix that. We could leave a wrapper that uses the_repository, but there aren't that many calls, so we'll just convert them all. I looked briefly at each site to see if we had a repository struct (besides the_repository) we could pass, but none of them do (so this conversion to pass the_repository is a pure noop in each case, though it does take us one step closer to eventually getting rid of the_repository). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31object: allow clear_commit_marks_all to handle any repoRené Scharfe
Allow callers to specify the repository to use. Rename the function to repo_clear_commit_marks to document its new scope. No functional change intended. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17maintenance: add auto condition for commit-graph taskDerrick Stolee
Instead of writing a new commit-graph in every 'git maintenance run --auto' process (when maintenance.commit-graph.enalbed is configured to be true), only write when there are "enough" commits not in a commit-graph file. This count is controlled by the maintenance.commit-graph.auto config option. To compute the count, use a depth-first search starting at each ref, and leaving markers using the SEEN flag. If this count reaches the limit, then terminate early and start the task. Otherwise, this operation will peel every ref and parse the commit it points to. If these are all in the commit-graph, then this is typically a very fast operation. Users with many refs might feel a slow-down, and hence could consider updating their limit to be very small. A negative value will force the step to run every time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10Merge branch 'tb/fix-persistent-shallow' into masterJunio C Hamano
When "fetch.writeCommitGraph" configuration is set in a shallow repository and a fetch moves the shallow boundary, we wrote out broken commit-graph files that do not match the reality, which has been corrected. * tb/fix-persistent-shallow: commit.c: don't persist substituted parents when unshallowing
2020-07-09commit.c: don't persist substituted parents when unshallowingTaylor Blau
Since 37b9dcabfc (shallow.c: use '{commit,rollback}_shallow_file', 2020-04-22), Git knows how to reset stat-validity checks for the $GIT_DIR/shallow file, allowing it to change between a shallow and non-shallow state in the same process (e.g., in the case of 'git fetch --unshallow'). However, when $GIT_DIR/shallow changes, Git does not alter or remove any grafts (nor substituted parents) in memory. This comes up in a "git fetch --unshallow" with fetch.writeCommitGraph set to true. Ordinarily in a shallow repository (and before 37b9dcabfc, even in this case), commit_graph_compatible() would return false, indicating that the repository should not be used to write a commit-graphs (since commit-graph files cannot represent a shallow history). But since 37b9dcabfc, in an --unshallow operation that check succeeds. Thus even though the repository isn't shallow any longer (that is, we have all of the objects), the in-core representation of those objects still has munged parents at the shallow boundaries. When the commit-graph write proceeds, we use the incorrect parentage, producing wrong results. There are two ways for a user to work around this: either (1) set 'fetch.writeCommitGraph' to 'false', or (2) drop the commit-graph after unshallowing. One way to fix this would be to reset the parsed object pool entirely (flushing the cache and thus preventing subsequent reads from modifying their parents) after unshallowing. That would produce a problem when callers have a now-stale reference to the old pool, and so this patch implements a different approach. Instead, attach a new bit to the pool, 'substituted_parent', which indicates if the repository *ever* stored a commit which had its parents modified (i.e., the shallow boundary prior to unshallowing). This bit needs to be sticky because all reads subsequent to modifying a commit's parents are unreliable when unshallowing. Modify the check in 'commit_graph_compatible' to take this bit into account, and correctly avoid generating commit-graphs in this case, thus solving the bug. Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Reported-by: Jay Conrod <jayconrod@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-07Merge branch 'rs/pack-bits-in-object-better'Junio C Hamano
By renumbering object flag bits, "struct object" managed to lose bloated inter-field padding. * rs/pack-bits-in-object-better: revision: reallocate TOPO_WALK object flags
2020-07-07Merge branch 'bc/http-push-flagsfix'Junio C Hamano
The code to push changes over "dumb" HTTP had a bad interaction with the commit reachability code due to incorrect allocation of object flag bits, which has been corrected. * bc/http-push-flagsfix: http-push: ensure unforced pushes fail when data would be lost
2020-06-24revision: reallocate TOPO_WALK object flagsRené Scharfe
The bit fields in struct object have an unfortunate layout. Here's what pahole reports on x86_64 GNU/Linux: struct object { unsigned int parsed:1; /* 0: 0 4 */ unsigned int type:3; /* 0: 1 4 */ /* XXX 28 bits hole, try to pack */ /* Force alignment to the next boundary: */ unsigned int :0; unsigned int flags:29; /* 4: 0 4 */ /* XXX 3 bits hole, try to pack */ struct object_id oid; /* 8 32 */ /* size: 40, cachelines: 1, members: 4 */ /* sum members: 32 */ /* sum bitfield members: 33 bits, bit holes: 2, sum bit holes: 31 bits */ /* last cacheline: 40 bytes */ }; Notice the 1+3+29=33 bits in bit fields and 28+3=31 bits in holes. There are holes inside the flags bit field as well -- while some object flags are used for more than one purpose, 22, 23 and 24 are still free. Use 23 and 24 instead of 27 and 28 for TOPO_WALK_EXPLORED and TOPO_WALK_INDEGREE. This allows us to reduce FLAG_BITS by one so that all bitfields combined fit into a single 32-bit slot: struct object { unsigned int parsed:1; /* 0: 0 4 */ unsigned int type:3; /* 0: 1 4 */ unsigned int flags:28; /* 0: 4 4 */ struct object_id oid; /* 4 32 */ /* size: 36, cachelines: 1, members: 4 */ /* last cacheline: 36 bytes */ }; With this tight packing the size of struct object is reduced by 10%. Other architectures probably benefit as well. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24http-push: ensure unforced pushes fail when data would be lostbrian m. carlson
When we push using the DAV-based protocol, the client is the one that performs the ref updates and therefore makes the checks to see whether an unforced push should be allowed. We make this check by determining if either (a) we lack the object file for the old value of the ref or (b) the new value of the ref is not newer than the old value, and in either case, reject the push. However, the ref_newer function, which performs this latter check, has an odd behavior due to the reuse of certain object flags. Specifically, it will incorrectly return false in its first invocation and then correctly return true on a subsequent invocation. This occurs because the object flags used by http-push.c are the same as those used by commit-reach.c, which implements ref_newer, and one piece of code misinterprets the flags set by the other. Note that this does not occur in all cases. For example, if the example used in the tests is changed to use one repository instead of two and rewind the head to add a commit, the test passes and we correctly reject the push. However, the example provided does trigger this behavior, and the code has been broken in this way since at least Git 2.0.0. To solve this problem, let's move the two sets of object flags so that they don't overlap, since we're clearly using them at the same time. The new set should not conflict with other usage because other users are either builtin code (which is not compiled into git http-push) or upload-pack (which we similarly do not use here). Reported-by: Michael Ward <mward@smartsoftwareinc.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-18object: drop parsed_object_pool->commit_countAbhishek Kumar
14ba97f8 (alloc: allow arbitrary repositories for alloc functions, 2018-05-15) introduced parsed_object_pool->commit_count to keep count of commits per repository and was used to assign commit->index. However, commit-slab code requires commit->index values to be unique and a global count would be correct, rather than a per-repo count. Let's introduce a static counter variable, `parsed_commits_count` to keep track of parsed commits so far. As commit_count has no use anymore, let's also drop it from the struct. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10revision: --show-pulls adds helpful mergesDerrick Stolee
The default file history simplification of "git log -- <path>" or "git rev-list -- <path>" focuses on providing the smallest set of commits that first contributed a change. The revision walk greatly restricts the set of walked commits by visiting only the first TREESAME parent of a merge commit, when one exists. This means that portions of the commit-graph are not walked, which can be a performance benefit, but can also "hide" commits that added changes but were ignored by a merge resolution. The --full-history option modifies this by walking all commits and reporting a merge commit as "interesting" if it has _any_ parent that is not TREESAME. This tends to be an over-representation of important commits, especially in an environment where most merge commits are created by pull request completion. Suppose we have a commit A and we create a commit B on top that changes our file. When we merge the pull request, we create a merge commit M. If no one else changed the file in the first-parent history between M and A, then M will not be TREESAME to its first parent, but will be TREESAME to B. Thus, the simplified history will be "B". However, M will appear in the --full-history mode. However, suppose that a number of topics T1, T2, ..., Tn were created based on commits C1, C2, ..., Cn between A and M as follows: A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn \ \ \ \ / / / / \ \__.. \ \/ ..__T1 / Tn \ \__.. /\ ..__T2 / \_____________________B \____________________/ If the commits T1, T2, ... Tn did not change the file, then all of P1 through Pn will be TREESAME to their first parent, but not TREESAME to their second. This means that all of those merge commits appear in the --full-history view, with edges that immediately collapse into the lower history without introducing interesting single-parent commits. The --simplify-merges option was introduced to remove these extra merge commits. By noticing that the rewritten parents are reachable from their first parents, those edges can be simplified away. Finally, the commits now look like single-parent commits that are TREESAME to their "only" parent. Thus, they are removed and this issue does not cause issues anymore. However, this also ends up removing the commit M from the history view! Even worse, the --simplify-merges option requires walking the entire history before returning a single result. Many Git users are using Git alongside a Git service that provides code storage alongside a code review tool commonly called "Pull Requests" or "Merge Requests" against a target branch. When these requests are accepted and merged, they typically create a merge commit whose first parent is the previous branch tip and the second parent is the tip of the topic branch used for the request. This presents a valuable order to the parents, but also makes that merge commit slightly special. Users may want to see not only which commits changed a file, but which pull requests merged those commits into their branch. In the previous example, this would mean the users want to see the merge commit "M" in addition to the single- parent commit "C". Users are even more likely to want these merge commits when they use pull requests to merge into a feature branch before merging that feature branch into their trunk. In some sense, users are asking for the "first" merge commit to bring in the change to their branch. As long as the parent order is consistent, this can be handled with the following rule: Include a merge commit if it is not TREESAME to its first parent, but is TREESAME to a later parent. These merges look like the merge commits that would result from running "git pull <topic>" on a main branch. Thus, the option to show these commits is called "--show-pulls". This has the added benefit of showing the commits created by closing a pull request or merge request on any of the Git hosting and code review platforms. To test these options, extend the standard test example to include a merge commit that is not TREESAME to its first parent. It is surprising that that option was not already in the example, as it is instructive. In particular, this extension demonstrates a common issue with file history simplification. When a user resolves a merge conflict using "-Xours" or otherwise ignoring one side of the conflict, they create a TREESAME edge that probably should not be TREESAME. This leads users to become frustrated and complain that "my change disappeared!" In my experience, showing them history with --full-history and --simplify-merges quickly reveals the problematic merge. As mentioned, this option is expensive to compute. The --show-pulls option _might_ show the merge commit (usually titled "resolving conflicts") more quickly. Of course, this depends on the user having the correct parent order, which is backwards when using "git pull master" from a topic branch. There are some special considerations when combining the --show-pulls option with --simplify-merges. This requires adding a new PULL_MERGE object flag to store the information from the initial TREESAME comparisons. This helps avoid dropping those commits in later filters. This is covered by a test, including how the parents can be simplified. Since "struct object" has already ruined its 32-bit alignment by using 33 bits across parsed, type, and flags member, let's not make it worse. PULL_MERGE is used in revision.c with the same value (1u<<15) as REACHABLE in commit-graph.c. The REACHABLE flag is only used when writing a commit-graph file, and a revision walk using --show-pulls does not happen in the same process. Care must be taken in the future to ensure this remains the case. Update Documentation/rev-list-options.txt with significant details around this option. This requires updating the example in the History Simplification section to demonstrate some of the problems with TREESAME second parents. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-13pack-bitmap: fix leak of haves/wants object listsJeff King
When we do a bitmap-aware revision traversal, we create an object_list for each of the "haves" and "wants" tips. After creating the result bitmaps these are no longer needed or used, but we never free the list memory. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-25commit-graph: fix writing first commit-graph during fetchDerrick Stolee
The previous commit includes a failing test for an issue around fetch.writeCommitGraph and fetching in a repo with a submodule. Here, we fix that bug and set the test to "test_expect_success". The problem arises with this set of commands when the remote repo at <url> has a submodule. Note that --recurse-submodules is not needed to demonstrate the bug. $ git clone <url> test $ cd test $ git -c fetch.writeCommitGraph=true fetch origin Computing commit graph generation numbers: 100% (12/12), done. BUG: commit-graph.c:886: missing parent <hash1> for commit <hash2> Aborted (core dumped) As an initial fix, I converted the code in builtin/fetch.c that calls write_commit_graph_reachable() to instead launch a "git commit-graph write --reachable --split" process. That code worked, but is not how we want the feature to work long-term. That test did demonstrate that the issue must be something to do with internal state of the 'git fetch' process. The write_commit_graph() method in commit-graph.c ensures the commits we plan to write are "closed under reachability" using close_reachable(). This method walks from the input commits, and uses the UNINTERESTING flag to mark which commits have already been visited. This allows the walk to take O(N) time, where N is the number of commits, instead of O(P) time, where P is the number of paths. (The number of paths can be exponential in the number of commits.) However, the UNINTERESTING flag is used in lots of places in the codebase. This flag usually means some barrier to stop a commit walk, such as in revision-walking to compare histories. It is not often cleared after the walk completes because the starting points of those walks do not have the UNINTERESTING flag, and clear_commit_marks() would stop immediately. This is happening during a 'git fetch' call with a remote. The fetch negotiation is comparing the remote refs with the local refs and marking some commits as UNINTERESTING. I tested running clear_commit_marks_many() to clear the UNINTERESTING flag inside close_reachable(), but the tips did not have the flag, so that did nothing. It turns out that the calculate_changed_submodule_paths() method is at fault. Thanks, Peff, for pointing out this detail! More specifically, for each submodule, the collect_changed_submodules() runs a revision walk to essentially do file-history on the list of submodules. That revision walk marks commits UNININTERESTING if they are simplified away by not changing the submodule. Instead, I finally arrived on the conclusion that I should use a flag that is not used in any other part of the code. In commit-reach.c, a number of flags were defined for commit walk algorithms. The REACHABLE flag seemed like it made the most sense, and it seems it was not actually used in the file. The REACHABLE flag was used in early versions of commit-reach.c, but was removed by 4fbcca4 (commit-reach: make can_all_from_reach... linear, 2018-07-20). Add the REACHABLE flag to commit-graph.c and use it instead of UNINTERESTING in close_reachable(). This fixes the bug in manual testing. Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Jeff King <peff@peff.net> Helped-by: Szeder Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20object: convert create_object() to use object_idJeff King
There are no callers left of create_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20object: convert lookup_object() to use object_idJeff King
There are no callers left of lookup_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20object: convert lookup_unknown_object() to use object_idJeff King
There are no callers left of lookup_unknown_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-05*.[ch]: remove extern from function declarations using spatchDenton Liu
There has been a push to remove extern from function declarations. Remove some instances of "extern" for function declarations which are caught by Coccinelle. Note that Coccinelle has some difficulty with processing functions with `__attribute__` or varargs so some `extern` declarations are left behind to be dealt with in a future patch. This was the Coccinelle patch used: @@ type T; identifier f; @@ - extern T f(...); and it was run with: $ git ls-files \*.{c,h} | grep -v ^compat/ | xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place Files under `compat/` are intentionally excluded as some are directly copied from external sources and we should avoid churning them as much as possible. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-02revision.c: generation-based topo-order algorithmDerrick Stolee
The current --topo-order algorithm requires walking all reachable commits up front, topo-sorting them, all before outputting the first value. This patch introduces a new algorithm which uses stored generation numbers to incrementally walk in topo-order, outputting commits as we go. This can dramatically reduce the computation time to write a fixed number of commits, such as when limiting with "-n <N>" or filling the first page of a pager. When running a command like 'git rev-list --topo-order HEAD', Git performed the following steps: 1. Run limit_list(), which parses all reachable commits, adds them to a linked list, and distributes UNINTERESTING flags. If all unprocessed commits are UNINTERESTING, then it may terminate without walking all reachable commits. This does not occur if we do not specify UNINTERESTING commits. 2. Run sort_in_topological_order(), which is an implementation of Kahn's algorithm. It first iterates through the entire set of important commits and computes the in-degree of each (plus one, as we use 'zero' as a special value here). Then, we walk the commits in priority order, adding them to the priority queue if and only if their in-degree is one. As we remove commits from this priority queue, we decrement the in-degree of their parents. 3. While we are peeling commits for output, get_revision_1() uses pop_commit on the full list of commits computed by sort_in_topological_order(). In the new algorithm, these three steps correspond to three different commit walks. We run these walks simultaneously, and advance each only as far as necessary to satisfy the requirements of the 'higher order' walk. We know when we can pause each walk by using generation numbers from the commit- graph feature. Recall that the generation number of a commit satisfies: * If the commit has at least one parent, then the generation number is one more than the maximum generation number among its parents. * If the commit has no parent, then the generation number is one. There are two special generation numbers: * GENERATION_NUMBER_INFINITY: this value is 0xffffffff and indicates that the commit is not stored in the commit-graph and the generation number was not previously calculated. * GENERATION_NUMBER_ZERO: this value (0) is a special indicator to say that the commit-graph was generated by a version of Git that does not compute generation numbers (such as v2.18.0). Since we use generation_numbers_enabled() before using the new algorithm, we do not need to worry about GENERATION_NUMBER_ZERO. However, the existence of GENERATION_NUMBER_INFINITY implies the following weaker statement than the usual we expect from generation numbers: If A and B are commits with generation numbers gen(A) and gen(B) and gen(A) < gen(B), then A cannot reach B. Thus, we will walk in each of our stages until the "maximum unexpanded generation number" is strictly lower than the generation number of a commit we are about to use. The walks are as follows: 1. EXPLORE: using the explore_queue priority queue (ordered by maximizing the generation number), parse each reachable commit until all commits in the queue have generation number strictly lower than needed. During this walk, update the UNINTERESTING flags as necessary. 2. INDEGREE: using the indegree_queue priority queue (ordered by maximizing the generation number), add one to the in- degree of each parent for each commit that is walked. Since we walk in order of decreasing generation number, we know that discovering an in-degree value of 0 means the value for that commit was not initialized, so should be initialized to two. (Recall that in-degree value "1" is what we use to say a commit is ready for output.) As we iterate the parents of a commit during this walk, ensure the EXPLORE walk has walked beyond their generation numbers. 3. TOPO: using the topo_queue priority queue (ordered based on the sort_order given, which could be commit-date, author- date, or typical topo-order which treats the queue as a LIFO stack), remove a commit from the queue and decrement the in-degree of each parent. If a parent has an in-degree of one, then we add it to the topo_queue. Before we decrement the in-degree, however, ensure the INDEGREE walk has walked beyond that generation number. The implementations of these walks are in the following methods: * explore_walk_step and explore_to_depth * indegree_walk_step and compute_indegrees_to_depth * next_topo_commit and expand_topo_walk These methods have some patterns that may seem strange at first, but they are probably carry-overs from their equivalents in limit_list and sort_in_topological_order. One thing that is missing from this implementation is a proper way to stop walking when the entire queue is UNINTERESTING, so this implementation is not enabled by comparisions, such as in 'git rev-list --topo-order A..B'. This can be updated in the future. In my local testing, I used the following Git commands on the Linux repository in three modes: HEAD~1 with no commit-graph, HEAD~1 with a commit-graph, and HEAD with a commit-graph. This allows comparing the benefits we get from parsing commits from the commit-graph and then again the benefits we get by restricting the set of commits we walk. Test: git rev-list --topo-order -100 HEAD HEAD~1, no commit-graph: 6.80 s HEAD~1, w/ commit-graph: 0.77 s HEAD, w/ commit-graph: 0.02 s Test: git rev-list --topo-order -100 HEAD -- tools HEAD~1, no commit-graph: 9.63 s HEAD~1, w/ commit-graph: 6.06 s HEAD, w/ commit-graph: 0.06 s This speedup is due to a few things. First, the new generation- number-enabled algorithm walks commits on order of the number of results output (subject to some branching structure expectations). Since we limit to 100 results, we are running a query similar to filling a single page of results. Second, when specifying a path, we must parse the root tree object for each commit we walk. The previous benefits from the commit-graph are entirely from reading the commit-graph instead of parsing commits. Since we need to parse trees for the same number of commits as before, we slow down significantly from the non-path-based query. For the test above, I specifically selected a path that is changed frequently, including by merge commits. A less-frequently-changed path (such as 'README') has similar end-to-end time since we need to walk the same number of commits (before determining we do not have 100 hits). However, get the benefit that the output is presented to the user as it is discovered, much the same as a normal 'git log' command (no '--topo-order'). This is an improved user experience, even if the command has the same runtime. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'ds/reachable'Junio C Hamano
The code for computing history reachability has been shuffled, obtained a bunch of new tests to cover them, and then being improved. * ds/reachable: commit-reach: correct accidental #include of C file commit-reach: use can_all_from_reach commit-reach: make can_all_from_reach... linear commit-reach: replace ref_newer logic test-reach: test commit_contains test-reach: test can_all_from_reach_with_flags test-reach: test reduce_heads test-reach: test get_merge_bases_many test-reach: test is_descendant_of test-reach: test in_merge_bases test-reach: create new test tool for ref_newer commit-reach: move can_all_from_reach_with_flags upload-pack: generalize commit date cutoff upload-pack: refactor ok_to_give_up() upload-pack: make reachable() more generic commit-reach: move commit_contains from ref-filter commit-reach: move ref_newer from remote.c commit.h: remove method declarations commit-reach: move walk methods from commit.c
2018-08-15Add missing includes and forward declarationsElijah Newren
I looped over the toplevel header files, creating a temporary two-line C program for each consisting of #include "git-compat-util.h" #include $HEADER This patch is the result of manually fixing errors in compiling those tiny programs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-03Merge branch 'sb/object-store-lookup'Junio C Hamano
lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...
2018-08-03Merge branch 'jt/fetch-pack-negotiator'Junio C Hamano
Code restructuring and a small fix to transport protocol v2 during fetching. * jt/fetch-pack-negotiator: fetch-pack: introduce negotiator API fetch-pack: move common check and marking together fetch-pack: make negotiation-related vars local fetch-pack: use ref adv. to prune "have" sent fetch-pack: directly end negotiation if ACK ready fetch-pack: clear marks before re-marking fetch-pack: split up everything_local()
2018-07-25Merge branch 'jt/partial-clone-fsck-connectivity'Junio C Hamano
Partial clone support of "git clone" has been updated to correctly validate the objects it receives from the other side. The server side has been corrected to send objects that are directly requested, even if they may match the filtering criteria (e.g. when doing a "lazy blob" partial clone). * jt/partial-clone-fsck-connectivity: clone: check connectivity even if clone is partial upload-pack: send refs' objects despite "filter"
2018-07-21commit-reach: move can_all_from_reach_with_flagsDerrick Stolee
There are several commit walks in the codebase. Group them together into a new commit-reach.c file and corresponding header. After we group these walks into one place, we can reduce duplicate logic by calling equivalent methods. The can_all_from_reach_with_flags method is used in a stateful way by upload-pack.c. The parameters are very flexible, so we will be able to use its commit walking logic for many other callers. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-21commit-reach: move walk methods from commit.cDerrick Stolee
There are several commit walks in the codebase. Group them together into a new commit-reach.c file and corresponding header. After we group these walks into one place, we can reduce duplicate logic by calling equivalent methods. The method declarations in commit.h are not touched by this commit and will be moved in a following commit. Many consumers need to point to commit-reach.h and that would bloat this commit. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-18Merge branch 'sb/object-store-grafts'Junio C Hamano
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-07-09upload-pack: send refs' objects despite "filter"Jonathan Tan
A filter line in a request to upload-pack filters out objects regardless of whether they are directly referenced by a "want" line or not. This means that cloning with "--filter=blob:none" (or another filter that excludes blobs) from a repository with at least one ref pointing to a blob (for example, the Git repository itself) results in output like the following: error: missing object referenced by 'refs/tags/junio-gpg-pub' and if that particular blob is not referenced by a fetched tree, the resulting clone fails fsck because there is no object from the remote to vouch that the missing object is a promisor object. Update both the protocol and the upload-pack implementation to include all explicitly specified "want" objects in the packfile regardless of the filter specification. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object.c: allow parse_object to handle arbitrary repositoriesStefan Beller
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object.c: allow parse_object_buffer to handle arbitrary repositoriesStefan Beller
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29commit.c: migrate the commit buffer to the parsed object storeStefan Beller
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: allow lookup_object to handle arbitrary repositoriesStefan Beller
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: allow object_as_type to handle arbitrary repositoriesStefan Beller
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to object_as_typeStefan Beller
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to parse_object_bufferStefan Beller
Add a repository argument to allow the callers of parse_object_buffer to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to lookup_objectStefan Beller
Add a repository argument to allow callers of lookup_object to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to parse_objectStefan Beller
Add a repository argument to allow the callers of parse_object to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>