Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJunio C Hamano <gitster@pobox.com>2018-05-08 09:59:20 +0300
committerJunio C Hamano <gitster@pobox.com>2018-05-08 09:59:20 +0300
commitb10edb2df55241b2e042b3d5473537904d09d193 (patch)
tree0e456f3d65f1268c364a8d6d2b463456f23e222a
parent4f4d0b42bae8091fd989aa6d7db72ecdd86aa36b (diff)
parent7547b95b4fbb8591726b1d9381c176cc27fc6aea (diff)
Merge branch 'ds/commit-graph'
Precompute and store information necessary for ancestry traversal in a separate file to optimize graph walking. * ds/commit-graph: commit-graph: implement "--append" option commit-graph: build graph from starting commits commit-graph: read only from specific pack-indexes commit: integrate commit graph with commit parsing commit-graph: close under reachability commit-graph: add core.commitGraph setting commit-graph: implement git commit-graph read commit-graph: implement git-commit-graph write commit-graph: implement write_commit_graph() commit-graph: create git-commit-graph builtin graph: add commit graph design document commit-graph: add format document csum-file: refactor finalize_hashfile() method csum-file: rename hashclose() to finalize_hashfile()
-rw-r--r--.gitignore1
-rw-r--r--Documentation/config.txt4
-rw-r--r--Documentation/git-commit-graph.txt94
-rw-r--r--Documentation/technical/commit-graph-format.txt97
-rw-r--r--Documentation/technical/commit-graph.txt163
-rw-r--r--Makefile2
-rw-r--r--alloc.c1
-rw-r--r--builtin.h1
-rw-r--r--builtin/commit-graph.c171
-rw-r--r--builtin/index-pack.c2
-rw-r--r--builtin/pack-objects.c6
-rw-r--r--bulk-checkin.c4
-rw-r--r--cache.h1
-rw-r--r--command-list.txt1
-rw-r--r--commit-graph.c741
-rw-r--r--commit-graph.h46
-rw-r--r--commit.c3
-rw-r--r--commit.h3
-rw-r--r--config.c5
-rw-r--r--contrib/completion/git-completion.bash2
-rw-r--r--csum-file.c10
-rw-r--r--csum-file.h9
-rw-r--r--environment.c1
-rw-r--r--fast-import.c2
-rw-r--r--git.c1
-rw-r--r--pack-bitmap-write.c2
-rw-r--r--pack-write.c5
-rw-r--r--packfile.c4
-rw-r--r--packfile.h2
-rwxr-xr-xt/t5318-commit-graph.sh224
30 files changed, 1587 insertions, 21 deletions
diff --git a/.gitignore b/.gitignore
index 8d9a458555..b2a1ae4a1d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -35,6 +35,7 @@
/git-clone
/git-column
/git-commit
+/git-commit-graph
/git-commit-tree
/git-config
/git-count-objects
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2659153cb3..cb4c93f74a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,10 @@ core.notesRef::
This setting defaults to "refs/notes/commits", and it can be overridden by
the `GIT_NOTES_REF` environment variable. See linkgit:git-notes[1].
+core.commitGraph::
+ Enable git commit graph feature. Allows reading from the
+ commit-graph file.
+
core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
new file mode 100644
index 0000000000..4c97b555cc
--- /dev/null
+++ b/Documentation/git-commit-graph.txt
@@ -0,0 +1,94 @@
+git-commit-graph(1)
+===================
+
+NAME
+----
+git-commit-graph - Write and verify Git commit graph files
+
+
+SYNOPSIS
+--------
+[verse]
+'git commit-graph read' [--object-dir <dir>]
+'git commit-graph write' <options> [--object-dir <dir>]
+
+
+DESCRIPTION
+-----------
+
+Manage the serialized commit graph file.
+
+
+OPTIONS
+-------
+--object-dir::
+ Use given directory for the location of packfiles and commit graph
+ file. This parameter exists to specify the location of an alternate
+ that only has the objects directory, not a full .git directory. The
+ commit graph file is expected to be at <dir>/info/commit-graph and
+ the packfiles are expected to be in <dir>/pack.
+
+
+COMMANDS
+--------
+'write'::
+
+Write a commit graph file based on the commits found in packfiles.
++
+With the `--stdin-packs` option, generate the new commit graph by
+walking objects only in the specified pack-indexes. (Cannot be combined
+with --stdin-commits.)
++
+With the `--stdin-commits` option, generate the new commit graph by
+walking commits starting at the commits specified in stdin as a list
+of OIDs in hex, one OID per line. (Cannot be combined with
+--stdin-packs.)
++
+With the `--append` option, include all commits that are present in the
+existing commit-graph file.
+
+'read'::
+
+Read a graph file given by the commit-graph file and output basic
+details about the graph file. Used for debugging purposes.
+
+
+EXAMPLES
+--------
+
+* Write a commit graph file for the packed commits in your local .git folder.
++
+------------------------------------------------
+$ git commit-graph write
+------------------------------------------------
+
+* Write a graph file, extending the current graph file using commits
+* in <pack-index>.
++
+------------------------------------------------
+$ echo <pack-index> | git commit-graph write --stdin-packs
+------------------------------------------------
+
+* Write a graph file containing all reachable commits.
++
+------------------------------------------------
+$ git show-ref -s | git commit-graph write --stdin-commits
+------------------------------------------------
+
+* Write a graph file containing all commits in the current
+* commit-graph file along with those reachable from HEAD.
++
+------------------------------------------------
+$ git rev-parse HEAD | git commit-graph write --stdin-commits --append
+------------------------------------------------
+
+* Read basic information from the commit-graph file.
++
+------------------------------------------------
+$ git commit-graph read
+------------------------------------------------
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 0000000000..ad6af8105c
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,97 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+ generation number 1; commits with parents have generation number
+ one more than the maximum generation number of its parents. We
+ reserve zero as special, and can be used to mark a generation
+ number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+ the graph file.
+
+These positional references are stored as unsigned 32-bit integers
+corresponding to the array position withing the list of commit OIDs. We
+use the most-significant bit for special purposes, so we can store at most
+(1 << 31) - 1 (around 2 billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+ 4-byte signature:
+ The signature is: {'C', 'G', 'P', 'H'}
+
+ 1-byte version number:
+ Currently, the only valid version is 1.
+
+ 1-byte Hash Version (1 = SHA-1)
+ We infer the hash length (H) from this value.
+
+ 1-byte number (C) of "chunks"
+
+ 1-byte (reserved for later use)
+ Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+ (C + 1) * 12 bytes listing the table of contents for the chunks:
+ First 4 bytes describe the chunk id. Value 0 is a terminating label.
+ Other 8 bytes provide the byte-offset in current file for chunk to
+ start. (Chunks are ordered contiguously in the file, so you can infer
+ the length using the next chunk position if necessary.) Each chunk
+ ID appears at most once.
+
+ The remaining data in the body is described one chunk at a time, and
+ these chunks may be given in any order. Chunks are required unless
+ otherwise specified.
+
+CHUNK DATA:
+
+ OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+ The ith entry, F[i], stores the number of OIDs with first
+ byte at most i. Thus F[255] stores the total
+ number of commits (N).
+
+ OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+ The OIDs for all commits in the graph, sorted in ascending order.
+
+ Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+ * The first H bytes are for the OID of the root tree.
+ * The next 8 bytes are for the positions of the first two parents
+ of the ith commit. Stores value 0xffffffff if no parent in that
+ position. If there are more than two parents, the second value
+ has its most-significant bit on and the other bits store an array
+ position into the Large Edge List chunk.
+ * The next 8 bytes store the generation number of the commit and
+ the commit time in seconds since EPOCH. The generation number
+ uses the higher 30 bits of the first 4 bytes, while the commit
+ time uses the 32 bits of the second 4 bytes, along with the lowest
+ 2 bits of the lowest byte, storing the 33rd and 34th bit of the
+ commit time.
+
+ Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+ This list of 4-byte values store the second through nth parents for
+ all octopus merges. The second parent value in the commit data stores
+ an array position within this list along with the most-significant bit
+ on. Starting at that array position, iterate through this list of commit
+ positions for the parents until reaching a value with the most-significant
+ bit on. The other bits correspond to the position of the last parent.
+
+TRAILER:
+
+ H-byte HASH-checksum of all of the above.
diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
new file mode 100644
index 0000000000..0550c6d0dc
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,163 @@
+Git Commit Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+ the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+ If A and B are commits with generation numbers N and M, respectively,
+ and N <= M, then A cannot reach B. That is, we know without searching
+ that B is not an ancestor of A because it is further from a root commit
+ than A.
+
+ Conversely, when checking if A is an ancestor of B, then we only need
+ to walk commits until all commits on the walk boundary have generation
+ number at most N. If we walk commits using a priority queue seeded by
+ generation numbers, then we always expand the boundary commit with highest
+ generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+ If A and B are commits with commit time X and Y, respectively, and
+ X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+Design Details
+--------------
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+ .git/objects/info directory. This could be stored in the info directory
+ of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+ so a future change of hash algorithm does not require a change in format.
+
+Future Work
+-----------
+
+- The commit graph feature currently does not honor commit grafts. This can
+ be remedied by duplicating or refactoring the current graft logic.
+
+- The 'commit-graph' subcommand does not have a "verify" mode that is
+ necessary for integration with fsck.
+
+- The file format includes room for precomputed generation numbers. These
+ are not currently computed, so all generation numbers will be marked as
+ 0 (or "uncomputed"). A later patch will include this calculation.
+
+- After computing and storing generation numbers, we must make graph
+ walks aware of generation numbers to gain the performance benefits they
+ enable. This will mostly be accomplished by swapping a commit-date-ordered
+ priority queue with one ordered by generation number. The following
+ operations are important candidates:
+
+ - paint_down_to_common()
+ - 'log --topo-order'
+
+- Currently, parse_commit_gently() requires filling in the root tree
+ object for a commit. This passes through lookup_tree() and consequently
+ lookup_object(). Also, it calls lookup_commit() when loading the parents.
+ These method calls check the ODB for object existence, even if the
+ consumer does not need the content. For example, we do not need the
+ tree contents when computing merge bases. Now that commit parsing is
+ removed from the computation time, these lookup operations are the
+ slowest operations keeping graph walks from being fast. Consider
+ loading these objects without verifying their existence in the ODB and
+ only loading them fully when consumers need them. Consider a method
+ such as "ensure_tree_loaded(commit)" that fully loads a tree before
+ using commit->tree.
+
+- The current design uses the 'commit-graph' subcommand to generate the graph.
+ When this feature stabilizes enough to recommend to most users, we should
+ add automatic graph writes to common operations that create many commits.
+ For example, one could compute a graph on 'clone', 'fetch', or 'repack'
+ commands.
+
+- A server could provide a commit graph file as part of the network protocol
+ to avoid extra calculations by clients. This feature is only of benefit if
+ the user is willing to trust the file, because verifying the file is correct
+ is as hard as computing it from scratch.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+ Chromium work item for: Serialized Commit Graph
+
+[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+ An abandoned patch that introduced generation numbers.
+
+[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+ Discussion about generation numbers on commits and how they interact
+ with fsck.
+
+[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
+ More discussion about generation numbers and not storing them inside
+ commit objects. A valuable quote:
+
+ "I think we should be moving more in the direction of keeping
+ repo-local caches for optimizations. Reachability bitmaps have been
+ a big performance win. I think we should be doing the same with our
+ properties of commits. Not just generation numbers, but making it
+ cheap to access the graph structure without zlib-inflating whole
+ commit objects (i.e., packv4 or something like the "metapacks" I
+ proposed a few years ago)."
+
+[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+ A patch to remove the ahead-behind calculation from 'status'.
diff --git a/Makefile b/Makefile
index 46a215dc89..f89f2dba5b 100644
--- a/Makefile
+++ b/Makefile
@@ -816,6 +816,7 @@ LIB_OBJS += color.o
LIB_OBJS += column.o
LIB_OBJS += combine-diff.o
LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
LIB_OBJS += compat/obstack.o
LIB_OBJS += compat/terminal.o
LIB_OBJS += config.o
@@ -995,6 +996,7 @@ BUILTIN_OBJS += builtin/clone.o
BUILTIN_OBJS += builtin/column.o
BUILTIN_OBJS += builtin/commit-tree.o
BUILTIN_OBJS += builtin/commit.o
+BUILTIN_OBJS += builtin/commit-graph.o
BUILTIN_OBJS += builtin/config.o
BUILTIN_OBJS += builtin/count-objects.o
BUILTIN_OBJS += builtin/credential.o
diff --git a/alloc.c b/alloc.c
index 12afadfacd..cf4f8b61e1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
c->object.type = OBJ_COMMIT;
c->index = alloc_commit_index();
+ c->graph_pos = COMMIT_NOT_FROM_GRAPH;
return c;
}
diff --git a/builtin.h b/builtin.h
index 3f3fdfc281..4e0f64723e 100644
--- a/builtin.h
+++ b/builtin.h
@@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const char *prefix);
extern int cmd_clean(int argc, const char **argv, const char *prefix);
extern int cmd_column(int argc, const char **argv, const char *prefix);
extern int cmd_commit(int argc, const char **argv, const char *prefix);
+extern int cmd_commit_graph(int argc, const char **argv, const char *prefix);
extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
extern int cmd_config(int argc, const char **argv, const char *prefix);
extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
new file mode 100644
index 0000000000..37420ae0fd
--- /dev/null
+++ b/builtin/commit-graph.c
@@ -0,0 +1,171 @@
+#include "builtin.h"
+#include "config.h"
+#include "dir.h"
+#include "lockfile.h"
+#include "parse-options.h"
+#include "commit-graph.h"
+
+static char const * const builtin_commit_graph_usage[] = {
+ N_("git commit-graph [--object-dir <objdir>]"),
+ N_("git commit-graph read [--object-dir <objdir>]"),
+ N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+ NULL
+};
+
+static const char * const builtin_commit_graph_read_usage[] = {
+ N_("git commit-graph read [--object-dir <objdir>]"),
+ NULL
+};
+
+static const char * const builtin_commit_graph_write_usage[] = {
+ N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
+ NULL
+};
+
+static struct opts_commit_graph {
+ const char *obj_dir;
+ int stdin_packs;
+ int stdin_commits;
+ int append;
+} opts;
+
+static int graph_read(int argc, const char **argv)
+{
+ struct commit_graph *graph = NULL;
+ char *graph_name;
+
+ static struct option builtin_commit_graph_read_options[] = {
+ OPT_STRING(0, "object-dir", &opts.obj_dir,
+ N_("dir"),
+ N_("The object directory to store the graph")),
+ OPT_END(),
+ };
+
+ argc = parse_options(argc, argv, NULL,
+ builtin_commit_graph_read_options,
+ builtin_commit_graph_read_usage, 0);
+
+ if (!opts.obj_dir)
+ opts.obj_dir = get_object_directory();
+
+ graph_name = get_commit_graph_filename(opts.obj_dir);
+ graph = load_commit_graph_one(graph_name);
+
+ if (!graph)
+ die("graph file %s does not exist", graph_name);
+ FREE_AND_NULL(graph_name);
+
+ printf("header: %08x %d %d %d %d\n",
+ ntohl(*(uint32_t*)graph->data),
+ *(unsigned char*)(graph->data + 4),
+ *(unsigned char*)(graph->data + 5),
+ *(unsigned char*)(graph->data + 6),
+ *(unsigned char*)(graph->data + 7));
+ printf("num_commits: %u\n", graph->num_commits);
+ printf("chunks:");
+
+ if (graph->chunk_oid_fanout)
+ printf(" oid_fanout");
+ if (graph->chunk_oid_lookup)
+ printf(" oid_lookup");
+ if (graph->chunk_commit_data)
+ printf(" commit_metadata");
+ if (graph->chunk_large_edges)
+ printf(" large_edges");
+ printf("\n");
+
+ return 0;
+}
+
+static int graph_write(int argc, const char **argv)
+{
+ const char **pack_indexes = NULL;
+ int packs_nr = 0;
+ const char **commit_hex = NULL;
+ int commits_nr = 0;
+ const char **lines = NULL;
+ int lines_nr = 0;
+ int lines_alloc = 0;
+
+ static struct option builtin_commit_graph_write_options[] = {
+ OPT_STRING(0, "object-dir", &opts.obj_dir,
+ N_("dir"),
+ N_("The object directory to store the graph")),
+ OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
+ N_("scan pack-indexes listed by stdin for commits")),
+ OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
+ N_("start walk at commits listed by stdin")),
+ OPT_BOOL(0, "append", &opts.append,
+ N_("include all commits already in the commit-graph file")),
+ OPT_END(),
+ };
+
+ argc = parse_options(argc, argv, NULL,
+ builtin_commit_graph_write_options,
+ builtin_commit_graph_write_usage, 0);
+
+ if (opts.stdin_packs && opts.stdin_commits)
+ die(_("cannot use both --stdin-commits and --stdin-packs"));
+ if (!opts.obj_dir)
+ opts.obj_dir = get_object_directory();
+
+ if (opts.stdin_packs || opts.stdin_commits) {
+ struct strbuf buf = STRBUF_INIT;
+ lines_nr = 0;
+ lines_alloc = 128;
+ ALLOC_ARRAY(lines, lines_alloc);
+
+ while (strbuf_getline(&buf, stdin) != EOF) {
+ ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
+ lines[lines_nr++] = strbuf_detach(&buf, NULL);
+ }
+
+ if (opts.stdin_packs) {
+ pack_indexes = lines;
+ packs_nr = lines_nr;
+ }
+ if (opts.stdin_commits) {
+ commit_hex = lines;
+ commits_nr = lines_nr;
+ }
+ }
+
+ write_commit_graph(opts.obj_dir,
+ pack_indexes,
+ packs_nr,
+ commit_hex,
+ commits_nr,
+ opts.append);
+
+ return 0;
+}
+
+int cmd_commit_graph(int argc, const char **argv, const char *prefix)
+{
+ static struct option builtin_commit_graph_options[] = {
+ OPT_STRING(0, "object-dir", &opts.obj_dir,
+ N_("dir"),
+ N_("The object directory to store the graph")),
+ OPT_END(),
+ };
+
+ if (argc == 2 && !strcmp(argv[1], "-h"))
+ usage_with_options(builtin_commit_graph_usage,
+ builtin_commit_graph_options);
+
+ git_config(git_default_config, NULL);
+ argc = parse_options(argc, argv, prefix,
+ builtin_commit_graph_options,
+ builtin_commit_graph_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+
+ if (argc > 0) {
+ if (!strcmp(argv[0], "read"))
+ return graph_read(argc, argv);
+ if (!strcmp(argv[0], "write"))
+ return graph_write(argc, argv);
+ }
+
+ usage_with_options(builtin_commit_graph_usage,
+ builtin_commit_graph_options);
+}
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 78e66b9986..a2cd29d8f4 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1271,7 +1271,7 @@ static void conclude_pack(int fix_thin_pack, const char *curr_pack, unsigned cha
nr_objects - nr_objects_initial);
stop_progress_msg(&progress, msg.buf);
strbuf_release(&msg);
- hashclose(f, tail_hash, 0);
+ finalize_hashfile(f, tail_hash, 0);
hashcpy(read_hash, pack_hash);
fixup_pack_header_footer(output_fd, pack_hash,
curr_pack, nr_objects,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4bdae5a1d8..24b1c6c5dd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,11 +837,11 @@ static void write_pack_file(void)
* If so, rewrite it like in fast-import
*/
if (pack_to_stdout) {
- hashclose(f, oid.hash, CSUM_CLOSE);
+ finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
} else if (nr_written == nr_remaining) {
- hashclose(f, oid.hash, CSUM_FSYNC);
+ finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
} else {
- int fd = hashclose(f, oid.hash, 0);
+ int fd = finalize_hashfile(f, oid.hash, 0);
fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
nr_written, oid.hash, offset);
close(fd);
diff --git a/bulk-checkin.c b/bulk-checkin.c
index de1f4040c7..c0bc8de107 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -36,9 +36,9 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
unlink(state->pack_tmp_name);
goto clear_exit;
} else if (state->nr_written == 1) {
- hashclose(state->f, oid.hash, CSUM_FSYNC);
+ finalize_hashfile(state->f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
} else {
- int fd = hashclose(state->f, oid.hash, 0);
+ int fd = finalize_hashfile(state->f, oid.hash, 0);
fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
state->nr_written, oid.hash,
state->offset);
diff --git a/cache.h b/cache.h
index 6f14cfbb6f..601a49c79f 100644
--- a/cache.h
+++ b/cache.h
@@ -806,6 +806,7 @@ extern char *git_replace_ref_base;
extern int fsync_object_files;
extern int core_preload_index;
+extern int core_commit_graph;
extern int core_apply_sparse_checkout;
extern int precomposed_unicode;
extern int protect_hfs;
diff --git a/command-list.txt b/command-list.txt
index a1fad28fd8..835c5890be 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -34,6 +34,7 @@ git-clean mainporcelain
git-clone mainporcelain init
git-column purehelpers
git-commit mainporcelain history
+git-commit-graph plumbingmanipulators
git-commit-tree plumbingmanipulators
git-config ancillarymanipulators
git-count-objects ancillaryinterrogators
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644
index 0000000000..3fc1e0da27
--- /dev/null
+++ b/commit-graph.c
@@ -0,0 +1,741 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "revision.h"
+#include "sha1-lookup.h"
+#include "commit-graph.h"
+#include "object-store.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
+#define GRAPH_PARENT_MISSING 0x7fffffff
+#define GRAPH_EDGE_LAST_MASK 0x7fffffff
+#define GRAPH_PARENT_NONE 0x70000000
+
+#define GRAPH_LAST_EDGE 0x80000000
+
+#define GRAPH_FANOUT_SIZE (4 * 256)
+#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
+ GRAPH_OID_LEN + 8)
+
+char *get_commit_graph_filename(const char *obj_dir)
+{
+ return xstrfmt("%s/info/commit-graph", obj_dir);
+}
+
+static struct commit_graph *alloc_commit_graph(void)
+{
+ struct commit_graph *g = xcalloc(1, sizeof(*g));
+ g->graph_fd = -1;
+
+ return g;
+}
+
+struct commit_graph *load_commit_graph_one(const char *graph_file)
+{
+ void *graph_map;
+ const unsigned char *data, *chunk_lookup;
+ size_t graph_size;
+ struct stat st;
+ uint32_t i;
+ struct commit_graph *graph;
+ int fd = git_open(graph_file);
+ uint64_t last_chunk_offset;
+ uint32_t last_chunk_id;
+ uint32_t graph_signature;
+ unsigned char graph_version, hash_version;
+
+ if (fd < 0)
+ return NULL;
+ if (fstat(fd, &st)) {
+ close(fd);
+ return NULL;
+ }
+ graph_size = xsize_t(st.st_size);
+
+ if (graph_size < GRAPH_MIN_SIZE) {
+ close(fd);
+ die("graph file %s is too small", graph_file);
+ }
+ graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+ data = (const unsigned char *)graph_map;
+
+ graph_signature = get_be32(data);
+ if (graph_signature != GRAPH_SIGNATURE) {
+ error("graph signature %X does not match signature %X",
+ graph_signature, GRAPH_SIGNATURE);
+ goto cleanup_fail;
+ }
+
+ graph_version = *(unsigned char*)(data + 4);
+ if (graph_version != GRAPH_VERSION) {
+ error("graph version %X does not match version %X",
+ graph_version, GRAPH_VERSION);
+ goto cleanup_fail;
+ }
+
+ hash_version = *(unsigned char*)(data + 5);
+ if (hash_version != GRAPH_OID_VERSION) {
+ error("hash version %X does not match version %X",
+ hash_version, GRAPH_OID_VERSION);
+ goto cleanup_fail;
+ }
+
+ graph = alloc_commit_graph();
+
+ graph->hash_len = GRAPH_OID_LEN;
+ graph->num_chunks = *(unsigned char*)(data + 6);
+ graph->graph_fd = fd;
+ graph->data = graph_map;
+ graph->data_len = graph_size;
+
+ last_chunk_id = 0;
+ last_chunk_offset = 8;
+ chunk_lookup = data + 8;
+ for (i = 0; i < graph->num_chunks; i++) {
+ uint32_t chunk_id = get_be32(chunk_lookup + 0);
+ uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+ int chunk_repeated = 0;
+
+ chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
+
+ if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+ error("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32),
+ (uint32_t)chunk_offset);
+ goto cleanup_fail;
+ }
+
+ switch (chunk_id) {
+ case GRAPH_CHUNKID_OIDFANOUT:
+ if (graph->chunk_oid_fanout)
+ chunk_repeated = 1;
+ else
+ graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
+ break;
+
+ case GRAPH_CHUNKID_OIDLOOKUP:
+ if (graph->chunk_oid_lookup)
+ chunk_repeated = 1;
+ else
+ graph->chunk_oid_lookup = data + chunk_offset;
+ break;
+
+ case GRAPH_CHUNKID_DATA:
+ if (graph->chunk_commit_data)
+ chunk_repeated = 1;
+ else
+ graph->chunk_commit_data = data + chunk_offset;
+ break;
+
+ case GRAPH_CHUNKID_LARGEEDGES:
+ if (graph->chunk_large_edges)
+ chunk_repeated = 1;
+ else
+ graph->chunk_large_edges = data + chunk_offset;
+ break;
+ }
+
+ if (chunk_repeated) {
+ error("chunk id %08x appears multiple times", chunk_id);
+ goto cleanup_fail;
+ }
+
+ if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
+ {
+ graph->num_commits = (chunk_offset - last_chunk_offset)
+ / graph->hash_len;
+ }
+
+ last_chunk_id = chunk_id;
+ last_chunk_offset = chunk_offset;
+ }
+
+ return graph;
+
+cleanup_fail:
+ munmap(graph_map, graph_size);
+ close(fd);
+ exit(1);
+}
+
+/* global storage */
+static struct commit_graph *commit_graph = NULL;
+
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+ char *graph_name;
+
+ if (commit_graph)
+ return;
+
+ graph_name = get_commit_graph_filename(obj_dir);
+ commit_graph = load_commit_graph_one(graph_name);
+
+ FREE_AND_NULL(graph_name);
+}
+
+static int prepare_commit_graph_run_once = 0;
+static void prepare_commit_graph(void)
+{
+ struct alternate_object_database *alt;
+ char *obj_dir;
+
+ if (prepare_commit_graph_run_once)
+ return;
+ prepare_commit_graph_run_once = 1;
+
+ obj_dir = get_object_directory();
+ prepare_commit_graph_one(obj_dir);
+ prepare_alt_odb(the_repository);
+ for (alt = the_repository->objects->alt_odb_list;
+ !commit_graph && alt;
+ alt = alt->next)
+ prepare_commit_graph_one(alt->path);
+}
+
+static void close_commit_graph(void)
+{
+ if (!commit_graph)
+ return;
+
+ if (commit_graph->graph_fd >= 0) {
+ munmap((void *)commit_graph->data, commit_graph->data_len);
+ commit_graph->data = NULL;
+ close(commit_graph->graph_fd);
+ }
+
+ FREE_AND_NULL(commit_graph);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos)
+{
+ return bsearch_hash(oid->hash, g->chunk_oid_fanout,
+ g->chunk_oid_lookup, g->hash_len, pos);
+}
+
+static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+ uint64_t pos,
+ struct commit_list **pptr)
+{
+ struct commit *c;
+ struct object_id oid;
+ hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+ c = lookup_commit(&oid);
+ if (!c)
+ die("could not find commit %s", oid_to_hex(&oid));
+ c->graph_pos = pos;
+ return &commit_list_insert(c, pptr)->next;
+}
+
+static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos)
+{
+ struct object_id oid;
+ uint32_t edge_value;
+ uint32_t *parent_data_ptr;
+ uint64_t date_low, date_high;
+ struct commit_list **pptr;
+ const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos;
+
+ item->object.parsed = 1;
+ item->graph_pos = pos;
+
+ hashcpy(oid.hash, commit_data);
+ item->tree = lookup_tree(&oid);
+
+ date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
+ date_low = get_be32(commit_data + g->hash_len + 12);
+ item->date = (timestamp_t)((date_high << 32) | date_low);
+
+ pptr = &item->parents;
+
+ edge_value = get_be32(commit_data + g->hash_len);
+ if (edge_value == GRAPH_PARENT_NONE)
+ return 1;
+ pptr = insert_parent_or_die(g, edge_value, pptr);
+
+ edge_value = get_be32(commit_data + g->hash_len + 4);
+ if (edge_value == GRAPH_PARENT_NONE)
+ return 1;
+ if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {
+ pptr = insert_parent_or_die(g, edge_value, pptr);
+ return 1;
+ }
+
+ parent_data_ptr = (uint32_t*)(g->chunk_large_edges +
+ 4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
+ do {
+ edge_value = get_be32(parent_data_ptr);
+ pptr = insert_parent_or_die(g,
+ edge_value & GRAPH_EDGE_LAST_MASK,
+ pptr);
+ parent_data_ptr++;
+ } while (!(edge_value & GRAPH_LAST_EDGE));
+
+ return 1;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+ if (!core_commit_graph)
+ return 0;
+ if (item->object.parsed)
+ return 1;
+
+ prepare_commit_graph();
+ if (commit_graph) {
+ uint32_t pos;
+ int found;
+ if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
+ pos = item->graph_pos;
+ found = 1;
+ } else {
+ found = bsearch_graph(commit_graph, &(item->object.oid), &pos);
+ }
+
+ if (found)
+ return fill_commit_in_graph(item, commit_graph, pos);
+ }
+
+ return 0;
+}
+
+static void write_graph_chunk_fanout(struct hashfile *f,
+ struct commit **commits,
+ int nr_commits)
+{
+ int i, count = 0;
+ struct commit **list = commits;
+
+ /*
+ * Write the first-level table (the list is sorted,
+ * but we use a 256-entry lookup to be able to avoid
+ * having to do eight extra binary search iterations).
+ */
+ for (i = 0; i < 256; i++) {
+ while (count < nr_commits) {
+ if ((*list)->object.oid.hash[0] != i)
+ break;
+ count++;
+ list++;
+ }
+
+ hashwrite_be32(f, count);
+ }
+}
+
+static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
+ struct commit **commits, int nr_commits)
+{
+ struct commit **list = commits;
+ int count;
+ for (count = 0; count < nr_commits; count++, list++)
+ hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static const unsigned char *commit_to_sha1(size_t index, void *table)
+{
+ struct commit **commits = table;
+ return commits[index]->object.oid.hash;
+}
+
+static void write_graph_chunk_data(struct hashfile *f, int hash_len,
+ struct commit **commits, int nr_commits)
+{
+ struct commit **list = commits;
+ struct commit **last = commits + nr_commits;
+ uint32_t num_extra_edges = 0;
+
+ while (list < last) {
+ struct commit_list *parent;
+ int edge_value;
+ uint32_t packedDate[2];
+
+ parse_commit(*list);
+ hashwrite(f, (*list)->tree->object.oid.hash, hash_len);
+
+ parent = (*list)->parents;
+
+ if (!parent)
+ edge_value = GRAPH_PARENT_NONE;
+ else {
+ edge_value = sha1_pos(parent->item->object.oid.hash,
+ commits,
+ nr_commits,
+ commit_to_sha1);
+
+ if (edge_value < 0)
+ edge_value = GRAPH_PARENT_MISSING;
+ }
+
+ hashwrite_be32(f, edge_value);
+
+ if (parent)
+ parent = parent->next;
+
+ if (!parent)
+ edge_value = GRAPH_PARENT_NONE;
+ else if (parent->next)
+ edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
+ else {
+ edge_value = sha1_pos(parent->item->object.oid.hash,
+ commits,
+ nr_commits,
+ commit_to_sha1);
+ if (edge_value < 0)
+ edge_value = GRAPH_PARENT_MISSING;
+ }
+
+ hashwrite_be32(f, edge_value);
+
+ if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
+ do {
+ num_extra_edges++;
+ parent = parent->next;
+ } while (parent);
+ }
+
+ if (sizeof((*list)->date) > 4)
+ packedDate[0] = htonl(((*list)->date >> 32) & 0x3);
+ else
+ packedDate[0] = 0;
+
+ packedDate[1] = htonl((*list)->date);
+ hashwrite(f, packedDate, 8);
+
+ list++;
+ }
+}
+
+static void write_graph_chunk_large_edges(struct hashfile *f,
+ struct commit **commits,
+ int nr_commits)
+{
+ struct commit **list = commits;
+ struct commit **last = commits + nr_commits;
+ struct commit_list *parent;
+
+ while (list < last) {
+ int num_parents = 0;
+ for (parent = (*list)->parents; num_parents < 3 && parent;
+ parent = parent->next)
+ num_parents++;
+
+ if (num_parents <= 2) {
+ list++;
+ continue;
+ }
+
+ /* Since num_parents > 2, this initializer is safe. */
+ for (parent = (*list)->parents->next; parent; parent = parent->next) {
+ int edge_value = sha1_pos(parent->item->object.oid.hash,
+ commits,
+ nr_commits,
+ commit_to_sha1);
+
+ if (edge_value < 0)
+ edge_value = GRAPH_PARENT_MISSING;
+ else if (!parent->next)
+ edge_value |= GRAPH_LAST_EDGE;
+
+ hashwrite_be32(f, edge_value);
+ }
+
+ list++;
+ }
+}
+
+static int commit_compare(const void *_a, const void *_b)
+{
+ const struct object_id *a = (const struct object_id *)_a;
+ const struct object_id *b = (const struct object_id *)_b;
+ return oidcmp(a, b);
+}
+
+struct packed_commit_list {
+ struct commit **list;
+ int nr;
+ int alloc;
+};
+
+struct packed_oid_list {
+ struct object_id *list;
+ int nr;
+ int alloc;
+};
+
+static int add_packed_commits(const struct object_id *oid,
+ struct packed_git *pack,
+ uint32_t pos,
+ void *data)
+{
+ struct packed_oid_list *list = (struct packed_oid_list*)data;
+ enum object_type type;
+ off_t offset = nth_packed_object_offset(pack, pos);
+ struct object_info oi = OBJECT_INFO_INIT;
+
+ oi.typep = &type;
+ if (packed_object_info(pack, offset, &oi) < 0)
+ die("unable to get type of object %s", oid_to_hex(oid));
+
+ if (type != OBJ_COMMIT)
+ return 0;
+
+ ALLOC_GROW(list->list, list->nr + 1, list->alloc);
+ oidcpy(&(list->list[list->nr]), oid);
+ list->nr++;
+
+ return 0;
+}
+
+static void add_missing_parents(struct packed_oid_list *oids, struct commit *commit)
+{
+ struct commit_list *parent;
+ for (parent = commit->parents; parent; parent = parent->next) {
+ if (!(parent->item->object.flags & UNINTERESTING)) {
+ ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
+ oidcpy(&oids->list[oids->nr], &(parent->item->object.oid));
+ oids->nr++;
+ parent->item->object.flags |= UNINTERESTING;
+ }
+ }
+}
+
+static void close_reachable(struct packed_oid_list *oids)
+{
+ int i;
+ struct commit *commit;
+
+ for (i = 0; i < oids->nr; i++) {
+ commit = lookup_commit(&oids->list[i]);
+ if (commit)
+ commit->object.flags |= UNINTERESTING;
+ }
+
+ /*
+ * As this loop runs, oids->nr may grow, but not more
+ * than the number of missing commits in the reachable
+ * closure.
+ */
+ for (i = 0; i < oids->nr; i++) {
+ commit = lookup_commit(&oids->list[i]);
+
+ if (commit && !parse_commit(commit))
+ add_missing_parents(oids, commit);
+ }
+
+ for (i = 0; i < oids->nr; i++) {
+ commit = lookup_commit(&oids->list[i]);
+
+ if (commit)
+ commit->object.flags &= ~UNINTERESTING;
+ }
+}
+
+void write_commit_graph(const char *obj_dir,
+ const char **pack_indexes,
+ int nr_packs,
+ const char **commit_hex,
+ int nr_commits,
+ int append)
+{
+ struct packed_oid_list oids;
+ struct packed_commit_list commits;
+ struct hashfile *f;
+ uint32_t i, count_distinct = 0;
+ char *graph_name;
+ int fd;
+ struct lock_file lk = LOCK_INIT;
+ uint32_t chunk_ids[5];
+ uint64_t chunk_offsets[5];
+ int num_chunks;
+ int num_extra_edges;
+ struct commit_list *parent;
+
+ oids.nr = 0;
+ oids.alloc = approximate_object_count() / 4;
+
+ if (append) {
+ prepare_commit_graph_one(obj_dir);
+ if (commit_graph)
+ oids.alloc += commit_graph->num_commits;
+ }
+
+ if (oids.alloc < 1024)
+ oids.alloc = 1024;
+ ALLOC_ARRAY(oids.list, oids.alloc);
+
+ if (append && commit_graph) {
+ for (i = 0; i < commit_graph->num_commits; i++) {
+ const unsigned char *hash = commit_graph->chunk_oid_lookup +
+ commit_graph->hash_len * i;
+ hashcpy(oids.list[oids.nr++].hash, hash);
+ }
+ }
+
+ if (pack_indexes) {
+ struct strbuf packname = STRBUF_INIT;
+ int dirlen;
+ strbuf_addf(&packname, "%s/pack/", obj_dir);
+ dirlen = packname.len;
+ for (i = 0; i < nr_packs; i++) {
+ struct packed_git *p;
+ strbuf_setlen(&packname, dirlen);
+ strbuf_addstr(&packname, pack_indexes[i]);
+ p = add_packed_git(packname.buf, packname.len, 1);
+ if (!p)
+ die("error adding pack %s", packname.buf);
+ if (open_pack_index(p))
+ die("error opening index for %s", packname.buf);
+ for_each_object_in_pack(p, add_packed_commits, &oids);
+ close_pack(p);
+ }
+ strbuf_release(&packname);
+ }
+
+ if (commit_hex) {
+ for (i = 0; i < nr_commits; i++) {
+ const char *end;
+ struct object_id oid;
+ struct commit *result;
+
+ if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+ continue;
+
+ result = lookup_commit_reference_gently(&oid, 1);
+
+ if (result) {
+ ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);
+ oidcpy(&oids.list[oids.nr], &(result->object.oid));
+ oids.nr++;
+ }
+ }
+ }
+
+ if (!pack_indexes && !commit_hex)
+ for_each_packed_object(add_packed_commits, &oids, 0);
+
+ close_reachable(&oids);
+
+ QSORT(oids.list, oids.nr, commit_compare);
+
+ count_distinct = 1;
+ for (i = 1; i < oids.nr; i++) {
+ if (oidcmp(&oids.list[i-1], &oids.list[i]))
+ count_distinct++;
+ }
+
+ if (count_distinct >= GRAPH_PARENT_MISSING)
+ die(_("the commit graph format cannot write %d commits"), count_distinct);
+
+ commits.nr = 0;
+ commits.alloc = count_distinct;
+ ALLOC_ARRAY(commits.list, commits.alloc);
+
+ num_extra_edges = 0;
+ for (i = 0; i < oids.nr; i++) {
+ int num_parents = 0;
+ if (i > 0 && !oidcmp(&oids.list[i-1], &oids.list[i]))
+ continue;
+
+ commits.list[commits.nr] = lookup_commit(&oids.list[i]);
+ parse_commit(commits.list[commits.nr]);
+
+ for (parent = commits.list[commits.nr]->parents;
+ parent; parent = parent->next)
+ num_parents++;
+
+ if (num_parents > 2)
+ num_extra_edges += num_parents - 1;
+
+ commits.nr++;
+ }
+ num_chunks = num_extra_edges ? 4 : 3;
+
+ if (commits.nr >= GRAPH_PARENT_MISSING)
+ die(_("too many commits to write graph"));
+
+ graph_name = get_commit_graph_filename(obj_dir);
+ fd = hold_lock_file_for_update(&lk, graph_name, 0);
+
+ if (fd < 0) {
+ struct strbuf folder = STRBUF_INIT;
+ strbuf_addstr(&folder, graph_name);
+ strbuf_setlen(&folder, strrchr(folder.buf, '/') - folder.buf);
+
+ if (mkdir(folder.buf, 0777) < 0)
+ die_errno(_("cannot mkdir %s"), folder.buf);
+ strbuf_release(&folder);
+
+ fd = hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
+
+ if (fd < 0)
+ die_errno("unable to create '%s'", graph_name);
+ }
+
+ f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+
+ hashwrite_be32(f, GRAPH_SIGNATURE);
+
+ hashwrite_u8(f, GRAPH_VERSION);
+ hashwrite_u8(f, GRAPH_OID_VERSION);
+ hashwrite_u8(f, num_chunks);
+ hashwrite_u8(f, 0); /* unused padding byte */
+
+ chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+ chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+ chunk_ids[2] = GRAPH_CHUNKID_DATA;
+ if (num_extra_edges)
+ chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
+ else
+ chunk_ids[3] = 0;
+ chunk_ids[4] = 0;
+
+ chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+ chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
+ chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
+ chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+ chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+
+ for (i = 0; i <= num_chunks; i++) {
+ uint32_t chunk_write[3];
+
+ chunk_write[0] = htonl(chunk_ids[i]);
+ chunk_write[1] = htonl(chunk_offsets[i] >> 32);
+ chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+ hashwrite(f, chunk_write, 12);
+ }
+
+ write_graph_chunk_fanout(f, commits.list, commits.nr);
+ write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
+ write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+ write_graph_chunk_large_edges(f, commits.list, commits.nr);
+
+ close_commit_graph();
+ finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
+ commit_lock_file(&lk);
+
+ free(oids.list);
+ oids.alloc = 0;
+ oids.nr = 0;
+}
diff --git a/commit-graph.h b/commit-graph.h
new file mode 100644
index 0000000000..e1d8580c98
--- /dev/null
+++ b/commit-graph.h
@@ -0,0 +1,46 @@
+#ifndef COMMIT_GRAPH_H
+#define COMMIT_GRAPH_H
+
+#include "git-compat-util.h"
+
+char *get_commit_graph_filename(const char *obj_dir);
+
+/*
+ * Given a commit struct, try to fill the commit struct info, including:
+ * 1. tree object
+ * 2. date
+ * 3. parents.
+ *
+ * Returns 1 if and only if the commit was found in the packed graph.
+ *
+ * See parse_commit_buffer() for the fallback after this call.
+ */
+int parse_commit_in_graph(struct commit *item);
+
+struct commit_graph {
+ int graph_fd;
+
+ const unsigned char *data;
+ size_t data_len;
+
+ unsigned char hash_len;
+ unsigned char num_chunks;
+ uint32_t num_commits;
+ struct object_id oid;
+
+ const uint32_t *chunk_oid_fanout;
+ const unsigned char *chunk_oid_lookup;
+ const unsigned char *chunk_commit_data;
+ const unsigned char *chunk_large_edges;
+};
+
+struct commit_graph *load_commit_graph_one(const char *graph_file);
+
+void write_commit_graph(const char *obj_dir,
+ const char **pack_indexes,
+ int nr_packs,
+ const char **commit_hex,
+ int nr_commits,
+ int append);
+
+#endif
diff --git a/commit.c b/commit.c
index ca474a7c11..57049118a5 100644
--- a/commit.c
+++ b/commit.c
@@ -1,6 +1,7 @@
#include "cache.h"
#include "tag.h"
#include "commit.h"
+#include "commit-graph.h"
#include "pkt-line.h"
#include "utf8.h"
#include "diff.h"
@@ -383,6 +384,8 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
return -1;
if (item->object.parsed)
return 0;
+ if (parse_commit_in_graph(item))
+ return 0;
buffer = read_object_file(&item->object.oid, &type, &size);
if (!buffer)
return quiet_on_missing ? -1 :
diff --git a/commit.h b/commit.h
index 0fb8271665..e57ae4b583 100644
--- a/commit.h
+++ b/commit.h
@@ -9,6 +9,8 @@
#include "string-list.h"
#include "pretty.h"
+#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
+
struct commit_list {
struct commit *item;
struct commit_list *next;
@@ -21,6 +23,7 @@ struct commit {
timestamp_t date;
struct commit_list *parents;
struct tree *tree;
+ uint32_t graph_pos;
};
extern int save_commit_buffer;
diff --git a/config.c b/config.c
index 5687e2d23e..40d46b3325 100644
--- a/config.c
+++ b/config.c
@@ -1293,6 +1293,11 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.commitgraph")) {
+ core_commit_graph = git_config_bool(var, value);
+ return 0;
+ }
+
if (!strcmp(var, "core.sparsecheckout")) {
core_apply_sparse_checkout = git_config_bool(var, value);
return 0;
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 01dd9ff07a..86a13fca28 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -875,6 +875,7 @@ __git_list_porcelain_commands ()
check-ref-format) : plumbing;;
checkout-index) : plumbing;;
column) : internal helper;;
+ commit-graph) : plumbing;;
commit-tree) : plumbing;;
count-objects) : infrequent;;
credential) : credentials;;
@@ -2345,6 +2346,7 @@ _git_config ()
core.bigFileThreshold
core.checkStat
core.commentChar
+ core.commitGraph
core.compression
core.createObject
core.deltaBaseCacheLimit
diff --git a/csum-file.c b/csum-file.c
index 5eda7fb6af..53ce37f7ca 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -53,7 +53,7 @@ void hashflush(struct hashfile *f)
}
}
-int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
+int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int flags)
{
int fd;
@@ -61,11 +61,11 @@ int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
the_hash_algo->final_fn(f->buffer, &f->ctx);
if (result)
hashcpy(result, f->buffer);
- if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
- /* write checksum and close fd */
+ if (flags & CSUM_HASH_IN_STREAM)
flush(f, f->buffer, the_hash_algo->rawsz);
- if (flags & CSUM_FSYNC)
- fsync_or_die(f->fd, f->name);
+ if (flags & CSUM_FSYNC)
+ fsync_or_die(f->fd, f->name);
+ if (flags & CSUM_CLOSE) {
if (close(f->fd))
die_errno("%s: sha1 file error on close", f->name);
fd = 0;
diff --git a/csum-file.h b/csum-file.h
index 992e5c0141..c5a2e335e7 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -26,14 +26,15 @@ struct hashfile_checkpoint {
extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *);
extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
-/* hashclose flags */
-#define CSUM_CLOSE 1
-#define CSUM_FSYNC 2
+/* finalize_hashfile flags */
+#define CSUM_CLOSE 1
+#define CSUM_FSYNC 2
+#define CSUM_HASH_IN_STREAM 4
extern struct hashfile *hashfd(int fd, const char *name);
extern struct hashfile *hashfd_check(const char *name);
extern struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp);
-extern int hashclose(struct hashfile *, unsigned char *, unsigned int);
+extern int finalize_hashfile(struct hashfile *, unsigned char *, unsigned int);
extern void hashwrite(struct hashfile *, const void *, unsigned int);
extern void hashflush(struct hashfile *f);
extern void crc32_begin(struct hashfile *);
diff --git a/environment.c b/environment.c
index fd970b81bd..80542e2b4f 100644
--- a/environment.c
+++ b/environment.c
@@ -65,6 +65,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
char *notes_ref_name;
int grafts_replace_parents = 1;
+int core_commit_graph;
int core_apply_sparse_checkout;
int merge_log_config = -1;
int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
diff --git a/fast-import.c b/fast-import.c
index 05d1079d23..34edf3fb8f 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -973,7 +973,7 @@ static void end_packfile(void)
struct tag *t;
close_pack_windows(pack_data);
- hashclose(pack_file, cur_pack_oid.hash, 0);
+ finalize_hashfile(pack_file, cur_pack_oid.hash, 0);
fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
pack_data->pack_name, object_count,
cur_pack_oid.hash, pack_size);
diff --git a/git.c b/git.c
index 70798f6837..bab6bbfded 100644
--- a/git.c
+++ b/git.c
@@ -392,6 +392,7 @@ static struct cmd_struct commands[] = {
{ "clone", cmd_clone },
{ "column", cmd_column, RUN_SETUP_GENTLY },
{ "commit", cmd_commit, RUN_SETUP | NEED_WORK_TREE },
+ { "commit-graph", cmd_commit_graph, RUN_SETUP },
{ "commit-tree", cmd_commit_tree, RUN_SETUP | NO_PARSEOPT },
{ "config", cmd_config, RUN_SETUP_GENTLY | DELAY_PAGER_CONFIG },
{ "count-objects", cmd_count_objects, RUN_SETUP },
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 41ae27fb19..9d1bb054bb 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -534,7 +534,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
if (options & BITMAP_OPT_HASH_CACHE)
write_hash_cache(f, index, index_nr);
- hashclose(f, NULL, CSUM_FSYNC);
+ finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
if (adjust_shared_perm(tmp_file.buf))
die_errno("unable to make temporary bitmap file readable");
diff --git a/pack-write.c b/pack-write.c
index d775c7406d..a9d46bc03f 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -170,8 +170,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
}
hashwrite(f, sha1, the_hash_algo->rawsz);
- hashclose(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
- ? CSUM_CLOSE : CSUM_FSYNC));
+ finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE |
+ ((opts->flags & WRITE_IDX_VERIFY)
+ ? 0 : CSUM_FSYNC));
return index_name;
}
diff --git a/packfile.c b/packfile.c
index 0bc67d0e00..6c3ddc3c31 100644
--- a/packfile.c
+++ b/packfile.c
@@ -304,7 +304,7 @@ void close_pack_index(struct packed_git *p)
}
}
-static void close_pack(struct packed_git *p)
+void close_pack(struct packed_git *p)
{
close_pack_windows(p);
close_pack_fd(p);
@@ -1869,7 +1869,7 @@ int has_pack_index(const unsigned char *sha1)
return 1;
}
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
+int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
{
uint32_t i;
int r = 0;
diff --git a/packfile.h b/packfile.h
index a92c0b241c..9c2f885994 100644
--- a/packfile.h
+++ b/packfile.h
@@ -65,6 +65,7 @@ extern void close_pack_index(struct packed_git *);
extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
extern void close_pack_windows(struct packed_git *);
+extern void close_pack(struct packed_git *);
extern void close_all_packs(struct raw_object_store *o);
extern void unuse_pack(struct pack_window **);
extern void clear_delta_base_cache(void);
@@ -154,6 +155,7 @@ typedef int each_packed_object_fn(const struct object_id *oid,
struct packed_git *pack,
uint32_t pos,
void *data);
+extern int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn, void *data);
extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags);
/*
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
new file mode 100755
index 0000000000..a380419b65
--- /dev/null
+++ b/t/t5318-commit-graph.sh
@@ -0,0 +1,224 @@
+#!/bin/sh
+
+test_description='commit graph'
+. ./test-lib.sh
+
+test_expect_success 'setup full repo' '
+ mkdir full &&
+ cd "$TRASH_DIRECTORY/full" &&
+ git init &&
+ git config core.commitGraph true &&
+ objdir=".git/objects"
+'
+
+test_expect_success 'write graph with no packs' '
+ cd "$TRASH_DIRECTORY/full" &&
+ git commit-graph write --object-dir . &&
+ test_path_is_file info/commit-graph
+'
+
+test_expect_success 'create commits and repack' '
+ cd "$TRASH_DIRECTORY/full" &&
+ for i in $(test_seq 3)
+ do
+ test_commit $i &&
+ git branch commits/$i
+ done &&
+ git repack
+'
+
+graph_git_two_modes() {
+ git -c core.graph=true $1 >output
+ git -c core.graph=false $1 >expect
+ test_cmp output expect
+}
+
+graph_git_behavior() {
+ MSG=$1
+ DIR=$2
+ BRANCH=$3
+ COMPARE=$4
+ test_expect_success "check normal git operations: $MSG" '
+ cd "$TRASH_DIRECTORY/$DIR" &&
+ graph_git_two_modes "log --oneline $BRANCH" &&
+ graph_git_two_modes "log --topo-order $BRANCH" &&
+ graph_git_two_modes "log --graph $COMPARE..$BRANCH" &&
+ graph_git_two_modes "branch -vv" &&
+ graph_git_two_modes "merge-base -a $BRANCH $COMPARE"
+ '
+}
+
+graph_git_behavior 'no graph' full commits/3 commits/1
+
+graph_read_expect() {
+ OPTIONAL=""
+ NUM_CHUNKS=3
+ if test ! -z $2
+ then
+ OPTIONAL=" $2"
+ NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
+ fi
+ cat >expect <<- EOF
+ header: 43475048 1 1 $NUM_CHUNKS 0
+ num_commits: $1
+ chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
+ EOF
+ git commit-graph read >output &&
+ test_cmp expect output
+}
+
+test_expect_success 'write graph' '
+ cd "$TRASH_DIRECTORY/full" &&
+ graph1=$(git commit-graph write) &&
+ test_path_is_file $objdir/info/commit-graph &&
+ graph_read_expect "3"
+'
+
+graph_git_behavior 'graph exists' full commits/3 commits/1
+
+test_expect_success 'Add more commits' '
+ cd "$TRASH_DIRECTORY/full" &&
+ git reset --hard commits/1 &&
+ for i in $(test_seq 4 5)
+ do
+ test_commit $i &&
+ git branch commits/$i
+ done &&
+ git reset --hard commits/2 &&
+ for i in $(test_seq 6 7)
+ do
+ test_commit $i &&
+ git branch commits/$i
+ done &&
+ git reset --hard commits/2 &&
+ git merge commits/4 &&
+ git branch merge/1 &&
+ git reset --hard commits/4 &&
+ git merge commits/6 &&
+ git branch merge/2 &&
+ git reset --hard commits/3 &&
+ git merge commits/5 commits/7 &&
+ git branch merge/3 &&
+ git repack
+'
+
+# Current graph structure:
+#
+# __M3___
+# / | \
+# 3 M1 5 M2 7
+# |/ \|/ \|
+# 2 4 6
+# |___/____/
+# 1
+
+test_expect_success 'write graph with merges' '
+ cd "$TRASH_DIRECTORY/full" &&
+ git commit-graph write &&
+ test_path_is_file $objdir/info/commit-graph &&
+ graph_read_expect "10" "large_edges"
+'
+
+graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
+graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3
+graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3
+
+test_expect_success 'Add one more commit' '
+ cd "$TRASH_DIRECTORY/full" &&
+ test_commit 8 &&
+ git branch commits/8 &&
+ ls $objdir/pack | grep idx >existing-idx &&
+ git repack &&
+ ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx
+'
+
+# Current graph structure:
+#
+# 8
+# |
+# __M3___
+# / | \
+# 3 M1 5 M2 7
+# |/ \|/ \|
+# 2 4 6
+# |___/____/
+# 1
+
+graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'write graph with new commit' '
+ cd "$TRASH_DIRECTORY/full" &&
+ git commit-graph write &&
+ test_path_is_file $objdir/info/commit-graph &&
+ graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'write graph with nothing new' '
+ cd "$TRASH_DIRECTORY/full" &&
+ git commit-graph write &&
+ test_path_is_file $objdir/info/commit-graph &&
+ graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'build graph from latest pack with closure' '
+ cd "$TRASH_DIRECTORY/full" &&
+ cat new-idx | git commit-graph write --stdin-packs &&
+ test_path_is_file $objdir/info/commit-graph &&
+ graph_read_expect "9" "large_edges"
+'
+
+graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'build graph from commits with closure' '
+ cd "$TRASH_DIRECTORY/full" &&
+ git tag -a -m "merge" tag/merge merge/2 &&
+ git rev-parse tag/merge >commits-in &&
+ git rev-parse merge/1 >>commits-in &&
+ cat commits-in | git commit-graph write --stdin-commits &&
+ test_path_is_file $objdir/info/commit-graph &&
+ graph_read_expect "6"
+'
+
+graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'build graph from commits with append' '
+ cd "$TRASH_DIRECTORY/full" &&
+ git rev-parse merge/3 | git commit-graph write --stdin-commits --append &&
+ test_path_is_file $objdir/info/commit-graph &&
+ graph_read_expect "10" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
+test_expect_success 'setup bare repo' '
+ cd "$TRASH_DIRECTORY" &&
+ git clone --bare --no-local full bare &&
+ cd bare &&
+ git config core.commitGraph true &&
+ baredir="./objects"
+'
+
+graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2
+
+test_expect_success 'write graph in bare repo' '
+ cd "$TRASH_DIRECTORY/bare" &&
+ git commit-graph write &&
+ test_path_is_file $baredir/info/commit-graph &&
+ graph_read_expect "11" "large_edges"
+'
+
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2
+
+test_done