diff options
author | Zeger-Jan van de Weg <git@zjvandeweg.nl> | 2019-04-08 17:45:14 +0300 |
---|---|---|
committer | Zeger-Jan van de Weg <git@zjvandeweg.nl> | 2019-04-15 13:09:29 +0300 |
commit | 2c02a335b917c62dc6caa3213ec4e6f2b5dc1091 (patch) | |
tree | 83a3e1904efc2caddb8abc10be7d7a1c1d15be5c | |
parent | fb06aa871d117443f3012f3e7548e374972e68fd (diff) |
Add technical documentation about delta islands
To prevent only a few people to have the understanding of Gitaly, the
feature is explained on a technical level. Thus it lives in the Gitaly
repository as these are features that end users don't have an interface
to other than their usual interface, `git fetch` or `git clone`.
-rw-r--r-- | doc/README.md | 4 | ||||
-rw-r--r-- | doc/delta_islands.md | 55 |
2 files changed, 59 insertions, 0 deletions
diff --git a/doc/README.md b/doc/README.md index 65217c61c..a3a2b29af 100644 --- a/doc/README.md +++ b/doc/README.md @@ -23,3 +23,7 @@ means that Gitaly is not highly available. How this will be solved is described [in the HA design document](doc/design_ha.md) For configuration please read [praefects configuration documentation](doc/configuration/praefect.md). + +#### Technical explanations + +- [Delta Islands](doc/delta_islands.md) diff --git a/doc/delta_islands.md b/doc/delta_islands.md new file mode 100644 index 000000000..4182f1140 --- /dev/null +++ b/doc/delta_islands.md @@ -0,0 +1,55 @@ +## Delta Islands + +Most of the time, an object in a Git repository is stored as a delta against +another object. This means that if a blob is stored once, and only one line is +changed, the storage requirements for the repository does not come close to the +two individual blobs. This helps Git also, when a client is requesting data during +a fetch. The transmitted bytes again, is much less than the combined size of all +the objects. + +Now when a third blob of the same file is created, it too has its delta +calculated against the second blob, only stored as delta, to itself. These three +blobs now form a delta chain. Git stores these delta chains in [pack files][git-pack], +and when a `repack` is executed these chains might be recalculated if one of the +blobs isn't required anymore, or if a better delta base is discovered. + +In the case of a git fetch, it might happen that the client doesn't have a set +of branches that, during repack, were detected to share many of the same contents +and thus form a delta chain. Git can then decide to send the full delta chain. + +In practice, these delta chains jump between branches, tags, and other refs. When +a client initiates a fetch, it's usually not interested in any of the other +refs. Further more it might create a security issue when objects are shared +between between repositories. This will invalidate the delta chain on disk, and +Git will, and during the fetch request, Git will recalculate the diffs for the +objects later in the chain it does want. + +Delta islands try to solve this by creating islands of objects which the delta +detection algorithm can use to create a delta against. For example, all branches +could be a namespace, or island. When a client fetches, the likelihood of the +chain being valid is much greater. This prevents Git from reconstructing the +full objects, which improves the load on the server and latency for the fetch. + +The drawback of this feature is that the packs on disk are potentially +larger as it's not always the case the optimal object can be used as delta base. + +### In GitLab + +Delta Island relies on Git version 2.20 or later, which GitLab is expected to +use from version 11.11 onwards. The change on the Gitaly side is limited to +setting a config option [when repacking][delta-config]. This option is set at +runtime to prevent having to write the configuration file for all repositories. + +User impact of this feature includes faster fetches, as Git on the server does +less work and reuses previous work better. + +#### Further reading + +As usually the case, the [tests of Git][git-delta-test] provide a good overview +of how the feature works. + +[git-delta-test]: https://github.com/git/git/blob/041f5ea1cf987a4068ef5f39ba0a09be85952064/t/t5320-delta-islands.sh +[git-pack]: https://git-scm.com/docs/git-pack-objects +[delta-mr]: https://gitlab.com/gitlab-org/gitaly/merge_requests/1110 +[delta-config]: https://gitlab.com/gitlab-org/gitaly/merge_requests/1110/diffs#e01aecd9d7ee43aee1959795092f852d07a1e7ed_55_78 + |