Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPatrick Steinhardt <psteinhardt@gitlab.com>2023-04-13 15:32:30 +0300
committerPatrick Steinhardt <psteinhardt@gitlab.com>2023-04-13 16:23:44 +0300
commit21d61f7d517a1034ec4de8655a9138a3a3c32103 (patch)
tree2119bc64ac4e8e5d0e44b82c6a41511b77aa2b22
parent3644054ad177816057b76b40526e4338c51facdb (diff)
git/housekeeping: Reduce frequency of full repacks
Due to different reasons we need to perform regular full repacks both in normal repositories and in object pools. These full repacks are guided by a cooldown period so that we'll perform them only in case the last full repack is longer ago than the cooldown period. For object pool, the reason we do full repacks is to refresh deltas so that they again honor our delta islands. This is not all that important to users and should not be noticeable in general when we do this less frequently. Consequentially, we only perform a full repack once every week. For normal repositories full repacks are mostly done in order to guarantee that objects will get evicted into cruft packs so that they can be expired and thus deleted. This _is_ something that both we and our customers care about given that it can be directly equated to disk space that is required. It is thus prudent that we perform this on a more-regular basis so that objects get deleted quickly. That being said, there is interplay between the stale object grace period (which is 14 days) and the cooldown periods (which is 1 day). Effectively, assuming that a repository gets daily optimization jobs, and with the knowledge in mind that we need to perform two full repacks in order to evict an unreachable object, objects will get deleted after 14 to 15 days: 1. The first full repack on days 0 to 1 will evict the unreachable object into a cruft pack. 2. We wait 14 days and will thus land either on day 14 or 15. 3. We perform a second full repack to expire the object part of the cruft pack. This interplay between both periods is important, because it means that we can do compromises between tuning the cooldown period and stale object grace period without actually impacting the median time to deletion: - Increasing the cooldown period means we need to perform less regular full repacks, thus saving on resources. Conversely, decreasing the cooldown means more regular full repacks and thus using more resources. - Increasing the grace period means we'll have a longer time to avoid racy access to Git objects with the downside of more disk space use. Decreasing the grace period means we are more likely to hit racy access to Git objects, but evict objects and thus save disk space more regularly. Now optimizing the cooldown period is something we're very keen to do because it directly impacts how much resources we and our customers need to provision for machines. On the other hand, the grace period is mostly there to avoid racy access to Git objects, and two weeks feels excessive for that. So long story short, this commit changes our strategy to increase the full repack cooldown period to 5 days instead of 1 day while decreasing the stale object grace period from 14 days to 7 days to counteract the longer time-to-deletion for stale objects. This means objects will get deleted 12 to 17 days afer becoming unreachable, with a median value of 14.5 days. This is the exact same median value as previously, so the time-to-deletion should not change in practice. But on the other hand, it does allow us to greatly save on compute resources by reducing the frequency we perform full repacks to one fifth. Furthermore, as the repack cooldown period for normal repositories and object pools are almost the same now, let's merge them so that we have one less special case to think about. Changelog: changed
-rw-r--r--internal/git/housekeeping/optimization_strategy.go46
-rw-r--r--internal/git/housekeeping/optimization_strategy_test.go4
-rw-r--r--internal/git/stats/repository_info.go4
3 files changed, 25 insertions, 29 deletions
diff --git a/internal/git/housekeeping/optimization_strategy.go b/internal/git/housekeeping/optimization_strategy.go
index 290441beb..851702b87 100644
--- a/internal/git/housekeeping/optimization_strategy.go
+++ b/internal/git/housekeeping/optimization_strategy.go
@@ -12,10 +12,7 @@ import (
const (
// FullRepackCooldownPeriod is the cooldown period that needs to pass since the last full
// repack before we consider doing another full repack.
- FullRepackCooldownPeriod = 24 * time.Hour
- // FullRepackCooldownPeriodForPools is the same as FullRepackCooldownPeriod, but specific to
- // object pools.
- FullRepackCooldownPeriodForPools = 7 * 24 * time.Hour
+ FullRepackCooldownPeriod = 5 * 24 * time.Hour
)
// OptimizationStrategy is an interface to determine which parts of a repository should be
@@ -93,12 +90,7 @@ func (s HeuristicalOptimizationStrategy) ShouldRepackObjects(ctx context.Context
// declared as unreachable and when the pruning grace period starts as it impacts
// usage quotas. So with this simple policy we can tell customers that we evict and
// expire unreachable objects on a regular schedule.
- if !s.info.IsObjectPool && nonCruftPackfilesCount > 1 && timeSinceLastFullRepack > FullRepackCooldownPeriod {
- cfg.Strategy = RepackObjectsStrategyFullWithCruft
- cfg.CruftExpireBefore = s.expireBefore
- return true, cfg
- }
-
+ //
// On the other hand, for object pools, we also need to perform regular full
// repacks. The reason is different though, as we don't ever delete objects from
// pool repositories anyway.
@@ -112,21 +104,25 @@ func (s HeuristicalOptimizationStrategy) ShouldRepackObjects(ctx context.Context
// regress over time as new objects are pulled into the pool repository.
//
// So we perform regular full repacks in the repository to ensure that the delta
- // islands will be "freshened" again. As this is nothing that would be visible to
- // the end user (except for performance), it should be fine to perform the repack a
- // lot less frequent than we perform the full repacks in non-object-pools.
- //
- // If geometric repacks ever learn to take delta islands into account we can get rid
- // of this condition and only do geometric repacks.
- if s.info.IsObjectPool && nonCruftPackfilesCount > 1 && timeSinceLastFullRepack > FullRepackCooldownPeriodForPools {
- // Using cruft packs would be pointless here as we don't ever want to expire
- // unreachable objects. And we don't want to explode unreachable objects
- // into loose objects either: for one that'd be inefficient, and second
- // they'd only get soaked up by the next geometric repack anyway.
- //
- // So instead, we do a full repack that appends unreachable objects to the
- // end of the new packfile.
- cfg.Strategy = RepackObjectsStrategyFullWithUnreachable
+ // islands will be "freshened" again. If geometric repacks ever learn to take delta
+ // islands into account we can get rid of this condition and only do geometric
+ // repacks.
+ if nonCruftPackfilesCount > 1 && timeSinceLastFullRepack > FullRepackCooldownPeriod {
+ if s.info.IsObjectPool {
+ // Using cruft packs would be pointless here as we don't ever want
+ // to expire unreachable objects. And we don't want to explode
+ // unreachable objects into loose objects either: for one that'd be
+ // inefficient, and second they'd only get soaked up by the next
+ // geometric repack anyway.
+ //
+ // So instead, we do a full repack that appends unreachable objects
+ // to the end of the new packfile.
+ cfg.Strategy = RepackObjectsStrategyFullWithUnreachable
+ } else {
+ cfg.Strategy = RepackObjectsStrategyFullWithCruft
+ cfg.CruftExpireBefore = s.expireBefore
+ }
+
return true, cfg
}
diff --git a/internal/git/housekeeping/optimization_strategy_test.go b/internal/git/housekeeping/optimization_strategy_test.go
index 5e862b65b..f9c9c50e0 100644
--- a/internal/git/housekeeping/optimization_strategy_test.go
+++ b/internal/git/housekeeping/optimization_strategy_test.go
@@ -193,7 +193,7 @@ func testHeuristicalOptimizationStrategyShouldRepackObjects(t *testing.T, ctx co
// normal repositories, but have a longer grace
// period for the next repack.
Count: 2,
- LastFullRepack: time.Now().Add(-FullRepackCooldownPeriodForPools + time.Hour),
+ LastFullRepack: time.Now().Add(-FullRepackCooldownPeriod + time.Hour),
MultiPackIndex: stats.MultiPackIndexInfo{
Exists: true,
PackfileCount: 2,
@@ -221,7 +221,7 @@ func testHeuristicalOptimizationStrategyShouldRepackObjects(t *testing.T, ctx co
// repositories should get a full repack in case
// they have more than a single packfile.
Count: 2,
- LastFullRepack: time.Now().Add(-FullRepackCooldownPeriodForPools),
+ LastFullRepack: time.Now().Add(-FullRepackCooldownPeriod),
MultiPackIndex: stats.MultiPackIndexInfo{
Exists: true,
PackfileCount: 2,
diff --git a/internal/git/stats/repository_info.go b/internal/git/stats/repository_info.go
index 910fb47b6..aa5de1c84 100644
--- a/internal/git/stats/repository_info.go
+++ b/internal/git/stats/repository_info.go
@@ -20,8 +20,8 @@ import (
const (
// StaleObjectsGracePeriod is time delta that is used to indicate cutoff wherein an object
- // would be considered old. Currently this is set to being 2 weeks (2 * 7days * 24hours).
- StaleObjectsGracePeriod = -14 * 24 * time.Hour
+ // would be considered old. Currently this is set to being 10 days.
+ StaleObjectsGracePeriod = -10 * 24 * time.Hour
// FullRepackTimestampFilename is the name of the file that is used as a timestamp for the
// last repack that happened in the repository. Whenever a full repack happens, Gitaly will