diff options
Diffstat (limited to 'doc/administration/gitaly/gitaly_geo_capabilities.md')
-rw-r--r-- | doc/administration/gitaly/gitaly_geo_capabilities.md | 41 |
1 files changed, 41 insertions, 0 deletions
diff --git a/doc/administration/gitaly/gitaly_geo_capabilities.md b/doc/administration/gitaly/gitaly_geo_capabilities.md new file mode 100644 index 00000000000..e4147eec162 --- /dev/null +++ b/doc/administration/gitaly/gitaly_geo_capabilities.md @@ -0,0 +1,41 @@ +--- +stage: Systems +group: Gitaly +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments +--- + +# Gitaly and Geo capabilities + +It is common to want the most available, quickly recoverable, highly performant, +and fully resilient solution for your data. However, there are tradeoffs. + +The following tables are intended to guide you to choose the right combination of capabilities based on your requirements. + +## Gitaly capabilities + +| Capability | Availability | Recoverability | Data Resiliency | Performance | Risks/Trade-offs| +|------------|--------------|----------------|-----------------|-------------|-----------------| +|Gitaly Cluster | Very high - tolerant of node failures | RTO for a single node of 10 s with no manual intervention | Data is stored on multiple nodes | Good - While writes may take slightly longer due to voting, read distribution improves read speeds | **Trade-off** - Slight decrease in write speed for redundant, strongly-consistent storage solution. **Risks** - [Does not support snapshot backups](../gitaly/index.md#snapshot-backup-and-recovery-limitations), GitLab backup task can be slow for large data sets | +|Gitaly Shards | Single storage location is a single point of failure | Would need to restore only shards which failed | Single point of failure | Good - can allocate repositories to shards to spread load | **Trade-off** - Need to manually configure repositories into different shards to balance loads / storage space **Risks** - Single point of failure relies on recovery process when single-node failure occurs | +|Gitaly + NFS | Single storage location is a single point of failure | Single node failure requires restoration from backup | Single point of failure | Average - NFS is not ideally suited to large quantities of small reads / writes which can have a detrimental impact on performance | **Trade-off** - Familiar administration though NFS is not ideally suited to Git demands **Risks** - Many instances of NFS compatibility issues which provide very poor customer experiences | + +## Geo capabilities + +If your availability needs to span multiple zones or multiple locations, read about [Geo](../geo/index.md). + +| Capability | Availability | Recoverability | Data Resiliency | Performance | Risks/Trade-offs| +|------------|--------------|----------------|-----------------|-------------|-----------------| +|Geo| Depends on the architecture of the Geo site. It is possible to deploy secondaries in single and multiple node configurations. | Eventually consistent. Recovery point depends on replication lag, which depends on a number of factors such as network speeds. Geo supports failover from a primary to secondary site using manual commands that are scriptable. | Geo replicates 100% of planned data types and verifies 50%. See [limitations table](../geo/replication/datatypes.md#limitations-on-replicationverification) for more detail. | Improves read/clone times for users of a secondary. | Geo is not intended to replace other backup/restore solutions. Because of replication lag and the possibility of replicating bad data from a primary, customers should also take regular backups of their primary site and test the restore process. | + +## Scenarios for failure modes and available mitigation paths + +The following table outlines failure modes and mitigation paths for the product offerings detailed in the tables above. Note - Gitaly Cluster install assumes an odd number replication factor of 3 or greater + +| Gitaly Mode | Loss of Single Gitaly Node | Application / Data Corruption | Regional Outage (Loss of Instance) | Notes | +| ----------- | -------------------------- | ----------------------------- | ---------------------------------- | ----- | +| Single Gitaly Node | Downtime - Must restore from backup | Downtime - Must restore from Backup | Downtime - Must wait for outage to end | | +| Single Gitaly Node + Geo Secondary | Downtime - Must restore from backup, can perform a manual failover to secondary | Downtime - Must restore from Backup, errors could have propagated to secondary | Manual intervention - failover to Geo secondary | | +| Sharded Gitaly Install | Partial Downtime - Only repositories on impacted node affected, must restore from backup | Partial Downtime - Only repositories on impacted node affected, must restore from backup | Downtime - Must wait for outage to end | | +| Sharded Gitaly Install + Geo Secondary | Partial Downtime - Only repositories on impacted node affected, must restore from backup, could perform manual failover to secondary for impacted repositories | Partial Downtime - Only repositories on impacted node affected, must restore from backup, errors could have propagated to secondary | Manual intervention - failover to Geo secondary | | +| Gitaly Cluster Install* | No Downtime - swaps repository primary to another node after 10 seconds | Not applicable; All writes are voted on by multiple Gitaly Cluster nodes | Downtime - Must wait for outage to end | Snapshot backups for Gitaly Cluster nodes not supported at this time | +| Gitaly Cluster Install* + Geo Secondary | No Downtime - swaps repository primary to another node after 10 seconds | Not applicable; All writes are voted on by multiple Gitaly Cluster nodes | Manual intervention - failover to Geo secondary | Snapshot backups for Gitaly Cluster nodes not supported at this time | |