diff options
author | Zeger-Jan van de Weg <git@zjvandeweg.nl> | 2018-04-24 10:11:24 +0300 |
---|---|---|
committer | Zeger-Jan van de Weg <git@zjvandeweg.nl> | 2018-05-28 10:04:43 +0300 |
commit | 34f435ae69b7c88025d40f273e1ccfeba979f5bb (patch) | |
tree | fced9dc9040fab6a0f57c7475ad6dc9e0960b1d5 | |
parent | 307f8ab481eeae2ab2a3d69f9aeadbfabc1140d9 (diff) |
Use Go-Git for the FindCommit RPC
FindCommit is called very often, to an extend that it's a problem to
have all these requests go through Gitaly at the moment. This made the
Gitaly team invest a lot of time in clientside N + 1 problems. Eventhough
this was fruitful, the optimalisations weren't enough to bring the number
of RPC/s down to a level where the RPC could be called 100% of the time.
The current way of obtaining the commit information is by shelling out
to the git binary, using `git log -z` with extensive use of format
options. Shelling out comes at a runtime cost, and by using a native
Golang implementation of git this cost could be avoided. The parent
commit introduced src-d/go-git as a dependency.
The intent is to swap out the git implemenation without the need for any
proto, or client-side changes, and also be fully compatible with the
shelling out. Things to check, before these commits can be merged to
master include:
1. Shelling out includes `GIT_OBJECT_DIRECTORY` and
`GIT_ALTERNATE_OBJECT_DIRECTORY`, and sets the values in the execution
environment. To what extend FindCommit requires these values, and how to
set these values when using go-git are unanswered questions at the
moment.
2. The full test suite of GitLab-CE and GitLab-EE should be able to pass
with mininal changes to those codebases.
The main reason to swap out implemenations is performance, so
gitaly-bench was updated to be able benchmark the FindCommit RPC in:
https://gitlab.com/gitlab-org/gitaly-bench/merge_requests/5
Both Gitaly's were started with the same configuration, apart from the
port. The shell out implemenation was listening on :9999, the go-git
implementation on :19999. Output is truncated, but the commands are not
for reproducibilty.
```
$ go version
go version go1.10.1 darwin/amd64
$ go run gitaly-bench.go -iterations 100 -repo
gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
Stats:
Average: 0.000000
Total requests: 1000
Elapsed Time (sec): 17.3454
Average QPS: 57.65
Errors: 0
Percent errors: 0.00
$ go run gitaly-bench.go -iterations 100 -repo
gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
Stats:
Average: 0.000000
Total requests: 1000
Elapsed Time (sec): 0.9546
Average QPS: 1047.55
Errors: 0
Percent errors: 0.00
$ go run gitaly-bench.go -iterations 100 -repo
gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
-revision "4a24d82dbca5c11c61556f3b35ca472b7463187e"
Stats:
Average: 0.000000
Total requests: 1000
Elapsed Time (sec): 17.7700
Average QPS: 56.27
Errors: 0
Percent errors: 0.00
$ go run gitaly-bench.go -iterations 100 -repo
gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
-revision "4a24d82dbca5c11c61556f3b35ca472b7463187e"
Stats:
Average: 0.000000
Total requests: 1000
Elapsed Time (sec): 1.3640
Average QPS: 733.12
Errors: 0
Percent errors: 0.00
$ go run gitaly-bench.go -iterations 100 -repo
gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
-revision "HEAD~25"
Stats:
Average: 0.000000
Total requests: 1000
Elapsed Time (sec): 17.5492
Average QPS: 56.98
Errors: 0
Percent errors: 0.00
$ go run gitaly-bench.go -iterations 100 -repo
gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
-revision "HEAD~25"
Stats:
Average: 0.000000
Total requests: 1000
Elapsed Time (sec): 3.2684
Average QPS: 305.96
Errors: 0
Percent errors: 0.00
$ go run gitaly-bench.go -iterations 100 -repo
gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit
-revision "feature_conflict"
Stats:
Average: 0.000000
Total requests: 1000
Elapsed Time (sec): 18.8795
Average QPS: 52.97
Errors: 0
Percent errors: 0.00
$ go run gitaly-bench.go -iterations 100 -repo
gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit
-revision "feature_conflict"
Stats:
Average: 0.000000
Total requests: 1000
Elapsed Time (sec): 1.4138
Average QPS: 707.33
Errors: 0
Percent errors: 0.00
```
Data shows that across the board the go-git implementation is faster
than shelling out to the git binary. The order of magnitude faster
varies strongly however. From being 18 times faster, to 'just' 5 times
faster. The difference can be explained by the type of revision that is
passed as argument to the RPC. The relative revision, HEAD~25, is
slowest, my asumption being that the implemenation of walking the
history is not optimal. The shelling out implemenation is highly
consistent in its timings.
This change has one notable side effect; logging is greatly reduced,
as shelling out is limited. The internal wrappers around shelling out
log heavily, improving visibilty.
-rw-r--r-- | changelogs/unreleased/zj-go-git-find-commit.yml | 5 | ||||
-rw-r--r-- | internal/adapters/gogit/commit.go | 80 | ||||
-rw-r--r-- | internal/adapters/gogit/commit_test.go | 107 | ||||
-rw-r--r-- | internal/featureflag/grpc_header.go | 8 | ||||
-rw-r--r-- | internal/service/commit/find_commit.go | 65 | ||||
-rw-r--r-- | internal/service/commit/find_commit_test.go | 24 |
6 files changed, 280 insertions, 9 deletions
diff --git a/changelogs/unreleased/zj-go-git-find-commit.yml b/changelogs/unreleased/zj-go-git-find-commit.yml new file mode 100644 index 000000000..e8b3dfd6b --- /dev/null +++ b/changelogs/unreleased/zj-go-git-find-commit.yml @@ -0,0 +1,5 @@ +--- +title: Use Go-Git for the FindCommit RPC +merge_request: 691 +author: +type: added diff --git a/internal/adapters/gogit/commit.go b/internal/adapters/gogit/commit.go new file mode 100644 index 000000000..2d044b96b --- /dev/null +++ b/internal/adapters/gogit/commit.go @@ -0,0 +1,80 @@ +package gogit + +import ( + "strings" + + "github.com/golang/protobuf/ptypes/timestamp" + pb "gitlab.com/gitlab-org/gitaly-proto/go" + "gitlab.com/gitlab-org/gitaly/internal/helper" + "google.golang.org/grpc/codes" + "google.golang.org/grpc/status" + + git "gopkg.in/src-d/go-git.v4" + "gopkg.in/src-d/go-git.v4/plumbing" + "gopkg.in/src-d/go-git.v4/plumbing/object" +) + +// FindCommit resolves the revision and returns the commit found +func FindCommit(repoPath, revision string) (*pb.GitCommit, error) { + repo, err := git.PlainOpen(repoPath) + if err != nil { + return nil, status.Error(codes.Internal, "FindCommit: unabled to open the repository") + } + + h, err := repo.ResolveRevision(plumbing.Revision(revision)) + if err != nil { + return nil, err + } + + commit, err := object.GetCommit(repo.Storer, *h) + if err != nil { + return nil, err + } + + pbCommit := commitToProtoCommit(commit) + + return pbCommit, nil +} + +func commitToProtoCommit(commit *object.Commit) *pb.GitCommit { + // Ripped from internal/git/log/commitmessage.go getCommitMessage() + // It seems to send the same bits twice? + var subject string + if split := strings.SplitN(commit.Message, "\n", 2); len(split) == 2 { + subject = strings.TrimRight(split[0], "\r\n") + } + body := commit.Message + + parentIds := make([]string, len(commit.ParentHashes)) + for i, hash := range commit.ParentHashes { + parentIds[i] = hash.String() + } + + author := pb.CommitAuthor{ + Name: []byte(commit.Author.Name), + Email: []byte(commit.Author.Email), + Date: ×tamp.Timestamp{Seconds: commit.Author.When.Unix()}, + } + committer := pb.CommitAuthor{ + Name: []byte(commit.Committer.Name), + Email: []byte(commit.Author.Email), + Date: ×tamp.Timestamp{Seconds: commit.Author.When.Unix()}, + } + + byteBody := []byte(body) + newCommit := &pb.GitCommit{ + Id: commit.ID().String(), + Subject: []byte(subject), + Body: byteBody, + BodySize: int64(len(byteBody)), + Author: &author, + Committer: &committer, + ParentIds: parentIds, + } + + if threshold := helper.MaxCommitOrTagMessageSize; int(newCommit.BodySize) > threshold { + newCommit.Body = newCommit.Body[:threshold] + } + + return newCommit +} diff --git a/internal/adapters/gogit/commit_test.go b/internal/adapters/gogit/commit_test.go new file mode 100644 index 000000000..a363f4ddb --- /dev/null +++ b/internal/adapters/gogit/commit_test.go @@ -0,0 +1,107 @@ +package gogit + +import ( + "testing" + + "github.com/golang/protobuf/ptypes/timestamp" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + pb "gitlab.com/gitlab-org/gitaly-proto/go" + "gitlab.com/gitlab-org/gitaly/internal/testhelper" +) + +func TestFindCommit(t *testing.T) { + _, repoPath, cleanupFn := testhelper.NewTestRepo(t) + defer cleanupFn() + + testCases := []struct { + revision string + commit *pb.GitCommit + }{ + { + revision: "7975be0116940bf2ad4321f79d02a55c5f7779aa", + commit: &pb.GitCommit{ + Id: "7975be0116940bf2ad4321f79d02a55c5f7779aa", + Subject: []byte("Merge branch 'rd-add-file-larger-than-1-mb' into 'master'"), + Body: []byte("Merge branch 'rd-add-file-larger-than-1-mb' into 'master'\n\nAdd file larger than 1 mb\n\nSee merge request gitlab-org/gitlab-test!32"), + Author: &pb.CommitAuthor{ + Name: []byte("Douwe Maan"), + Email: []byte("douwe@gitlab.com"), + Date: ×tamp.Timestamp{Seconds: 1523259684}, + }, + Committer: &pb.CommitAuthor{ + Name: []byte("Douwe Maan"), + Email: []byte("douwe@gitlab.com"), + Date: ×tamp.Timestamp{Seconds: 1523259684}, + }, + ParentIds: []string{"60ecb67744cb56576c30214ff52294f8ce2def98", "c84ff944ff4529a70788a5e9003c2b7feae29047"}, + BodySize: 129, + }, + }, + { + revision: "gitaly-windows-1251", + commit: &pb.GitCommit{ + Id: "c809470461118b7bcab850f6e9a7ca97ac42f8ea", + Subject: []byte("\304\356\341\340\342\350\362\374 \364\340\351\353\373 \342 \352\356\344\350\360\356\342\352\340\365 Windows-1251 \350 UTF-8"), + Body: []byte("\304\356\341\340\342\350\362\374 \364\340\351\353\373 \342 \352\356\344\350\360\356\342\352\340\365 Windows-1251 \350 UTF-8\n"), + Author: &pb.CommitAuthor{ + Name: []byte("Jacob Vosmaer"), + Email: []byte("jacob@gitlab.com"), + Date: ×tamp.Timestamp{Seconds: 1512132977}, + }, + Committer: &pb.CommitAuthor{ + Name: []byte("Jacob Vosmaer"), + Email: []byte("jacob@gitlab.com"), + Date: ×tamp.Timestamp{Seconds: 1512132977}, + }, + ParentIds: []string{"e63f41fe459e62e1228fcef60d7189127aeba95a"}, + BodySize: 49, + }, + }, + { + revision: "gitaly-non-utf8-commit", + commit: &pb.GitCommit{ + Id: "0999bb770f8dc92ab5581cc0b474b3e31a96bf5c", + Subject: []byte("Hello\360world"), + Body: []byte("Hello\360world\n"), + Author: &pb.CommitAuthor{ + Name: []byte("Jacob Vosmaer"), + Email: []byte("jacob@gitlab.com"), + Date: ×tamp.Timestamp{Seconds: 1517328273}, + }, + Committer: &pb.CommitAuthor{ + Name: []byte("Jacob Vosmaer"), + Email: []byte("jacob@gitlab.com"), + Date: ×tamp.Timestamp{Seconds: 1517328273}, + }, + ParentIds: []string{"60ecb67744cb56576c30214ff52294f8ce2def98"}, + BodySize: 12, + }, + }, + } + + for _, tc := range testCases { + foundCommit, err := FindCommit(repoPath, tc.revision) + require.NoError(t, err) + + assert.Equal(t, tc.commit, foundCommit) + } +} + +func TestFindCommitFailures(t *testing.T) { + _, repoPath, cleanupFn := testhelper.NewTestRepo(t) + defer cleanupFn() + + revisions := []string{ + // Branch resolving to a commit with the same sha: https://github.com/src-d/go-git/issues/823 + "1942eed5cc108b19c7405106e81fa96125d0be22", + // Anotated tags: https://github.com/src-d/go-git/issues/772 + "v1.0.0", + } + + for _, rev := range revisions { + _, err := FindCommit(repoPath, rev) + + assert.Errorf(t, err, "no ref found", "") + } +} diff --git a/internal/featureflag/grpc_header.go b/internal/featureflag/grpc_header.go index 8d63fb41f..28d3c978c 100644 --- a/internal/featureflag/grpc_header.go +++ b/internal/featureflag/grpc_header.go @@ -19,8 +19,7 @@ func IsEnabled(ctx context.Context, flag string) bool { return false } - headerKey := fmt.Sprintf("gitaly-feature-%s", flag) - val, ok := md[headerKey] + val, ok := md[HeaderKey(flag)] if !ok { return false } @@ -32,3 +31,8 @@ func IsEnabled(ctx context.Context, flag string) bool { func IsDisabled(ctx context.Context, flag string) bool { return !IsEnabled(ctx, flag) } + +// HeaderKey returns the feature flag key to be used in the metadata map +func HeaderKey(flag string) string { + return fmt.Sprintf("gitaly-feature-%s", flag) +} diff --git a/internal/service/commit/find_commit.go b/internal/service/commit/find_commit.go index d56f3eb13..6c73687a3 100644 --- a/internal/service/commit/find_commit.go +++ b/internal/service/commit/find_commit.go @@ -1,24 +1,81 @@ package commit import ( - "gitlab.com/gitlab-org/gitaly/internal/git/log" - pb "gitlab.com/gitlab-org/gitaly-proto/go" + + "github.com/prometheus/client_golang/prometheus" + "gitlab.com/gitlab-org/gitaly/internal/adapters/gogit" + "gitlab.com/gitlab-org/gitaly/internal/featureflag" "gitlab.com/gitlab-org/gitaly/internal/git" + "gitlab.com/gitlab-org/gitaly/internal/git/alternates" + "gitlab.com/gitlab-org/gitaly/internal/git/log" + "golang.org/x/net/context" "google.golang.org/grpc/codes" "google.golang.org/grpc/status" ) +var ( + findCommitRequests = prometheus.NewCounterVec( + prometheus.CounterOpts{ + Name: "gitaly_find_commit_requests_total", + Help: "Counter of FindCommit requests, separated by backend", + }, + []string{"backend", "status"}, + ) +) + +func init() { + prometheus.MustRegister(findCommitRequests) +} + func (s *server) FindCommit(ctx context.Context, in *pb.FindCommitRequest) (*pb.FindCommitResponse, error) { if err := git.ValidateRevision(in.GetRevision()); err != nil { return nil, status.Errorf(codes.InvalidArgument, "FindCommit: revision: %v", err) } - commit, err := log.GetCommit(ctx, in.GetRepository(), string(in.GetRevision()), "") + repo := in.GetRepository() + revision := in.GetRevision() + + if featureflag.IsEnabled(ctx, "gogit-findcommit") { + commit, err := gogitFindCommit(repo, revision) + if err == nil { + findCommitRequests.WithLabelValues("go-git", "OK").Inc() + return &pb.FindCommitResponse{Commit: commit}, nil + } + findCommitRequests.WithLabelValues("go-git", "Fail").Inc() + } + + commit, err := shelloutFindCommit(ctx, repo, revision) + if err == nil { + findCommitRequests.WithLabelValues("spawn-git", "OK").Inc() + } else { + findCommitRequests.WithLabelValues("spawn-git", "Fail").Inc() + } + + return &pb.FindCommitResponse{Commit: commit}, err +} + +func gogitFindCommit(repo *pb.Repository, revision []byte) (*pb.GitCommit, error) { + repoPath, _, err := alternates.PathAndEnv(repo) + if err != nil { + return nil, status.Error(codes.InvalidArgument, "FindCommit: repository path not found") + } + + var commit *pb.GitCommit + commit, err = gogit.FindCommit(repoPath, string(revision)) + if err != nil { + return nil, err + } + + return commit, nil +} + +func shelloutFindCommit(ctx context.Context, repo *pb.Repository, revision []byte) (*pb.GitCommit, error) { + commit, err := log.GetCommit(ctx, repo, string(revision), "") if err != nil { return nil, err } - return &pb.FindCommitResponse{Commit: commit}, nil + return commit, err } diff --git a/internal/service/commit/find_commit_test.go b/internal/service/commit/find_commit_test.go index ab5b8d0a8..6e8b522ec 100644 --- a/internal/service/commit/find_commit_test.go +++ b/internal/service/commit/find_commit_test.go @@ -8,11 +8,13 @@ import ( "github.com/golang/protobuf/ptypes/timestamp" "github.com/stretchr/testify/require" pb "gitlab.com/gitlab-org/gitaly-proto/go" + "gitlab.com/gitlab-org/gitaly/internal/featureflag" "gitlab.com/gitlab-org/gitaly/internal/git/log" "gitlab.com/gitlab-org/gitaly/internal/helper" "gitlab.com/gitlab-org/gitaly/internal/testhelper" "golang.org/x/net/context" "google.golang.org/grpc/codes" + "google.golang.org/grpc/metadata" ) func TestSuccessfulFindCommitRequest(t *testing.T) { @@ -206,6 +208,7 @@ func TestSuccessfulFindCommitRequest(t *testing.T) { }, } + allCommits := []*pb.GitCommit{} for _, testCase := range testCases { t.Run(testCase.description, func(t *testing.T) { request := &pb.FindCommitRequest{ @@ -216,13 +219,28 @@ func TestSuccessfulFindCommitRequest(t *testing.T) { ctx, cancel := context.WithCancel(context.Background()) defer cancel() response, err := client.FindCommit(ctx, request) - if err != nil { - t.Fatal(err) - } + require.NoError(t, err) require.Equal(t, testCase.commit, response.Commit, "mismatched commits") + allCommits = append(allCommits, response.Commit) }) } + + ctx = metadata.NewOutgoingContext( + ctx, + metadata.New(map[string]string{featureflag.HeaderKey("gogit-findcommit"): "true"}), + ) + + for i, testCase := range testCases { + request := &pb.FindCommitRequest{ + Repository: testRepo, + Revision: []byte(testCase.revision), + } + + response, err := client.FindCommit(ctx, request) + require.NoError(t, err) + require.Equal(t, allCommits[i], response.Commit) + } } func TestFailedFindCommitRequest(t *testing.T) { |