Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitaly.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorZeger-Jan van de Weg <git@zjvandeweg.nl>2018-04-24 10:11:24 +0300
committerZeger-Jan van de Weg <git@zjvandeweg.nl>2018-05-28 10:04:43 +0300
commit34f435ae69b7c88025d40f273e1ccfeba979f5bb (patch)
treefced9dc9040fab6a0f57c7475ad6dc9e0960b1d5
parent307f8ab481eeae2ab2a3d69f9aeadbfabc1140d9 (diff)
Use Go-Git for the FindCommit RPC
FindCommit is called very often, to an extend that it's a problem to have all these requests go through Gitaly at the moment. This made the Gitaly team invest a lot of time in clientside N + 1 problems. Eventhough this was fruitful, the optimalisations weren't enough to bring the number of RPC/s down to a level where the RPC could be called 100% of the time. The current way of obtaining the commit information is by shelling out to the git binary, using `git log -z` with extensive use of format options. Shelling out comes at a runtime cost, and by using a native Golang implementation of git this cost could be avoided. The parent commit introduced src-d/go-git as a dependency. The intent is to swap out the git implemenation without the need for any proto, or client-side changes, and also be fully compatible with the shelling out. Things to check, before these commits can be merged to master include: 1. Shelling out includes `GIT_OBJECT_DIRECTORY` and `GIT_ALTERNATE_OBJECT_DIRECTORY`, and sets the values in the execution environment. To what extend FindCommit requires these values, and how to set these values when using go-git are unanswered questions at the moment. 2. The full test suite of GitLab-CE and GitLab-EE should be able to pass with mininal changes to those codebases. The main reason to swap out implemenations is performance, so gitaly-bench was updated to be able benchmark the FindCommit RPC in: https://gitlab.com/gitlab-org/gitaly-bench/merge_requests/5 Both Gitaly's were started with the same configuration, apart from the port. The shell out implemenation was listening on :9999, the go-git implementation on :19999. Output is truncated, but the commands are not for reproducibilty. ``` $ go version go version go1.10.1 darwin/amd64 $ go run gitaly-bench.go -iterations 100 -repo gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit Stats: Average: 0.000000 Total requests: 1000 Elapsed Time (sec): 17.3454 Average QPS: 57.65 Errors: 0 Percent errors: 0.00 $ go run gitaly-bench.go -iterations 100 -repo gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit Stats: Average: 0.000000 Total requests: 1000 Elapsed Time (sec): 0.9546 Average QPS: 1047.55 Errors: 0 Percent errors: 0.00 $ go run gitaly-bench.go -iterations 100 -repo gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit -revision "4a24d82dbca5c11c61556f3b35ca472b7463187e" Stats: Average: 0.000000 Total requests: 1000 Elapsed Time (sec): 17.7700 Average QPS: 56.27 Errors: 0 Percent errors: 0.00 $ go run gitaly-bench.go -iterations 100 -repo gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit -revision "4a24d82dbca5c11c61556f3b35ca472b7463187e" Stats: Average: 0.000000 Total requests: 1000 Elapsed Time (sec): 1.3640 Average QPS: 733.12 Errors: 0 Percent errors: 0.00 $ go run gitaly-bench.go -iterations 100 -repo gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit -revision "HEAD~25" Stats: Average: 0.000000 Total requests: 1000 Elapsed Time (sec): 17.5492 Average QPS: 56.98 Errors: 0 Percent errors: 0.00 $ go run gitaly-bench.go -iterations 100 -repo gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit -revision "HEAD~25" Stats: Average: 0.000000 Total requests: 1000 Elapsed Time (sec): 3.2684 Average QPS: 305.96 Errors: 0 Percent errors: 0.00 $ go run gitaly-bench.go -iterations 100 -repo gitlab-org/gitlab-test.git -host tcp://localhost:9999 find-commit -revision "feature_conflict" Stats: Average: 0.000000 Total requests: 1000 Elapsed Time (sec): 18.8795 Average QPS: 52.97 Errors: 0 Percent errors: 0.00 $ go run gitaly-bench.go -iterations 100 -repo gitlab-org/gitlab-test.git -host tcp://localhost:19999 find-commit -revision "feature_conflict" Stats: Average: 0.000000 Total requests: 1000 Elapsed Time (sec): 1.4138 Average QPS: 707.33 Errors: 0 Percent errors: 0.00 ``` Data shows that across the board the go-git implementation is faster than shelling out to the git binary. The order of magnitude faster varies strongly however. From being 18 times faster, to 'just' 5 times faster. The difference can be explained by the type of revision that is passed as argument to the RPC. The relative revision, HEAD~25, is slowest, my asumption being that the implemenation of walking the history is not optimal. The shelling out implemenation is highly consistent in its timings. This change has one notable side effect; logging is greatly reduced, as shelling out is limited. The internal wrappers around shelling out log heavily, improving visibilty.
-rw-r--r--changelogs/unreleased/zj-go-git-find-commit.yml5
-rw-r--r--internal/adapters/gogit/commit.go80
-rw-r--r--internal/adapters/gogit/commit_test.go107
-rw-r--r--internal/featureflag/grpc_header.go8
-rw-r--r--internal/service/commit/find_commit.go65
-rw-r--r--internal/service/commit/find_commit_test.go24
6 files changed, 280 insertions, 9 deletions
diff --git a/changelogs/unreleased/zj-go-git-find-commit.yml b/changelogs/unreleased/zj-go-git-find-commit.yml
new file mode 100644
index 000000000..e8b3dfd6b
--- /dev/null
+++ b/changelogs/unreleased/zj-go-git-find-commit.yml
@@ -0,0 +1,5 @@
+---
+title: Use Go-Git for the FindCommit RPC
+merge_request: 691
+author:
+type: added
diff --git a/internal/adapters/gogit/commit.go b/internal/adapters/gogit/commit.go
new file mode 100644
index 000000000..2d044b96b
--- /dev/null
+++ b/internal/adapters/gogit/commit.go
@@ -0,0 +1,80 @@
+package gogit
+
+import (
+ "strings"
+
+ "github.com/golang/protobuf/ptypes/timestamp"
+ pb "gitlab.com/gitlab-org/gitaly-proto/go"
+ "gitlab.com/gitlab-org/gitaly/internal/helper"
+ "google.golang.org/grpc/codes"
+ "google.golang.org/grpc/status"
+
+ git "gopkg.in/src-d/go-git.v4"
+ "gopkg.in/src-d/go-git.v4/plumbing"
+ "gopkg.in/src-d/go-git.v4/plumbing/object"
+)
+
+// FindCommit resolves the revision and returns the commit found
+func FindCommit(repoPath, revision string) (*pb.GitCommit, error) {
+ repo, err := git.PlainOpen(repoPath)
+ if err != nil {
+ return nil, status.Error(codes.Internal, "FindCommit: unabled to open the repository")
+ }
+
+ h, err := repo.ResolveRevision(plumbing.Revision(revision))
+ if err != nil {
+ return nil, err
+ }
+
+ commit, err := object.GetCommit(repo.Storer, *h)
+ if err != nil {
+ return nil, err
+ }
+
+ pbCommit := commitToProtoCommit(commit)
+
+ return pbCommit, nil
+}
+
+func commitToProtoCommit(commit *object.Commit) *pb.GitCommit {
+ // Ripped from internal/git/log/commitmessage.go getCommitMessage()
+ // It seems to send the same bits twice?
+ var subject string
+ if split := strings.SplitN(commit.Message, "\n", 2); len(split) == 2 {
+ subject = strings.TrimRight(split[0], "\r\n")
+ }
+ body := commit.Message
+
+ parentIds := make([]string, len(commit.ParentHashes))
+ for i, hash := range commit.ParentHashes {
+ parentIds[i] = hash.String()
+ }
+
+ author := pb.CommitAuthor{
+ Name: []byte(commit.Author.Name),
+ Email: []byte(commit.Author.Email),
+ Date: &timestamp.Timestamp{Seconds: commit.Author.When.Unix()},
+ }
+ committer := pb.CommitAuthor{
+ Name: []byte(commit.Committer.Name),
+ Email: []byte(commit.Author.Email),
+ Date: &timestamp.Timestamp{Seconds: commit.Author.When.Unix()},
+ }
+
+ byteBody := []byte(body)
+ newCommit := &pb.GitCommit{
+ Id: commit.ID().String(),
+ Subject: []byte(subject),
+ Body: byteBody,
+ BodySize: int64(len(byteBody)),
+ Author: &author,
+ Committer: &committer,
+ ParentIds: parentIds,
+ }
+
+ if threshold := helper.MaxCommitOrTagMessageSize; int(newCommit.BodySize) > threshold {
+ newCommit.Body = newCommit.Body[:threshold]
+ }
+
+ return newCommit
+}
diff --git a/internal/adapters/gogit/commit_test.go b/internal/adapters/gogit/commit_test.go
new file mode 100644
index 000000000..a363f4ddb
--- /dev/null
+++ b/internal/adapters/gogit/commit_test.go
@@ -0,0 +1,107 @@
+package gogit
+
+import (
+ "testing"
+
+ "github.com/golang/protobuf/ptypes/timestamp"
+ "github.com/stretchr/testify/assert"
+ "github.com/stretchr/testify/require"
+ pb "gitlab.com/gitlab-org/gitaly-proto/go"
+ "gitlab.com/gitlab-org/gitaly/internal/testhelper"
+)
+
+func TestFindCommit(t *testing.T) {
+ _, repoPath, cleanupFn := testhelper.NewTestRepo(t)
+ defer cleanupFn()
+
+ testCases := []struct {
+ revision string
+ commit *pb.GitCommit
+ }{
+ {
+ revision: "7975be0116940bf2ad4321f79d02a55c5f7779aa",
+ commit: &pb.GitCommit{
+ Id: "7975be0116940bf2ad4321f79d02a55c5f7779aa",
+ Subject: []byte("Merge branch 'rd-add-file-larger-than-1-mb' into 'master'"),
+ Body: []byte("Merge branch 'rd-add-file-larger-than-1-mb' into 'master'\n\nAdd file larger than 1 mb\n\nSee merge request gitlab-org/gitlab-test!32"),
+ Author: &pb.CommitAuthor{
+ Name: []byte("Douwe Maan"),
+ Email: []byte("douwe@gitlab.com"),
+ Date: &timestamp.Timestamp{Seconds: 1523259684},
+ },
+ Committer: &pb.CommitAuthor{
+ Name: []byte("Douwe Maan"),
+ Email: []byte("douwe@gitlab.com"),
+ Date: &timestamp.Timestamp{Seconds: 1523259684},
+ },
+ ParentIds: []string{"60ecb67744cb56576c30214ff52294f8ce2def98", "c84ff944ff4529a70788a5e9003c2b7feae29047"},
+ BodySize: 129,
+ },
+ },
+ {
+ revision: "gitaly-windows-1251",
+ commit: &pb.GitCommit{
+ Id: "c809470461118b7bcab850f6e9a7ca97ac42f8ea",
+ Subject: []byte("\304\356\341\340\342\350\362\374 \364\340\351\353\373 \342 \352\356\344\350\360\356\342\352\340\365 Windows-1251 \350 UTF-8"),
+ Body: []byte("\304\356\341\340\342\350\362\374 \364\340\351\353\373 \342 \352\356\344\350\360\356\342\352\340\365 Windows-1251 \350 UTF-8\n"),
+ Author: &pb.CommitAuthor{
+ Name: []byte("Jacob Vosmaer"),
+ Email: []byte("jacob@gitlab.com"),
+ Date: &timestamp.Timestamp{Seconds: 1512132977},
+ },
+ Committer: &pb.CommitAuthor{
+ Name: []byte("Jacob Vosmaer"),
+ Email: []byte("jacob@gitlab.com"),
+ Date: &timestamp.Timestamp{Seconds: 1512132977},
+ },
+ ParentIds: []string{"e63f41fe459e62e1228fcef60d7189127aeba95a"},
+ BodySize: 49,
+ },
+ },
+ {
+ revision: "gitaly-non-utf8-commit",
+ commit: &pb.GitCommit{
+ Id: "0999bb770f8dc92ab5581cc0b474b3e31a96bf5c",
+ Subject: []byte("Hello\360world"),
+ Body: []byte("Hello\360world\n"),
+ Author: &pb.CommitAuthor{
+ Name: []byte("Jacob Vosmaer"),
+ Email: []byte("jacob@gitlab.com"),
+ Date: &timestamp.Timestamp{Seconds: 1517328273},
+ },
+ Committer: &pb.CommitAuthor{
+ Name: []byte("Jacob Vosmaer"),
+ Email: []byte("jacob@gitlab.com"),
+ Date: &timestamp.Timestamp{Seconds: 1517328273},
+ },
+ ParentIds: []string{"60ecb67744cb56576c30214ff52294f8ce2def98"},
+ BodySize: 12,
+ },
+ },
+ }
+
+ for _, tc := range testCases {
+ foundCommit, err := FindCommit(repoPath, tc.revision)
+ require.NoError(t, err)
+
+ assert.Equal(t, tc.commit, foundCommit)
+ }
+}
+
+func TestFindCommitFailures(t *testing.T) {
+ _, repoPath, cleanupFn := testhelper.NewTestRepo(t)
+ defer cleanupFn()
+
+ revisions := []string{
+ // Branch resolving to a commit with the same sha: https://github.com/src-d/go-git/issues/823
+ "1942eed5cc108b19c7405106e81fa96125d0be22",
+ // Anotated tags: https://github.com/src-d/go-git/issues/772
+ "v1.0.0",
+ }
+
+ for _, rev := range revisions {
+ _, err := FindCommit(repoPath, rev)
+
+ assert.Errorf(t, err, "no ref found", "")
+ }
+}
diff --git a/internal/featureflag/grpc_header.go b/internal/featureflag/grpc_header.go
index 8d63fb41f..28d3c978c 100644
--- a/internal/featureflag/grpc_header.go
+++ b/internal/featureflag/grpc_header.go
@@ -19,8 +19,7 @@ func IsEnabled(ctx context.Context, flag string) bool {
return false
}
- headerKey := fmt.Sprintf("gitaly-feature-%s", flag)
- val, ok := md[headerKey]
+ val, ok := md[HeaderKey(flag)]
if !ok {
return false
}
@@ -32,3 +31,8 @@ func IsEnabled(ctx context.Context, flag string) bool {
func IsDisabled(ctx context.Context, flag string) bool {
return !IsEnabled(ctx, flag)
}
+
+// HeaderKey returns the feature flag key to be used in the metadata map
+func HeaderKey(flag string) string {
+ return fmt.Sprintf("gitaly-feature-%s", flag)
+}
diff --git a/internal/service/commit/find_commit.go b/internal/service/commit/find_commit.go
index d56f3eb13..6c73687a3 100644
--- a/internal/service/commit/find_commit.go
+++ b/internal/service/commit/find_commit.go
@@ -1,24 +1,81 @@
package commit
import (
- "gitlab.com/gitlab-org/gitaly/internal/git/log"
-
pb "gitlab.com/gitlab-org/gitaly-proto/go"
+
+ "github.com/prometheus/client_golang/prometheus"
+ "gitlab.com/gitlab-org/gitaly/internal/adapters/gogit"
+ "gitlab.com/gitlab-org/gitaly/internal/featureflag"
"gitlab.com/gitlab-org/gitaly/internal/git"
+ "gitlab.com/gitlab-org/gitaly/internal/git/alternates"
+ "gitlab.com/gitlab-org/gitaly/internal/git/log"
+
"golang.org/x/net/context"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
+var (
+ findCommitRequests = prometheus.NewCounterVec(
+ prometheus.CounterOpts{
+ Name: "gitaly_find_commit_requests_total",
+ Help: "Counter of FindCommit requests, separated by backend",
+ },
+ []string{"backend", "status"},
+ )
+)
+
+func init() {
+ prometheus.MustRegister(findCommitRequests)
+}
+
func (s *server) FindCommit(ctx context.Context, in *pb.FindCommitRequest) (*pb.FindCommitResponse, error) {
if err := git.ValidateRevision(in.GetRevision()); err != nil {
return nil, status.Errorf(codes.InvalidArgument, "FindCommit: revision: %v", err)
}
- commit, err := log.GetCommit(ctx, in.GetRepository(), string(in.GetRevision()), "")
+ repo := in.GetRepository()
+ revision := in.GetRevision()
+
+ if featureflag.IsEnabled(ctx, "gogit-findcommit") {
+ commit, err := gogitFindCommit(repo, revision)
+ if err == nil {
+ findCommitRequests.WithLabelValues("go-git", "OK").Inc()
+ return &pb.FindCommitResponse{Commit: commit}, nil
+ }
+ findCommitRequests.WithLabelValues("go-git", "Fail").Inc()
+ }
+
+ commit, err := shelloutFindCommit(ctx, repo, revision)
+ if err == nil {
+ findCommitRequests.WithLabelValues("spawn-git", "OK").Inc()
+ } else {
+ findCommitRequests.WithLabelValues("spawn-git", "Fail").Inc()
+ }
+
+ return &pb.FindCommitResponse{Commit: commit}, err
+}
+
+func gogitFindCommit(repo *pb.Repository, revision []byte) (*pb.GitCommit, error) {
+ repoPath, _, err := alternates.PathAndEnv(repo)
+ if err != nil {
+ return nil, status.Error(codes.InvalidArgument, "FindCommit: repository path not found")
+ }
+
+ var commit *pb.GitCommit
+ commit, err = gogit.FindCommit(repoPath, string(revision))
+ if err != nil {
+ return nil, err
+ }
+
+ return commit, nil
+}
+
+func shelloutFindCommit(ctx context.Context, repo *pb.Repository, revision []byte) (*pb.GitCommit, error) {
+ commit, err := log.GetCommit(ctx, repo, string(revision), "")
if err != nil {
return nil, err
}
- return &pb.FindCommitResponse{Commit: commit}, nil
+ return commit, err
}
diff --git a/internal/service/commit/find_commit_test.go b/internal/service/commit/find_commit_test.go
index ab5b8d0a8..6e8b522ec 100644
--- a/internal/service/commit/find_commit_test.go
+++ b/internal/service/commit/find_commit_test.go
@@ -8,11 +8,13 @@ import (
"github.com/golang/protobuf/ptypes/timestamp"
"github.com/stretchr/testify/require"
pb "gitlab.com/gitlab-org/gitaly-proto/go"
+ "gitlab.com/gitlab-org/gitaly/internal/featureflag"
"gitlab.com/gitlab-org/gitaly/internal/git/log"
"gitlab.com/gitlab-org/gitaly/internal/helper"
"gitlab.com/gitlab-org/gitaly/internal/testhelper"
"golang.org/x/net/context"
"google.golang.org/grpc/codes"
+ "google.golang.org/grpc/metadata"
)
func TestSuccessfulFindCommitRequest(t *testing.T) {
@@ -206,6 +208,7 @@ func TestSuccessfulFindCommitRequest(t *testing.T) {
},
}
+ allCommits := []*pb.GitCommit{}
for _, testCase := range testCases {
t.Run(testCase.description, func(t *testing.T) {
request := &pb.FindCommitRequest{
@@ -216,13 +219,28 @@ func TestSuccessfulFindCommitRequest(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
response, err := client.FindCommit(ctx, request)
- if err != nil {
- t.Fatal(err)
- }
+ require.NoError(t, err)
require.Equal(t, testCase.commit, response.Commit, "mismatched commits")
+ allCommits = append(allCommits, response.Commit)
})
}
+
+ ctx = metadata.NewOutgoingContext(
+ ctx,
+ metadata.New(map[string]string{featureflag.HeaderKey("gogit-findcommit"): "true"}),
+ )
+
+ for i, testCase := range testCases {
+ request := &pb.FindCommitRequest{
+ Repository: testRepo,
+ Revision: []byte(testCase.revision),
+ }
+
+ response, err := client.FindCommit(ctx, request)
+ require.NoError(t, err)
+ require.Equal(t, allCommits[i], response.Commit)
+ }
}
func TestFailedFindCommitRequest(t *testing.T) {