diff options
author | Yorick Peterse <yorickpeterse@gmail.com> | 2018-06-04 17:20:01 +0300 |
---|---|---|
committer | Yorick Peterse <yorickpeterse@gmail.com> | 2018-06-04 19:33:56 +0300 |
commit | 71ed7987d3a7ab16677cb2b87721f391e8daaf13 (patch) | |
tree | adf08481293e0542eb1ccc7e648dd6281ac6cdcc /lib/gitlab/github_import | |
parent | 9c2961947826442e780285cb551583b09cf6dae9 (diff) |
Perform pull request IO work outside a transaction
When importing a GitHub pull request we would perform all work in a
single database transaction. This is less than ideal, because we perform
various slow Git operations when creating a merge request. This in turn
can lead to many DB connections being used, while just waiting for an IO
operation to complete.
To work around this, we now move most of the heavy lifting out of the
database transaction. Some extra error handling is added to ensure we
can resume importing a partially imported pull request, instead of just
throwing an error.
This commit also changes the specs for IssueImporter so they don't rely
on deprecated RSpec methods.
Diffstat (limited to 'lib/gitlab/github_import')
-rw-r--r-- | lib/gitlab/github_import/importer/pull_request_importer.rb | 56 |
1 files changed, 42 insertions, 14 deletions
diff --git a/lib/gitlab/github_import/importer/pull_request_importer.rb b/lib/gitlab/github_import/importer/pull_request_importer.rb index 49d859f9624..b2f6cb7ad19 100644 --- a/lib/gitlab/github_import/importer/pull_request_importer.rb +++ b/lib/gitlab/github_import/importer/pull_request_importer.rb @@ -22,15 +22,22 @@ module Gitlab end def execute - if (mr_id = create_merge_request) - issuable_finder.cache_database_id(mr_id) + mr, already_exists = create_merge_request + + if mr + insert_git_data(mr, already_exists) + issuable_finder.cache_database_id(mr.id) end end # Creates the merge request and returns its ID. # # This method will return `nil` if the merge request could not be - # created. + # created, otherwise it will return an Array containing the following + # values: + # + # 1. A MergeRequest instance. + # 2. A boolean indicating if the MR already exists. def create_merge_request author_id, author_found = user_finder.author_id_for(pull_request) @@ -69,21 +76,42 @@ module Gitlab merge_request_id = GithubImport .insert_and_return_id(attributes, project.merge_requests) - merge_request = project.merge_requests.find(merge_request_id) - - # These fields are set so we can create the correct merge request - # diffs. - merge_request.source_branch_sha = pull_request.source_branch_sha - merge_request.target_branch_sha = pull_request.target_branch_sha - - merge_request.keep_around_commit - merge_request.merge_request_diffs.create - - merge_request.id + [project.merge_requests.find(merge_request_id), false] end rescue ActiveRecord::InvalidForeignKey # It's possible the project has been deleted since scheduling this # job. In this case we'll just skip creating the merge request. + [] + rescue ActiveRecord::RecordNotUnique + # It's possible we previously created the MR, but failed when updating + # the Git data. In this case we'll just continue working on the + # existing row. + [project.merge_requests.find_by(iid: pull_request.iid), true] + end + + def insert_git_data(merge_request, already_exists = false) + # These fields are set so we can create the correct merge request + # diffs. + merge_request.source_branch_sha = pull_request.source_branch_sha + merge_request.target_branch_sha = pull_request.target_branch_sha + + merge_request.keep_around_commit + + # MR diffs normally use an "after_save" hook to pull data from Git. + # All of this happens in the transaction started by calling + # create/save/etc. This in turn can lead to these transactions being + # held open for much longer than necessary. To work around this we + # first save the diff, then populate it. + diff = + if already_exists + merge_request.merge_request_diffs.take + else + merge_request.merge_request_diffs.build + end + + diff.importing = true + diff.save + diff.save_git_content end end end |