From a8722750985a53cc502a66ae3d68a9e42c7fdb98 Mon Sep 17 00:00:00 2001 From: Jeff King Date: Wed, 27 Aug 2014 13:01:28 -0400 Subject: teach fast-export an --anonymize option Sometimes users want to report a bug they experience on their repository, but they are not at liberty to share the contents of the repository. It would be useful if they could produce a repository that has a similar shape to its history and tree, but without leaking any information. This "anonymized" repository could then be shared with developers (assuming it still replicates the original problem). This patch implements an "--anonymize" option to fast-export, which generates a stream that can recreate such a repository. Producing a single stream makes it easy for the caller to verify that they are not leaking any useful information. You can get an overview of what will be shared by running a command like: git fast-export --anonymize --all | perl -pe 's/\d+/X/g' | sort -u | less which will show every unique line we generate, modulo any numbers (each anonymized token is assigned a number, like "User 0", and we replace it consistently in the output). In addition to anonymizing, this produces test cases that are relatively small (compared to the original repository) and fast to generate (compared to using filter-branch, or modifying the output of fast-export yourself). Here are numbers for git.git: $ time git fast-export --anonymize --all \ --tag-of-filtered-object=drop >output real 0m2.883s user 0m2.828s sys 0m0.052s $ gzip output $ ls -lh output.gz | awk '{print $5}' 2.9M Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- Documentation/git-fast-export.txt | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation/git-fast-export.txt') diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index 221506b04b..52831faca9 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -105,6 +105,12 @@ marks the same across runs. in the commit (as opposed to just listing the files which are different from the commit's first parent). +--anonymize:: + Replace all refnames, paths, blob contents, commit and tag + messages, names, and email addresses in the output with + anonymized data, while still retaining the shape of history and + of the stored tree. + --refspec:: Apply the specified refspec to each ref exported. Multiple of them can be specified. -- cgit v1.2.3