diff options
author | Phillip Wood <phillip.wood@dunelm.org.uk> | 2023-03-20 19:10:00 +0300 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2023-03-20 20:58:53 +0300 |
commit | 1f2e05f0b794d9e4b1cf07d63c9efd1325893ecc (patch) | |
tree | 10c0e7e8ec953611b620a05392ddd49e6dcc484a /t/t3070-wildmatch.sh | |
parent | 768bb238c4843bf52847773a621de4dffa6b9ab5 (diff) |
wildmatch: fix exponential behavior
When dowild() cannot match a '*' or '/**/' wildcard then it must return
WM_ABORT_TO_STARSTAR or WM_ABORT_ALL respectively. Failure to observe
this results in unnecessary backtracking and the time taken for a failed
match increases exponentially with the number of wildcards in the
pattern [1]. Unfortunately in some instances dowild() returns WM_NOMATCH
for a failed match resulting in long match times for patterns containing
multiple wildcards as can be seen in the following benchmark.
(Note that the timings in the Benchmark 1 are really measuring the time
to execute test-tool rather than the time to match the pattern)
Benchmark 1: t/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a"
Time (mean ± σ): 22.8 ms ± 1.7 ms [User: 12.1 ms, System: 10.6 ms]
Range (min … max): 19.4 ms … 26.9 ms 113 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: t/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a*a*a*a*a*a*a*a*a"
Time (mean ± σ): 5.244 s ± 0.228 s [User: 5.229 s, System: 0.010 s]
Range (min … max): 4.969 s … 5.707 s 10 runs
Warning: Ignoring non-zero exit code.
Summary
't/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a"' ran
230.37 ± 20.04 times faster than 't/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a*a*a*a*a*a*a*a*a"'
The security implications are limited as it only affects operations that
are potentially DoS vectors. For example by creating a blob containing
such a pattern a malicious user can exploit this behavior to use large
amounts of CPU time on a remote server by pushing the blob and then
creating a new clone with --filter=sparse:oid. However this filter type
is usually disabled as it is known to consume large amounts of CPU time
even without this bug.
The WM_MATCH changed in the first hunk of this patch comes from the
original implementation imported from rsync in 5230f605e1 (Import
wildmatch from rsync, 2012-10-15). Compared to the others converted here
it is fairly harmless as it only triggers at the end of the pattern and
so will only cause a single unnecessary backtrack. The others introduced
by 6f1a31f0aa (wildmatch: advance faster in <asterisk> + <literal>
patterns, 2013-01-01) and 46983441ae (wildmatch: make a special case for
"*/" with FNM_PATHNAME, 2013-01-01) are more pernicious and will cause
exponential behavior.
A new test is added to protect against future regressions.
[1] https://research.swtch.com/glob
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t3070-wildmatch.sh')
-rwxr-xr-x | t/t3070-wildmatch.sh | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index 5d871fde96..b91a7cb712 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -431,4 +431,13 @@ match 1 1 1 1 'a' '[B-a]' match 0 1 0 1 'z' '[Z-y]' match 1 1 1 1 'Z' '[Z-y]' +test_expect_success 'matching does not exhibit exponential behavior' ' + test-tool wildmatch wildmatch \ + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab \ + "*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a" & + pid=$! && + sleep 2 && + ! kill $! +' + test_done |