diff options
author | Taylor Blau <me@ttaylorr.com> | 2023-04-25 01:20:10 +0300 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2023-04-25 02:01:28 +0300 |
commit | 52acddf36c8cb3778ab2098a0d95cc2e375a4069 (patch) | |
tree | dbd7ee8cff346ef7cc39f7aba4f23db6211b46b6 /t/t0063-string-list.sh | |
parent | 9857273be005833c71e2d16ba48e193113e12276 (diff) |
string-list: multi-delimiter `string_list_split_in_place()`
Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.
Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.
When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.
This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:
string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1)
would yield a string list of:
["foo", "", "", "bar", "", "", "baz"]
Callers that wish to emulate the behavior of strtok(2) more directly
should call `string_list_remove_empty_items()` after splitting.
To avoid regressions for the new multi-character delimter cases, update
t0063 in this patch as well.
[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t0063-string-list.sh')
-rwxr-xr-x | t/t0063-string-list.sh | 51 |
1 files changed, 51 insertions, 0 deletions
diff --git a/t/t0063-string-list.sh b/t/t0063-string-list.sh index 46d4839194..1fee6d9010 100755 --- a/t/t0063-string-list.sh +++ b/t/t0063-string-list.sh @@ -18,6 +18,14 @@ test_split () { " } +test_split_in_place() { + cat >expected && + test_expect_success "split (in place) $1 at $2, max $3" " + test-tool string-list split_in_place '$1' '$2' '$3' >actual && + test_cmp expected actual + " +} + test_split "foo:bar:baz" ":" "-1" <<EOF 3 [0]: "foo" @@ -61,6 +69,49 @@ test_split ":" ":" "-1" <<EOF [1]: "" EOF +test_split_in_place "foo:;:bar:;:baz:;:" ":;" "-1" <<EOF +10 +[0]: "foo" +[1]: "" +[2]: "" +[3]: "bar" +[4]: "" +[5]: "" +[6]: "baz" +[7]: "" +[8]: "" +[9]: "" +EOF + +test_split_in_place "foo:;:bar:;:baz" ":;" "0" <<EOF +1 +[0]: "foo:;:bar:;:baz" +EOF + +test_split_in_place "foo:;:bar:;:baz" ":;" "1" <<EOF +2 +[0]: "foo" +[1]: ";:bar:;:baz" +EOF + +test_split_in_place "foo:;:bar:;:baz" ":;" "2" <<EOF +3 +[0]: "foo" +[1]: "" +[2]: ":bar:;:baz" +EOF + +test_split_in_place "foo:;:bar:;:" ":;" "-1" <<EOF +7 +[0]: "foo" +[1]: "" +[2]: "" +[3]: "bar" +[4]: "" +[5]: "" +[6]: "" +EOF + test_expect_success "test filter_string_list" ' test "x-" = "x$(test-tool string-list filter - y)" && test "x-" = "x$(test-tool string-list filter no y)" && |