Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/dir.h
AgeCommit message (Collapse)Author
2023-10-10dir.[ch]: add 'follow_symlink' arg to 'get_dtype'Victoria Dye
Add a 'follow_symlink' boolean option to 'get_type()'. If 'follow_symlink' is enabled, DT_LNK (in addition to DT_UNKNOWN) d_types triggers the stat-based d_type resolution, using 'stat' instead of 'lstat' to get the type of the followed symlink. Note that symlinks are not followed recursively, so a symlink pointing to another symlink will still resolve to DT_LNK. Update callers in 'diagnose.c' to specify 'follow_symlink = 0' to preserve current behavior. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10dir.[ch]: expose 'get_dtype'Victoria Dye
Move 'get_dtype()' from 'diagnose.c' to 'dir.c' and add its declaration to 'dir.h' so that it is accessible to callers in other files. The function and its documentation are moved verbatim except for a small addition to the description clarifying what the 'path' arg represents. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-06Merge branch 'cw/strbuf-cleanup'Junio C Hamano
Move functions that are not about pure string manipulation out of strbuf.[ch] * cw/strbuf-cleanup: strbuf: remove global variable path: move related function to path object-name: move related functions to object-name credential-store: move related functions to credential-store file abspath: move related functions to abspath strbuf: clarify dependency strbuf: clarify API boundary
2023-06-30Merge branch 'en/header-split-cache-h-part-3'Junio C Hamano
Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...
2023-06-21hash-ll, hashmap: move oidhash() to hash-llElijah Newren
oidhash() was used by both hashmap and khash, which makes sense. However, the location of this function in hashmap.[ch] meant that khash.h had to depend upon hashmap.h, making people unfamiliar with khash think that it was built upon hashmap. (Or at least, I personally was confused for a while about this in the past.) Move this function to hash-ll, so that khash.h can stop depending upon hashmap.h. This has another benefit as well: it allows us to remove hashmap.h's dependency on hash-ll.h. While some callers of hashmap.h were making use of oidhash, most were not, so this change provides another way to reduce the number of includes. Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12object-name: move related functions to object-nameCalvin Wan
Move object-name-related functions from strbuf.[ch] to object-name.[ch] so that strbuf is focused on string manipulation routines with minimal dependencies. dir.h relied on the forward declration of the repository struct in strbuf.h. Since that is removed in this patch, add the forward declaration to dir.h. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12statinfo.h: move DTYPE defines from dir.hAlejandro R. Sedeño
592fc5b3 (dir.h: move DTYPE defines from cache.h, 2023-04-22) moved DTYPE macros from cache.h to dir.h, but they are still used by cache.h to implement ce_to_dtype(); cache.h cannot include dir.h because that would cause name-hash.c to have two different and conflicting definitions of `struct dir_entry`. (That should be separately fixed.) Both dir.h and cache.h include statinfo.h, and this seems a reasonable place for these definitions. This change fixes a broken build issue on old SunOS. Signed-off-by: Alejandro R. Sedeño <asedeno@mit.edu> Signed-off-by: Alejandro R Sedeño <asedeno@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24dir.h: move DTYPE defines from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-18Merge branch 'en/header-cleanup'Junio C Hamano
Code clean-up to clarify the rule that "git-compat-util.h" must be the first to be included. * en/header-cleanup: diff.h: remove unnecessary include of object.h Remove unnecessary includes of builtin.h treewide: replace cache.h with more direct headers, where possible replace-object.h: move read_replace_refs declaration from cache.h to here object-store.h: move struct object_info from cache.h dir.h: refactor to no longer need to include cache.h object.h: stop depending on cache.h; make cache.h depend on object.h ident.h: move ident-related declarations out of cache.h pretty.h: move has_non_ascii() declaration from commit.h cache.h: remove dependence on hex.h; make other files include it explicitly hex.h: move some hex-related declarations from cache.h hash.h: move some oid-related declarations from cache.h alloc.h: move ALLOC_GROW() functions from cache.h treewide: remove unnecessary cache.h includes in source files treewide: remove unnecessary cache.h includes treewide: remove unnecessary git-compat-util.h includes in headers treewide: ensure one of the appropriate headers is sourced first
2023-02-27dir: mark output only fields of dir_struct as suchElijah Newren
While at it, also group these fields together for convenience. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27dir: add a usage note to exclude_per_dirElijah Newren
As evidenced by the fix a couple commits ago, places in the code using exclude_per_dir are likely buggy and should be adapted to call setup_standard_excludes() instead. Unfortunately, the usage of exclude_per_dir has been hardcoded into the arguments ls-files accepts, so we cannot actually remove it. Add a note that it is deprecated and no other callers should use it directly. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27dir: separate public from internal portion of dir_structElijah Newren
In order to make it clearer to callers what portions of dir_struct are public API, and avoid errors from them setting fields that are meant as internal API, split the fields used for internal implementation reasons into a separate embedded struct. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24dir.h: refactor to no longer need to include cache.hElijah Newren
Moving a few functions around allows us to make dir.h no longer need to include cache.h. This commit is best viewed with: git log -1 -p --color-moved Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache"Ævar Arnfjörð Bjarmason
When the "ident" member of the structure was added in 1e8fef609e7 (untracked cache: guard and disable on system changes, 2015-03-08) this function wasn't updated to free it. Let's do so. Let's also free the "exclude_per_dir" memory we've been leaking since[1], while making sure not to free() the constant ".gitignore" string we add by default[2]. As we now have three struct members we're freeing let's change free_untracked_cache() to return early if "uc" isn't defined. We won't hand it to free() now, but that was just for convenience, once we're dealing with >=2 struct members this pattern is more convenient. 1. f9e6c649589 (untracked cache: load from UNTR index extension, 2015-03-08) 2. 039bc64e886 (core.excludesfile clean-up, 2007-11-14) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-08-19match_pathname(): drop unused "flags" parameterJeff King
This field has not been used since the function was introduced in b559263216 (exclude: split pathname matching code into a separate function, 2012-10-15), though there was a brief period where it was erroneously used and then reverted in ed4958477b (dir: fix pattern matching on dirs, 2021-09-24) and 5ceb663e92 (dir: fix directory-matching bug, 2021-11-02). It's possible we'd eventually add a flag that makes it useful here, but there are only a handful of callers. It would be easy to add back if necessary, and in the meantime this makes the function interface less misleading. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-17dir API: add a generalized path_match_flags() functionÆvar Arnfjörð Bjarmason
Add a path_match_flags() function and have the two sets of starts_with_dot_{,dot_}slash() functions added in 63e95beb085 (submodule: port resolve_relative_url from shell to C, 2016-04-15) and a2b26ffb1a8 (fsck: convert gitmodules url to URL passed to curl, 2020-04-18) be thin wrappers for it. As the latter of those notes the fsck version was copied from the initial builtin/submodule--helper.c version. Since the code added in a2b26ffb1a8 was doing really doing the same as win32_is_dir_sep() added in 1cadad6f658 (git clone <url> C:\cygwin\home\USER\repo' is working (again), 2018-12-15) let's move the latter to git-compat-util.h is a is_xplatform_dir_sep(). We can then call either it or the platform-specific is_dir_sep() from this new function. Let's likewise change code in various other places that was hardcoding checks for "'/' || '\\'" with the new is_xplatform_dir_sep(). As can be seen in those callers some of them still concern themselves with ':' (Mac OS classic?), but let's leave the question of whether that should be consolidated for some other time. As we expect to make wider use of the "native" case in the future, define and use two starts_with_dot_{,dot_}slash_native() convenience wrappers. This makes the diff in builtin/submodule--helper.c much smaller. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-10dir: new flag to remove_dir_recurse() to spare the original_cwdElijah Newren
remove_dir_recurse(), and its non-static wrapper called remove_dir_recursively(), both take flags for modifying its behavior. As with the previous commits, we would generally like to protect the original_cwd, but we want to forced user commands (e.g. 'git rm -rf ...') or other special cases to remove it. Add a flag for this purpose. After reading through every caller of remove_dir_recursively() in the current codebase, there was only one that should be adjusted and that one only in a very unusual circumstance. Add a pair of new testcases to highlight that very specific case involving submodules && --git-dir && --work-tree. Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-10dir: avoid incidentally removing the original_cwd in remove_path()Elijah Newren
Modern git often tries to avoid leaving empty directories around when removing files. Originally, it did not bother. This behavior started with commit 80e21a9ed809 (merge-recursive::removeFile: remove empty directories, 2005-11-19), stating the reason simply as: When the last file in a directory is removed as the result of a merge, try to rmdir the now-empty directory. This was reimplemented in C and renamed to remove_path() in commit e1b3a2cad7 ("Build-in merge-recursive", 2008-02-07), but was still internal to merge-recursive. This trend towards removing leading empty directories continued with commit d9b814cc97f1 (Add builtin "git rm" command, 2006-05-19), which stated the reasoning as: The other question is what to do with leading directories. The old "git rm" script didn't do anything, which is somewhat inconsistent. This one will actually clean up directories that have become empty as a result of removing the last file, but maybe we want to have a flag to decide the behaviour? remove_path() in dir.c was added in 4a92d1bfb784 (Add remove_path: a function to remove as much as possible of a path, 2008-09-27), because it was noted that we had two separate implementations of the same idea AND both were buggy. It described the purpose of the function as a function to remove as much as possible of a path Why remove as much as possible? Well, at the time we probably would have said something like: * removing leading directories makes things feel tidy * removing leading directories doesn't hurt anything so long as they had no files in them. But I don't believe those reasons hold when the empty directory happens to be the current working directory we inherited from our parent process. Leaving the parent process in a deleted directory can cause user confusion when subsequent processes fail: any git command, for example, will immediately fail with fatal: Unable to read current working directory: No such file or directory Other commands may similarly get confused. Modify remove_path() so that the empty leading directories it also deletes does not include the current working directory we inherited from our parent process. I have looked through every caller of remove_path() in the current codebase to make sure that all should take this change. Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-21Merge branch 'ds/sparse-index-ignored-files'Junio C Hamano
In cone mode, the sparse-index code path learned to remove ignored files (like build artifacts) outside the sparse cone, allowing the entire directory outside the sparse cone to be removed, which is especially useful when the sparse patterns change. * ds/sparse-index-ignored-files: sparse-checkout: clear tracked sparse dirs sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag attr: be careful about sparse directories sparse-checkout: create helper methods sparse-index: use WRITE_TREE_MISSING_OK sparse-index: silently return when cache tree fails unpack-trees: fix nested sparse-dir search sparse-index: silently return when not using cone-mode patterns t7519: rewrite sparse index test
2021-09-08sparse-checkout: create helper methodsDerrick Stolee
As we integrate the sparse index into more builtins, we occasionally need to check the sparse-checkout patterns to see if a path is within the sparse-checkout cone. Create some helper methods that help initialize the patterns and check for pattern matching to make this easier. The existing callers of commands like get_sparse_checkout_patterns() use a custom 'struct pattern_list' that is not necessarily the one in the 'struct index_state', so there are not many previous uses that could adopt these helpers. There are just two in builtin/add.c and sparse-index.c that can use path_in_sparse_checkout(). We add a path_in_cone_mode_sparse_checkout() as well that will only return false if the path is outside of the sparse-checkout definition _and_ the sparse-checkout patterns are in cone mode. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-10dir: libify and export helper functions from clone.cAtharva Raykar
These functions can be useful to other parts of Git. Let's move them to dir.c, while renaming them to be make their functionality more explicit. Signed-off-by: Atharva Raykar <raykar.ath@gmail.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Shourya Shukla <periperidip@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-28Merge branch 'ew/many-alternate-optim'Junio C Hamano
Optimization for repositories with many alternate object store. * ew/many-alternate-optim: oidtree: a crit-bit tree for odb_loose_cache oidcpy_with_padding: constify `src' arg make object_directory.loose_objects_subdir_seen a bitmap avoid strlen via strbuf_addstr in link_alt_odb_entry speed up alt_odb_usable() with many alternates
2021-07-08speed up alt_odb_usable() with many alternatesEric Wong
With many alternates, the duplicate check in alt_odb_usable() wastes many cycles doing repeated fspathcmp() on every existing alternate. Use a khash to speed up lookups by odb->path. Since the kh_put_* API uses the supplied key without duplicating it, we also take advantage of it to replace both xstrdup() and strbuf_release() in link_alt_odb_entry() with strbuf_detach() to avoid the allocation and copy. In a test repository with 50K alternates and each of those 50K alternates having one alternate each (for a total of 100K total alternates); this speeds up lookup of a non-existent blob from over 16 minutes to roughly 2.7 seconds on my busy workstation. Note: all underlying git object directories were small and unpacked with only loose objects and no packs. Having to load packs increases times significantly. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-01dir.[ch]: replace dir_init() with DIR_INITÆvar Arnfjörð Bjarmason
Remove the dir_init() function and replace it with a DIR_INIT macro. In many cases in the codebase we need to initialize things with a function for good reasons, e.g. needing to call another function on initialization. The "dir_init()" function was not one such case, and could trivially be replaced with a more idiomatic macro initialization pattern. The only place where we made use of its use of memset() was in dir_clear() itself, which resets the contents of an an existing struct pointer. Let's use the new "memcpy() a 'blank' struct on the stack" idiom to do that reset. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20Merge branch 'en/dir-traversal'Junio C Hamano
"git clean" and "git ls-files -i" had confusion around working on or showing ignored paths inside an ignored directory, which has been corrected. * en/dir-traversal: dir: introduce readdir_skip_dot_and_dotdot() helper dir: update stale description of treat_directory() dir: traverse into untracked directories if they may have ignored subfiles dir: avoid unnecessary traversal into ignored directory t3001, t7300: add testcase showcasing missed directory traversal t7300: add testcase showing unnecessary traversal into ignored directory ls-files: error out on -i unless -o or -c are specified dir: report number of visited directories and paths with trace2 dir: convert trace calls to trace2 equivalents
2021-05-13dir: introduce readdir_skip_dot_and_dotdot() helperElijah Newren
Many places in the code were doing while ((d = readdir(dir)) != NULL) { if (is_dot_or_dotdot(d->d_name)) continue; ...process d... } Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner: while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) { ...process d... } This helper particularly simplifies checks for empty directories. Also use this helper in read_cached_dir() so that our statistics are consistent across platforms. (In other words, read_cached_dir() should have been using is_dot_or_dotdot() and skipping such entries, but did not and left it to treat_path() to detect and mark such entries as path_none.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13dir: report number of visited directories and paths with trace2Elijah Newren
Provide more statistics in trace2 output that include the number of directories and total paths visited by the directory traversal logic. Subsequent patches will take advantage of this to ensure we do not unnecessarily traverse into ignored directories. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'ds/sparse-index-protections'Junio C Hamano
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-04-14*: remove 'const' qualifier for struct index_stateDerrick Stolee
Several methods specify that they take a 'struct index_state' pointer with the 'const' qualifier because they intend to only query the data, not change it. However, we will be introducing a step very low in the method stack that might modify a sparse-index to become a full index in the case that our queries venture inside a sparse-directory entry. This change only removes the 'const' qualifiers that are necessary for the following change which will actually modify the implementation of index_name_stage_pos(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16exclude: add flags parameter to add_patterns()Jeff King
There are a number of callers of add_patterns() and its sibling functions. Let's give them a "flags" parameter for adding new options without having to touch each caller. We'll use this in a future patch to add O_NOFOLLOW support. But for now each caller just passes 0. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-24sparse-checkout: load sparse-checkout patternsDerrick Stolee
A future feature will want to load the sparse-checkout patterns into a pattern_list, but the current mechanism to do so is a bit complicated. This is made difficult due to needing to find the sparse-checkout file in different ways throughout the codebase. The logic implemented in the new get_sparse_checkout_patterns() was duplicated in populate_from_existing_patterns() in unpack-trees.c. Use the new method instead, keeping the logic around handling the struct unpack_trees_options. The callers to get_sparse_checkout_filename() in builtin/sparse-checkout.c manipulate the sparse-checkout file directly, so it is not appropriate to replace logic in that file with get_sparse_checkout_patterns(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19dir: fix problematic API to avoid memory leaksElijah Newren
The dir structure seemed to have a number of leaks and problems around it. First I noticed that parent_hashmap and recursive_hashmap were being leaked (though Peff noticed and submitted fixes before me). Then I noticed in the previous commit that clear_directory() was only taking responsibility for a subset of fields within dir_struct, despite the fact that entries[] and ignored[] we allocated internally to dir.c. That, of course, resulted in many callers either leaking or haphazardly trying to free these arrays and their contents. Digging further, I found that despite the pretty clear documentation near the top of dir.h that folks were supposed to call clear_directory() when the user no longer needed the dir_struct, there were four callers that didn't bother doing that at all. However, two of them clearly thought about leaks since they had an UNLEAK(dir) directive, which to me suggests that the method to free the data was too unclear. I suspect the non-obviousness of the API and its holes led folks to avoid it, which then snowballed into further problems with the entries[], ignored[], parent_hashmap, and recursive_hashmap problems. Rename clear_directory() to dir_clear() to be more in line with other data structures in git, and introduce a dir_init() to handle the suggested memsetting of dir_struct to all zeroes. I hope that a name like "dir_clear()" is more clear, and that the presence of dir_init() will provide a hint to those looking at the code that they need to look for either a dir_clear() or a dir_free() and lead them to find dir_clear(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19dir: make clear_directory() free all relevant memoryElijah Newren
The calling convention for the dir API is supposed to end with a call to clear_directory() to free up no longer needed memory. However, clear_directory() didn't free dir->entries or dir->ignored. I believe this was an oversight, but a number of callers noticed memory leaks and started free'ing these. Unfortunately, they did so somewhat haphazardly (sometimes freeing the entries in the arrays, and sometimes only free'ing the arrays themselves). This suggests the callers weren't trying to make sure any possible memory used might be free'd, but just the memory they noticed their usecase definitely had allocated. Fix this mess by moving all the duplicated free'ing logic into clear_directory(). End by resetting dir to a pristine state so it could be reused if desired. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-25Merge branch 'ds/sparse-cone'Junio C Hamano
Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command. * ds/sparse-cone: (21 commits) sparse-checkout: improve OS ls compatibility sparse-checkout: respect core.ignoreCase in cone mode sparse-checkout: check for dirty status sparse-checkout: update working directory in-process for 'init' sparse-checkout: cone mode should not interact with .gitignore sparse-checkout: write using lockfile sparse-checkout: use in-process update for disable subcommand sparse-checkout: update working directory in-process sparse-checkout: sanitize for nested folders unpack-trees: add progress to clear_ce_flags() unpack-trees: hash less in cone mode sparse-checkout: init and set in cone mode sparse-checkout: use hashmaps for cone patterns sparse-checkout: add 'cone' mode trace2: add region in clear_ce_flags sparse-checkout: create 'disable' subcommand sparse-checkout: add '--stdin' option to set subcommand sparse-checkout: 'set' subcommand clone: add --sparse mode sparse-checkout: create 'init' subcommand ...
2019-11-22unpack-trees: hash less in cone modeDerrick Stolee
The sparse-checkout feature in "cone mode" can use the fact that the recursive patterns are "connected" to the root via parent patterns to decide if a directory is entirely contained in the sparse-checkout or entirely removed. In these cases, we can skip hashing the paths within those directories and simply set the skipworktree bit to the correct value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: init and set in cone modeDerrick Stolee
To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout (init|set)'. Add '--cone' flag to 'git sparse-checkout init' to set the config option 'core.sparseCheckoutCone=true'. When running 'git sparse-checkout set' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories. When testing 'git sparse-checkout set' in cone mode, check the error stream to ensure we do not see any errors. Specifically, we want to avoid the warning that the patterns do not match the cone-mode patterns. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22sparse-checkout: use hashmaps for cone patternsDerrick Stolee
The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match. As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here: /$folder/ !/$folder/*/ This resulted in a sparse-checkout file sith 8,296 patterns. Running 'git read-tree -mu HEAD' on this file had the following performance: core.sparseCheckout=false: 0.21 s (0.00 s) core.sparseCheckout=true: 3.75 s (3.50 s) core.sparseCheckoutCone=true: 0.23 s (0.01 s) The times in parentheses above correspond to the time spent in the first clear_ce_flags() call, according to the trace2 performance traces. While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-18dir: move doc to dir.hHeba Waly
Move the documentation from Documentation/technical/api-directory-listing.txt to dir.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-directory-listing.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header files. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-11Merge branch 'en/clean-nested-with-ignored'Junio C Hamano
"git clean" fixes. * en/clean-nested-with-ignored: dir: special case check for the possibility that pathspec is NULL clean: fix theoretical path corruption clean: rewrap overly long line clean: avoid removing untracked files in a nested git repository clean: disambiguate the definition of -d git-clean.txt: do not claim we will delete files with -n/--dry-run dir: add commentary explaining match_pathspec_item's return value dir: if our pathspec might match files under a dir, recurse into it dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case dir: also check directories for matching pathspecs dir: fix off-by-one error in match_pathspec_item dir: fix typo in comment t7300: add testcases showing failure to clean specified pathspecs
2019-09-17clean: avoid removing untracked files in a nested git repositoryElijah Newren
Users expect files in a nested git repository to be left alone unless sufficiently forced (with two -f's). Unfortunately, in certain circumstances, git would delete both tracked (and possibly dirty) files and untracked files within a nested repository. To explain how this happens, let's contrast a couple cases. First, take the following example setup (which assumes we are already within a git repo): git init nested cd nested >tracked git add tracked git commit -m init >untracked cd .. In this setup, everything works as expected; running 'git clean -fd' will result in fill_directory() returning the following paths: nested/ nested/tracked nested/untracked and then correct_untracked_entries() would notice this can be compressed to nested/ and then since "nested/" is a directory, we would call remove_dirs("nested/", ...), which would check is_nonbare_repository_dir() and then decide to skip it. However, if someone also creates an ignored file: >nested/ignored then running 'git clean -fd' would result in fill_directory() returning the same paths: nested/ nested/tracked nested/untracked but correct_untracked_entries() will notice that we had ignored entries under nested/ and thus simplify this list to nested/tracked nested/untracked Since these are not directories, we do not call remove_dirs() which was the only place that had the is_nonbare_repository_dir() safety check -- resulting in us deleting both the untracked file and the tracked (and possibly dirty) file. One possible fix for this issue would be walking the parent directories of each path and checking if they represent nonbare repositories, but that would be wasteful. Even if we added caching of some sort, it's still a waste because we should have been able to check that "nested/" represented a nonbare repository before even descending into it in the first place. Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use it to prevent fill_directory() and friends from descending into nested git repos. With this change, we also modify two regression tests added in commit 91479b9c72f1 ("t7300: add tests to document behavior of clean and nested git", 2015-06-15). That commit, nor its series, nor the six previous iterations of that series on the mailing list discussed why those tests coded the expectation they did. In fact, it appears their purpose was simply to test _existing_ behavior to make sure that the performance changes didn't change the behavior. However, these two tests directly contradicted the manpage's claims that two -f's were required to delete files/directories under a nested git repository. While one could argue that the user gave an explicit path which matched files/directories that were within a nested repository, there's a slippery slope that becomes very difficult for users to understand once you go down that route (e.g. what if they specified "git clean -f -d '*.c'"?) It would also be hard to explain what the exact behavior was; avoid such problems by making it really simple. Also, clean up some grammar errors describing this functionality in the git-clean manpage. Finally, there are still a couple bugs with -ffd not cleaning out enough (e.g. missing the nested .git) and with -ffdX possibly cleaning out the wrong files (paying attention to outer .gitignore instead of inner). This patch does not address these cases at all (and does not change the behavior relative to those flags), it only fixes the handling when given a single -f. See https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more discussion of the -ffd[X?] bugs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-17dir: if our pathspec might match files under a dir, recurse into itElijah Newren
For git clean, if a directory is entirely untracked and the user did not specify -d (corresponding to DIR_SHOW_IGNORED_TOO), then we usually do not want to remove that directory and thus do not recurse into it. However, if the user manually specified specific (or even globbed) paths somewhere under that directory to remove, then we need to recurse into the directory to make sure we remove the relevant paths under that directory as the user requested. Note that this does not mean that the recursed-into directory will be added to dir->entries for later removal; as of a few commits earlier in this series, there is another more strict match check that is run after returning from a recursed-into directory before deciding to add it to the list of entries. Therefore, this will only result in files underneath the given directory which match one of the pathspecs being added to the entries list. Two notes of potential interest to future readers: * If we wanted to only recurse into a directory when it is specifically matched rather than matched-via-glob (e.g. '*.c'), then we could do so via making the final non-zero return in match_pathspec_item be MATCHED_RECURSIVELY instead of MATCHED_RECURSIVELY_LEADING_PATHSPEC. (Note that the relative order of MATCHED_RECURSIVELY_LEADING_PATHSPEC and MATCHED_RECURSIVELY are important for such a change.) I was leaving open that possibility while writing an RFC asking for the behavior we want, but even though we don't want it, that knowledge might help you understand the code flow better. * There is a growing amount of logic in read_directory_recursive() for deciding whether to recurse into a subdirectory. However, there is a comment immediately preceding this logic that says to recurse if instructed by treat_path(). It may be better for the logic in read_directory_recursive() to ultimately be moved to treat_path() (or another function it calls, such as treat_directory()), but I have left that for someone else to tackle in the future. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-06unpack-trees: rename 'is_excluded_from_list()'Derrick Stolee
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! Now that this library is renamed to use 'struct pattern_list' and 'struct pattern', we can now rename the method used by the sparse-checkout feature to determine which paths should appear in the working directory. The method is_excluded_from_list() is only used by the sparse-checkout logic in unpack-trees and list-objects-filter. The confusing part is that it returned 1 for "excluded" (i.e. it matches the list of exclusions) but that really manes that the path matched the list of patterns for _inclusion_ in the working directory. Rename the method to be path_matches_pattern_list() and have it return an explicit 'enum pattern_match_result'. Here, the values MATCHED = 1, UNMATCHED = 0, and UNDECIDED = -1 agree with the previous integer values. This shift allows future consumers to better understand what the retur values mean, and provides more type checking for handling those values. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-06treewide: rename 'exclude' methods to 'pattern'Derrick Stolee
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames several methods defined in dir.h to make more sense with the renamed 'struct exclude_list' to 'struct pattern_list' and 'struct exclude' to 'struct path_pattern': * last_exclude_matching() -> last_matching_pattern() * parse_exclude() -> parse_path_pattern() In addition, the word 'exclude' was replaced with 'pattern' in the methods below: * add_exclude_list() * add_excludes_from_file_to_list() * add_excludes_from_file() * add_excludes_from_blob_to_list() * add_exclude() * clear_exclude_list() A few methods with the word "exclude" remain. These will be handled seperately. In particular, the method "is_excluded()" is concretely about the .gitignore file relative to a specific directory. This is the important boundary between library and consumer: is_excluded() cares about .gitignore, but is_excluded() calls last_matching_pattern() to make that decision. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-06treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_'Derrick Stolee
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit replaces 'EXCL_FLAG_' to 'PATTERN_FLAG_' in the names of the flags used on 'struct path_pattern'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-06treewide: rename 'struct exclude_list' to 'struct pattern_list'Derrick Stolee
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a 'struct exclude_list' makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames 'struct exclude_list' to 'struct pattern_list' and renames several variables called 'el' to 'pl'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-06treewide: rename 'struct exclude' to 'struct path_pattern'Derrick Stolee
The first consumer of pattern-matching filenames was the .gitignore feature. In that context, storing a list of patterns as a list of 'struct exclude' items makes sense. However, the sparse-checkout feature then adopted these structures and methods, but with the opposite meaning: these patterns match the files that should be included! It would be clearer to rename this entire library as a "pattern matching" library, and the callers apply exclusion/inclusion logic accordingly based on their needs. This commit renames 'struct exclude' to 'struct path_pattern' and renames several variable names to match. 'struct pattern' was already taken by attr.c, and this more completely describes that the patterns are specific to file paths. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-13Merge branch 'dl/no-extern-in-func-decl'Junio C Hamano
Mechanically and systematically drop "extern" from function declarlation. * dl/no-extern-in-func-decl: *.[ch]: manually align parameter lists *.[ch]: remove extern from function declarations using sed *.[ch]: remove extern from function declarations using spatch
2019-05-08Merge branch 'nd/sha1-name-c-wo-the-repository'Junio C Hamano
Further code clean-up to allow the lowest level of name-to-object mapping layer to work with a passed-in repository other than the default one. * nd/sha1-name-c-wo-the-repository: (34 commits) sha1-name.c: remove the_repo from get_oid_mb() sha1-name.c: remove the_repo from other get_oid_* sha1-name.c: remove the_repo from maybe_die_on_misspelt_object_name submodule-config.c: use repo_get_oid for reading .gitmodules sha1-name.c: add repo_get_oid() sha1-name.c: remove the_repo from get_oid_with_context_1() sha1-name.c: remove the_repo from resolve_relative_path() sha1-name.c: remove the_repo from diagnose_invalid_index_path() sha1-name.c: remove the_repo from handle_one_ref() sha1-name.c: remove the_repo from get_oid_1() sha1-name.c: remove the_repo from get_oid_basic() sha1-name.c: remove the_repo from get_describe_name() sha1-name.c: remove the_repo from get_oid_oneline() sha1-name.c: add repo_interpret_branch_name() sha1-name.c: remove the_repo from interpret_branch_mark() sha1-name.c: remove the_repo from interpret_nth_prior_checkout() sha1-name.c: remove the_repo from get_short_oid() sha1-name.c: add repo_for_each_abbrev() sha1-name.c: store and use repo in struct disambiguate_state sha1-name.c: add repo_find_unique_abbrev_r() ...
2019-05-05*.[ch]: manually align parameter listsDenton Liu
In previous patches, extern was mechanically removed from function declarations without care to formatting, causing parameter lists to be misaligned. Manually format changed sections such that the parameter lists should be realigned. Viewing this patch with 'git diff -w' should produce no output. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-05*.[ch]: remove extern from function declarations using spatchDenton Liu
There has been a push to remove extern from function declarations. Remove some instances of "extern" for function declarations which are caught by Coccinelle. Note that Coccinelle has some difficulty with processing functions with `__attribute__` or varargs so some `extern` declarations are left behind to be dealt with in a future patch. This was the Coccinelle patch used: @@ type T; identifier f; @@ - extern T f(...); and it was run with: $ git ls-files \*.{c,h} | grep -v ^compat/ | xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place Files under `compat/` are intentionally excluded as some are directly copied from external sources and we should avoid churning them as much as possible. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>