Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.kernel.org/pub/scm/git/git.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-18ident: rename commit_rewrite_person() to apply_mailmap_to_header()Siddharth Asthana
commit_rewrite_person() takes a commit buffer and replaces the idents in the header with their canonical versions using the mailmap mechanism. The name "commit_rewrite_person()" is misleading as it doesn't convey what kind of rewrite are we going to do to the buffer. It also doesn't clearly mention that the function will limit itself to the header part of the buffer. The new name, "apply_mailmap_to_header()", expresses the functionality of the function pretty clearly. We intend to use apply_mailmap_to_header() in git-cat-file to replace idents in the headers of commit and tag object buffers. So, we will be extending this function to take tag objects buffer as well and replace idents on the tagger header using the mailmap mechanism. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18ident: move commit_rewrite_person() to ident.cSiddharth Asthana
commit_rewrite_person() and rewrite_ident_line() are static functions defined in revision.c. Their usages are as follows: - commit_rewrite_person() takes a commit buffer and replaces the author and committer idents with their canonical versions using the mailmap mechanism - rewrite_ident_line() takes author/committer header lines from the commit buffer and replaces the idents with their canonical versions using the mailmap mechanism. This patch moves commit_rewrite_person() and rewrite_ident_line() to ident.c which contains many other functions related to idents like split_ident_line(). By moving commit_rewrite_person() to ident.c, we also intend to use it in git-cat-file to replace committer and author idents from the headers to their canonical versions using the mailmap mechanism. The function is moved as is for now to make it clear that there are no other changes, but it will be renamed in a following commit. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-04Merge branch 'ds/sparse-sparse-checkout'Junio C Hamano
"sparse-checkout" learns to work well with the sparse-index feature. * ds/sparse-sparse-checkout: sparse-checkout: integrate with sparse index p2000: add test for 'git sparse-checkout [add|set]' sparse-index: complete partial expansion sparse-index: partially expand directories sparse-checkout: --no-sparse-index needs a full index cache-tree: implement cache_tree_find_path() sparse-index: introduce partially-sparse indexes sparse-index: create expand_index() t1092: stress test 'git sparse-checkout set' t1092: refactor 'sparse-index contents' test
2022-06-04Merge branch 'ns/batch-fsync'Junio C Hamano
Introduce a filesystem-dependent mechanism to optimize the way the bits for many loose object files are ensured to hit the disk platter. * ns/batch-fsync: core.fsyncmethod: performance tests for batch mode t/perf: add iteration setup mechanism to perf-lib core.fsyncmethod: tests for batch mode test-lib-functions: add parsing helpers for ls-files and ls-tree core.fsync: use batch mode and sync loose objects by default on Windows unpack-objects: use the bulk-checkin infrastructure update-index: use the bulk-checkin infrastructure builtin/add: add ODB transaction around add_files_to_cache cache-tree: use ODB transaction around writing a tree core.fsyncmethod: batched disk flushes for loose-objects bulk-checkin: rebrand plug/unplug APIs as 'odb transactions' bulk-checkin: rename 'state' variable and separate 'plugged' boolean
2022-05-23sparse-index: introduce partially-sparse indexesDerrick Stolee
A future change will present a temporary, in-memory mode where the index can both contain sparse directory entries but also not be completely collapsed to the smallest possible sparse directories. This will be necessary for modifying the sparse-checkout definition while using a sparse index. For now, convert the single-bit member 'sparse_index' in 'struct index_state' to be a an 'enum sparse_index_mode' with three modes: * INDEX_EXPANDED (0): No sparse directories exist. This is always the case for repositories that do not use cone-mode sparse-checkout. * INDEX_COLLAPSED: Sparse directories may exist. Files outside the sparse-checkout cone are reduced to sparse directory entries whenever possible. * INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file entries outside the sparse-checkout cone may exist. Running convert_to_sparse() may further reduce those files to sparse directory entries. The main reason to store this extra information is to allow convert_to_sparse() to short-circuit when the index is already in INDEX_EXPANDED mode but to actually do the necessary work when in INDEX_PARTIALLY_SPARSE mode. The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-04Merge branch 'ds/midx-normalize-pathname-before-comparison'Junio C Hamano
The path taken by "git multi-pack-index" command from the end user was compared with path internally prepared by the tool withut first normalizing, which lead to duplicated paths not being noticed, which has been corrected. * ds/midx-normalize-pathname-before-comparison: cache: use const char * for get_object_directory() multi-pack-index: use --object-dir real path midx: use real paths in lookup_multi_pack_index()
2022-04-25cache: use const char * for get_object_directory()Derrick Stolee
The get_object_directory() method returns the exact string stored at the_repository->objects->odb->path. The return type of "char *" implies that the caller must keep track of the buffer and free() it when complete. This causes significant problems later when the ODB is accessed. Use "const char *" as the return type to avoid this confusion. There are no current callers that care about the non-const definition. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06core.fsync: use batch mode and sync loose objects by default on WindowsNeeraj Singh
Git for Windows has defaulted to core.fsyncObjectFiles=true since September 2017. We turn on syncing of loose object files with batch mode in upstream Git so that we can get broad coverage of the new code upstream. We don't actually do fsyncs in the most of the test suite, since GIT_TEST_FSYNC is set to 0. However, we do exercise all of the surrounding batch mode code since GIT_TEST_FSYNC merely makes the maybe_fsync wrapper always appear to succeed. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06core.fsyncmethod: batched disk flushes for loose-objectsNeeraj Singh
When adding many objects to a repo with `core.fsync=loose-object`, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. This commit introduces a new `core.fsyncMethod=batch` option that batches up hardware flushes. It hooks into the bulk-checkin odb-transaction functionality, takes advantage of tmp-objdir, and uses the writeout-only support code. When the new mode is enabled, we do the following for each new object: 1a. Create the object in a tmp-objdir. 2a. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin: 1b. Issue an fsync against a dummy file to flush the log and hardware writeback cache, which should by now have seen the tmp-objdir writes. 2b. Rename all of the tmp-objdir files to their final names. 3b. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the default today, but the user now has the option of syncing the index and there is a separate patch series to implement syncing of refs. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This sequence also ensures that no object files appear in the main object store unless they are fsync-durable. Batch mode is only enabled if core.fsync includes loose-objects. If the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does not include loose-objects, we will use file-by-file fsyncing. In step (1a) of the sequence, the tmp-objdir is created lazily to avoid work if no loose objects are ever added to the ODB. We use a tmp-objdir to maintain the invariant that no loose-objects are visible in the main ODB unless they are properly fsync-durable. This is important since future ODB operations that try to create an object with specific contents will silently drop the new data if an object with the target hash exists without checking that the loose-object contents match the hash. Only a full git-fsck would restore the ODB to a functional state where dataloss doesn't occur. In step (1b) of the sequence, we issue a fsync against a dummy file created specifically for the purpose. This method has a little higher cost than using one of the input object files, but makes adding new callers of this mechanism easier, since we don't need to figure out which object file is "last" or risk sharing violations by caching the fd of the last object file. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. Adding 500 files to the repo with 'git add' Times reported in seconds. object file syncing | Linux | Mac | Windows --------------------|-------|-------|-------- disabled | 0.06 | 0.35 | 0.61 fsync | 1.88 | 11.18 | 2.47 batch | 0.15 | 0.41 | 1.53 Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-06Merge branch 'ns/core-fsyncmethod' into ns/batch-fsyncJunio C Hamano
* ns/core-fsyncmethod: configure.ac: fix HAVE_SYNC_FILE_RANGE definition core.fsyncmethod: correctly camel-case warning message core.fsync: fix incorrect expression for default configuration core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped
2022-04-04Merge branch 'jh/builtin-fsmonitor-part2'Junio C Hamano
Built-in fsmonitor (part 2). * jh/builtin-fsmonitor-part2: (30 commits) t7527: test status with untracked-cache and fsmonitor--daemon fsmonitor: force update index after large responses fsmonitor--daemon: use a cookie file to sync with file system fsmonitor--daemon: periodically truncate list of modified files t/perf/p7519: add fsmonitor--daemon test cases t/perf/p7519: speed up test on Windows t/perf/p7519: fix coding style t/helper/test-chmtime: skip directories on Windows t/perf: avoid copying builtin fsmonitor files into test repo t7527: create test for fsmonitor--daemon t/helper/fsmonitor-client: create IPC client to talk to FSMonitor Daemon help: include fsmonitor--daemon feature flag in version info fsmonitor--daemon: implement handle_client callback compat/fsmonitor/fsm-listen-darwin: implement FSEvent listener on MacOS compat/fsmonitor/fsm-listen-darwin: add MacOS header files for FSEvent compat/fsmonitor/fsm-listen-win32: implement FSMonitor backend on Windows fsmonitor--daemon: create token-based changed path cache fsmonitor--daemon: define token-ids fsmonitor--daemon: add pathname classification fsmonitor--daemon: implement 'start' command ...
2022-04-04Merge branch 'ns/core-fsyncmethod'Junio C Hamano
A couple of fix-up to a topic that is now in 'master'. * ns/core-fsyncmethod: core.fsyncmethod: correctly camel-case warning message core.fsync: fix incorrect expression for default configuration
2022-03-30core.fsync: fix incorrect expression for default configurationNeeraj Singh
Commit b9f5d035 (core.fsync: documentation and user-friendly aggregate options, 2022-03-15) introduced an incorrect value for FSYNC_COMPONENTS_DEFAULT. We need an AND-NOT rather than OR-NOT. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-26Merge branch 'ps/fsync-refs'Junio C Hamano
Updates to refs traditionally weren't fsync'ed, but we can configure using core.fsync variable to do so. * ps/fsync-refs: core.fsync: new option to harden references
2022-03-26Merge branch 'ns/core-fsyncmethod'Junio C Hamano
Replace core.fsyncObjectFiles with two new configuration variables, core.fsync and core.fsyncMethod. * ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped
2022-03-26fsmonitor: config settings are repository-specificJeff Hostetler
Move fsmonitor config settings to a new and opaque `struct fsmonitor_settings` structure. Add a lazily-loaded pointer to this into `struct repo_settings` Create an `enum fsmonitor_mode` type in `struct fsmonitor_settings` to represent the state of fsmonitor. This lets us represent which, if any, fsmonitor provider (hook or IPC) is enabled. Create `fsm_settings__get_*()` getters to lazily look up fsmonitor- related config settings. Get rid of the `core_fsmonitor` global variable. Move the code to lookup the existing `core.fsmonitor` config value into the fsmonitor settings. Create a hook pathname variable in `struct fsmonitor-settings` and only set it when in hook mode. Extend the definition of `core.fsmonitor` to be either a boolean or a hook pathname. When true, the builtin FSMonitor is used. When false or unset, no FSMonitor (neither builtin nor hook) is used. The existing `core_fsmonitor` global variable was used to store the pathname to the fsmonitor hook *and* it was used as a boolean to see if fsmonitor was enabled. This dual usage and global visibility leads to confusion when we add the IPC-based provider. So lets hide the details in fsmonitor-settings.c and let it decide which provider to use in the case of multiple settings. This avoids cluttering up repo-settings.c with these private details. A future commit in builtin-fsmonitor series will add the ability to disqualify worktrees for various reasons, such as being mounted from a remote volume, where fsmonitor should not be started. Having the config settings hidden in fsmonitor-settings.c allows such worktree restrictions to override the config values used. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17Merge branch 'ab/object-file-api-updates'Junio C Hamano
Object-file API shuffling. * ab/object-file-api-updates: object-file API: pass an enum to read_object_with_reference() object-file.c: add a literal version of write_object_file_prepare() object-file API: have hash_object_file() take "enum object_type" object API: rename hash_object_file_literally() to write_*() object-file API: split up and simplify check_object_signature() object API users + docs: check <0, not !0 with check_object_signature() object API docs: move check_object_signature() docs to cache.h object API: correct "buf" v.s. "map" mismatch in *.c and *.h object-file API: have write_object_file() take "enum object_type" object-file API: add a format_object_header() function object-file API: return "void", not "int" from hash_object_file() object-file.c: split up declaration of unrelated variables
2022-03-15core.fsync: new option to harden referencesPatrick Steinhardt
When writing both loose and packed references to disk we first create a lockfile, write the updated values into that lockfile, and on commit we rename the file into place. According to filesystem developers, this behaviour is broken because applications should always sync data to disk before doing the final rename to ensure data consistency [1][2][3]. If applications fail to do this correctly, a hard crash of the machine can easily result in corrupted on-disk data. This kind of corruption can in fact be easily observed with Git when the machine hard-resets shortly after writing references to disk. On machines with ext4, this will likely lead to the "empty files" problem: the file has been renamed, but its data has not been synced to disk. The result is that the reference is corrupt, and in the worst case this can lead to data loss. Implement a new option to harden references so that users and admins can avoid this scenario by syncing locked loose and packed references to disk before we rename them into place. [1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ [2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename) [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-15Merge branch 'ns/core-fsyncmethod' into ps/fsync-refsJunio C Hamano
* ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped
2022-03-15core.fsync: documentation and user-friendly aggregate optionsNeeraj Singh
This commit adds aggregate options for the core.fsync setting that are more user-friendly. These options are specified in terms of 'levels of safety', indicating which Git operations are considered to be sync points for durability. The new documentation is also included here in its entirety for ease of review. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-11core.fsync: new option to harden the indexNeeraj Singh
This commit introduces the new ability for the user to harden the index. In the event of a system crash, the index must be durable for the user to actually find a file that has been added to the repo and then deleted from the working tree. We use the presence of the COMMIT_LOCK flag and absence of the alternate_index_output as a proxy for determining whether we're updating the persistent index of the repo or some temporary index. We don't sync these temporary indexes. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-11core.fsync: introduce granular fsync control infrastructureNeeraj Singh
This commit introduces the infrastructure for the core.fsync configuration knob. The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components. If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects. This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values. This change introduces the currently unused fsync_component helper, which will be used by a later patch that adds fsyncing to the refs backend. Actual configuration and documentation of the fsync components list are in other patches in the series to separate review of the underlying mechanism from the policy of how it's configured. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-11core.fsyncmethod: add writeout-only modeNeeraj Singh
This commit introduces the `core.fsyncMethod` configuration knob, which can currently be set to `fsync` or `writeout-only`. The new writeout-only mode attempts to tell the operating system to flush its in-memory page cache to the storage hardware without issuing a CACHE_FLUSH command to the storage controller. Writeout-only fsync is significantly faster than a vanilla fsync on common hardware, since data is written to a disk-side cache rather than all the way to a durable medium. Later changes in this patch series will take advantage of this primitive to implement batching of hardware flushes. When git_fsync is called with FSYNC_WRITEOUT_ONLY, it may fail and the caller is expected to do an ordinary fsync as needed. On Apple platforms, the fsync system call does not issue a CACHE_FLUSH directive to the storage controller. This change updates fsync to do fcntl(F_FULLFSYNC) to make fsync actually durable. We maintain parity with existing behavior on Apple platforms by setting the default value of the new core.fsyncMethod option. Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-10Merge branch 'en/present-despite-skipped'Junio C Hamano
In sparse-checkouts, files mis-marked as missing from the working tree could lead to later problems. Such files were hard to discover, and harder to correct. Automatically detecting and correcting the marking of such files has been added to avoid these problems. * en/present-despite-skipped: repo_read_index: add config to expect files outside sparse patterns Accelerate clear_skip_worktree_from_present_files() by caching Update documentation related to sparsity and the skip-worktree bit repo_read_index: clear SKIP_WORKTREE bit from files present in worktree unpack-trees: fix accidental loss of user changes t1011: add testcase demonstrating accidental loss of user modifications
2022-03-02repo_read_index: add config to expect files outside sparse patternsElijah Newren
Typically with sparse checkouts, we expect files outside the sparsity patterns to be marked as SKIP_WORKTREE and be missing from the working tree. Sometimes this expectation would be violated however; including in cases such as: * users grabbing files from elsewhere and writing them to the worktree (perhaps by editing a cached copy in an editor, copying/renaming, or even untarring) * various git commands having incomplete or no support for the SKIP_WORKTREE bit[1,2] * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. When the SKIP_WORKTREE bit in the index did not reflect the presence of the file in the working tree, it traditionally caused confusion and was difficult to detect and recover from. So, in a sparse checkout, since af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present in worktree, 2022-01-14), Git automatically clears the SKIP_WORKTREE bit at index read time for entries corresponding to files that are present in the working tree. There is another workflow, however, where it is expected that paths outside the sparsity patterns appear to exist in the working tree and that they do not lose the SKIP_WORKTREE bit, at least until they get modified. A Git-aware virtual file system[4] takes advantage of its position as a file system driver to expose all files in the working tree, fetch them on demand using partial clone on access, and tell Git to pay attention to them on demand by updating the sparse checkout pattern on writes. This means that commands like "git status" only have to examine files that have potentially been modified, whereas commands like "ls" are able to show the entire codebase without requiring manual updates to the sparse checkout pattern. Thus since af6a51875a, Git with such Git-aware virtual file systems unsets the SKIP_WORKTREE bit for all files and commands like "git status" have to fetch and examine them all. Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to allow limiting the tracked set of files to a small set once again. A Git-aware virtual file system or other application that wants to maintain files outside of the sparse checkout can set this in a repository to instruct Git not to check for the presence of SKIP_WORKTREE files. The setting defaults to false, so most users of sparse checkout will still get the benefit of an automatically updating index to recover from the variety of difficult issues detailed in af6a51875a for paths with SKIP_WORKTREE set despite the path being present. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] The three long paragraphs in the middle of https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ [4] such as the vfsd described in https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/ Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-26object-file API: pass an enum to read_object_with_reference()Ævar Arnfjörð Bjarmason
Change the read_object_with_reference() function to take an "enum object_type". It was not prepared to handle an arbitrary "const char *type", as it was itself calling type_from_string(). Let's change the only caller that passes in user data to use type_from_string(), and convert the rest to use e.g. "OBJ_TREE" instead of "tree_type". The "cat-file" caller is not on the codepath that handles"--allow-unknown", so the type_from_string() there is safe. Its use of type_from_string() doesn't functionally differ from that of the pre-image. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-26object-file API: have hash_object_file() take "enum object_type"Ævar Arnfjörð Bjarmason
Change the hash_object_file() function to take an "enum object_type". Since a preceding commit all of its callers are passing either "{commit,tree,blob,tag}_type", or the result of a call to type_name(), the parse_object() caller that would pass NULL is now using stream_object_signature(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-26object-file API: split up and simplify check_object_signature()Ævar Arnfjörð Bjarmason
Split up the check_object_signature() function into that non-streaming version (it accepts an already filled "buf"), and a new stream_object_signature() which will retrieve the object from storage, and hash it on-the-fly. All of the callers of check_object_signature() were effectively calling two different functions, if we go by cyclomatic complexity. I.e. they'd either take the early "if (map)" branch and return early, or not. This has been the case since the "if (map)" condition was added in 090ea12671b (parse_object: avoid putting whole blob in core, 2012-03-07). We can then further simplify the resulting check_object_signature() function since only one caller wanted to pass a non-NULL "buf" and a non-NULL "real_oidp". That "read_loose_object()" codepath used by "git fsck" can instead use hash_object_file() followed by oideq(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-26object API users + docs: check <0, not !0 with check_object_signature()Ævar Arnfjörð Bjarmason
Change those users of the object API that misused check_object_signature() by assuming it returned any non-zero when the OID didn't match the expected value to check <0 instead. In practice all of this code worked before, but it wasn't consistent with rest of the users of the API. Let's also clarify what the <0 return value means in API docs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-26object API docs: move check_object_signature() docs to cache.hÆvar Arnfjörð Bjarmason
Move the API documentation for check_object_signature() to cache.h, where its prototype is declared. This is in preparation for adding a companion function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-26Merge branch 'ab/date-mode-release'Junio C Hamano
Plug (some) memory leaks around parse_date_format(). * ab/date-mode-release: date API: add and use a date_mode_release() date API: add basic API docs date API: provide and use a DATE_MODE_INIT date API: create a date.h, split from cache.h cache.h: remove always unused show_date_human() declaration
2022-02-17Merge branch 'ab/auto-detect-zlib-compress2'Junio C Hamano
The build procedure has been taught to notice older version of zlib and enable our replacement uncompress2() automatically. * ab/auto-detect-zlib-compress2: compat: auto-detect if zlib has uncompress2()
2022-02-16date API: create a date.h, split from cache.hÆvar Arnfjörð Bjarmason
Move the declaration of the date.c functions from cache.h, and adjust the relevant users to include the new date.h header. The show_ident_date() function belonged in pretty.h (it's defined in pretty.c), its two users outside of pretty.c didn't strictly need to include pretty.h, as they get it indirectly, but let's add it to them anyway. Similarly, the change to "builtin/{fast-import,show-branch,tag}.c" isn't needed as far as the compiler is concerned, but since they all use the "DATE_MODE()" macro we now define in date.h, let's have them include it. We could simply include this new header in "cache.h", but as this change shows these functions weren't common enough to warrant including in it in the first place. By moving them out of cache.h changes to this API will no longer cause a (mostly) full re-build of the project when "make" is run. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16cache.h: remove always unused show_date_human() declarationÆvar Arnfjörð Bjarmason
There has never been a show_date_human() function on the "master" branch in git.git. This declaration was added in b841d4ff438 (Add `human` format to test-tool, 2019-01-28). A look at the ML history reveals that it was leftover cruft from an earlier version of that commit[1]. 1. https://lore.kernel.org/git/20190118061805.19086-5-ischis2@cox.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-05Merge branch 'ms/update-index-racy'Junio C Hamano
"git update-index --refresh" has been taught to deal better with racy timestamps (just like "git status" already does). * ms/update-index-racy: update-index: refresh should rewrite index in case of racy timestamps t7508: add tests capturing racy timestamp handling t7508: fix bogus mtime verification test-lib: introduce API for verifying file mtime
2022-02-05Merge branch 'ab/cat-file'Junio C Hamano
Assorted updates to "git cat-file", especially "-h". * ab/cat-file: cat-file: s/_/-/ in typo'd usage_msg_optf() message cat-file: don't whitespace-pad "(...)" in SYNOPSIS and usage output cat-file: use GET_OID_ONLY_TO_DIE in --(textconv|filters) object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY cat-file: correct and improve usage information cat-file: fix remaining usage bugs cat-file: make --batch-all-objects a CMDMODE cat-file: move "usage" variable to cmd_cat_file() cat-file docs: fix SYNOPSIS and "-h" output parse-options API: add a usage_msg_optf() cat-file tests: test messaging on bad objects/paths cat-file tests: test bad usage
2022-01-26compat: auto-detect if zlib has uncompress2()Ævar Arnfjörð Bjarmason
We have a copy of uncompress2() implementation in compat/ so that we can build with an older version of zlib that lack the function, and the build procedure selects if it is used via the NO_UNCOMPRESS2 $(MAKE) variable. This is yet another "annoying" knob the porters need to tweak on platforms that are not common enough to have the default set in the config.mak.uname file. Attempt to instead ask the system header <zlib.h> to decide if we need the compatibility implementation. This is a deviation from the way we have been handling the "compatiblity" features so far, and if it can be done cleanly enough, it could work as a model for features that need compatibility definition we discover in the future. With that goal in mind, avoid expedient but ugly hacks, like shoving the code that is conditionally compiled into an unrelated .c file, which may not work in future cases---instead, take an approach that uses a file that is independently compiled and stands on its own. Compile and link compat/zlib-uncompress2.c file unconditionally, but conditionally hide the implementation behind #if/#endif when zlib version is 1.2.9 or newer, and unconditionally archive the resulting object file in the libgit.a to be picked up by the linker. There are a few things to note in the shape of the code base after this change: - We no longer use NO_UNCOMPRESS2 knob; if the system header <zlib.h> claims a version that is more cent than the library actually is, this would break, but it is easy to add it back when we find such a system. - The object file compat/zlib-uncompress2.o is always compiled and archived in libgit.a, just like a few other compat/ object files already are. - The inclusion of <zlib.h> is done in <git-compat-util.h>; we used to do so from <cache.h> which includes <git-compat-util.h> as the first thing it does, so from the *.c codes, there is no practical change. - Until objects in libgit.a that is already used gains a reference to the function, the reftable code will be the only one that wants it, so libgit.a on the linker command line needs to appear once more at the end to satisify the mutual dependency. - Beat found a trick used by OpenSSL to avoid making the conditionally-compiled object truly empty (apparently because they had to deal with compilers that do not want to see an effectively empty input file). Our compat/zlib-uncompress2.c file borrows the same trick for portabilty. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Beat Bolli <dev+git@drbeat.li> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-13Merge branch 'ma/header-dup-cleanup'Junio C Hamano
Code clean-up. * ma/header-dup-cleanup: cache.h: drop duplicate `ensure_full_index()` declaration
2022-01-10cache.h: drop duplicate `ensure_full_index()` declarationMartin Ågren
There are two identical declarations of `ensure_full_index()` in cache.h. Commit 3964fc2aae ("sparse-index: add guard to ensure full index", 2021-03-30) provided an empty implementation of `ensure_full_index()`, declaring it in a new file sparse-index.h. When commit 4300f8442a ("sparse-index: implement ensure_full_index()", 2021-03-30) fleshed out the implementation, it added an identical declaration to cache.h. Then 118a2e8bde ("cache: move ensure_full_index() to cache.h", 2021-04-01) favored having the declaration in cache.h. Because of the double declaration, at that point we could have just dropped the one in sparse-index.h, but instead it got moved to cache.h. As a result, cache.h contains the exact same function declaration twice. Drop the one under "/* Name hashing */", in favor of the one under "/* Initialize and use the cache information */". Signed-off-by: Martin Ågren <martin.agren@gmail.com> Acked-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-07update-index: refresh should rewrite index in case of racy timestampsMarc Strapetz
'git update-index --refresh' and '--really-refresh' should force writing of the index file if racy timestamps have been encountered, as 'git status' already does [1]. Note that calling 'git update-index --refresh' still does not guarantee that there will be no more racy timestamps afterwards (the same holds true for 'git status'): - calling 'git update-index --refresh' immediately after touching and adding a file may still leave racy timestamps if all three operations occur within the racy-tolerance (usually 1 second unless USE_NSEC has been defined) - calling 'git update-index --refresh' for timestamps which are set into the future will leave them racy To guarantee that such racy timestamps will be resolved would require to wait until the system clock has passed beyond these timestamps and only then write the index file. Especially for future timestamps, this does not seem feasible because of possibly long delays/hangs. [1] https://lore.kernel.org/git/d3dd805c-7c1d-30a9-6574-a7bfcb7fc013@syntevo.com/ Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-06Merge branch 'en/keep-cwd'Junio C Hamano
Many git commands that deal with working tree files try to remove a directory that becomes empty (i.e. "git switch" from a branch that has the directory to another branch that does not would attempt remove all files in the directory and the directory itself). This drops users into an unfamiliar situation if the command was run in a subdirectory that becomes subject to removal due to the command. The commands have been taught to keep an empty directory if it is the directory they were started in to avoid surprising users. * en/keep-cwd: t2501: simplify the tests since we can now assume desired behavior dir: new flag to remove_dir_recurse() to spare the original_cwd dir: avoid incidentally removing the original_cwd in remove_path() stash: do not attempt to remove startup_info->original_cwd rebase: do not attempt to remove startup_info->original_cwd clean: do not attempt to remove startup_info->original_cwd symlinks: do not include startup_info->original_cwd in dir removal unpack-trees: add special cwd handling unpack-trees: refuse to remove startup_info->original_cwd setup: introduce startup_info->original_cwd t2501: add various tests for removing the current working directory
2021-12-31cat-file: use GET_OID_ONLY_TO_DIE in --(textconv|filters)Ævar Arnfjörð Bjarmason
Change the cat_one_file() logic that calls get_oid_with_context() under --textconv and --filters to use the GET_OID_ONLY_TO_DIE flag, thus improving the error messaging emitted when e.g. <path> is missing but <rev> is not. To service the "cat-file" use-case we need to introduce a new "GET_OID_REQUIRE_PATH" flag, otherwise it would exit early as soon as a valid "HEAD" was resolved, but in the "cat-file" case being changed we always need a valid revision and path. This arguably makes the "<bad rev>:<bad path>" and "<bad rev>:<good (in HEAD) path>" use cases worse, as we won't quote the <path> component at the user anymore, but let's just use the existing logic "git log" et al use for now. We can improve the messaging for those cases as a follow-up for all callers. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-15Merge branch 'ew/test-wo-fsync'Junio C Hamano
Allow running our tests while disabling fsync. * ew/test-wo-fsync: tests: disable fsync everywhere
2021-12-11Merge branch 'vd/sparse-reset'Junio C Hamano
Various operating modes of "git reset" have been made to work better with the sparse index. * vd/sparse-reset: unpack-trees: improve performance of next_cache_entry reset: make --mixed sparse-aware reset: make sparse-aware (except --mixed) reset: integrate with sparse index reset: expand test coverage for sparse checkouts sparse-index: update command for expand/collapse test reset: preserve skip-worktree bit in mixed reset reset: rename is_missing to !is_in_reset_tree
2021-12-10setup: introduce startup_info->original_cwdElijah Newren
Removing the current working directory causes all subsequent git commands run from that directory to get confused and fail with a message about being unable to read the current working directory: $ git status fatal: Unable to read current working directory: No such file or directory Non-git commands likely have similar warnings or even errors, e.g. $ bash -c 'echo hello' shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory hello This confuses end users, particularly since the command they get the error from is not the one that caused the problem; the problem came from the side-effect of some previous command. We would like to avoid removing the current working directory of our parent process; towards this end, introduce a new variable, startup_info->original_cwd, that tracks the current working directory that we inherited from our parent process. For convenience of later comparisons, we prefer that this new variable store a path relative to the toplevel working directory (thus much like 'prefix'), except without the trailing slash. Subsequent commits will make use of this new variable. Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-29reset: make sparse-aware (except --mixed)Victoria Dye
Remove `ensure_full_index` guard on `prime_cache_tree` and update `prime_cache_tree_rec` to correctly reconstruct sparse directory entries in the cache tree. While processing a tree's entries, `prime_cache_tree_rec` must determine whether a directory entry is sparse or not by searching for it in the index (*without* expanding the index). If a matching sparse directory index entry is found, no subtrees are added to the cache tree entry and the entry count is set to 1 (representing the sparse directory itself). Otherwise, the tree is assumed to not be sparse and its subtrees are recursively added to the cache tree. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-04strbuf_addftime(): handle "%s" manuallyJeff King
The strftime() function has a non-standard "%s" extension, which prints the number of seconds since the epoch. But the "struct tm" we get has already been adjusted for a particular time zone; going back to an epoch time requires knowing that zone offset. Since strftime() doesn't take such an argument, round-tripping to a "struct tm" and back to the "%s" format may produce the wrong value (off by tz_offset seconds). Since we're already passing in the zone offset courtesy of c3fbf81a85 (strbuf: let strbuf_addftime handle %z and %Z itself, 2017-06-15), we can use that same value to adjust our epoch seconds accordingly. Note that the description above makes it sound like strftime()'s "%s" is useless (and really, the issue is shared by mktime(), which is what strftime() would use under the hood). But it gets the two cases for which it's designed correct: - the result of gmtime() will have a zero offset, so no adjustment is necessary - the result of localtime() will be offset by the local zone offset, and mktime() and strftime() are defined to assume this offset when converting back (there's actually some magic here; some implementations record this in the "struct tm", but we can't portably access or manipulate it. But they somehow "know" whether a "struct tm" is from gmtime() or localtime()). This latter point means that "format-local:%s" actually works correctly already, because in that case we rely on the system routines due to 6eced3ec5e (date: use localtime() for "-local" time formats, 2017-06-15). Our problem comes when trying to show times in the author's zone, as the system routines provide no mechanism for converting in non-local zones. So in those cases we have a "struct tm" that came from gmtime(), but has been manipulated according to our offset. The tests cover the broken round-trip by formatting "%s" for a time in a non-system timezone. We use the made-up "+1234" here, which has two advantages. One, we know it won't ever be the real system zone (and so we're actually testing a case that would break). And two, since it has a minute component, we're testing the full decoding of the +HHMM zone into a number of seconds. Likewise, we test the "-1234" variant to make sure there aren't any sign mistakes. There's one final test, which covers "format-local:%s". As noted, this already passes, but it's important to check that we didn't regress this case. In particular, the caller in show_date() is relying on localtime() to have done the zone adjustment, independent of any tz_offset we compute ourselves. These should match up, since our local_tzoffset() is likewise built around localtime(). But it would be easy for a caller to forget to pass in a correct tz_offset to strbuf_addftime(). Fortunately show_date() does this correctly (it has to because of the existing handling of %z), and the test continues to pass. So this one is just future-proofing against a change in our assumptions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-29tests: disable fsync everywhereEric Wong
The "GIT_TEST_FSYNC" environment variable now exists for disabling fsync() even on packfiles and other "critical" data. Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system on an HDD, test runtime drops from ~4 minutes down to ~3 minutes. Using "GIT_TEST_FSYNC=1" re-enables fsync() for comparison purposes. SVN interopability tests are minimally affected since SVN will still use fsync in various places. This will also be useful for 3rd-party tools which create throwaway git repositories of temporary data, but remains undocumented for end users. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-26Merge branch 'ab/fix-commit-error-message-upon-unwritable-object-store'Junio C Hamano
"git commit" gave duplicated error message when the object store was unwritable, which has been corrected. * ab/fix-commit-error-message-upon-unwritable-object-store: commit: fix duplication regression in permission error output unwritable tests: assert exact error output
2021-10-26Merge branch 'ab/fsck-unexpected-type'Junio C Hamano
"git fsck" has been taught to report mismatch between expected and actual types of an object better. * ab/fsck-unexpected-type: fsck: report invalid object type-path combinations fsck: don't hard die on invalid object types object-file.c: stop dying in parse_loose_header() object-file.c: return ULHR_TOO_LONG on "header too long" object-file.c: use "enum" return type for unpack_loose_header() object-file.c: simplify unpack_loose_short_header() object-file.c: make parse_loose_header_extended() public object-file.c: return -1, not "status" from unpack_loose_header() object-file.c: don't set "typep" when returning non-zero cat-file tests: test for current --allow-unknown-type behavior cat-file tests: add corrupt loose object test cat-file tests: test for missing/bogus object with -t, -s and -p cat-file tests: move bogus_* variable declarations earlier fsck tests: test for garbage appended to a loose object fsck tests: test current hash/type mismatch behavior fsck tests: refactor one test to use a sub-repo fsck tests: add test for fsck-ing an unknown type