Age | Commit message (Collapse) | Author |
|
When reading ref records of type "val1", we store its object ID in an
allocated array. This results in an additional allocation for every
single ref record we read, which is rather inefficient especially when
iterating over refs.
Refactor the code to instead use an embedded array of `GIT_MAX_RAWSZ`
bytes. While this means that `struct ref_record` is bigger now, we
typically do not store all refs in an array anyway and instead only
handle a limited number of records at the same point in time.
Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:
HEAP SUMMARY:
in use at exit: 21,098 bytes in 192 blocks
total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 21,098 bytes in 192 blocks
total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We're about to convert reftable records to stop storing their object IDs
as allocated hashes. Prepare for this refactoring by constifying some
parts of the interface that will be impacted by this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each reftable may contain multiple types of blocks for refs, objects and
reflog records, where each of these may have an index that makes it more
efficient to find the records. It was observed that the index for log
records can become corrupted under certain circumstances, where the
first entry of the index points into the object index instead of to the
log records.
As it turns out, this corruption can occur whenever we write a log index
as well as at least one additional index. Writing records and their index
is basically a two-step process:
1. We write all blocks for the corresponding record. Each block that
gets written is added to a list of blocks to index.
2. Once all blocks were written we finish the section. If at least two
blocks have been added to the list of blocks to index then we will
now write the index for those blocks and flush it, as well.
When we have a very large number of blocks then we may decide to write a
multi-level index, which is why we also keep track of the list of the
index blocks in the same way as we previously kept track of the blocks
to index.
Now when we have finished writing all index blocks we clear the index
and flush the last block to disk. This is done in the wrong order though
because flushing the block to disk will re-add it to the list of blocks
to be indexed. The result is that the next section we are about to write
will have an entry in the list of blocks to index that points to the
last block of the preceding section's index, which will corrupt the log
index.
Fix this corruption by clearing the index after having written the last
block.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 5c086453ff (reftable/stack: perform auto-compaction with
transactional interface, 2023-12-11), we fixed a bug where the
transactional interface to add changes to a reftable stack did not
perform auto-compaction by calling `reftable_stack_auto_compact()` in
`reftable_stack_addition_commit()`. While correct, this change may now
cause us to perform auto-compaction twice in the non-transactional
interface `reftable_stack_add()`:
- It performs auto-compaction by itself.
- It now transitively performs auto-compaction via the transactional
interface.
Remove the first instance so that we only end up doing auto-compaction
once.
Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In order to compact multiple stacks we iterate through the merged ref
and log records. When there is any error either when reading the records
from the old merged table or when writing the records to the new table
then we break out of the respective loops. When breaking out of the loop
for the ref records though the error code will be overwritten, which may
cause us to inadvertently skip over bad ref records. In the worst case,
this can lead to a compacted stack that is missing records.
Fix the code by using `goto done` instead so that any potential error
codes are properly returned to the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To compute the expected on-disk size of packed objects, we sort the
output of show-index by pack offset and then compute the difference
between adjacent entries using awk. This works but has a few readability
problems:
1. Reading the index in pack order means don't find out the size of an
oid's entry until we see the _next_ entry. So we have to save it to
print later.
We can instead iterate in reverse order, so we compute each oid's
size as we see it.
2. Since the awk invocation is inside a text_expect block, we can't
easily use single-quotes to hold the script. So we use
double-quotes, but then have to escape the dollar signs in the awk
script.
We can swap this out for a shell loop instead (which is made much
easier by the first change).
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Assorted changes around pseudoref handling.
* ps/pseudo-refs:
bisect: consistently write BISECT_EXPECTED_REV via the refdb
refs: complete list of special refs
refs: propagate errno when reading special refs fails
wt-status: read HEAD and ORIG_HEAD via the refdb
|
|
Doc updates to clarify what an "unborn branch" means.
* jc/orphan-unborn:
orphan/unborn: fix use of 'orphan' in end-user facing messages
orphan/unborn: add to the glossary and use them consistently
|
|
"git status" is taught to show both the branch being bisected and
being rebased when both are in effect at the same time.
* rj/status-bisect-while-rebase:
status: fix branch shown when not only bisecting
|
|
Code clean-up.
* la/trailer-cleanups:
trailer: use offsets for trailer_start/trailer_end
trailer: find the end of the log message
commit: ignore_non_trailer computes number of bytes to ignore
|
|
Code clean-up.
* jc/retire-cas-opt-name-constant:
remote.h: retire CAS_OPT_NAME
|
|
Code clean-up.
* rs/rebase-use-strvec-pushf:
rebase: use strvec_pushf() for format-patch revisions
|
|
Command line completion script (in contrib/) learned to work better
with the reftable backend.
* sh/completion-with-reftable:
completion: support pseudoref existence checks for reftables
completion: refactor existence checks for pseudorefs
|
|
In t9500 we're writing a custom configuration that sets up gitweb. This
requires us to manually ensure that the repository format is configured
as required, including both the repository format version and
extensions. With the introduction of the "extensions.refStorage"
extension we need to update the test to also write this new one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new `--ref-format` value flag for git-clone(1) that allows
the user to specify the ref format that is to be used for a newly
initialized repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new `--ref-format` value flag for git-init(1) that allows
the user to specify the ref format that is to be used for a newly
initialized repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new `--show-ref-format` to git-rev-parse(1) that causes it
to print the ref format used by a repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new GIT_TEST_DEFAULT_REF_FORMAT environment variable that
lets developers run the test suite with a different default ref format
without impacting the ref format used by non-test Git invocations. This
is modeled after GIT_TEST_DEFAULT_OBJECT_FORMAT, which does the same
thing for the repository's object format.
Adapt the setup of the `REFFILES` test prerequisite to be conditionally
set based on the default ref format.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new GIT_DEFAULT_REF_FORMAT environment variable that lets
users control the default ref format used by both git-init(1) and
git-clone(1). This is modeled after GIT_DEFAULT_OBJECT_FORMAT, which
does the same thing for the repository's object format.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new "extensions.refStorage" extension that allows us to
specify the ref storage format used by a repository. For now, the only
supported format is the "files" format, but this list will likely soon
be extended to also support the upcoming "reftable" format.
There have been discussions on the Git mailing list in the past around
how exactly this extension should look like. One alternative [1] that
was discussed was whether it would make sense to model the extension in
such a way that backends are arbitrarily stackable. This would allow for
a combined value of e.g. "loose,packed-refs" or "loose,reftable", which
indicates that new refs would be written via "loose" files backend and
compressed into "packed-refs" or "reftable" backends, respectively.
It is arguable though whether this flexibility and the complexity that
it brings with it is really required for now. It is not foreseeable that
there will be a proliferation of backends in the near-term future, and
the current set of existing formats and formats which are on the horizon
can easily be configured with the much simpler proposal where we have a
single value, only.
Furthermore, if we ever see that we indeed want to gain the ability to
arbitrarily stack the ref formats, then we can adapt the current
extension rather easily. Given that Git clients will refuse any unknown
value for the "extensions.refStorage" extension they would also know to
ignore a stacked "loose,packed-refs" in the future.
So let's stick with the easy proposal for the time being and wire up the
extension.
[1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The proper hash algorithm and ref storage format that will be used for a
newly initialized repository will be figured out in `init_db()` via
`validate_hash_algorithm()` and `validate_ref_storage_format()`. Until
now though, we never set up the hash algorithm or ref storage format of
`the_repository` accordingly.
There are only two callsites of `init_db()`, one in git-init(1) and one
in git-clone(1). The former function doesn't care for the formats to be
set up properly because it never access the repository after calling the
function in the first place.
For git-clone(1) it's a different story though, as we call `init_db()`
before listing remote refs. While we do indeed have the wrong hash
function in `the_repository` when `init_db()` sets up a non-default
object format for the repository, it never mattered because we adjust
the hash after learning about the remote's hash function via the listed
refs.
So the current state is correct for the hash algo, but it's not for the
ref storage format because git-clone(1) wouldn't know to set it up
properly. But instead of adjusting only the `ref_storage_format`, set
both the hash algo and the ref storage format so that `the_repository`
is in the correct state when `init_db()` exits. This is fine as we will
adjust the hash later on anyway and makes it easier to reason about the
end state of `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In order to discern which ref storage format a repository is supposed to
use we need to start setting up and/or discovering the format. This
needs to happen in two separate code paths.
- The first path is when we create a repository via `init_db()`. When
we are re-initializing a preexisting repository we need to retain
the previously used ref storage format -- if the user asked for a
different format then this indicates an error and we error out.
Otherwise we either initialize the repository with the format asked
for by the user or the default format, which currently is the
"files" backend.
- The second path is when discovering repositories, where we need to
read the config of that repository. There is not yet any way to
configure something other than the "files" backend, so we can just
blindly set the ref storage format to this backend.
Wire up this logic so that we have the ref storage format always readily
available when needed. As there is only a single backend and because it
is not configurable we cannot yet verify that this tracking works as
expected via tests, but tests will be added in subsequent commits. To
countermand this ommission now though, raise a BUG() in case the ref
storage format is not set up properly in `ref_store_init()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In order to look up ref storage backends, we're currently using a linked
list of backends, where each backend is expected to set up its `next`
pointer to the next ref storage backend. This is kind of a weird setup
as backends need to be aware of other backends without much of a reason.
Refactor the code so that the array of backends is centrally defined in
"refs.c", where each backend is now identified by an integer constant.
Expose functions to translate from those integer constants to the name
and vice versa, which will be required by subsequent patches.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When calling `git init --separate-git-dir=<new-path>` on a preexisting
repository, we move the Git directory of that repository to the new path
specified by the user. If there are worktrees present in the repository,
we need to repair the worktrees so that their gitlinks point to the new
location of the repository.
This repair logic will load repositories via `get_worktrees()`, which
will enumerate up and initialize all worktrees. Part of initialization
is logic that we resolve their respective worktree HEADs, even though
that information may not actually be needed in the end by all callers.
Although not a problem presently with the file-based reference backend,
it will become a problem with the upcoming reftable backend. In the
context of git-init(1) we do not have a fully-initialized repository set
up via `setup_git_directory()` or friends. Consequently, we do not know
about the repository format when `repair_worktrees()` is called, and
properly setting up all parts of the repositroy in `init_db()` before we
try to repair worktrees is not an easy task. With the introduction of
the reftable backend, we would ultimately try to look up the worktree
HEADs before we have figured out the reference format, which does not
work.
We do not require the worktree HEADs at all to repair worktrees. So
let's fix this issue by skipping over the step that reads them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A limited number of tests require repositories to have the default
repository format or otherwise they would fail to run, e.g. because they
fail to detect the correct hash function. While the hash function is the
only extension right now that creates problems like this, we are about
to add a second extension for the ref format.
Introduce a new DEFAULT_REPO_FORMAT prereq that can easily be amended
whenever we add new format extensions. Next to making any such changes
easier on us, the prerequisite's name should also help to clarify the
intent better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Gives all paths builtin objectmode values based on the paths' modes
(one of 100644, 100755, 120000, 040000, 160000). Users may use
this feature to filter by file types. For example a pathspec such as
':(attr:builtin_objectmode=160000)' could filter for submodules without
needing to have `builtin_objectmode=160000` to be set in .gitattributes
for every submodule path.
These values are also reflected in `git check-attr` results.
If the git_attr_direction is set to GIT_ATTR_INDEX or GIT_ATTR_CHECKIN
and a path is not found in the index, the value will be unspecified.
This patch also reserves the builtin_* attribute namespace for objectmode
and any future builtin attributes. Any user defined attributes using this
reserved namespace will result in a warning. This is a breaking change for
any existing builtin_* attributes.
Pathspecs with some builtin_* attribute name (excluding builtin_objectmode)
will behave like any attribute where there are no user specified values.
Signed-off-by: Joanna Wang <jojwang@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Use DIV_ROUND_UP in mem_pool_alloc() to round the allocation length to
the next multiple of GIT_MAX_ALIGNMENT instead of twiddling bits
explicitly. This is shorter and clearer, to the point that we no longer
need the comment that explains what's being calculated.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Memory pool allocations that require a new block and would fill at
least half of it are handled specially. Before 158dfeff3d (mem-pool:
add life cycle management functions, 2018-07-02) they used to be
allocated outside of the pool. This patch made mem_pool_alloc() create
a bespoke block instead, to allow releasing it when the pool gets
discarded.
Unfortunately mem_pool_alloc() returns a pointer to the start of such a
bespoke block, i.e. to the struct mp_block at its top. When the caller
writes to it, the management information gets corrupted. This affects
mem_pool_discard() and -- if there are no other blocks in the pool --
also mem_pool_alloc().
Return the payload pointer of bespoke blocks, just like for smaller
allocations, to protect the management struct.
Also update next_free to mark the block as full. This is only strictly
necessary for the first allocated block, because subsequent ones are
inserted after the current block and never considered for further
allocations, but it's easier to just do it in all cases.
Add a basic unit test to demonstrate the issue by using
mem_pool_calloc() with a tiny block size, which forces the creation of a
bespoke block.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git documentation does this with the exception of ancient release notes.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
GitHub wraps artifacts generated by workflows in a .zip file.
Internally, workflows can package anything they like in them.
A recently generated failure artifact had the form:
windows-artifacts.zip
Length Date Time Name
--------- ---------- ----- ----
76001695 12-19-2023 01:35 artifacts.tar.gz
11005650 12-19-2023 01:35 tracked.tar.gz
--------- -------
87007345 2 files
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
GitHub has two general forms for its states, sometimes they're a simple
colored object (e.g. green check or red x), and sometimes there's also a
colored container (e.g. green box or red circle) which contains that
object (e.g. check or x).
That's a lot of words to try to describe things, but in general, the key
for a failure is that it's recognized as an `x` and that it's associated
with the color red -- the color of course is problematic for people who
are red-green color-blind, but that's why they are paired with distinct
shapes.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Current statistics show a strong preference to only capitalize the first
letter in a hyphenated tag, but that some guidance would be helpful:
git log |
perl -ne 'next unless /^\s+(?:Signed-[oO]ff|Acked)-[bB]y:/;
s/^\s+//;s/:.*/:/;print'|
sort|uniq -c|sort -n
2 Signed-off-By:
4 Signed-Off-by:
22 Acked-By:
47 Signed-Off-By:
2202 Acked-by:
95315 Signed-off-by:
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add items with at least 100 uses in the past three years:
- Co-authored-by
- Helped-by
- Mentored-by
- Suggested-by
git log --since=3.years|
perl -ne 'next unless /^\s+[A-Z][a-z]+-\S+:/;s/^\s+//;s/:.*/:/;print'|
sort|uniq -c|sort -n|grep '[0-9][0-9] '
14 Based-on-patch-by:
14 Original-patch-by:
17 Tested-by:
100 Suggested-by:
121 Co-authored-by:
163 Mentored-by:
274 Reported-by:
290 Acked-by:
450 Helped-by:
602 Reviewed-by:
14111 Signed-off-by:
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There seems to be consensus amongst the core Git community on a working
set of common trailers, and there are non-trivial costs to people
inventing new trailers (research to discover what they mean/how they
differ from existing trailers) such that inventing new ones is generally
unwarranted and not something to be recommended to new contributors.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"What's in git.git" was last seen in 2010:
https://lore.kernel.org/git/?q=%22what%27s+in+git.git%22
https://lore.kernel.org/git/7vaavikg72.fsf@alter.siamese.dyndns.org/
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
- Match style in Release Notes
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The contents within parenthesis should be omittable without resulting
in broken text.
Eliding the parenthesis left a period to end a run without any content.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git clone" has been prepared to allow cloning a repository with
non-default hash function into a repository that uses the reftable
backend.
* ps/clone-into-reftable-repository:
builtin/clone: create the refdb with the correct object format
builtin/clone: skip reading HEAD when retrieving remote
builtin/clone: set up sparse checkout later
builtin/clone: fix bundle URIs with mismatching object formats
remote-curl: rediscover repository when fetching refs
setup: allow skipping creation of the refdb
setup: extract function to create the refdb
|
|
Test fix.
* rs/t6300-compressed-size-fix:
t6300: avoid hard-coding object sizes
|
|
"git fetch --atomic" issued an unnecessary empty error message,
which has been corrected.
* jx/fetch-atomic-error-message-fix:
fetch: no redundant error message for atomic fetch
t5574: test porcelain output of atomic fetch
|
|
Test balloon to use C99 "bool" type from <stdbool.h>.
* rs/c99-stdbool-test-balloon:
git-compat-util: convert skip_{prefix,suffix}{,_mem} to bool
|
|
Error message fix in the test framework.
* sp/test-i18ngrep:
test-lib-functions.sh: fix test_grep fail message wording
|
|
Doc update.
* jc/doc-misspelt-refs-fix:
doc: format.notes specify a ref under refs/notes/ hierarchy
|
|
Doc updates.
* jc/doc-most-refs-are-not-that-special:
docs: MERGE_AUTOSTASH is not that special
docs: AUTO_MERGE is not that special
refs.h: HEAD is not that special
git-bisect.txt: BISECT_HEAD is not that special
git.txt: HEAD is not that special
|
|
Doc update.
* es/add-doc-list-short-form-of-all-in-synopsis:
git-add.txt: add missing short option -A to synopsis
|
|
The code to parse the From e-mail header has been updated to avoid
recursion.
* jk/mailinfo-iterative-unquote-comment:
mailinfo: avoid recursion when unquoting From headers
t5100: make rfc822 comment test more careful
|
|
Test framework update.
* ps/chainlint-self-check-update:
tests: adjust whitespace in chainlint expectations
|