diff options
author | Christopher Haster <chaster@utexas.edu> | 2020-01-29 10:45:19 +0300 |
---|---|---|
committer | Christopher Haster <chaster@utexas.edu> | 2020-02-09 20:54:22 +0300 |
commit | 517d3414c5e04eedb07be2e58107c1f96b8b8684 (patch) | |
tree | cecd1e6f01b16430c8e344cde119a402e185b3ba /scripts | |
parent | aab6aa0ed939303d7788e21bb547eb0a386636fb (diff) |
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
Diffstat (limited to 'scripts')
-rwxr-xr-x | scripts/readtree.py | 23 | ||||
-rwxr-xr-x | scripts/test.py | 93 |
2 files changed, 80 insertions, 36 deletions
diff --git a/scripts/readtree.py b/scripts/readtree.py index 30e3cfc..ecfdab9 100755 --- a/scripts/readtree.py +++ b/scripts/readtree.py @@ -118,9 +118,17 @@ def main(args): superblock = None gstate = b'' mdirs = [] + cycle = False tail = (args.block1, args.block2) hard = False while True: + for m in it.chain((m for d in dirs for m in d), mdirs): + if set(m.blocks) == set(tail): + # cycle detected + cycle = m.blocks + if cycle: + break + # load mdir data = [] blocks = {} @@ -129,6 +137,7 @@ def main(args): data.append(f.read(args.block_size) .ljust(args.block_size, b'\xff')) blocks[id(data[-1])] = block + mdir = MetadataPair(data) mdir.blocks = tuple(blocks[id(p.data)] for p in mdir.pair) @@ -171,7 +180,7 @@ def main(args): # find paths dirtable = {} for dir in dirs: - dirtable[tuple(sorted(dir[0].blocks))] = dir + dirtable[frozenset(dir[0].blocks)] = dir pending = [("/", dirs[0])] while pending: @@ -183,7 +192,7 @@ def main(args): npath = tag.data.decode('utf8') dirstruct = mdir[Tag('dirstruct', tag.id, 0)] nblocks = struct.unpack('<II', dirstruct.data) - nmdir = dirtable[tuple(sorted(nblocks))] + nmdir = dirtable[frozenset(nblocks)] pending.append(((path + '/' + npath), nmdir)) except KeyError: pass @@ -243,7 +252,15 @@ def main(args): '|', line)) - return 0 if all(mdir for dir in dirs for mdir in dir) else 1 + if cycle: + print("*** cycle detected! -> {%#x, %#x} ***" % (cycle[0], cycle[1])) + + if cycle: + return 2 + elif not all(mdir for dir in dirs for mdir in dir): + return 1 + else: + return 0; if __name__ == "__main__": import argparse diff --git a/scripts/test.py b/scripts/test.py index 02801f8..3c3d692 100755 --- a/scripts/test.py +++ b/scripts/test.py @@ -182,7 +182,23 @@ class TestCase: elif args.get('no_internal', False) and self.in_ is not None: return False elif self.if_ is not None: - return eval(self.if_, None, self.defines.copy()) + if_ = self.if_ + print(if_) + while True: + for k, v in self.defines.items(): + if k in if_: + if_ = if_.replace(k, '(%s)' % v) + print(if_) + break + else: + break + if_ = ( + re.sub('(\&\&|\?)', ' and ', + re.sub('(\|\||:)', ' or ', + re.sub('!(?!=)', ' not ', if_)))) + print(if_) + print('---', eval(if_), '---') + return eval(if_) else: return True @@ -235,33 +251,37 @@ class TestCase: mpty = os.fdopen(mpty, 'r', 1) stdout = [] assert_ = None - while True: - try: - line = mpty.readline() - except OSError as e: - if e.errno == errno.EIO: - break - raise - stdout.append(line) - if args.get('verbose', False): - sys.stdout.write(line) - # intercept asserts - m = re.match( - '^{0}([^:]+):(\d+):(?:\d+:)?{0}{1}:{0}(.*)$' - .format('(?:\033\[[\d;]*.| )*', 'assert'), - line) - if m and assert_ is None: + try: + while True: try: - with open(m.group(1)) as f: - lineno = int(m.group(2)) - line = next(it.islice(f, lineno-1, None)).strip('\n') - assert_ = { - 'path': m.group(1), - 'line': line, - 'lineno': lineno, - 'message': m.group(3)} - except: - pass + line = mpty.readline() + except OSError as e: + if e.errno == errno.EIO: + break + raise + stdout.append(line) + if args.get('verbose', False): + sys.stdout.write(line) + # intercept asserts + m = re.match( + '^{0}([^:]+):(\d+):(?:\d+:)?{0}{1}:{0}(.*)$' + .format('(?:\033\[[\d;]*.| )*', 'assert'), + line) + if m and assert_ is None: + try: + with open(m.group(1)) as f: + lineno = int(m.group(2)) + line = (next(it.islice(f, lineno-1, None)) + .strip('\n')) + assert_ = { + 'path': m.group(1), + 'line': line, + 'lineno': lineno, + 'message': m.group(3)} + except: + pass + except KeyboardInterrupt: + raise TestFailure(self, 1, stdout, None) proc.wait() # did we pass? @@ -654,6 +674,10 @@ def main(**args): if filtered != sum(len(suite.perms) for suite in suites): print('filtered down to %d permutations' % filtered) + # only requested to build? + if args.get('build', False): + return 0 + print('====== testing ======') try: for suite in suites: @@ -678,18 +702,19 @@ def main(**args): perm=perm, path=perm.suite.path, lineno=perm.lineno, returncode=perm.result.returncode or 0)) if perm.result.stdout: - for line in (perm.result.stdout - if not perm.result.assert_ - else perm.result.stdout[:-1]): + if perm.result.assert_: + stdout = perm.result.stdout[:-1] + else: + stdout = perm.result.stdout + if (not args.get('verbose', False) and len(stdout) > 5): + sys.stdout.write('...\n') + for line in stdout[-5:]: sys.stdout.write(line) if perm.result.assert_: sys.stdout.write( "\033[01m{path}:{lineno}:\033[01;31massert:\033[m " "{message}\n{line}\n".format( **perm.result.assert_)) - else: - for line in perm.result.stdout: - sys.stdout.write(line) sys.stdout.write('\n') failed += 1 @@ -728,6 +753,8 @@ if __name__ == "__main__": parser.add_argument('-p', '--persist', choices=['erase', 'noerase'], nargs='?', const='erase', help="Store disk image in a file.") + parser.add_argument('-b', '--build', action='store_true', + help="Only build the tests, do not execute.") parser.add_argument('-g', '--gdb', choices=['init', 'start', 'assert'], nargs='?', const='assert', help="Drop into gdb on test failure.") |