`.
- [ ] Add some primitive which intelligently copies/moves between views and vectors.
Specifically, if resizing, if type is trivially copyable, skip memory copying during
resize via remapping.
- [ ] Add an intelligent on demand memory mapper:
- Use one-two-three level page system, so 4Kb/2Mb/1Gb. Files under 2Mb need just
one indirection.
- Page tables need to also live in a potentially mapped file
- Could speculatively map 4Kb chunks lazily and keep an internal map of 4Kb
offsets to map. This allows more optimal handing of growing files.
- WOULD BE NICE: Copy on Write support which collates a list of dirtied 4Kb
pages and could write those out as a delta.
- Perhaps Snappy compression could be useful? It is continuable from a base
if you dump out the dictionary i.e. 1Mb data compressed, then add 4Kb delta, you can
compress the additional 4Kb very quickly using the dictionary from the 1Mb.
- LATER: Use guard pages to toggle dirty flag per initial COW
- [ ] Store in EA or a file called .spookyhashes or .spookyhash the 128 bit hash of
a file and the time it was calculated. This can save lots of hashing work later.
- [ ] Correct directory hierarchy delete
i.e.:
- Delete files first tips to trunk, retrying for some given timeout. If fail to
immediately delete, rename to base directory under a long random hex name, try
to delete again.
- Only after all files have been deleted, delete directories. If new files appear
during directory deletion, loop.
- Options:
- Rename base directory(ies) to something random to atomically prevent lookup.
- Delete all or nothing (i.e. rename all files into another tree checking
permissions etc, if successful only then delete)
- [ ] Correct directory hierarchy copy
- Optional backup semantics (i.e. copy all ACLs, metadata etc)
- Intelligent retry for some given timeout before failing.
- Optional fixup of any symbolic links pointing into copied tree.
- Optional copy directory structure but hard or symbolic linking files.
- Symbolic links optionally are always absolute paths instead of relative.
- Optional deference all hard links and/or symbolic links into real files.
- [ ] Correct directory hierarchy move
- [ ] Correct directory hierarchy update (i.e. changes only)
- [ ] Make directory tree C by cloning tree B to tree B, and then updating tree C
with changes from tree A. The idea is for an incremental backup of changes over
time but saving storage where possible.
- [ ] Replace all content (including EA) duplicate files in a tree with hardlinks.
- [ ] Figure out all hard linked file entries for some inode.
- [ ] Generate list of all hard linked files in a tree (i.e. refcount>1) and which
are the same inode.
### Eventual transactional key-blob store:
- What's the least possible complex implementation based on files and directories?
- `store/index` is 48 bit counter + r/w mutex + open hash map of 128 bit key to blob
identifier (64 bits). Blob identifier is top 6 bits file id:
- 0 means large file, values 1-15 are reserved for future use (large file deltas).
- Values 16-63 are the smallfile.
1. `store/small/01-3f` for blobs <= 65512 bytes (8 bytes tail, 16 bytes key).
Each blob is padded to 64 byte
multiple and tail record with 16 bytes of key, 6 byte (48 bit) counter + 2 byte size aligned
at end + optional 16 byte hash. There are 48 of these used to maximise write concurrency
(use the thread id to select a smallfile on open, use exclusive lock probing to
figure out a small file not in use, hold shared lock on chosen small file until exit).
Remaining 58 bits of blob identifier is the offset into the smallfile of the end of
the tail record (shifted left 6 bits, all records in smallfiles are at 64 byte multiples).
2. `store/large/*` for blobs > 65512 under the assumption that one day we'll
implement 4Kb dirty delta page with compression support.
- `store/large/hexkey/48bithexcounter` stores each blob
- Last 64 bytes contains magic, size, optional hash.
Blob identifier is top 6 bits zero. Next 10 bits is 4 bits mantissa shifted left
6 bits of shift (0-63) for approx size. Remaining 48 bits is counter.
- `store/config` keeps:
- transactions enabled or not.
- mmap enable or not (i.e. can be used over network drive)
- content hash used e.g. SpookyHash
- compression used e.g. pithy
- dirty flag i.e. do fsck on next first open
- `O_SYNC` was on or not last open (affects severity of any fsck).
- shared lock kept on config so we know last user exit/first user enter
- Start a transaction = atomic increment current 48 bit counter
- Write the changes making up this transaction under this counter
- When ready, lock the open hash map's r/w mutex for exclusive access.
- Check that all the keys we are modifying have not been changed since the
transaction began.
- If all good, update all the keys with their new values and unlock the r/w mutex
QUESTION: Can forcing all map users to lock the mutex each access be avoided?
e.g. atomic swapping in a pointer to an updated table? One could COW the 4Kb pages
being changed in the current table, then update the map, then swap pointers
and leave the old table hang around for a while.
- Garbage collecting in this design is easy enough, write all current small values
into a single file, then update the map in a single shot, then hole punch all the
other small files.
- Live resizing the open hash map I think is impossible however unless we use that
atomic swapping design.
- You may need compression, https://github.com/johnezang/pithy looks easily convertible
into header-only C++ and has a snappy-like performance to compression ratio. Make sure
you merge the bug fixes from the forks first.
## Commits and tags in this git repository can be verified using:
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v2
mDMEVvMacRYJKwYBBAHaRw8BAQdAp+Qn6djfxWQYtAEvDmv4feVmGALEQH/pYpBC
llaXNQe0WE5pYWxsIERvdWdsYXMgKHMgW3VuZGVyc2NvcmVdIHNvdXJjZWZvcmdl
IHthdH0gbmVkcHJvZCBbZG90XSBjb20pIDxzcGFtdHJhcEBuZWRwcm9kLmNvbT6I
eQQTFggAIQUCVvMacQIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRCELDV4
Zvkgx4vwAP9gxeQUsp7ARMFGxfbR0xPf6fRbH+miMUg2e7rYNuHtLQD9EUoR32We
V8SjvX4r/deKniWctvCi5JccgfUwXkVzFAk=
=puFk
-----END PGP PUBLIC KEY BLOCK-----