[/============================================================================== Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ===============================================================================/] [section:overview AFIO single page cheat sheet for the very impatient] '''''' * AFIO models file descriptors/handles as a reference counted `std::shared_ptr<__afio_handle__>` (aliased as `afio::handle_ptr`) where `handle` is an opaque abstract base class. `handle` may, or may not, refer to a currently open file descriptor/handle i.e. it is possible to close it before the reference count reaches zero. You can create your own custom implementations of `handle`. * AFIO models a filesystem as a reference counted `std::shared_ptr<__afio_dispatcher__>` (aliased as `afio::dispatcher_ptr`) where `dispatcher` is also an opaque abstract base class. When constructed using __afio_make_dispatcher__ you specify a URI such as [*`__fileurl__`] which specifies what namespace the dispatcher will dispatch onto. You can create your own custom `dispatcher` implementations for some URI regex pattern too. * AFIO provides its own custom future type `afio::__afio_op__` which [*always] represents a future `handle_ptr` plus an optional `T` which is usually `void` (hence defaulting `T` to `void`). This custom future type is implemented using __boost_outcome__'s lightweight monadic future factory toolkit, and therefore provides all the C++ 1z Concurrency TS and __boost_thread__ extensions including continuations and wait composure. Note that lightweight futures can carry an `error_code` as well as an `exception_ptr`, and therefore much of the need for non-throwing `error_code` taking API overloads is not needed (though any function not returning a future still uses that idiom). * AFIO provides its own custom filesystem path type `afio::__afio_path__` which is a thin wrap of your STL/Boost Filesystem path. This is done because on Microsoft Windows, AFIO [*always] uses NT kernel paths, never Win32 paths. By skipping the Win32 translation layer one achieves significantly better performance and much closer to POSIX semantics including case sensitivity, plus there are no issues with path length limits. On Windows `afio::path` (NT kernel paths) will usually auto convert to and from `filesystem::path` (Win32 paths), however do note there is not always a one to one mapping from a NT kernel path to a Win32 path in which case the Win32 paths to send in will never be the Win32 paths you get out. See the documentation for __afio_path__. [caution Until [@https://github.com/BoostGSoC13/boost.afio/issues/79 issue #79] is resolved, don't do directory enumerations from an x86 binary on a x64 Windows system. This is due to a bug in Microsoft's WOW64 syscall translation layer which Microsoft have decided is wontfix.] * Error reporting [*always] follows the ["earliest possible reporting] rule. As absolutely everything is asynchronous in AFIO, that also means that error and exception handling also occurs asynchronously. So if you try to use a future as an input and that future is errored at the point of use, the errored state is immediately rethrown. As futures may become errored asynchronously, that means you must always write code which assumes any API which consumes a future can throw (unless you know for a fact the future is ready and not errored). [note Gathering error states from many futures at once is made easier using the `when_all_p()` (when all propagating) extended wait composure function from __boost_outcome__ __dash__ this propagates any error in any of the entry futures into the composed output future, thus saving you having to check the futures for errors by hand.] * Handles to directories may ignore requests to close them i.e. `handle_ptr->close()` is a no-op. This is due to a performance feature called ["directory handle caching] which shares read-only directory handles amongst all users unless specifically requested not to do so. You can test for this using the `available_to_directory_cache()` member function of `handle`. * As a general rule, AFIO spots when you try to delete a currently open file handle where that is illegal on your platform (Microsoft Windows) and will instead rename the file to a cryptographically strong random name of the form `<32 chars of random hex>.afiod` and ask the operating system to delete it later when the last handle is closed. Any enumerations of directories containing these randomised named files will omit those entries by default. This emulates POSIX semantics, albeit imperfectly. One caveat is it cannot always do this with directories, though it tries hard to try again later where it can. A future version of AFIO will add `tmpfs` support which will let us relocate troublesome directories into the temp directory where they can be deleted much later, thus more closely matching POSIX semantics still further. __boost_afio__ version 1.4 provides the following operations: [table:supported_operations [[Operation][Asynchronous batch `__afio_dispatcher__` functions][Synchronous `__afio_handle__` functions][Asynchronous free functions][Synchronous free functions][Related types]] [[[link afio.reference.functions.dir Open/create a directory: `dir()`]] [1] [] [2] [4] [__afio_path_req__]] [[[link afio.reference.functions.rmdir Delete a directory: `rmdir()`]] [1] [] [2] [4] [__afio_path_req__]] [[[link afio.reference.functions.file Open/create a file: `file()`]] [1] [] [2] [4] [__afio_path_req__]] [[[link afio.reference.functions.rmfile Delete a file: `rmfile()`]] [1] [] [2] [4] [__afio_path_req__]] [[[link afio.reference.functions.symlink Open/create a symlink: `symlink()`]] [1] [] [2] [4] [__afio_path_req__]] [[[link afio.reference.functions.rmsymlink Delete a symlink: `rmsymlink()`]] [1] [] [2] [4] [__afio_path_req__]] [[[link afio.reference.functions.sync Synchronise changes to physical storage: `sync()`]] [1] [] [1] [2] []] [[[link afio.reference.functions.zero Deallocate/zero physical storage: `zero()`]] [1] [] [1] [2] []] [[[link afio.reference.functions.close Close a fd/handle: `close()`]] [1] [] [1] [2] []] [[[link afio.reference.functions.read Scatter read file contents: `read()`]] [1] [] [2] [4] [__afio_io_req__]] [[[link afio.reference.functions.write Gather write file contents: `write()`]] [1] [] [2] [4] [__afio_io_req__]] [[[link afio.reference.functions.truncate Set maximum file extent: `truncate()`]] [1] [] [1] [2] []] [[[link afio.reference.functions.enumerate Enumerate directory contents/ fetch file metadata: `enumerate()`]] [1] [] [3] [6] [__afio_enumerate_req__, __afio_directory_entry__, __afio_stat_t__]] [[[link afio.reference.functions.extents Enumerate file physical storage extents: `extents()`]] [1] [] [1] [2] []] [[[link afio.reference.functions.statfs Examine mounted storage volume: `statfs()`]] [1] [] [1] [2] [__afio_statfs_t__]] [[[link afio.reference.classes.handle.path Get current path of an open fd/handle even if other processes are renaming it (Linux, Windows; FreeBSD directories only; not OS X): `path()`]] [] [2] [] [] [__afio_path__]] [[[link afio.reference.classes.handle.direntry Get open fd/handle metadata: `direntry()`, `lstat()`]][] [2] [] [] [__afio_directory_entry__, __afio_stat_t__]] [[[link afio.reference.classes.handle.target Get target path of a symlink: `target()`]] [] [1] [] [] []] [[[link afio.reference.classes.handle Map a read-only file into memory: `try_mapfile()`]] [] [1] [] [] []] [[[link afio.reference.classes.handle.link Hard link an open fd/handle to a new path: `link()`]][] [1] [] [] [__afio_path_req__]] [[[link afio.reference.classes.handle.unlink Unlink an open fd/handle from its existing path (caution!): `unlink()`]] [] [1] [] [] [__afio_path_req__]] [[[link afio.reference.classes.handle.atomic_relink ['Strong guaranteed atomically] relink an open fd/handle from its existing path to a different path: `atomic_relink()`]] [] [1] [] [] [__afio_path_req__]] ] Planned new operations coming next version of AFIO so you know what's coming next: * Mutual exclusion and locking which works across NFS and Samba: * The ability to tag many `handle`'s with what locking to do per operation. * Asynchronous exclusive lock files. * Asynchronous multi-file byte range locking. * Individual file change monitoring (needed to implement locking). * [*`tmp://`] URI backend for a temporary file filesystem. * Optional inline hashing of file reads and writes with SpookyHash/Blake2b/SHA256. * Maybe Boost.Fiber support to gain coroutines on all compilers. '''''' [endsect] [/overview] [section:design Design Introduction and Rationale] '''''' __boost_afio__ came about out of the need for a scalable, high performance, portable asynchronous file i/o and filesystem implementation library for a forthcoming filing system based graph store ACID compliant transactional persistence layer called __triplegit__ __dash__ call it a ["SQLite3 but for graphstores][footnote The [@http://unqlite.org/ UnQLite embedded NoSQL database engine] is exactly one of those of course. Unfortunately I intend TripleGit for implementing portable Component Objects for C++ extending C++ Modules, which means I need a database engine suitable for incorporation into a dynamic linker, which unfortunately is not quite UnQLite.]. The fact that a portable asynchronous file i/o and filesystem library for C++ was needed at all came as a bit of a surprise: one thinks of these things as done and dusted decades ago, but it turns out that the fully featured [@http://nikhilm.github.io/uvbook/filesystem.html libuv], a C library, is good enough for most people needing portable asynchronous file i/o. However as great as libuv is, it isn't very C++-ish, and hooking it in with __boost_asio__ (parts of which are expected to enter the ISO C++ language standard) isn't particularly clean. I therefore resolved to write a native Boost asynchronous file i/o and filesystem implementation, and keep it as simple as possible. [heading A quick version history] AFIO started life as a C++ 0x library written for an early Visual Studio 2013 Community Preview back in 2012 as a outside-of-work side project when I was working at BlackBerry. It was ported to Boost during Google Summer of Code 2013 with the help of student Paul Kirth, and VS2012 and VS2010 support was added. For v1.0, AFIO used a simple dispatch engine which kept the extant ops in a hash table, and the entire dispatch engine was protected by a single giant and recursive mutex. Performance never exceeded about 150k ops/sec maximum on a four core Intel Ivy Bridge CPU. That performance was embarrassing, so for v1.1 the entire engine was rewritten using atomic shared pointers to be completely lock free, and very nearly wait free if it weren't for the thin spin locks around the central ops hash table. Now performance can reach 1.5m ops/sec on a four core Intel Ivy Bridge CPU, or more than half of Boost.ASIO's maximum dispatch rate. For the v1.2 engine, another large refactor was done, this time to substantially simplify the engine by removing the use of `std::packaged_task<>` completely, replacing it with a new intrusive-capable `enqueued_task<>` which permits the engine to early out in many cases, plus allowing the consolidation of all spinlocked points down to just two: one in dispatch, and one other in completion, which is now optimal. Performance of the v1.2 engine rose by about 20% over the v1.1 engine, plus AFIO is now fully clean on all race detecting tools. For the v1.3 engine, yet another large refactor was done, though not for performance but rather to make it much easier to maintain AFIO in the future, especially after acceptance into Boost whereupon one cannot arbitrarily break API anymore, and one must maintain backwards compatibility. To this end the dependencies between AFIO and Boost were completely abstracted into a substitutable symbol aliasing layer such that any combination of Boost and C++ 11 STL threading/chrono, filesystem and networking can be selected externally using macros. Indeed, any of the eight build combinations can coexist in the same translation unit too, I have unit test runs which prove it! With the v1.3 engine AFIO optionally no longer needs Boost at all, not even for its unit testing. However the cost was dropping support for all Visual Studios before 2013 and all GCCs before 4.7 as they don't have the template aliasing support needed to implement the STL abstraction layer. A very large amount of legacy cruft code e.g. support for non-variadic templates was cleaned out for the v1.3 release. [heading This version 1.4 of AFIO] During ACCU and C++ Now 2015 I spoke with a number of ISO WG21 committee members about the structural design problems in iostreams and the Filesystem TS (lack of filesystem race freedom, lack of context dependant filesystem), and what design the committee would prefer to see to fix those problems. As it happens, we were all close to the same page, so from the v1.4 engine onwards I resolved to refactor the AFIO API thusly: # Since v1.0 AFIO implemented the Concurrency TS continuations atop standard futures using a central hash table to keep the continuations information per future. This had the big advantage that standard STL and Boost futures could be used, but it came with many other problems, mainly performance and code complexity related. An additional problem was that the Concurrency TS had moved on considerably since 2012, and AFIO's emulation was now significantly out of date. For v1.4 a new lightweight future-promise factory toolkit library called __boost_outcome__ was written between the end of C++ Now (May) and the beginning of the AFIO peer review (July) which makes easy the writing of arbitrarily customisable future-promises for C++ which: * Implements an almost complete Concurrency TS ([@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4399.html N4399]) future-promise specification with almost all the Boost.Thread future-promise extensions. * Are very considerably more performant (2x-3x) and much more reliably low latency (no memory allocation). * Permit arbitrary wait composure of any kind of custom future-promise with one another. * Are part of a general purpose lightweight C++ monadic programming implementation, so futures are merely asynchronous monads. * Natively support C++ 1z coroutines ([@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4499.pdf N4499]) which are currently only supported by Visual Studio 2015. The use of this future factory toolkit makes the AFIO continuations infrastructure redundant, and it will therefore be removed shortly. The monadic programming library also makes quite a bit of internal AFIO implementation code much more simplified as thanks to the monads, one can use noexcept design throughout and therefore skip dealing directly with exception safety as the monads take away the potential of control flow being reversed by an exception throw. [note This version of AFIO being presented for Boost review does not yet make use of lightweight future-promises, and instead mocks up the eventual API using the existing highly mature and very well tested engine. The API presented is expected to be final, except for the very few items specified as deprecated (see below for a list). This has been done in order to test that the engine rewrite based on lightweight future-promise exactly matches the behaviour of the current engine using an identical unit test suite. I should emphasise that I expect any programs written to match the presented API to continue to work after the engine rewrite __dash__ after all the internal unit test suite will do so.] # For race free filesystem programming, you really need to base all path related operations on an open file descriptor or handle, so something like: `` afio::handle_ptr &h; // Some open file handle // Create a sibling file to h->path() race free afio::handle_ptr newfileh=afio::file(h, "../newfile", afio::file_flags::create); // Asynchronously create a sibling file to h->path() race free // afio::future has type void because afio::future *always* carries a shared pointer to a handle // (it only gets a type when the operation returns more than a file handle) afio::future newfilefh2=afio::async_file(h, "../newfile", afio::file_flags::create); // Wait for the asynchronous file creation to complete, rethrowing any error or exception afio::handle_ptr newfileh2=newfilefh2.get_handle(); `` I'll admit this design isn't quite what the members of WG21 had in mind, especially the notion that `afio::future` with type void always carries a shared pointer to a handle and that implicit type slicing from `afio::future` to `afio::future` is not just allowed but absolutely essential. However apart from that, the above API is probably quite close to what members of WG21 were thinking. # One oft-observed design limitation in the Filesystem TS is that it cannot support filesystems in filesystems, with the most classic example being a ZIP archive on a filesystem where it might be nice to allow generic C++ filesystem code to not need to be aware that the filesystem it sees is inside a ZIP archive. The solution could be one or both of these options: # Make the Filesystem TS operations hang as virtual member functions off some `filesystem` abstract base class. # Have a thread local variable set the current `filesystem` instance to be used by the global Filesystem TS functions on that thread. AFIO v1.4 and probably v1.5 won't implement this as this is really a thing for Boost.Filesystem to do. However, AFIO's __afio_make_dispatcher__ already takes a URI and there is a RAII facility for setting the current thread local dispatcher, so AFIO is ready for a Filesystem TS implementation matching the above design to be written on top of it in that a `dispatcher` instance has a suite of virtual member functions which define what some filesystem is or does. AFIO v1.4 only provides POSIX and NT kernel filesystem backends currently, however it is expected that v1.5 will add a new temporary filesystem backend which lets programs portably work inside `tmpfs` in whichever form that takes across Linux, FreeBSD, Apple OS X and Microsoft Windows. Additional backends implementing say a ZIP archive filesystem are similarly easy to add on. As mentioned above, note that due to the above refactoring some parts of this v1.4 release of AFIO are deprecated and are expected to be removed shortly. You can find a list of these shortly to be removed APIs and parts [link afio.release_notes here]. The list is not long, and the removals are obvious. '''''' [section:acid_write_ordering Write ordering constraints, and how these can be used to achieve some of the Durability in ACID without needing `fsync()`] '''''' [*ACID] stands for Atomic Consistent Isolated Durable, and it is a key requirement for database implementations which manage data you don't want to lose, which tends to be a common use case in databases. Whilst achieving all four is non-trivial, achieving Durability is simultaneously both the easiest and the hardest of the four. This is because the easy way of ensuring durability is to always wait for each write to reach non-volatile storage, yet such a naive solution is typically exceptionally slow. Achieving performant data durability is without doubt a wicked hard problem in computer science. Because a majority of users of __boost_afio__ are going to be people needing some form of data persistence guarantees, this section is a short essay on the current state of data persistence on various popular platforms. Any errors or omissions, both of which are probable, are entirely the fault of this author Niall Douglas. Note also that the forthcoming information was probably correct around the winter of 2014, and it highly likely to become less correct over time as filing system implementations evolve. '''''' [section:background Background on how filing systems work] '''''' Filing system implementations traditionally offer three methods of ensuring that writes have reached non-volatile storage: # The family of `fsync()` or its equivalent functions, which flush any cached written data not yet stored onto non-volatile storage. These are usually synchronous operations, in that they do not return until they have finished. A big caveat with these functions is that some filing systems e.g. ext3 flush ['every] bit of pending write data for the filing system instead of just the pending writes for the file handle specified i.e. they are equivalent to a synchronous `sync()` as described below. # The family of `O_SYNC` or its equivalent per file handle flags, which simply disable any form of write back caching. These usually make all data write functions not return until written data has reached non-volatile storage. This flag, for all intents and purposes, effectively asks for ["old fashioned] filing system behaviour from before when filing systems tried to be clever by not actually writing changes when a program writes changes. # The whole filing system cached written data flush, often performed by a function like `sync()`. Unlike the previous two, this is usually an asynchronous operation and there is usually no portable way of knowing when it has completed. Nevertheless, it is important because on traditional Unix implementations data persistence is simply `sync()` on a regular period cronjob, and while modern Unix implementations usually no longer do this, the end implementation has not fundamentally changed much[footnote The main change is that individual writes get an individual lifetime before they must be written to storage rather flushing everything according to some external wall clock.]. There is also the matter of the difference between data and ['meta]data: metadata is the stuff a filing system stores such that it knows about your data. For each of the first two of the above three families of functions, most systems provide three variants: flush metadata, flush data, and flush both metadata and data, so for clarity: [table:data_persistence_types Mechanisms for enforcing data persistence onto physical storage [[][Flush file metadata][Flush file data][Flush both metadata and data]] [[Once off][`fsync(parentdir_fd)`][`fdatasync(fd)`][`fsync(fd)`]] [[Always][Varies[footnote Many filing systems (NTFS, HFS+, ext3/4 with `data=ordered`) keep back a metadata flush until when a file handle close causes data to finish reaching physical storage. This ensures that file entries don't appear in directories with zero sizes.]][`fcntl(fd, F_SETFL, O_DSYNC)`][`fcntl(fd, F_SETFL, O_SYNC)`]] ] In addition to manually flushing data to physical storage, every filing system also implements some form of timer based flush whereby a piece of written data will always be sent to physical storage within some predefined period of time after the write. Most filing systems implement different timeouts for metadata and data, but typically on almost all production filing systems __dash__ unless they are in a power-saving laptop mode -- any data write is guaranteed to be sent to non-volatile storage within one minute. Let me be clear here for later argument's sake: ['the filing system is allowed to reorder writes by up to one minute in time from the order in which they were issued]. Or put another way, most filing systems have a one minute temporal constraint on write order. Most people think of `fsync()`, `O_SYNC` and `sync()` in terms of flushing caches. An alternative way of thinking about them is that they ['impose an order on writes] to non-volatile storage which acts above and beyond the timeout based write order. There is no doubt that they are a very crude and highly inefficient way of doing so because they are all or nothing, but they do open the option of ['emulating] native filing system support for write ordering constraints when nothing else better is available. So why is the ability to constrain write ordering important? '''''' [endsect] [/background] [section:write_ordering_data Write ordering data and durability: why does it matter?] '''''' [note You may find the paper presented at C++ Now 2014 [@http://arxiv.org/abs/1405.3323 ["Large Code Base Change Ripple Management in C++: My thoughts on how a new Boost C++ Library could help]] of interest here.] Implementing performant Durability essentially reduces down to answering two questions: (i) how long does it take to restore a consistent state after an unexpected power loss, and (ii) how much of your most recent data are you willing to lose? AFIO has been designed and written as the asynchronous file i/o portability layer for the forthcoming direct filing system graphstore database __triplegit__ which, like as with ZFS, implements late Durability i.e. you are guaranteed that your writes from some wall clock distance from now can never be lost. As discussing how TripleGit will use AFIO is probably useful to many others, that is what the remainder of this section does. __triplegit__ will achieve the Consistent and Isolated parts of being a reliable database by placing abortable, garbage collectable concurrent writes of new data into separate files, and pushing the atomicity enforcement into a very small piece of ordering logic in order to reduce transaction contention by multiple writers as much as possible. If you wish to never lose most recent data, to implement a transaction one (i) writes one's data to the filing system, (ii) ensure it has reached non-volatile storage, (iii) appends the knowledge it definitely is on non-volatile storage to the intent log, and then (iv) ensure one's append also has reached non-volatile storage. This is presently the only way to ensure that valuable data definitely is never lost on any filing system that I know of. The obvious problem is that this method involves writing all your data with `O_SYNC` and using `fsync()` on the intent log. This might perform okay with a single writer, but with multiple writers performance is usually awful, especially on storage incapable of high queue depths and potentially many hundreds of milliseconds of latency (e.g. SD Cards). Despite the performance issues, there are many valid use cases for especially precious data, and TripleGit of course will offer such a facility, at both the per-graph and per-update levels. __triplegit__'s normal persistence strategy is a bit more clever: write all your data, but keep a hash like a SHA of its contents as you write it[footnote TripleGit actually uses a different, much faster 256 bit 3 cycles/byte cryptographic hash called [@https://blake2.net/ Blake2] by default, but one can force use of SHA256/512 on a per-graph basis, or indeed if your CPU has SHA hardware offload instructions these may be used by default.]. When you write your intent log, atomically append all the SHAs of the items you just wrote and skip `O_DATA` and `fsync()` completely. If power gets removed before all the data is written to non-volatile storage, you can figure out that the database is dirty easily enough, and you simply parse from the end of the intent log backwards, checking each item's contents to ensure their SHAs match up, throwing away any transaction where any file is missing or any file's contents don't match. On a filing system such as ext4 where data is guaranteed to be sent to non-volatile storage after one minute[footnote This is the default, and it may be changed by a system e.g. I have seen thirty minutes set for laptops. Note that the Linux-specific call `syncfs()` lets one artifically schedule whole filing system flushes.], and of course so long as you don't mind losing up to one minute's worth of data, this solution can have much better performance than the previous solution with lots of simultaneous writers. The problem though is that while better, performance is still far less than optimal. Firstly, you have to calculate a whole load of hashes all the time, and that isn't trivial especially on lower end platforms like a mobile phone where 25-30 cycles per byte SHA256 performance might be typical. Secondly, dirty database reconstruction is rather like how ext2 had to call `fsck` on boot: a whole load of time and i/o must be expended to fix up damage in the store, and while it's running one generally must wait. What would be really, really useful is if the filing system exposed its internal write ordering constraint implementation to user mode code, so one could say ["schedule writing A, B, C and D in any order whenever you get round to it, but you ['must] write all of those before you write any of E, F and G]. Such an ability gives maximum scope to the filing system to reorder and coalesce writes as it sees fit, but still allows database implementations to ensure that a transaction's intent log entry can never appear without all the data it refers to. Such an ability would eliminate the need for expensive dirty database checking and reconstruction, or the need for any journalling infrastructure used to skip the manual integrity checking. Unfortunately I know of no filing system which makes publicly available such a facility. The closest that I know of is ZFS which internally uses a concept of transaction groups which are, for all intents and purposes, partial whole filing system snapshots issued once every five seconds. Data writes may be reordered into any order within a transaction group, but transaction group commits are strongly tied to the wall clock and are [*always] committed sequentially. Since the addition of the ZFS Write Throttle, the default settings are to accept new writes as fast as RAM can handle, buffering up to thirty wall clock seconds of writes before pacing the acceptance of new write data to match the speed of the non-volatile storage (which may be a ZFS Intent Log (ZIL) device if you're doing synchronous writes). This implies up to thirty seconds of buffered data could be lost, but note that ZFS still guarantees transaction group sequential write order. Therefore, what ZFS is in fact guaranteeing is this: ["we may reorder your write by up to five seconds away from the sequence in which you wrote it and other writes surrounding it. Other than that, we guarantee the order in which you write is the order in which that data reaches physical storage.] [footnote Source: [@http://www.c0t0d0s0.org/archives/5343-Insights-into-ZFS-today-The-nature-of-writing-things.html]] What this means is this: on ZFS, TripleGit can turn off all write synchronisation and replace it with a five second delay between writing new data and updating the intent log, and in so doing guaranteeing that the intent log's contents will [*always] refer to data definitely on storage (or rather, close enough that one need not perform a lot of repair work on first use after power loss). One can additionally skip SHA hashing on reads because ZFS guarantees file and metadata will always match and as TripleGit always copy on writes data, either a copy's length matches the intent log's or it doesn't (i.e. the file's length as reported by the filing system really is how much true data it contains), plus the file modified timestamp always reflects the actual last modifed timestamp of the data. Note that ext3 and ext4 can also guarantee that file and metadata will always match using the (IOPS expensive) mount option `data=journal`, which can be detected from `/proc/mounts`. If combined with the proprietary Linux call `syncfs()`, one can reasonably emulate ZFS's behaviour, albeit rather inefficiently. Another option is to have an idle thread issue fsync for writes in the order they were issued after some timeout period, thus making sure that writes definitely will reach physical storage within some given timeout and in their order of issue __dash__ this can be used to emulate the ZFS wall clock based write order consistency guarantees. [note You may find the tutorial of interest which implements an ACID transactional key-value store using the theory in this section.] Sadly, most use of __triplegit__ and __boost_afio__ will be without the luxury of ZFS, so here is a quick table of power loss data safety. Once again, I reiterate that errors and omissions are my fault alone. [include power_loss_safety_table.qbk] '''''' [endsect] [/write_ordering_data] [endsect] [/acid_write_ordering] [endsect] [/ end of section Design Rationale]