Update the README

author: Aidan MacDonald <amachronic@protonmail.com> 2021-11-05 17:41:58 +0300
committer: Aidan MacDonald <amachronic@protonmail.com> 2021-11-05 18:21:52 +0300
commit: 4868c7d4de5114757cc56de32a577851c0f68377 (patch)
tree: 3f31bfd8bb344adf996df9ada27c1c7410261790
parent: 2dd008245d42504c12c27d9731f2e1180162fe21 (diff)
1 files changed, 224 insertions, 83 deletions
diff --git a/README.md b/README.md
index 1a663d5..8d8a02b 100644
--- a/README.md
+++ b/README.md
@@ -1,122 +1,263 @@
 # microtar
-A lightweight tar library written in ANSI C
 
+A lightweight tar library written in ANSI C.
 
-## Modifications from upstream
+This version is a fork of [rxi's microtar](https://github.com/rxi/microtar)
+with bugfixes and API changes aimed at improving usability, but still keeping
+with the minimal design of the original library.
 
-[Upstream](https://github.com/rxi/microtar) has numerous bugs and gotchas,
-which I fixed in order to improve the overall robustness of the library.
+## License
+
+This library is free software; you can redistribute it and/or modify it under
+the terms of the MIT license. See [LICENSE](LICENSE) for details.
+
+
+## Supported format variants
+
+No effort has been put into handling every tar format variant. Basically
+what is accepted is the "old-style" format, which appears to work well
+enough to access basic archives created by GNU `tar`.
 
-A summary of my changes, in no particular order:
 
-- Fix possible sscanf beyond the bounds of the input buffer
-- Fix possible buffer overruns due to strcpy on untrusted input
-- Fix incorrect octal formatting by sprintf and possible output overrruns
-- Catch read/writes which are too big and handle them gracefully
-- Handle over-long names in `mtar_write_file_header` / `mtar_write_dir_header`
-- Ensure strings in `mtar_header_t` are always null-terminated
-- Save and load group information so we don't lose information
-- Move `mtar_open()` to `microtar-stdio.c` so `microtar.c` can be used in
-  a freestanding environment
-- Allow control of stack usage by moving temporary variables into `mtar_t`,
-  so the caller can decide whether to use the stack or heap
+## Basic usage
 
-An up-to-date copy of this modified version can be found
-[here](https://github.com/amachronic/microtar).
+The library consists of two files, `microtar.c` and `microtar.h`, which only
+depend on a tiny part of the standard C library & can be easily incorporated
+into a host project's build system.
 
+The core library does not include any I/O hooks as these are supposed to be
+provided by the host application. If the C library's `fopen` and friends is
+good enough, you can use `microtar-stdio.c`.
 
-## Basic Usage
-The library consists of `microtar.c` and `microtar.h`. These two files can be
-dropped into an existing project and compiled along with it.
 
+### Initialization
+
+Initialization is very simple. Everything the library needs is contained in
+the `mtar_t` struct; there is no memory allocation and no global state. It is
+enough to zero-initialize an `mtar_t` object to put it into a "closed" state.
+You can use `mtar_is_open()` to query whether the archive is open or not.
+
+An archive can be opened for reading _or_ writing, but not both. You have to
+specify which access mode you're using when you create the archive.
 
-#### Reading
 ```c
 mtar_t tar;
-mtar_header_t h;
-char *p;
+mtar_init(&tar, MTAR_READ, my_io_ops, my_stream);
+```
 
-/* Open archive for reading */
-mtar_open(&tar, "test.tar", "r");
+Or if using `microtar-stdio.c`:
 
-/* Print all file names and sizes */
-while ( (mtar_read_header(&tar, &h)) != MTAR_ENULLRECORD ) {
-  printf("%s (%d bytes)\n", h.name, h.size);
-  mtar_next(&tar);
+```c
+int error = mtar_open(&tar, "file.tar", "rb");
+if(error) {
+    /* do something about it */
 }
+```
 
-/* Load and print contents of file "test.txt" */
-mtar_find(&tar, "test.txt", &h);
-p = calloc(1, h.size + 1);
-mtar_read_data(&tar, p, h.size);
-printf("%s", p);
-free(p);
+Note that `mtar_init()` is called for you in this case and the access mode is
+deduced from the mode flags.
 
-/* Close archive */
-mtar_close(&tar);
-```
 
-#### Writing
+### Iterating and locating files
+
+If you opened an archive for reading, you'll likely want to iterate over
+all the files. Here's the long way of doing it:
+
 ```c
 mtar_t tar;
-const char *str1 = "Hello world";
-const char *str2 = "Goodbye world";
+int err;
+
+/* Go to the start of the archive... Not necessary if you've
+ * just opened the archive and are already at the beginning.
+ * (And of course you normally want to check the return value.) */
+mtar_rewind(&tar);
+
+/* Iterate over the archive members */
+while((err = mtar_next(&tar)) == MTAR_ESUCCESS) {
+    /* Get a pointer to the current file header. It will
+     * remain valid until you move to another record with
+     * mtar_next() or call mtar_rewind() */
+    const mtar_header_t* header = mtar_get_header(&tar);
+
+    printf("%s (%d bytes)\n", header->name, header->size);
+}
+
+if(err != MTAR_ENULLRECORD) {
+    /* ENULLRECORD means we hit end of file; any
+     * other return value is an actual error. */
+}
+```
+
+There's a useful shortcut for this type of iteration which removes the
+loop boilerplate, replacing it with another kind of boilerplate that may
+be more palatable in some cases.
 
-/* Open archive for writing */
-mtar_open(&tar, "test.tar", "w");
+```c
+/* Will be called for each archive member visited by mtar_foreach().
+ * The member's header is passed in as an argument so you don't need
+ * to fetch it manually with mtar_get_header(). You can freely read
+ * data (if present) and seek around. There is no special cleanup
+ * required and it is not necessary to read to the end of the stream.
+ *
+ * The callback should return zero (= MTAR_SUCCESS) to continue the
+ * iteration or return nonzero to abort. On abort, the value returned
+ * by the callback will be returned from mtar_foreach(). Since it may
+ * also return normal microtar error codes, it is suggested to use a
+ * positive value or pass the result via 'arg'.
+ */
+int foreach_cb(mtar_t* tar, const mtar_header_t* header, void* arg)
+{
+    // ...
+    return 0;
+}
 
-/* Write strings to files `test1.txt` and `test2.txt` */
-mtar_write_file_header(&tar, "test1.txt", strlen(str1));
-mtar_write_data(&tar, str1, strlen(str1));
-mtar_write_file_header(&tar, "test2.txt", strlen(str2));
-mtar_write_data(&tar, str2, strlen(str2));
+void main()
+{
+    mtar_t tar;
 
-/* Finalize -- this needs to be the last thing done before closing */
-mtar_finalize(&tar);
+    // ...
 
-/* Close archive */
-mtar_close(&tar);
+    int ret = mtar_foreach(&tar, foreach_cb, NULL);
+    if(ret < 0) {
+        /* Microtar error codes are negative and may be returned if
+         * there is a problem with the iteration. */
+    } else if(ret == MTAR_ESUCCESS) {
+        /* If the iteration reaches the end of the archive without
+         * errors, the return code is MTAR_ESUCCESS. */
+    } else if(ret > 0) {
+        /* Positive values might be returned by the callback to
+         * signal some condition was met; they'll never be returned
+         * by microtar */
+    }
+}
 ```
 
+The other thing you're likely to do is look for a specific file:
+
+```c
+/* Seek to a specific member in the archive */
+int err = mtar_find(&tar, "foo.txt");
+if(err == MTAR_ESUCCESS) {
+    /* File was found -- read the header with mtar_get_header() */
+} else if(err == MTAR_ENOTFOUND) {
+    /* File wasn't in the archive */
+} else {
+    /* Some error occurred */
+}
+```
+
+Note this isn't terribly efficient since it scans the entire archive
+looking for the file.
+
+
+### Reading file data
+
+Once pointed at a file via `mtar_next()` or `mtar_find()` you can read the
+data with a simple POSIX-like API.
+
+- `mtar_read_data(tar, buf, count)` reads up to `count` bytes into `buf`,
+  returning the actual number of bytes read, or a negative error value.
+  If at EOF, this returns zero.
+
+- `mtar_seek_data(tar, offset, whence)` works exactly like `fseek()` with
+  `whence` being one of `SEEK_SET`, `SEEK_CUR`, or `SEEK_END` and `offset`
+  indicating a point relative to the beginning, current position, or end
+  of the file. Returns zero on success, or a negative error code.
+
+- `mtar_eof_data(tar)` returns nonzero if the end of the file has been
+  reached. It is possible to seek backward to clear this condition.
+
+
+### Writing archives
+
+If you have opened an archive for writing, your options are a bit more
+limited than with reading as you need to generate the whole archive in
+a single pass. Seeking around and rewriting previously written data is
+not allowed. Support for this wouldn't be hard to add, but it was not
+included in the interest of simplicity.
+
+The main functions are:
+
+- `mtar_write_header(tar, header)` writes out a new record. The amount
+  of data that follows is dictated by `header->size` and you will have
+  to write it out before moving to the next record.
+
+- `mtar_write_data(tar, buf, count)` will write up to `count` bytes from
+  `buf` into the current record. Returns the number of bytes actually
+  written or a negative error code. If you provide too much data, a short
+  count is returned.
+
+- `mtar_end_record(tar)` will end the current record. It will complain
+  if you did not write the correct amount data provided in the header.
+
+- `mtar_finalize(tar)` is called after you have written all records to
+  the archive. It writes out some null records which mark the end of the
+  archive, so you cannot write any more records after calling this.
+
+It isn't necessary to call `mtar_end_record()` explicitly since it will
+be called automatically by `mtar_write_header()` and `mtar_finalize()`.
+Similarily, `mtar_finalize()` is implicitly called by `mtar_close()` if
+you don't do so yourself.
+
+Also note that `mtar_close()` can fail independently if there was a problem
+flushing buffered data to disk, so its return value should always be checked.
+
 
 ## Error handling
-All functions which return an `int` will return `MTAR_ESUCCESS` if the operation
-is successful. If an error occurs an error value less-than-zero will be
-returned; this value can be passed to the function `mtar_strerror()` to get its
-corresponding error string.
 
+Most functions that return `int` return an error code from `enum mtar_error`.
+Zero is success and all other error codes are negative. `mtar_strerror()` can
+return a string describing the error code.
 
-## Wrapping a stream
-If you want to read or write from something other than a file, the `mtar_t`
-struct can be manually initialized with your own callback functions and a
-`stream` pointer.
+A couple of functions use a different return value convention:
 
-All callback functions are passed a pointer to the `mtar_t` struct as their
-first argument. They should return `MTAR_ESUCCESS` if the operation succeeds
-without an error, or an integer below zero if an error occurs.
+- `mtar_foreach()` may error codes or an arbitrary nonzero value provided
+  by the callback.
+- `mtar_read_data()` and `mtar_write_data()` returns the number of bytes read
+  or written, or a negative error code. In particular zero means that no bytes
+  were read or written.
+- `mtar_get_header()` may return `NULL` if there is no valid header.
+  It is only possible to see a null pointer if misusing the API or after
+  a previous error so checking for this is usually not necessary.
 
-After the `stream` field has been set, all required callbacks have been set and
-all unused fields have been zeroset the `mtar_t` struct can be safely used with
-the microtar functions. `mtar_open` *should not* be called if the `mtar_t`
-struct was initialized manually.
+There is essentially no support for error recovery. After an error you can
+only do two things reliably: close the archive with `mtar_close()` or try
+rewinding to the beginning with `mtar_rewind()`.
 
-#### Reading
-The following callbacks should be set for reading an archive from a stream:
 
-Name    | Arguments                                | Description
---------|------------------------------------------|---------------------------
-`read`  | `mtar_t *tar, void *data, unsigned size` | Read data from the stream
-`seek`  | `mtar_t *tar, unsigned pos`              | Set the position indicator
-`close` | `mtar_t *tar`                            | Close the stream
+## I/O hooks
 
-#### Writing
-The following callbacks should be set for writing an archive to a stream:
+You can provide your own I/O hooks in a `mtar_ops_t` struct. The same ops
+struct can be shared among multiple `mtar_t` objects but each object gets
+its own `void* stream` pointer.
 
-Name    | Arguments                                      | Description
---------|------------------------------------------------|---------------------
-`write` | `mtar_t *tar, const void *data, unsigned size` | Write data to the stream
+Name    | Arguments                                 | Required
+--------|-------------------------------------------|------------
+`read`  | `void* stream, void* data, unsigned size` | If reading
+`write` | `void* stream, void* data, unsigned size` | If writing
+`seek`  | `void* stream, unsigned pos`              | If reading
+`close` | `void* stream`                            | Always
 
+`read` and `write` should transfer the number of bytes indicated
+and return the number of bytes actually read or written, or a negative
+`enum mtar_error` code on error.
 
-## License
-This library is free software; you can redistribute it and/or modify it under
-the terms of the MIT license. See [LICENSE](LICENSE) for details.
+`seek` must have semantics like `lseek(..., pos, SEEK_SET)`; that is,
+the position is an absolute byte offset in the stream. Seeking is not
+optional for read support, but the library only performs backward
+seeks under two circumstances:
+
+- `mtar_rewind()` seeks to position 0.
+- `mtar_seek_data()` may seek backward if the user requests it.
+
+Therefore, you will be able to get away with a limited forward-only
+seek function if you're able to read everything in a single pass use
+the API carefully. Note `mtar_find()` and `mtar_foreach()` will call
+`mtar_rewind()`.
+
+`close` is called by `mtar_close()` to clean up the stream. Note the
+library assumes that the stream handle is cleaned up by `close` even
+if an error occurs.
+
+`seek` and `close` should return an `enum mtar_error` code, either
+`MTAR_SUCCESS`, or a negative value on error.
author	Aidan MacDonald <amachronic@protonmail.com>	2021-11-05 17:41:58 +0300
committer	Aidan MacDonald <amachronic@protonmail.com>	2021-11-05 18:21:52 +0300
commit	4868c7d4de5114757cc56de32a577851c0f68377 (patch)
tree	3f31bfd8bb344adf996df9ada27c1c7410261790
parent	2dd008245d42504c12c27d9731f2e1180162fe21 (diff)