diff options
Diffstat (limited to 'src/3rd_party/pathie-cpp/README.md')
-rw-r--r-- | src/3rd_party/pathie-cpp/README.md | 359 |
1 files changed, 359 insertions, 0 deletions
diff --git a/src/3rd_party/pathie-cpp/README.md b/src/3rd_party/pathie-cpp/README.md new file mode 100644 index 00000000..80b68770 --- /dev/null +++ b/src/3rd_party/pathie-cpp/README.md @@ -0,0 +1,359 @@ +PATHIE. +======= + +This is the Pathie project. It aims to provide a C++ library that covers +all needs of pathname manipulation and filename fiddling, without +having to worry about the underlying platform. That is, it is a glue +library that allows you to create platform-independent filename +handling code with special regard to Unicode path names. + +Supported systems +----------------- + +Currently supported platforms are Linux and Windows, the latter via +MSYS2 GCC. Any other compiler or system might or might not work. Mac +OS should work as well, but I cannot test this due to lack of a Mac. I +gladly accept contributions for any system or compiler. + +Pathie's source code itself is written conforming to C++98. On UNIX +systems, it assumes the system supports POSIX.1-2001. On Windows +systems, the minimum supported Windows version is Windows Vista. + +Installation +------------ + +See INSTALL.md. + +The library +----------- + +The entire world is using UTF-8 as the primary Unicode encoding. The +entire world? No, a little company from Redmond resists the temptation +and instead uses UTF-16LE, causing cross-platform handling of Unicode +paths to be a nightmare. + +One of the main problems the author ran into was compiler-dependant +code that was not marked as such. Many sites on the Internet claim +Unicode path handling on Windows is easy, but in fact, it only is if +you define “development for Windows” as “development with MSVC”, +Microsoft’s proprietary C/C++ compiler, which provides nonstandard +interfaces to allow for handling UTF-16LE filenames. The Pathie +library has been developed with a focus on MinGW and crosscompilation +from Linux to Windows and thus does not suffer from this problem. + +The Pathie library has been developed to release the programmer from +the burden of handling the different encodings in use for filenames, +and does so by focusing its API on UTF-8 regardless of the platform in +use. Thus, if you use UTF-8 as your preferred encoding inside your +program (take a look at the [UTF8 Everywhere +website](http://www.utf8everywhere.org) for reasons why you should do +that), Pathie will be of the most use for you, since it transparently +converts whatever filesystem encoding is encountered to UTF-8 in its +public interface. Likewise, any pathname you pass to the library is +assumed to be UTF-8 and is transcoded transparently to the filesystem +encoding before invoking the respective OS' filesystem access +methods. Of course, explicit conversion functions are also provided, +in case you do need a string in the native encoding or need to +construct a path from a string in the native encoding. + +General Usage +------------- + +First thing is to include the main header: + +~~~~~~~~~~~~~~~~~~{.cpp} +#include <pathie/path.hpp> +~~~~~~~~~~~~~~~~~~ + +Now consider the simple task to get all children of a directory, which +have Unicode filenames. Doing that manually will result in you having +to convert between UTF-8 and UTF-16 all the time. With pathie, you can +just do this: + +~~~~~~~~~~~~~~~~~~~{.cpp} +std::vector<Pathie::Path> children = your_path.children(); +~~~~~~~~~~~~~~~~~~~ + +Done. Retrieving the parent directory of your directory is pretty easy: + +~~~~~~~~~~~~~~~~~~~{.cpp} +Pathie::Path yourpath("foo/bar/baz"); +Pathie::Path parent = yourpath.parent(); +~~~~~~~~~~~~~~~~~~~ + +But Pathie is much more than just an abstraction of different filepath +encodings. It is a utility library for pathname manipulation, i.e. it +allows you to do things like finding the parent directory, expanding +relative to absolute paths, decomposing a filename into basename, +dirname, and extension, and so on. See the documentation of the +central Pathie::Path class on what you can do. + +~~~~~~~~~~~~~~~~~~~~~~{.cpp} +// Assume current directory is /tmp +Pathie::Path p("foo/bar/../baz"); +p.expand(); // => /tmp/foo/baz +~~~~~~~~~~~~~~~~~~~~~~ + +Or my personal favourite: + +~~~~~~~~~~~~~~~~~~~{.cpp} +Pathie::Path p1("/tmp/foo/bar"); +Pathie::Path p2("/tmp/bar/foo"); +Pathie::Path p3 = p1.relative(p2); // => ../../foo/bar +~~~~~~~~~~~~~~~~~~~ + +It also provides you with commonly used paths like the user’s +configuration directory or the path to the running executable. + +~~~~~~~~~~~~~~~~~~~~{.cpp} +Pathie::Path configdir = Pathie::Path::config_dir(); +Pathie::Path exepath = Pathie::Path::exe(); +~~~~~~~~~~~~~~~~~~~~ + +Pathie assumes that all string arguments passed are in UTF-8 and +transparently converts to the native filesystem encoding internally. + +Still, if you interface directly with the Windows API or other external +libraries, you might want to retrieve the native representation from a +Path or construct a Path from the native representation. Pathie +doesn’t want to be in your way then. The following example constructs +from and converts to the native representation on Windows, which is +UTF-16LE: + +~~~~~~~~~~~~~~~~~~~~{.cpp} +// Contruct from native +wchar_t* utf16 = Win32ApiCall(); +Path mypath = Path::from_native(utf16); // also accepts std::wstring + +// Retrieve native (Note C++’ish std::wstring rather than +// raw wchar_t* on Windows) +std::wstring native_utf16 = mypath.native(); +~~~~~~~~~~~~~~~~~~~~ + +On UNIX, these methods work with normal strings (std::string instead +of std::wstring) in the underlying filesystem encoding. In most cases, +that will be UTF-8, but some legacy systems may still use something +like ISO-8859-1 in which case that will differ. + +### Temporary files and directories + +There are two classes `Pathie::Tempdir` and `Pathie::Tempfile` that +you can use if you need to work with temporary files or directories, +respectively. Constructing instances of these classes creates a +temporary entry, which is removed (recursively in case of directories) +when the instance is destroyed again. Use TempEntry::path() to get +access to the Path instance pointing to the created entry. + +~~~~~~~~~~~~~~~~~~~~{.cpp} +#include <pathie/tempdir.hpp> + +//... + +{ + srand(time(NULL)); // Needs random number generator + Pathie::Tempdir tmpdir("foo"); // Pass a fragment to use as part of filename + std::cout << "Temporary dir is: " << tmpdir.path() << std::endl; +} +// When `tmpdir' is destroyed, the destructor recursively +// deletes the directory that was created. +~~~~~~~~~~~~~~~~~~~~ + +### Opening a file with a Unicode path name + +On Windows with GCC, it is [not possible to open a file with Unicode +pathname](https://stackoverflow.com/questions/821873) via C++'s usual +`std::ifstream` and `std::ofstream` mechanism. There's a nonstandard +extension provided by Microsoft's proprietary compiler that does this, +but GCC does not have this extension. Consequently, code that is +intended to compile on GCC (like Pathie) has to avoid it. + +There *is* however a function in the Win32API that allows to open a +file with a Unicode pathname *and* that returns a standard C `FILE*` +handle, +[_wfopen()](http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx). The +method Path::fopen() uses this function on Windows and a regular C +`fopen()` on all other platforms, thus allowing you to just deal with +your Unicode filename via the regular C I/O interface. If you urgently +need C++ I/O streams, read on. + +### Stream replacements + +Pathie mainly provides you with the means to handle paths, compose, +and decompose them. There is an experimental feature however that +provides replacements for C++ file streams that work with instances of +Pathie::Path instead of strings for opening a file. These replacements +are neither elegant nor portable, because they don't nicely honour the +template concept the STL is based on by directly subclassing the +standard streams in the matter needed most frequently and additionally +relying on vendor-specific details. For GCC, an internal (but at least +documented) interface is used to exchange the file descriptor inside a +stream, and for MSVC, a nonstandard (but documented) constructor is +used. Other compilers are not supported by this feature (which most +notably affects clang, where I have no idea on the interfaces I need +to use for such a trick). + +In one word, these replacements are hacky and I consider them +experimental. If that does not strike you as problematic, you can +enable this feature by passing `-DPATHIE_BUILD_STREAM_REPLACEMENTS=ON` +when invoking `cmake` during the build process. + +In order to use the replacements, include the respective header +(either `pathie_ifstream` or `pathie_ofstream`) and use the +`Pathie::ifstream` and `Pathie::ofstream` classes just like you would +use `std::ifstream` and `std::ofstream`, with the only difference +being that you construct them from a Pathie::Path instance instead of +a string. See the documentation of Pathie::ofstream for more +information. + +~~~~~~~~~~~~~~~~~{.cpp} +#include <pathie/pathie_ofstream> + +// ... + +Pathie::Path p("Bärenstark.txt"); +Pathie::ofstream file(p); +file << "Some content" << std::endl; +file.close() +~~~~~~~~~~~~~~~~~ + +There's also the inofficial +[boost::nowide](http://cppcms.com/files/nowide/html/), which is +similar to this feature and maybe more reliable. It has [recently been +accepted into +boost](https://lists.boost.org/boost-announce/2017/06/0516.php). + +Dependencies and linking +------------------------ + +Pathie is standalone, that is, it requires no other libraries except +for those provided by your operating system. Note that there’s a +caveat with this on Windows, which does provide the `Shlwapi` library +by default, but MinGW's GCC does not automatically link it in. Be sure +to link to this library explicitely when compiling for MinGW Windows +by appending `-lShlwapi` to the end of your linking command line. + +It is recommended to link in pathie as a dynamic library, because +there are some problems with it when linked statically on certain +operating systems (see _Caveats_ below). If you are sure you aren’t +affected by those problems, it is possible to link in pathie +statically. + +Caveats +------- + +This library assumes that under all UNIX systems out there (I also +consider Mac OSX to be a UNIX system) the file system root always is +`/` and the directory separator also always is `/`. This structure is +mandatory as per POSIX -- in POSIX.1-2008, it’s specified in section +10.1. Systems which do neither follow POSIX directory structure, nor +are Windows, are unsupported. + +On POSIX-compliant systems other than Mac OS X, the filesystem +encoding [generally is +unspecified](https://unix.stackexchange.com/questions/2089/what-charset-encoding-is-used-for-filenames-and-paths-on-linux). +Pathnames are merely byte blobs which do not contain NUL bytes, and +components are separated by `/`. It’s up to the applications, +including utilities like a shell or the ls(1) program, to make +something of those byte streams. Therefore, it is perfectly possible +that on one system, user A uses ISO-8859-1 filenames and user B uses +UTF-8 filenames. Even the same user could use differently encoded +filenames. Programs that have to interpret the byte blobs in pathnames +on these systems look at the locale environment variables, namely +`LANG` and `LC_ALL`, see section 7 of POSIX.1-2008. As a consequence, +it may happen you want to create filenames with characters not +supported in the user’s pathname encoding. For example, if you want to +create a file with a hebrew filename and the user’s pathname encoding +is ISO-8859-1, there’s a problem, because ISO-8859-1 has no hebrew +characters in it, but in UTF-8, which is the encoding you are advised +to use and which is what Pathie’s API expects from you, they are +available. There is no sensible solution to this problem that the +Pathie library could dictate; the `iconv()` function used by pathie +just replaces characters that are unavailable in the target encoding +with a system-defined default (probably “?”). Note that on systems +which have a Unicode pathname encoding, especially modern Linuxes with +UTF-8, such a situation can’t ever arise, because the Unicode +encodings (UTF-*) cover all characters you can ever use. + +At least on FreeBSD, calling the POSIX `iconv()` function fails with +the cryptic error message “Service unavailable” if a program is linked +statically. I’ve reported [a bug on +this](https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196567). This +means that you currently can’t link in pathie statically on FreeBSD +and systems which don’t allow statically linked executables to call +`iconv()`. + +On Linux systems, it is recommended to set your program’s locale to the +environment’s locale before you call any functions the Pathie library +provides, because this will allow Pathie to use the correct encoding +for filenames. This is relevant where the environment’s encoding is +not UTF-8, e.g. with $LANG set to `de_DE.ISO-8859-1`. You can do this +as follows (the `""` locale always refers to the locale of the +environment): + +~~~~~~~~~~~~~~~~~~~~~{.cpp} +#include <locale> +std::locale::global(std::locale("")); +~~~~~~~~~~~~~~~~~~~~~ + +This is not required on Windows nor on Mac OS X, because these +operating systems always use UTF-16LE (Windows) or UTF-8 (Mac OS X) as +the filesystem encoding, regardless of the user's locale. It however +does not hurt to call this either, it simply makes no difference for +Pathie on these systems. If you urgently need to avoid this call on +Linux, you need to compile pathie with the special build option +PATHIE_ASSUME_UTF8_ON_UNIX, which will force Pathie to assume that +UTF-8 is used as the filesystem encoding under any UNIX-based system. + +Links +----- + +* Project page: https://www.guelkerdev.de/projects/pathie/ +* GitHub mirror: https://github.com/Quintus/pathie-cpp +* Issue tracker: https://github.com/Quintus/pathie-cpp/issues + +Contributing +------------ + +Feel free to submit any contributions you deem useful. Try to make +separate branches for your new features, give a description on what +you changed, etc. + +Don’t you duplicate boost::filesystem? +------------------------------------- + +Yes and +no. [boost::filesystem](http://www.boost.org/doc/libs/1_56_0/libs/filesystem/doc/index.htm) +provides many methods pathie provides, but has a major problem with +Unicode path handling if you are not willing to do the UTF-8/UTF-16 +conversion manually. boost::filesystem always uses UTF-8 to store the +paths on UNIX, and, which is the problem, always uses UTF-16LE to +store the paths on a Windows system. There is no way to override +this, although there is a [hidden documentation +page](http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/default_encoding_under_windows.html) +that claims to solve the problem. I have wasted a great amount of time +to persuade boost::filesystem to automatically convert all +`std::string` input it receives into UTF-16LE, but failed to +succeed. Each time I wanted to create a file with a Unicode filename, +the test failed on Windows by producing garbage filenames. Finally I +found out that the neat trick shown in the documentation above indeed +does work -- but only if you use the Microsoft Visual C++ compiler +(MSVC) to compile your code. I don’t, I generally use g++ via the +[MinGW](http://www.mingw.org) toolchain. boost::filesystem fails with +g++ via MinGW with regard to Unicode filenames on Windows as of this +writing (September 2014). + +Apart from that, pathie provides some additional methods, especially +with regard to finding out where the user’s paths are. It is modelled +after Ruby’s popular +[Pathname](http://ruby-doc.org/stdlib-2.1.2/libdoc/pathname/rdoc/Pathname.html#method-i-rmtree) +class, but it doesn’t entirely duplicate its interface (which wouldn’t +be idiomatic C++). + +Also, pathie is a small library. Adding it to your project shouldn’t +hurt too much, while boost::filesystem is quite a large dependency. + +License +------- + +Pathie is BSD-licensed; see the file “LICENSE” for the exact license +conditions. |