Welcome to mirror list, hosted at ThFree Co, Russian Federation.

cygwin.com/git/newlib-cygwin.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorChristopher Faylor <me@cgf.cx>2000-02-17 22:38:33 +0300
committerChristopher Faylor <me@cgf.cx>2000-02-17 22:38:33 +0300
commit1fd5e000ace55b323124c7e556a7a864b972a5c4 (patch)
treedc4fcf1e5e22a040716ef92c496b8d94959b2baa /winsup/doc/textbinary.sgml
parent369d8a8fd5e887eca547bf34bccfdf755c9e5397 (diff)
import winsup-2000-02-17 snapshot
Diffstat (limited to 'winsup/doc/textbinary.sgml')
-rw-r--r--winsup/doc/textbinary.sgml181
1 files changed, 181 insertions, 0 deletions
diff --git a/winsup/doc/textbinary.sgml b/winsup/doc/textbinary.sgml
new file mode 100644
index 000000000..cf6fc1b36
--- /dev/null
+++ b/winsup/doc/textbinary.sgml
@@ -0,0 +1,181 @@
+<sect1 id="using-textbinary"><title>Text and Binary modes</title>
+
+<sect2> <title>The Issue</title>
+
+<para>On a UNIX system, when an application reads from a file it gets
+exactly what's in the file on disk and the converse is true for writing.
+The situation is different in the DOS/Windows world where a file can
+be opened in one of two modes, binary or text. In the binary mode the
+system behaves exactly as in UNIX. However in text mode there are
+major differences:</para>
+<OrderedList Numeration="Loweralpha" Spacing="Compact">
+<listitem>
+<para>
+On writing in text mode, a NL (\n, ^J) is transformed into the
+sequence CR (\r, ^M) NL.</para>
+</listitem>
+<listitem>
+<para>
+On reading in text mode, a CR followed by an NL is deleted and a ^Z
+character signals the end of file.</para>
+</listitem>
+</OrderedList>
+
+<para>This can wreak havoc with the seek/fseek calls since the number
+of bytes actually in the file may differ from that seen by the
+application.</para>
+
+<para>The mode can be specified explicitly as explained in the Programming
+section below. In an ideal DOS/Windows world, all programs using lines as
+records (such as <command>bash</command>, <command>make</command>,
+<command>sed</command> ...) would open files (and change the mode of their
+standard input and output) as text. All other programs (such as
+<command>cat</command>, <command>cmp</command>, <command>tr</command> ...)
+would use binary mode. In practice with Cygwin, programs that deal
+explicitly with object files specify binary mode (this is the case of
+<command>od</command>, which is helpful to diagnose CR problems). Most
+other programs (such as <command>cat</command>, <command>cmp</command>,
+<command>tr</command>) use the default mode.</para>
+
+</sect2>
+
+<sect2><title>The default Cygwin behavior</title>
+
+<para>The Cygwin system gives us some flexibility in deciding how files
+are to be opened when the mode is not specified explicitly.
+The rules are evolving, this section gives the design goals.</para>
+<OrderedList Numeration="Loweralpha">
+<listitem>
+<para>If the file appears to reside on a file system that is mounted
+(i.e. if its pathname starts with a directory displayed by
+<command>mount</command>), then the default is specified by the mount
+flag. If the file is a symbolic link, the mode of the target file system
+applies.</para>
+</listitem>
+<listitem>
+<para>If the file appears to reside on a file system that is not mounted
+(as can happen when the path contains a drive letter), the default is text.
+</para>
+</listitem>
+<listitem>
+<para>Pipes and non-file devices are opened in binary mode,
+except if the <EnVar>CYGWIN</EnVar> environment variable contains
+<literal>nobinmode</literal>.</para>
+<warning><Title>Warning!</Title><para>In b20.1 of 12/98, a file will be opened
+in binary mode if any of the following conditions hold:</para>
+<OrderedList Numeration="arabic" Spacing="Compact">
+<listitem><para>binary mode is specified in the open call</para>
+</listitem>
+<listitem><para><envar>CYGWIN</envar> contains <literal>binmode</literal></para>
+</listitem>
+<listitem><para>the file resides in a binary mounted partition</para>
+</listitem>
+<listitem><para>the file is not a disk file</para>
+</listitem>
+</OrderedList>
+</warning>
+</listitem>
+
+<listitem>
+<para>When a Cygwin program is launched by a shell, its standard input,
+output and error are in binary mode if the <envar>CYGWIN</envar> variable
+contains <literal>tty</literal>, else in text mode, except if they are piped
+or redirected.</para>
+<para> When redirecting, the Cygwin shells uses rules (a-c). For
+these shells the relevant value of <envar>CYGWIN</envar> is that at the time
+the shell was launched and not that at the time the program is executed.
+Non-Cygwin shells always pipe and redirect with binary mode. With
+non-Cygwin shells the commands <command> cat filename | program </command>
+and <command> program &lt filename </command> are not equivalent when
+<filename>filename</filename> is on a text-mounted partition. </para>
+</listitem>
+</OrderedList>
+</sect2>
+
+<sect2><title>Example</title>
+<para>To illustrate the various rules, we provide scripts to delete CRs
+from files by using the <command>tr</command> program, which can only write
+to standard output.
+The script</para>
+<screen>
+#!/bin/sh
+# Remove \r from the file given as argument
+tr -d '\r' < "$1" > "$1".nocr
+</screen>
+<para> will not work on a text mounted systems because the \r will be
+reintroduced on writing. However scripts such as </para>
+<screen>
+#!/bin/sh
+# Remove \r from the file given as argument
+tr -d '\r' | gzip | gunzip > "$1".nocr
+</screen>
+<para>and the .bat file</para>
+<screen>
+REM Remove \r from the file given as argument
+@echo off
+tr -d \r < %1 > %1.nocr
+</screen>
+<para> work fine. In the first case (assuming the pipes are binary)
+we rely on <command>gunzip</command> to set its output to binary mode,
+possibly overriding the mode used by the shell.
+In the second case we rely on the DOS shell to redirect in binary mode.
+</para>
+</sect2>
+
+<sect2><title>Binary or text?</title>
+
+<para>UNIX programs that have been written for maximum portability
+will know the difference between text and binary files and act
+appropriately under Cygwin. For those programs, the text mode default
+is a good choice. Programs included in official Cygnus distributions
+should work well in the default mode. </para>
+
+<para>Text mode makes it much easier to mix files between Cygwin and
+Windows programs, since Windows programs will usually use the CRLF
+format. Unfortunately you may still have some problems with text
+mode. First, some of the utilities included with Cygwin do not yet
+specify binary mode when they should, e.g. <command>cat</command> will
+not work with binary files (input will stop at ^Z, CRs will be
+introduced in the output). Second, you will introduce CRs in text
+files you write, which can cause problems when moving them back to a
+UNIX system. </para>
+
+<para>If you are mounting a remote file system from a UNIX machine,
+or moving files back and forth to a UNIX machine, you may want to
+access the files in binary mode. The text files found there will normally
+be in UNIX NL format, and you would want any files put there by Cygwin
+programs to be stored in a format understood by UNIX.
+Be sure to remove CRs from all Makefiles and
+shell scripts and make sure that you only edit the files with
+DOS/Windows editors that can cope with and preserve NL terminated lines.
+</para>
+
+<para>Note that you can decide this on a disk by disk basis (for
+example, mounting local disks in text mode and network disks in binary
+mode). You can also partition a disk, for example by mounting
+<filename>c:</filename> in text mode, and <filename>c:\home</filename>
+in binary mode.</para>
+
+</sect2>
+
+<sect2><title>Programming</title>
+
+<para>In the <function>open()</function> function call, binary mode can be
+specified with the flag <literal>O_BINARY</literal> and text mode with
+<literal>O_TEXT</literal>. These symbols are defined in
+<filename>fcntl.h</filename>.</para>
+
+<para>In the <function>fopen()</function> function call, binary mode can be
+specified by adding a <literal>b</literal> to the mode string. There is no
+direct way to specify text mode.</para>
+
+<para>The mode of a file can be changed by the call
+<function>setmode(fd,mode)</function> where <literal>fd</literal> is a file
+descriptor (an integer) and <literal>mode</literal> is
+<literal>O_BINARY</literal> or <literal>O_TEXT</literal>. The function
+returns <literal>O_BINARY</literal> or <literal>O_TEXT</literal> depending
+on the mode before the call, and <literal>EOF</literal> on error.</para>
+
+</sect2>
+
+</sect1>