Welcome to mirror list, hosted at ThFree Co, Russian Federation.

cygwin.com/git/newlib-cygwin.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorChristopher Faylor <me@cgf.cx>2000-02-17 22:38:33 +0300
committerChristopher Faylor <me@cgf.cx>2000-02-17 22:38:33 +0300
commit1fd5e000ace55b323124c7e556a7a864b972a5c4 (patch)
treedc4fcf1e5e22a040716ef92c496b8d94959b2baa /winsup/doc/overview2.sgml
parent369d8a8fd5e887eca547bf34bccfdf755c9e5397 (diff)
import winsup-2000-02-17 snapshot
Diffstat (limited to 'winsup/doc/overview2.sgml')
-rw-r--r--winsup/doc/overview2.sgml307
1 files changed, 307 insertions, 0 deletions
diff --git a/winsup/doc/overview2.sgml b/winsup/doc/overview2.sgml
new file mode 100644
index 000000000..9fad7cebe
--- /dev/null
+++ b/winsup/doc/overview2.sgml
@@ -0,0 +1,307 @@
+<sect1 id="ov-ex-unix"><title>Expectations for UNIX Programmers</title>
+
+<para>Developers coming from a UNIX background will find a set of utilities
+they are already comfortable using, including a working UNIX shell. The
+compiler tools are the standard GNU compilers most people will have previously
+used under UNIX, only ported to the Windows host. Programmers wishing to port
+UNIX software to Windows NT or 9x will find that the Cygwin library provides
+an easy way to port many UNIX packages, with only minimal source code
+changes.</para>
+</sect1>
+
+<sect1 id="ov-ex-win"><title>Expectations for Windows Programmers</title>
+<para>Developers coming from a Windows background will find a set of tools capable
+of writing console or GUI executables that rely on the Microsoft Win32 API.
+The linker and dlltool utility may be used to write Windows Dynamically Linked
+Libraries (DLLs). The resource compiler "windres" is also provided with the
+native Windows GNUPro tools. All tools may be used from the Microsoft command
+line prompt, with full support for normal Windows pathnames.</para>
+</sect1>
+
+<sect2 id="ov-hi-intro"><title>Introduction</title> <para>When a binary linked
+against the library is executed, the Cygwin DLL is loaded into the
+application's text segment. Because we are trying to emulate a UNIX kernel
+which needs access to all processes running under it, the first Cygwin DLL to
+run creates shared memory areas that other processes using separate instances
+of the DLL can access. This is used to keep track of open file descriptors and
+assist fork and exec, among other purposes. In addition to the shared memory
+regions, every process also has a per_process structure that contains
+information such as process id, user id, signal masks, and other similar
+process-specific information.</para>
+
+<para>The DLL is implemented using the Win32 API, which allows it to run on all
+Win32 hosts. Because processes run under the standard Win32 subsystem, they
+can access both the UNIX compatibility calls provided by Cygwin as well as
+any of the Win32 API calls. This gives the programmer complete flexibility in
+designing the structure of their program in terms of the APIs used. For
+example, they could write a Win32-specific GUI using Win32 API calls on top of
+a UNIX back-end that uses Cygwin.</para>
+
+<para>Early on in the development process, we made the important design
+decision that it would not be necessary to strictly adhere to existing UNIX
+standards like POSIX.1 if it was not possible or if it would significantly
+diminish the usability of the tools on the Win32 platform. In many cases, an
+environment variable can be set to override the default behavior and force
+standards compliance.</para>
+</sect2>
+
+<sect2 id="ov-hi-win9xnt"><title>Supporting both Windows NT and 9x</title>
+<para>While Windows 95 and Windows 98 are similar enough to each other that we
+can safely ignore the distinction when implementing Cygwin, Windows NT is an
+extremely different operating system. For this reason, whenever the DLL is
+loaded, the library checks which operating system is active so that it can act
+accordingly.</para>
+
+<para>In some cases, the Win32 API is only different for
+historical reasons. In this situation, the same basic functionality is
+available under Windows 9x and NT but the method used to gain this
+functionality differs. A trivial example: in our implementation of
+uname, the library examines the sysinfo.dwProcessorType structure
+member to figure out the processor type under Windows 9x. This field
+is not supported in NT, which has its own operating system-specific
+structure member called sysinfo.wProcessorLevel.</para>
+
+<para>Other differences between NT and 9x are much more fundamental in
+nature. The best example is that only NT provides a security model.</para>
+</sect2>
+
+<sect2 id="ov-hi-perm"><title>Permissions and Security</title>
+<para>Windows NT includes a sophisticated security model based on Access
+Control Lists (ACLs). Although some modern UNIX operating systems include
+support for ACLs, Cygwin maps Win32 file ownership and permissions to the
+more standard, older UNIX model. The chmod call maps UNIX-style permissions
+back to the Win32 equivalents. Because many programs expect to be able to find
+the /etc/passwd and /etc/group files, we provide utilities that can be used to
+construct them from the user and group information provided by the operating
+system.</para>
+
+<para>Under Windows NT, the administrator is permitted to chown files. There
+is currently no mechanism to support the setuid concept or API call. Although
+we hope to support this functionality at some point in the future, in practice,
+the programs we have ported have not needed it.</para>
+
+<para>Under Windows 9x, the situation is considerably different. Since a
+security model is not provided, Cygwin fakes file ownership by making all
+files look like they are owned by a default user and group id. As under NT,
+file permissions can still be determined by examining their read/write/execute
+status. Rather than return an unimplemented error, under Windows 9x, the
+chown call succeeds immediately without actually performing any action
+whatsoever. This is appropriate since essentially all users jointly own the
+files when no concept of file ownership exists.</para>
+
+<para>It is important that we discuss the implications of our "kernel" using
+shared memory areas to store information about Cygwin processes. Because
+these areas are not yet protected in any way, in principle a malicious user
+could modify them to cause unexpected behavior in Cygwin processes. While
+this is not a new problem under Windows 9x (because of the lack of operating
+system security), it does constitute a security hole under Windows NT.
+This is because one user could affect the Cygwin programs run by
+another user by changing the shared memory information in ways that
+they could not in a more typical WinNT program. For this reason, it
+is not appropriate to use Cygwin in high-security applications. In
+practice, this will not be a major problem for most uses of the
+library.</para>
+</sect2>
+
+<sect2 id="ov-hi-files"><title>File Access</title> <para>Cygwin supports
+both Win32- and POSIX-style paths, using either forward or back slashes as the
+directory delimiter. Paths coming into the DLL are translated from Win32 to
+POSIX as needed. As a result, the library believes that the file system is a
+POSIX-compliant one, translating paths back to Win32 paths whenever it calls a
+Win32 API function. UNC pathnames (starting with two slashes) are
+supported.</para>
+
+<para>The layout of this POSIX view of the Windows file system space is stored
+in the Windows registry. While the slash ('/') directory points to the system
+partition by default, this is easy to change with the Cygwin mount utility.
+In addition to selecting the slash partition, it allows mounting arbitrary
+Win32 paths into the POSIX file system space. Many people use the utility to
+mount each drive letter under the slash partition (e.g. C:\ to /c, D:\ to /d,
+etc...).</para>
+
+<para>The library exports several Cygwin-specific functions that can be used
+by external programs to convert a path or path list from Win32 to POSIX or vice
+versa. Shell scripts and Makefiles cannot call these functions directly.
+Instead, they can do the same path translations by executing the cygpath
+utility program that we provide with Cygwin.</para>
+
+<para>Win32 file systems are case preserving but case insensitive. Cygwin
+does not currently support case distinction because, in practice, few UNIX
+programs actually rely on it. While we could mangle file names to support case
+distinction, this would add unnecessary overhead to the library and make it
+more difficult for non-Cygwin applications to access those files.</para>
+
+<para>Symbolic links are emulated by files containing a magic cookie followed
+by the path to which the link points. They are marked with the System
+attribute so that only files with that attribute have to be read to determine
+whether or not the file is a symbolic link. Hard links are fully supported
+under Windows NT on NTFS file systems. On a FAT file system, the call falls
+back to simply copying the file, a strategy that works in many cases.</para>
+
+<para>The inode number for a file is calculated by hashing its full Win32 path.
+The inode number generated by the stat call always matches the one returned in
+d_ino of the dirent structure. It is worth noting that the number produced by
+this method is not guaranteed to be unique. However, we have not found this to
+be a significant problem because of the low probability of generating a
+duplicate inode number.</para>
+</sect2>
+
+<sect2 id="ov-hi-textvsbinary"><title>Text Mode vs. Binary Mode</title>
+<para>Interoperability with other Win32 programs such as text editors was
+critical to the success of the port of the development tools. Most Cygnus
+customers upgrading from the older DOS-hosted toolchains expected the new
+Win32-hosted ones to continue to work with their old development
+sources.</para>
+
+<para>Unfortunately, UNIX and Win32 use different end-of-line terminators in
+text files. Consequently, carriage-return newlines have to be translated on
+the fly by Cygwin into a single newline when reading in text mode. The
+control-z character is interpreted as a valid end-of-file character for a
+similar reason.</para>
+
+<para>This solution addresses the compatibility requirement at the expense of
+violating the POSIX standard that states that text and binary mode will be
+identical. Consequently, processes that attempt to lseek through text files can
+no longer rely on the number of bytes read as an accurate indicator of position
+in the file. For this reason, the CYGWIN environment variable can be
+set to override this behavior.</para>
+</sect2>
+
+<sect2 id="ov-hi-ansiclib"><title>ANSI C Library</title>
+<para>We chose to include
+Cygnus' own existing ANSI C library
+"newlib" as part of the library, rather than write all of the lib C
+and math calls from scratch. Newlib is a BSD-derived ANSI C library,
+previously only used by cross-compilers for embedded systems
+development.</para>
+
+<para>The reuse of existing free implementations of such things
+as the glob, regexp, and getopt libraries saved us considerable
+effort. In addition, Cygwin uses Doug Lea's free malloc
+implementation that successfully balances speed and compactness. The
+library accesses the malloc calls via an exported function pointer.
+This makes it possible for a Cygwin process to provide its own
+malloc if it so desires.</para>
+</sect2>
+
+<sect2 id="ov-hi-process"><title>Process Creation</title>
+<para>The fork call in Cygwin is particularly interesting because it
+does not map well on top of the Win32 API. This makes it very
+difficult to implement correctly. Currently, the Cygwin fork is a
+non-copy-on-write implementation similar to what was present in early
+flavors of UNIX.</para>
+
+<para>The first thing that happens when a parent process
+forks a child process is that the parent initializes a space in the
+Cygwin process table for the child. It then creates a suspended
+child process using the Win32 CreateProcess call. Next, the parent
+process calls setjmp to save its own context and sets a pointer to
+this in a Cygwin shared memory area (shared among all Cygwin
+tasks). It then fills in the child's .data and .bss sections by
+copying from its own address space into the suspended child's address
+space. After the child's address space is initialized, the child is
+run while the parent waits on a mutex. The child discovers it has
+been forked and longjumps using the saved jump buffer. The child then
+sets the mutex the parent is waiting on and blocks on another mutex.
+This is the signal for the parent to copy its stack and heap into the
+child, after which it releases the mutex the child is waiting on and
+returns from the fork call. Finally, the child wakes from blocking on
+the last mutex, recreates any memory-mapped areas passed to it via the
+shared area, and returns from fork itself.</para>
+
+<para>While we have some
+ideas as to how to speed up our fork implementation by reducing the
+number of context switches between the parent and child process, fork
+will almost certainly always be inefficient under Win32. Fortunately,
+in most circumstances the spawn family of calls provided by Cygwin
+can be substituted for a fork/exec pair with only a little effort.
+These calls map cleanly on top of the Win32 API. As a result, they
+are much more efficient. Changing the compiler's driver program to
+call spawn instead of fork was a trivial change and increased
+compilation speeds by twenty to thirty percent in our
+tests.</para>
+
+<para>However, spawn and exec present their own set of
+difficulties. Because there is no way to do an actual exec under
+Win32, Cygwin has to invent its own Process IDs (PIDs). As a
+result, when a process performs multiple exec calls, there will be
+multiple Windows PIDs associated with a single Cygwin PID. In some
+cases, stubs of each of these Win32 processes may linger, waiting for
+their exec'd Cygwin process to exit.</para>
+</sect2>
+
+<sect2 id="ov-hi-signals"><title>Signals</title>
+<para>When
+a Cygwin process starts, the library starts a secondary thread for
+use in signal handling. This thread waits for Windows events used to
+pass signals to the process. When a process notices it has a signal,
+it scans its signal bitmask and handles the signal in the appropriate
+fashion.</para>
+
+<para>Several complications in the implementation arise from the
+fact that the signal handler operates in the same address space as the
+executing program. The immediate consequence is that Cygwin system
+functions are interruptible unless special care is taken to avoid
+this. We go to some lengths to prevent the sig_send function that
+sends signals from being interrupted. In the case of a process
+sending a signal to another process, we place a mutex around sig_send
+such that sig_send will not be interrupted until it has completely
+finished sending the signal.</para>
+
+<para>In the case of a process sending
+itself a signal, we use a separate semaphore/event pair instead of the
+mutex. sig_send starts by resetting the event and incrementing the
+semaphore that flags the signal handler to process the signal. After
+the signal is processed, the signal handler signals the event that it
+is done. This process keeps intraprocess signals synchronous, as
+required by POSIX.</para>
+
+<para>Most standard UNIX signals are provided. Job
+control works as expected in shells that support
+it.</para>
+</sect2>
+
+<sect2 id="ov-hi-sockets"><title>Sockets</title>
+<para>Socket-related calls in Cygwin simply
+call the functions by the same name in Winsock, Microsoft's
+implementation of Berkeley sockets. Only a few changes were needed to
+match the expected UNIX semantics - one of the most troublesome
+differences was that Winsock must be initialized before the first
+socket function is called. As a result, Cygwin has to perform this
+initialization when appropriate. In order to support sockets across
+fork calls, child processes initialize Winsock if any inherited file
+descriptor is a socket.</para>
+
+<para>Unfortunately, implicitly loading DLLs
+at process startup is usually a slow affair. Because many processes
+do not use sockets, Cygwin explicitly loads the Winsock DLL the
+first time it calls the Winsock initialization routine. This single
+change sped up GNU configure times by thirty
+percent.</para>
+</sect2>
+
+<sect2 id="ov-hi-select"><title>Select</title>
+<para>The UNIX select function is another
+call that does not map cleanly on top of the Win32 API. Much to our
+dismay, we discovered that the Win32 select in Winsock only worked on
+socket handles. Our implementation allows select to function normally
+when given different types of file descriptors (sockets, pipes,
+handles, and a custom /dev/windows Windows messages
+pseudo-device).</para>
+
+<para>Upon entry into the select function, the first
+operation is to sort the file descriptors into the different types.
+There are then two cases to consider. The simple case is when at
+least one file descriptor is a type that is always known to be ready
+(such as a disk file). In that case, select returns immediately as
+soon as it has polled each of the other types to see if they are
+ready. The more complex case involves waiting for socket or pipe file
+descriptors to be ready. This is accomplished by the main thread
+suspending itself, after starting one thread for each type of file
+descriptor present. Each thread polls the file descriptors of its
+respective type with the appropriate Win32 API call. As soon as a
+thread identifies a ready descriptor, that thread signals the main
+thread to wake up. This case is now the same as the first one since
+we know at least one descriptor is ready. So select returns, after
+polling all of the file descriptors one last time.</para>
+</sect2>