From 1fd5e000ace55b323124c7e556a7a864b972a5c4 Mon Sep 17 00:00:00 2001 From: Christopher Faylor Date: Thu, 17 Feb 2000 19:38:33 +0000 Subject: import winsup-2000-02-17 snapshot --- winsup/doc/overview2.sgml | 307 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 307 insertions(+) create mode 100644 winsup/doc/overview2.sgml (limited to 'winsup/doc/overview2.sgml') diff --git a/winsup/doc/overview2.sgml b/winsup/doc/overview2.sgml new file mode 100644 index 000000000..9fad7cebe --- /dev/null +++ b/winsup/doc/overview2.sgml @@ -0,0 +1,307 @@ +Expectations for UNIX Programmers + +Developers coming from a UNIX background will find a set of utilities +they are already comfortable using, including a working UNIX shell. The +compiler tools are the standard GNU compilers most people will have previously +used under UNIX, only ported to the Windows host. Programmers wishing to port +UNIX software to Windows NT or 9x will find that the Cygwin library provides +an easy way to port many UNIX packages, with only minimal source code +changes. + + +Expectations for Windows Programmers +Developers coming from a Windows background will find a set of tools capable +of writing console or GUI executables that rely on the Microsoft Win32 API. +The linker and dlltool utility may be used to write Windows Dynamically Linked +Libraries (DLLs). The resource compiler "windres" is also provided with the +native Windows GNUPro tools. All tools may be used from the Microsoft command +line prompt, with full support for normal Windows pathnames. + + +Introduction When a binary linked +against the library is executed, the Cygwin DLL is loaded into the +application's text segment. Because we are trying to emulate a UNIX kernel +which needs access to all processes running under it, the first Cygwin DLL to +run creates shared memory areas that other processes using separate instances +of the DLL can access. This is used to keep track of open file descriptors and +assist fork and exec, among other purposes. In addition to the shared memory +regions, every process also has a per_process structure that contains +information such as process id, user id, signal masks, and other similar +process-specific information. + +The DLL is implemented using the Win32 API, which allows it to run on all +Win32 hosts. Because processes run under the standard Win32 subsystem, they +can access both the UNIX compatibility calls provided by Cygwin as well as +any of the Win32 API calls. This gives the programmer complete flexibility in +designing the structure of their program in terms of the APIs used. For +example, they could write a Win32-specific GUI using Win32 API calls on top of +a UNIX back-end that uses Cygwin. + +Early on in the development process, we made the important design +decision that it would not be necessary to strictly adhere to existing UNIX +standards like POSIX.1 if it was not possible or if it would significantly +diminish the usability of the tools on the Win32 platform. In many cases, an +environment variable can be set to override the default behavior and force +standards compliance. + + +Supporting both Windows NT and 9x +While Windows 95 and Windows 98 are similar enough to each other that we +can safely ignore the distinction when implementing Cygwin, Windows NT is an +extremely different operating system. For this reason, whenever the DLL is +loaded, the library checks which operating system is active so that it can act +accordingly. + +In some cases, the Win32 API is only different for +historical reasons. In this situation, the same basic functionality is +available under Windows 9x and NT but the method used to gain this +functionality differs. A trivial example: in our implementation of +uname, the library examines the sysinfo.dwProcessorType structure +member to figure out the processor type under Windows 9x. This field +is not supported in NT, which has its own operating system-specific +structure member called sysinfo.wProcessorLevel. + +Other differences between NT and 9x are much more fundamental in +nature. The best example is that only NT provides a security model. + + +Permissions and Security +Windows NT includes a sophisticated security model based on Access +Control Lists (ACLs). Although some modern UNIX operating systems include +support for ACLs, Cygwin maps Win32 file ownership and permissions to the +more standard, older UNIX model. The chmod call maps UNIX-style permissions +back to the Win32 equivalents. Because many programs expect to be able to find +the /etc/passwd and /etc/group files, we provide utilities that can be used to +construct them from the user and group information provided by the operating +system. + +Under Windows NT, the administrator is permitted to chown files. There +is currently no mechanism to support the setuid concept or API call. Although +we hope to support this functionality at some point in the future, in practice, +the programs we have ported have not needed it. + +Under Windows 9x, the situation is considerably different. Since a +security model is not provided, Cygwin fakes file ownership by making all +files look like they are owned by a default user and group id. As under NT, +file permissions can still be determined by examining their read/write/execute +status. Rather than return an unimplemented error, under Windows 9x, the +chown call succeeds immediately without actually performing any action +whatsoever. This is appropriate since essentially all users jointly own the +files when no concept of file ownership exists. + +It is important that we discuss the implications of our "kernel" using +shared memory areas to store information about Cygwin processes. Because +these areas are not yet protected in any way, in principle a malicious user +could modify them to cause unexpected behavior in Cygwin processes. While +this is not a new problem under Windows 9x (because of the lack of operating +system security), it does constitute a security hole under Windows NT. +This is because one user could affect the Cygwin programs run by +another user by changing the shared memory information in ways that +they could not in a more typical WinNT program. For this reason, it +is not appropriate to use Cygwin in high-security applications. In +practice, this will not be a major problem for most uses of the +library. + + +File Access Cygwin supports +both Win32- and POSIX-style paths, using either forward or back slashes as the +directory delimiter. Paths coming into the DLL are translated from Win32 to +POSIX as needed. As a result, the library believes that the file system is a +POSIX-compliant one, translating paths back to Win32 paths whenever it calls a +Win32 API function. UNC pathnames (starting with two slashes) are +supported. + +The layout of this POSIX view of the Windows file system space is stored +in the Windows registry. While the slash ('/') directory points to the system +partition by default, this is easy to change with the Cygwin mount utility. +In addition to selecting the slash partition, it allows mounting arbitrary +Win32 paths into the POSIX file system space. Many people use the utility to +mount each drive letter under the slash partition (e.g. C:\ to /c, D:\ to /d, +etc...). + +The library exports several Cygwin-specific functions that can be used +by external programs to convert a path or path list from Win32 to POSIX or vice +versa. Shell scripts and Makefiles cannot call these functions directly. +Instead, they can do the same path translations by executing the cygpath +utility program that we provide with Cygwin. + +Win32 file systems are case preserving but case insensitive. Cygwin +does not currently support case distinction because, in practice, few UNIX +programs actually rely on it. While we could mangle file names to support case +distinction, this would add unnecessary overhead to the library and make it +more difficult for non-Cygwin applications to access those files. + +Symbolic links are emulated by files containing a magic cookie followed +by the path to which the link points. They are marked with the System +attribute so that only files with that attribute have to be read to determine +whether or not the file is a symbolic link. Hard links are fully supported +under Windows NT on NTFS file systems. On a FAT file system, the call falls +back to simply copying the file, a strategy that works in many cases. + +The inode number for a file is calculated by hashing its full Win32 path. +The inode number generated by the stat call always matches the one returned in +d_ino of the dirent structure. It is worth noting that the number produced by +this method is not guaranteed to be unique. However, we have not found this to +be a significant problem because of the low probability of generating a +duplicate inode number. + + +Text Mode vs. Binary Mode +Interoperability with other Win32 programs such as text editors was +critical to the success of the port of the development tools. Most Cygnus +customers upgrading from the older DOS-hosted toolchains expected the new +Win32-hosted ones to continue to work with their old development +sources. + +Unfortunately, UNIX and Win32 use different end-of-line terminators in +text files. Consequently, carriage-return newlines have to be translated on +the fly by Cygwin into a single newline when reading in text mode. The +control-z character is interpreted as a valid end-of-file character for a +similar reason. + +This solution addresses the compatibility requirement at the expense of +violating the POSIX standard that states that text and binary mode will be +identical. Consequently, processes that attempt to lseek through text files can +no longer rely on the number of bytes read as an accurate indicator of position +in the file. For this reason, the CYGWIN environment variable can be +set to override this behavior. + + +ANSI C Library +We chose to include +Cygnus' own existing ANSI C library +"newlib" as part of the library, rather than write all of the lib C +and math calls from scratch. Newlib is a BSD-derived ANSI C library, +previously only used by cross-compilers for embedded systems +development. + +The reuse of existing free implementations of such things +as the glob, regexp, and getopt libraries saved us considerable +effort. In addition, Cygwin uses Doug Lea's free malloc +implementation that successfully balances speed and compactness. The +library accesses the malloc calls via an exported function pointer. +This makes it possible for a Cygwin process to provide its own +malloc if it so desires. + + +Process Creation +The fork call in Cygwin is particularly interesting because it +does not map well on top of the Win32 API. This makes it very +difficult to implement correctly. Currently, the Cygwin fork is a +non-copy-on-write implementation similar to what was present in early +flavors of UNIX. + +The first thing that happens when a parent process +forks a child process is that the parent initializes a space in the +Cygwin process table for the child. It then creates a suspended +child process using the Win32 CreateProcess call. Next, the parent +process calls setjmp to save its own context and sets a pointer to +this in a Cygwin shared memory area (shared among all Cygwin +tasks). It then fills in the child's .data and .bss sections by +copying from its own address space into the suspended child's address +space. After the child's address space is initialized, the child is +run while the parent waits on a mutex. The child discovers it has +been forked and longjumps using the saved jump buffer. The child then +sets the mutex the parent is waiting on and blocks on another mutex. +This is the signal for the parent to copy its stack and heap into the +child, after which it releases the mutex the child is waiting on and +returns from the fork call. Finally, the child wakes from blocking on +the last mutex, recreates any memory-mapped areas passed to it via the +shared area, and returns from fork itself. + +While we have some +ideas as to how to speed up our fork implementation by reducing the +number of context switches between the parent and child process, fork +will almost certainly always be inefficient under Win32. Fortunately, +in most circumstances the spawn family of calls provided by Cygwin +can be substituted for a fork/exec pair with only a little effort. +These calls map cleanly on top of the Win32 API. As a result, they +are much more efficient. Changing the compiler's driver program to +call spawn instead of fork was a trivial change and increased +compilation speeds by twenty to thirty percent in our +tests. + +However, spawn and exec present their own set of +difficulties. Because there is no way to do an actual exec under +Win32, Cygwin has to invent its own Process IDs (PIDs). As a +result, when a process performs multiple exec calls, there will be +multiple Windows PIDs associated with a single Cygwin PID. In some +cases, stubs of each of these Win32 processes may linger, waiting for +their exec'd Cygwin process to exit. + + +Signals +When +a Cygwin process starts, the library starts a secondary thread for +use in signal handling. This thread waits for Windows events used to +pass signals to the process. When a process notices it has a signal, +it scans its signal bitmask and handles the signal in the appropriate +fashion. + +Several complications in the implementation arise from the +fact that the signal handler operates in the same address space as the +executing program. The immediate consequence is that Cygwin system +functions are interruptible unless special care is taken to avoid +this. We go to some lengths to prevent the sig_send function that +sends signals from being interrupted. In the case of a process +sending a signal to another process, we place a mutex around sig_send +such that sig_send will not be interrupted until it has completely +finished sending the signal. + +In the case of a process sending +itself a signal, we use a separate semaphore/event pair instead of the +mutex. sig_send starts by resetting the event and incrementing the +semaphore that flags the signal handler to process the signal. After +the signal is processed, the signal handler signals the event that it +is done. This process keeps intraprocess signals synchronous, as +required by POSIX. + +Most standard UNIX signals are provided. Job +control works as expected in shells that support +it. + + +Sockets +Socket-related calls in Cygwin simply +call the functions by the same name in Winsock, Microsoft's +implementation of Berkeley sockets. Only a few changes were needed to +match the expected UNIX semantics - one of the most troublesome +differences was that Winsock must be initialized before the first +socket function is called. As a result, Cygwin has to perform this +initialization when appropriate. In order to support sockets across +fork calls, child processes initialize Winsock if any inherited file +descriptor is a socket. + +Unfortunately, implicitly loading DLLs +at process startup is usually a slow affair. Because many processes +do not use sockets, Cygwin explicitly loads the Winsock DLL the +first time it calls the Winsock initialization routine. This single +change sped up GNU configure times by thirty +percent. + + +Select +The UNIX select function is another +call that does not map cleanly on top of the Win32 API. Much to our +dismay, we discovered that the Win32 select in Winsock only worked on +socket handles. Our implementation allows select to function normally +when given different types of file descriptors (sockets, pipes, +handles, and a custom /dev/windows Windows messages +pseudo-device). + +Upon entry into the select function, the first +operation is to sort the file descriptors into the different types. +There are then two cases to consider. The simple case is when at +least one file descriptor is a type that is always known to be ready +(such as a disk file). In that case, select returns immediately as +soon as it has polled each of the other types to see if they are +ready. The more complex case involves waiting for socket or pipe file +descriptors to be ready. This is accomplished by the main thread +suspending itself, after starting one thread for each type of file +descriptor present. Each thread polls the file descriptors of its +respective type with the appropriate Win32 API call. As soon as a +thread identifies a ready descriptor, that thread signals the main +thread to wake up. This case is now the same as the first one since +we know at least one descriptor is ready. So select returns, after +polling all of the file descriptors one last time. + -- cgit v1.2.3