diff options
Diffstat (limited to 'winsup/cygwin/DevNotes')
-rw-r--r-- | winsup/cygwin/DevNotes | 385 |
1 files changed, 0 insertions, 385 deletions
diff --git a/winsup/cygwin/DevNotes b/winsup/cygwin/DevNotes deleted file mode 100644 index 68d8320cd..000000000 --- a/winsup/cygwin/DevNotes +++ /dev/null @@ -1,385 +0,0 @@ -2012-08-09 cgf-000014 - -So, apparently I got it somewhat right before wrt signal handling. -Checking on linux, it appears that signals will be sent to a thread -which can accept the signal. So resurrecting and extending the -"find_tls" function is in order. This function will return the tls -of any thread which 1) is waiting for a signal with sigwait*() or -2) has the signal unmasked. - -In redoing this it became obvious that I had the class designation wrong -for the threadlist handling so I moved the manipulation of the global -threadlist into the cygheap where it logically belongs. - -2012-07-21 cgf-000013 - -These changes reflect a revamp of the "wait for signal" functionality -which has existed in Cygwin through several signal massages. - -We now create a signal event only when a thread is waiting for a signal -and arm it only for that thread. The "set_signal_arrived" function is -used to establish the event and set it in a location referencable by -the caller. - -I still do not handle all of the race conditions. What happens when -a signal comes in just after a WF?O succeeds for some other purpose? I -suspect that it will arm the next WF?O call and the subsequent call to -call_signal_handler could cause a function to get an EINTR when possibly -it shouldn't have. - -I haven't yet checked all of the test cases for the URL listed in the -previous entry. - -Baby steps. - -2012-06-12 cgf-000012 - -These changes are the preliminary for redoing the way threads wait for -signals. The problems are shown by the test case mentioned here: - -http://cygwin.com/ml/cygwin/2012-05/msg00434.html - -I've known that the signal handling in threads wasn't quite right for -some time. I lost all of my thread signal tests in the great "rm -r" -debacle of a few years ago and have been less than enthusiastic about -redoing everything (I had PCTS tests and everything). But it really is -time to redo this signal handling to make it more like it is supposed to -be. - -This change should not introduce any new behavior. Things should -continue to behave as before. The major differences are a change in the -arguments to cancelable_wait and cygwait now uses cancelable_wait and, -so, the returns from cygwait now mirror cancelable_wait. - -The next change will consolidate cygwait and cancelable_wait into one -cygwait function. - -2012-06-02 cgf-000011 - -The refcnt handling was tricky to get right but I had convinced myself -that the refcnt's were always incremented/decremented under a lock. -Corinna's 2012-05-23 change to refcnt exposed a potential problem with -dup handling where the fdtab could be updated while not locked. - -That should be fixed by this change but, on closer examination, it seems -like there are many places where it is possible for the refcnt to be -updated while the fdtab is not locked since the default for -cygheap_fdget is to not lock the fdtab (and that should be the default - -you can't have read holding a lock). - -Since refcnt was only ever called with 1 or -1, I broke it up into two -functions but kept the Interlocked* operation. Incrementing a variable -should not be as racy as adding an arbitrary number to it but we have -InterlockedIncrement/InterlockedDecrement for a reason so I kept the -Interlocked operation here. - -In the meantime, I'll be mulling over whether the refcnt operations are -actually safe as they are. Maybe just ensuring that they are atomically -updated is enough since they control the destruction of an fh. If I got -the ordering right with incrementing and decrementing then that should -be adequate. - -2012-06-02 cgf-000010 - -<1.7.16> -- Fix emacs problem which exposed an issue with Cygwin's select() function. - If a signal arrives while select is blocking and the program longjmps - out of the signal handler then threads and memory may be left hanging. - Fixes: http://cygwin.com/ml/cygwin/2012-05/threads.html#00275 -</1.7.16> - -This was try #4 or #5 to get select() signal handling working right. -It's still not there but it should now at least not leak memory or -threads. - -I mucked with the interface between cygwin_select and select_stuff::wait -so that the "new" loop in select_stuff::wait() was essentially moved -into the caller. cygwin_select now uses various enum states to decide -what to do. It builds the select linked list at the beginning of the -loop, allowing wait() to tear everything down and restart. This is -necessary before calling a signal handler because the signal handler may -longjmp away. - -I initially had this all coded up to use a special signal_cleanup -callback which could be called when a longjmp is called in a signal -handler. And cygwin_select() set up and tore down this callback. Once -I got everything compiling it, of course, dawned on me that just because -you call a longjmp in a signal handler it doesn't mean that you are -jumping *out* of the signal handler. So, if the signal handler invokes -the callback and returns it will be very bad for select(). Hence, this -slower, but hopefully more correct implementation. - -(I still wonder if some sort of signal cleanup callback might still -be useful in the future) - -TODO: I need to do an audit of other places where this problem could be -occurring. - -As alluded to above, select's signal handling is still not right. It -still acts as if it could call a signal handler from something other -than the main thread but, AFAICT, from my STC, this doesn't seem to be -the case. It might be worthwhile to extend cygwait to just magically -figure this out and not even bother using w4[0] for scenarios like this. - -2012-05-16 cgf-000009 - -<1.7.16> -- Fix broken console mouse handling. Reported here: - http://cygwin.com/ml/cygwin/2012-05/msg00360.html -</1.7.16> - -I did a cvs annotate on smallprint.cc and see that the code to translate -%characters > 127 to 0x notation was in the 1.1 revision. Then I -checked the smallprint.c predecessor. It was in the 1.1 version of that -program too, which means that this odd change has probably been around -since <= 2000. - -Since __small_sprintf is supposed to emulate sprintf, I got rid of the -special case handling. This may affect fhandler_socket::bind. If so, we -should work around this problem there rather than keeping this strange -hack in __small_printf. - -2012-05-14 cgf-000008 - -<1.7.16> -- Fix hang when zero bytes are written to a pty using - Windows WriteFile or equivalent. Fixes: - http://cygwin.com/ml/cygwin/2012-05/msg00323.html -</1.7.16> - -cgf-000002, as usual, fixed one thing while breaking another. See -Larry's predicament in: http://goo.gl/oGEr2 . - -The problem is that zero byte writes to the pty pipe caused the dread -end-of-the-world-as-we-know-it problem reported on the mailing list -where ReadFile reads zero bytes even though there is still more to read -on the pipe. This is because that change caused a 'record' to be read -and a record can be zero bytes. - -I was never really keen about using a throwaway buffer just to get a -count of the number of characters available to be read in the pty pipe. -On closer reading of the documentation for PeekNamedPipe it seemed like -the sixth argument to PeekNamedPipe should return what I needed without -using a buffer. And, amazingly, it did, except that the problem still -remained - a zero byte message still screwed things up. - -So, we now detect the case where there is zero bytes available as a -message but there are bytes available in the pipe. In that scenario, -return the bytes available in the pipe rather than the message length of -zero. This could conceivably cause problems with pty pipe handling in -this scenario but since the only way this scenario could possibly happen -is when someone is writing zero bytes using WriteFile to a pty pipe, I'm -ok with that. - -2012-05-14 cgf-000007 - -<1.7.16> -- Fix invocation of strace from a cygwin process. Fixes: - http://cygwin.com/ml/cygwin/2012-05/msg00292.html -</1.7.16> - -The change in cgf-000004 introduced a problem for processes which load -cygwin1.dll dynamically. strace.exe is the most prominent example of -this. - -Since the parent handle is now closed for "non-Cygwin" processes, when -strace.exe tried to dynamically load cygwin1.dll, the handle was invalid -and child_info_spawn::handle_spawn couldn't use retrieve information -from the parent. This eventually led to a strace_printf error due to an -attempt to dereference an unavailable cygheap. Probably have to fix -this someday. You shouldn't use the cygheap while attempting to print -an error about the inavailability of said cygheap. - -This was fixed by saving the parent pid in child_info_spawn and calling -OpenProcess for the parent pid and using that handle iff a process is -dynamically loaded. - -2012-05-12 cgf-000006 - -<1.7.16> -- Fix hang when calling pthread_testcancel in a canceled thread. - Fixes some of: http://cygwin.com/ml/cygwin/2012-05/msg00186.html -</1.7.16> - -This should fix the first part of the reported problem in the above -message. The cancel seemed to actually be working but, the fprintf -eventually ended up calling pthread_testcancel. Since we'd gotten here -via a cancel, it tried to recursively call the cancel handler causing a -recursive loop. - -2012-05-12 cgf-000005 - -<1.7.16> -- Fix pipe creation problem which manifested as a problem creating a -fifo. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00253.html -</1.7.16> - -My change on 2012-04-28 introduced a problem with fifos. The passed -in name was overwritten. This was because I wasn't properly keeping -track of the length of the generated pipe name when there was a -name passed in to fhandler_pipe::create. - -There was also another problem in fhandler_pipe::create. Since fifos -use PIPE_ACCESS_DUPLEX and PIPE_ACCESS_DUPLEX is an or'ing of -PIPE_ACCESS_INBOUND and PIPE_ACCESS_OUTBOUND, using PIPE_ACCESS_OUTBOUND -as a "never-used" option for PIPE_ADD_PID in fhandler.h was wrong. So, -fifo creation attempted to add the pid of a pipe to the name which is -wrong for fifos. - -2012-05-08 cgf-000004 - -The change for cgf-000003 introduced a new problem: -http://cygwin.com/ml/cygwin/2012-05/msg00154.html -http://cygwin.com/ml/cygwin/2012-05/msg00157.html - -Since a handle associated with the parent is no longer being duplicated -into a non-cygwin "execed child", Windows is free to reuse the pid of -the parent when the parent exits. However, since we *did* duplicate a -handle pointing to the pid's shared memory area into the "execed child", -the shared memory for the pid was still active. - -Since the shared memory was still available, if a new process reuses the -previous pid, Cygwin would detect that the shared memory was not created -and had a "PID_REAPED" flag. That was considered an error, and, so, it -would set procinfo to NULL and pinfo::thisproc would die since this -situation is not supposed to occur. - -I fixed this in two ways: - -1) If a shared memory region has a PID_REAPED flag then zero it and -reuse it. This should be safe since you are not really supposed to be -querying the shared memory region for anything after PID_REAPED has been -set. - -2) Forego duping a copy of myself_pinfo if we're starting a non-cygwin -child for exec. - -It seems like 2) is a common theme and an audit of all of the handles -that are being passed to non-cygwin children is in order for 1.7.16. - -The other minor modification that was made in this change was to add the -pid of the failing process to fork error output. This helps slightly -when looking at strace output, even though in this case it was easy to -find what was failing by looking for '^---' when running the "stv" -strace dumper. That found the offending exception quickly. - -2012-05-07 cgf-000003 - -<1.7.15> -Don't make Cygwin wait for all children of a non-cygwin child program. -Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00063.html, - http://cygwin.com/ml/cygwin/2012-05/msg00075.html -</1.7.15> - -This problem is due to a recent change which added some robustness and -speed to Cygwin's exec/spawn handling by not trying to force inheritance -every time a process is started. See ChangeLog entries starting on -2012-03-20, and multiple on 2012-03-21. - -Making the handle inheritable meant that, as usual, there were problems -with non-Cygwin processes. When Cygwin "execs" a non-Cygwin process N, -all of its N + 1, N + 2, ... children will also inherit the handle. -That means that Cygwin will wait until all subprocesses have exited -before it returns. - -I was willing to make this a restriction of starting non-Cygwin -processes but the problem with allowing that is that it can cause the -creation of a "limbo" pid when N exits and N + 1 and friends are still -around. In this scenario, Cygwin dutifully notices that process N has -died and sets the exit code to indicate that but N's parent will wait on -rd_proc_pipe and will only return when every N + ... windows process -has exited. - -The removal of cygheap::pid_handle was not related to the initial -problem that I set out to fix. The change came from the realization -that we were duping the current process handle into the child twice and -only needed to do it once. The current process handle is used by exec -to keep the Windows pid "alive" so that it will not be reused. So, now -we just close parent in child_info_spawn::handle_spawn iff we're not -execing. - -In debugging this it bothered me that 'ps' identified a nonactive pid as -active. Part of the reason for this was the 'parent' handle in -child_info was opened in non-Cygwin processes, keeping the pid alive. -That has been kluged around (more changes after 1.7.15) but that didn't -fix the problem. On further investigation, this seems to be caused by -the fact that the shared memory region pid handles were still being -passed to non-cygwin children, keeping the pid alive in a limbo-like -fashion. This was easily fixed by having pinfo::init() consider a -memory region with PID_REAPED as not available. A more robust fix -should be considered for 1.7.15+ where these handles are not passed -to non-cygwin processes. - -This fixed the problem where a pid showed up in the list after a user -does something like: "bash$ cmd /c start notepad" but, for some reason, -it does not fix the problem where "bash$ setsid cmd /c start notepad". -That bears investigation after 1.7.15 is released but it is not a -regression and so is not a blocker for the release. - -2012-05-03 cgf-000002 - -<1.7.15> -Fix problem where too much input was attempted to be read from a -pty slave. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00049.html -</1.7.15> - -My change on 2012/04/05 reintroduced the problem first described by: -http://cygwin.com/ml/cygwin/2011-10/threads.html#00445 - -The problem then was, IIRC, due to the fact that bytes sent to the pty -pipe were not written as records. Changing pipe to PIPE_TYPE_MESSAGE in -pipe.cc fixed the problem since writing lines to one side of the pipe -caused exactly that the number of characters to be read on the other -even if there were more characters in the pipe. - -To debug this, I first replaced fhandler_tty.cc with the 1.258, -2012/04/05 version. The test case started working when I did that. - -So, then, I replaced individual functions, one at a time, in -fhandler_tty.cc with their previous versions. I'd expected this to be a -problem with fhandler_pty_master::process_slave_output since that had -seen the most changes but was surprised to see that the culprit was -fhandler_pty_slave::read(). - -The reason was that I really needed the bytes_available() function to -return the number of bytes which would be read in the next operation -rather than the number of bytes available in the pipe. That's because -there may be a number of lines available to be read but the number of -bytes which will be read by ReadFile should reflect the mode of the pty -and, if there is a line to read, only the number of bytes in the line -should be seen as available for the next read. - -Having bytes_available() return the number of bytes which would be read -seemed to fix the problem but it could subtly change the behavior of -other callers of this function. However, I actually think this is -probably a good thing since they probably should have been seeing the -line behavior. - -2012-05-02 cgf-000001 - -<1.7.15> -Fix problem setting parent pid to 1 when process with children execs -itself. Fixes: http://cygwin.com/ml/cygwin/2012-05/msg00009.html -</1.7.15> - -Investigating this problem with strace showed that ssh-agent was -checking the parent pid and getting a 1 when it shouldn't have. Other -stuff looked ok so I chose to consider this a smoking gun. - -Going back to the version that the OP said did not have the problem, I -worked forward until I found where the problem first occurred - -somewhere around 2012-03-19. And, indeed, the getppid call returned the -correct value in the working version. That means that this stopped -working when I redid the way the process pipe was inherited around -this time period. - -It isn't clear why (and I suspect I may have to debug this further at -some point) this hasn't always been a problem but I made the obvious fix. -We shouldn't have been setting ppid = 1 when we're about to pass off to -an execed process. - -As I was writing this, I realized that it was necessary to add some -additional checks. Just checking for "have_execed" isn't enough. If -we've execed a non-cygwin process then it won't know how to deal with -any inherited children. So, always set ppid = 1 if we've execed a -non-cygwin process. |