diff options
Diffstat (limited to 'mcs/class/System/System.Text.RegularExpressions/notes.txt')
-rw-r--r-- | mcs/class/System/System.Text.RegularExpressions/notes.txt | 45 |
1 files changed, 0 insertions, 45 deletions
diff --git a/mcs/class/System/System.Text.RegularExpressions/notes.txt b/mcs/class/System/System.Text.RegularExpressions/notes.txt deleted file mode 100644 index 56b047ec76e..00000000000 --- a/mcs/class/System/System.Text.RegularExpressions/notes.txt +++ /dev/null @@ -1,45 +0,0 @@ -TODO: - -* Need to go through everything and square it with RightToLeft matching. - The support for this was built into an early version, and lots of things built - afterwards are not savvy about bi-directional matching. Things that spring to - mind: Regex match methods should start at 0 or text.Length depending on - direction. Do split and replace need changes? Match should be aware of its - direction (already applied some of this to NextMatch logic). The interpreter - needs to check left and right bounds. Anchoring and substring discovery need - to be reworked. RTL matches are going to have anchors on the right - ie $, \Z - and \z. This should be added to the anchor logic. QuickSearch needs to work in - reverse. There may be other stuff.... work through the code. - -* Add ECMAScript support to the parser. For example, [.\w\s\d] map to ECMA - categories instead of canonical ones [DONE]. There's different behaviour on - backreference/octal disambiguation. Find out what the runtime behavioural - difference is for cyclic backreferences eg (?(1)abc\1) - this is only briefly - mentioned in the spec. I couldn't find much on this in the ECMAScript - specification either. - -* Octal/backreference parsing needs a big fix. The rules are ridiculously complex. - -* Add a check in QuickSearch for single character substrings. This is likely to - be a common case. There's no need to go through a shift table. Also, have a - look at just computing a relevant subset of the shift table and using an - (offset, size) pair to help test inclusion. Characters not in the table get - the default len + 1 shift. - -* Improve the perl test suite. Run under MS runtime to generate checksums for - each trial. Checksums should incorporate: all captures (index, length) for all - groups; names of explicit capturing groups, and the numbers they map to. Any - other state? RegexTrial.Execute() will then compare result and checksum. - -* The pattern (?(1?)a|b). It should fail: Perl fails, the MS implementation - fails, but I pass. The documentation says that the construct (?(X)...) can be - processed in two ways. If X is a valid group number, or a valid group name, - then the expression becomes a capture conditional - the (...) part is - executed only if X has been captured. If X is not a group number or name, then - it is treated as a positive lookahead., and (...) is only executed if the - lookahead succeeds. My code does the latter, but on further investigation it - appears that both Perl and MS fail to recognize an expression assertion if the - first character of the assertion is a number - which instead suggests a - capture conditional. The exception raised is something like "invalid group - number". I get the feeling the my behaviour seems more correct, but it's not - consistent with the other implementations, so it should probably be changed. |