diff options
Diffstat (limited to 'web/xml-classes')
-rwxr-xr-x | web/xml-classes | 464 |
1 files changed, 464 insertions, 0 deletions
diff --git a/web/xml-classes b/web/xml-classes new file mode 100755 index 00000000000..143a3af60f5 --- /dev/null +++ b/web/xml-classes @@ -0,0 +1,464 @@ +* XML Classes Status and Tasks + +** Abstract + + XML library is used by several areas of Mono such as ADO.NET and XML + Digital Signature (xmldsig). Here I write about System.Xml.dll and + related tools. This page won't include any classes which are in other + assemblies such as XmlDataDocument. + + Note that current corlib has its own XML parser class (Mono.Xml.MiniParser). + + Basically System.XML.dll feature is almost finished, so I write this + document mainly for bugs and improvement hints. + +** Status + +*** System.Xml namespace + + +**** Document Object Model (Core) + + DOM implementation has finished and our DOM implementation scores better + than MS.NET as to the NIST DOM test results (it is ported by Mainsoft + hackers and in our unit tests). + +**** Xml Writer + + Here XmlWriter almost equals to XmlTextWriter. If you want to see + another implementation, check XmlNodeWriter.cs and DTMXPathDocumentWriter.cs + in System.XML sources. + + XmlTextWriter is completed, though it looks a bit slower than MS.NET (I + tried 1.1). + +**** XmlResolver + + XmlUrlResolver is implemented. + + XmlSecureResolver, which is introduced in MS .NET Framework 1.1 is basically + implemented, but it requires CAS (code access security) feature. We need to + fixup this class after ongoing CAS effort works. + + You might also be interested in an improved <a href="http://codeblogs.ximian.com/blogs/benm/archives/000039.html">XmlCachingResolver</a> by Ben Maurer. + If even one time download is not acceptable, you can use <a href="http://primates.ximian.com/~atsushi/XmlStoredResolver.cs">this one</a>. + + [2.0] XmlDataSourceResolver is not implemented as yet. + +**** XmlNameTable + + NameTable is implemented, but also needs performance improvement. + It affects on the whole XML processing performance so much. + Optimization hackings are welcome. There is also a <a + href="http://bugzilla.ximian.com/show_bug.cgi?id=59537">bugzilla entry</a> + for this matter. + +**** XML Reader + + XmlTextReader, XmlNodeReader and XmlValidatingReader are almost finished. + + <ul> + * All OASIS conformance test passes as Microsoft does. Some + W3C tests fail, but it looks better. + * Entity expansion and its well-formedness check is incomplete. + It incorrectly allows divided content models. It incorrectly + treats its Base URI, so some dtd parse fails. + * I won't add any XDR support on XmlValidatingReader. (I haven't + ever seen XDR used other than Microsoft's BizTalk Server 2000, + and Now they have 2002 with XML Schema support). If anyone + contributes an implementation, it would be still nice. + </ul> + + XmlTextReader and XmlValidatingReader should be faster than now. Currently + XmlTextReader looks nearly twice as slow as MS.NET, and XmlValidatingReader + (which uses this slow XmlTextReader) looks nearly three times slower. (Note + that XmlValidatingReader wouldn't be so slow as itself. It uses schema + validating reader and dtd validating reader.) + + +**** Some Advantages + + The design of Mono's XmlValidatingReader is radically different from + that of Microsoft's implementation. Under MS.NET, DTD content validation + engine is in fact simple replacement of XML Schema validation engine. + Mono's DTD validation is designed fully separate and does validation + as normal XML parser does. For example, Mono allows non-deterministic DTD. + + Another advantage of this XmlValidatingReader is support for *any* XmlReader. + Microsoft supports only XmlTextReader (this bug is fixed in .NET 2.0 beta, + taking shape of XmlReader.Create()). + + <del>I added extra support interface named "IHasXmlParserContext", which is + considered in XmlValidatingReader.ResolveEntity(). </del><ins>This is now + made as internal interface.</ins> Microsoft failed to design XmlReader + so that XmlReader cannot be subtree-pluggable (i.e. wrapping use of other + XmlReader) since XmlParserContext shoud be supplied for DTD information + support (e.g. entity references cannot be expanded) and namespace manager. + (In .NET 2.0, Microsoft also supported similar to IHasXmlParserContext, + named IXmlNamespaceResolver, but it still does not provide DTD information.) + + We also have RELAX NG validating reader (described later). + + +*** System.Xml.Schema + +**** Summary + + Basically it is completed. You can test how current schema validation engine + is complete (incomplete) by using standalone test module (see + mcs/class/System.XML/Test/System.Xml.Schema/standalone_tests). + At least in my box, msxsdtest fails only 30 cases with bugfixed catalog - + this score is better than that of Microsoft implementation. But instead, + we need performance boost. There should be many points to improve + schema compilation and validation. + +**** Schema Object Model + + Completed, except for some things to be fixed: + + <ul> + * Complete facet support. Currently some of them is missing. + Recently David Sheldon is doing several fixes on them. + * ContentTypeParticle for pointless xs:choice is incomplete + (fixing this arose another bugs in compilation. + Interestingly, MS.NET also fails around here, so it might + be nature of ContentTypeParticle design) + * Some derivation by restriction (DBR) handling is incorrect. + </ul> + +**** Validating Reader + + Basically this is implemented and actually its feature is complete, + but I have only did validation feature testing. So we have to write more + tests on properties, methods, and events (validation errors). + + +*** System.Xml.Serialization + + Lluis rules ;-) + + Well, in fact XmlSerializer is almost finished and is on bugfix phase. + + However, we appliciate more tests. Please try + + <ul> + * System.Web.Services to invoke SOAP services. + * xsd.exe and wsdl.exe to create classes. + </ul> + + And if any problems were found, please file it to bugzilla. + + Lluis also built interesting standalone test system placed under + mcs/class/System.Web.Services/Test/standalone. + + You might also interested in "genxs", which enables you to create custom + XML serializer. This is not included in Microsoft.NET. + See <a + href="http://primates.ximian.com/~lluis/blog/archives/000120.html">here</a> + and manpages for details. Code files are in mcs/tools/genxs. + + Lluis also created "sgen", that based on XmlSerializer.GenerateSerializer(). + Code files are in mcs/tools/sgen. + +*** System.Xml.XPath and System.Xml.Xsl + + There are two XSLT implementations. One and historical implementation is + based on libxslt (aka Unmanaged XSLT). Now we uses fully implemented and + managed XSLT by default. To use Unmanaged XSLT, set MONO_UNMANAGED_XSLT + environment value (any value is acceptable). + + As for Managed XSLT, we support msxsl:script. + + It would be nice if we can support <a href="http://www.exslt.org/">EXSLT</a>. + <a href="http://msdn.microsoft.com/WebServices/default.aspx?pull=/library/en-us/dnexxml/html/xml05192003.asp">Microsoft has tried to do some of them</a>, + but it is not successful because of System.Xml.Xsl design problem: + + <ul> + * In general, .NET's "extension objects" (including + msxsl:script) is not useful to return node-sets (MS XSLT + implementation rejects just overriden XPathNodeIterator, + but accepts only their hidden classes. And are the same + in Mono though classes are different) + + * In .NET's extension object design, extension function name + is a valid method name that cannot contain some characters + such as '-'. That is, implementing EXSLT in C# is impossible. + </ul> + + So if we support EXSLT, it has to be done inside our System.XML.dll. + Microsoft developers are also aware of this problem and some of them wish + to have EXSLT support in WinFX (not whidbey). If anyone is interested + in it, it would be nice. + + Our managed XSLT implementation is slower than MS XSLT for some kind of + stylesheets, and faster for some. + + +*** RELAX NG + + I implemented an experimental RelaxngValidatingReader. It is still not + complete, for example some simplification stuff (see RELAX NG spec + chapter 4; especially 4.17-19) and some constraints (especially 7.3). + See mcs/class/Commons.Xml.Relaxng/README for details. + + Currently we have + + <ul> + * Custom datatype support. Right now, you can use XML schema + datatypes ( http://www.w3.org/2001/XMLSchema-datatypes ) + as well as RELAX NG default datatypes (as used in relaxng.rng). + + * RELAX NG Compact Syntax support, though not yet stable. + See Commons.Xml.Relaxng.Rnc.RncParser class. +</ul> + + +** System.Xml v2.0 + + Microsoft released the first public beta version of .NET Framework 2.0, + available from <a href="http://www.microsoft.com/downloads/details.aspx?familyid=916EC067-8BDC-4737-9430-6CEC9667655C&displaylang=en">MSDN</a>. + It contains several new classes. + + There are two assemblies related to System.Xml v2.0; System.Xml.dll and + System.Data.SqlXml.dll. Now that System.Data.SqlXml.dll is little important. + It just contains only XQueryCommand class inside System.Xml.* namespace. + Most of the important part are in System.Xml.dll. + + Note that .NET Framework is pre-release version, so they are subject + to change. Actually many of the pre-released classes vanished. + + System.Xml 2.0 contains several features such as: + + <ul> + * new XPathNavigator <del>and XPathDocument</del><ins>XPathDocument is <a href="http://blogs.msdn.com/dareobasanjo/archive/2004/08/25/220251.aspx">being reverted</a></ins> + * XmlReaderSettings, XmlWriterSettings and factory methods + * Strongly typed XmlReader and XmlWriter. + * XML Schema design changes + * XSD Inference + * Well-documented and improved XmlSerializer. + * XQuery execution engine + * XQuery and XSLT per-stylesheet assembly generator + </ul> + +*** System.Xml 2.0 + +**** XmlReader/XmlWrier Factory methods + + In .NET 2.0, XmlTextReader, XmlNodeReader, XmlValidatingReader are + obsolete and XmlReader.Create() is recommended (there is however no + alternative way to create XmlNodeReader). Similarly, there are + XmlWriter.Create() overloads. + + Currently, Microsoft's XmlWriter.Create() is unreliable and maybe there + will be changes. So basically XmlWriter.Create() is supposed to be done + after the next beta version of .NET 2.0. + + Some of XmlReader.Create() overloads are implemented, with limited + XmlReaderSettings support. + + +**** Typed XmlReader/XmlWriter + + In .NET 2.0, XmlReader is supposed to support strongly-typed data reading. + They are based on W3C "XML Schema Datatypes" Recommendation and "XQuery 1.0 + and XPath 2.0 Data Model" Working Draft. + + Some of XmlReader.ReadValueAsXxx() and XmlWriter.WriteValue() overloads are + implemented, though incompletely. They are based on internal XQueryConvert. + + +**** Sub-tree handling in XmlReader/XmlWriter/XPathNavigator + + Currently XmlReader.ReadSubtree(), XmlWriter.WriteSubtree() and + XPathNavigator.ReadSubtree() are implemented, though not well-tested. + They are based on Mono.Xml.SubtreeXmlReader and + Mono.Xml.XPath.XPathNavigatorReader classes. + + +*** System.Xml.Schema 2.0 + + Since .NET 1.x is not so compliant with W3C XML Schema specification, + Microsoft had to redesign System.Xml.Schema classes. We also have to + change many things. + + 1) It does not expose XmlSchemaDatatype anymore (except for obsolete + members). Primitive types are represented as XmlSchemaSimpleType + instances (thus there are ElementSchemaType, AttributeSchemaType, + BaseXmlSchemaType that replace some existing properties). + + 2) "XQuery 1.0 and XPath 2.0 Data Model" datatypes (such as + xdt:dayTimeDuration) are newly supported. They are partially implemented + yet. This task is partly done. + + 3) schema structures are now bound in parent-child relationship. It is + not yet implemented. With related to it, there seems bunch of schema + compilation bugfixes. + + 4) XmlSchemaCollection is not used anymore to represent effective set of + schemas. Instead, new XmlSchemaSet class is used. It should affect on + schema compilation design. In fact, I've implemented XmlSchemaCollection + as more conformant to W3C specification, but there are still many changes + required. This task is partly done. + + +**** XSD Inference + + In .NET 2.0, there is an XML Schema inference implementation. Now that + XmlSchemaSet is basically implemented, it can be separately done by anyone. + Volunteer efforts are welcome here. + + +*** System.Xml.XPath 2.0 + +**** Editable XPathDocument + + <del> + in .NET 2.0 XPathDocument is supposed to be editable. Currently we provide + fast document table model based implementation (DTMXPathNavigator), but + by that design change, we (and they) cannot provide fast read only + XPathNavigator from XPathDocument anymore. + </del><ins> + It is being reverted to the original (.NET 1.x) XPathDocument. We still have + them, but we'll revert them too in the future. So our XPathDocument will be still faster one. + </ins> + + Currently, new XPathDocument implementation is provided. The actual + implementation is Mono.Xml.XPath.XPathDocument2, that is simple dom-like + tree model. XPathDocument2 implements the same interfaces as XPathDocument + does. And XPathDocument delegates most of the methods to that class (for + example, XPathDocument.CreateEditor() calls XPathDocument2.CreateEditor()). + + Currently Mono.Xml.XPath.XPathDocument2 is unstable (it does not pass + the standalone XSLT tests unlike existing DTMXPathDocument does). So + it did not replace existing XPathDocument implementation, but you can use + new implementation by explicitly setting environment value + USE_XPATH_DOCUMENT_2 = yes. Currently it supports (well, is supposed + to support) basic editor feature such as AppendChild(). Other members + are untested (such as RejectChanges()). + +**** extra stuff - XPathEditableDocument + + Currently we provide another IXPathEditable; XPathEditableDocument. That is + based on the idea that handles XmlDocument as editor target. It is + implemented as Mono.Xml.XPath.XPathEditableDocument. We might provide this + class as extra set (might be different mono-specific XML assembly). + + +**** System.Xml.XQuery + + In this namespace, there are two significant classes. XsltCommand and + XQueryCommand. + + XsltCommand implements XSLT transformation. It is almost the same as + System.Xml.Xsl.XslTransform, but this class transforms documents twice + to four times as fast as XslTransform. Instead, stylesheet compilation + is much slower, because it generates compiled stylesheet assembly. + + XQueryCommand implements XQuery. XQuery is a new face XML document + manipulation language (at least new face in .NET world). It is similar + to XSLT, but extended to support XML Schema based datatypes (and it is + not XML based langauge). It is similar to XPath, but it can construct + XML nodes. It has no complicated template resolution, but works like + functional languages. + + Under MS.NET, XQuery implementation is mainly in System.Xml.Query and + MS.Internal.Xml.* namespaces. The implementation is mostly + in System.Xml.dll. It is also true to our System.Xml.dll. Our XQueryCommand + in System.Data.SqlXml.dll just invokes the actual XQuery processor + (Mono.Xml.XPath2.XQueryCommandImpl) which resides in System.Xml.dll via + reflection. + + Currently we are not implementing MS.Internal.Xml.* classes. MS + implementation is based on an old version of the W3C specification, and + our implementation is currently based on + <a href="http://www.w3.org/TR/2004/WD-xquery-20040723/">23 July 2004 + version</a> (latest as of now) of the working draft. + + XQuery implementation tasks are: + + <ul> + * XQuery syntax parser that parses xquery string to AST + (abstract syntax tree). -> partly not done. + + * XQuery AST compiler into static context -> partly not done. + + * XQuery (dynamic context) runtime = XQuery expression evaluator + + sequence iterator. -> partly not done. + + * XPathItem data model and (mainly) conversion support. + -> partly done. + + * Applied expression classes for XQuery/XPath 2.0 functions and + operators. -> partly done. + + * Optimization, and design per-query assembly code generator (later) + </ul> + + It already handles some queries, while major part implementation is missing + or buggy (like FLWOR, expressions for sequence type handling, built-in + function support etc.). + + +*** Relax NG and DSDL in Mono 1.2 + + Currently we support only RELAX NG as one part of ISO DSDL effort. There + is existing Schematron implementation (NMatrix Project: <a + href="http://sourceforge.net/projects/dotnetopensrc/"> + http://sourceforge.net/projects/dotnetopensrc/</a>). With a few changes, + it can be used with mono. + + We also don't have multi-language based validation support, namely + Namespace-based Validation Dispatch Language (NVDL). To support unwrapping, + one special XmlReader implementation is required (other schema validation + support can be done by ReadSubtree()). Note that we had seen RELAX + Namespace, Modular Namespace (MNS) and Namespace Routing Language (NRL) + - that is, standardization effort is still ongoing (though NVDL looks + mostly the same as NRL). + + In Mono 1.2, there might be improvements on Commons.Xml.Relaxng. + + <ul> + * Currently RelaxngPattern.Compile() provides cheap compilation + error information. At least it can provide error location. + Also, the type of error should be kind of + RelaxngGrammarException. + + * Right now there is no ambiguity detection implementation that + would be useful for RelaxngPattern based xml serialization (if + there is need). + + * Because of lack of ambiguity detection, there is no way to + provide XmlMapping (XmlTypeMapping/XmlMemberMapping). But + If anyone is interested in such effort, integration with + XmlSerializer would be interesting task. + </ul> + + +** Tools + +*** xsd.exe + + See <a href="ado-net.html">ADO.NET page</a>. + + +** Miscellaneous + +*** Mutual assembly dependency + + Sometimes I hear complain about System.dll and System.Xml.dll mutual + dependency: System.dll references to System.Xml.dll (e.g. + System.Configuration.ConfigXmlDocument extended from XmlDocument), while + System.Xml.dll vice versa (e.g. XmlUrlResolver.ResolveUri takes System.Uri). + Since they are in public method signatures, so at least we cannot get rid + of these mutual references. + + Nowadays System.Xml.dll is built using incomplete System.dll (lacking + System.Xml dependent classes such as ConfigXmlDocument). Full System.dll + is built after System.Xml.dll is done. + + Note that you still need System.dll to run mcs. + + + Atsushi Eno <asushi@ximian.com> + last updated 09/02/2004 + |