diff options
Diffstat (limited to 'mcs/docs/compiler')
-rwxr-xr-x | mcs/docs/compiler | 374 |
1 files changed, 0 insertions, 374 deletions
diff --git a/mcs/docs/compiler b/mcs/docs/compiler deleted file mode 100755 index 91ac4980107..00000000000 --- a/mcs/docs/compiler +++ /dev/null @@ -1,374 +0,0 @@ - The Internals of the Mono C# Compiler - - Miguel de Icaza - (miguel@ximian.com) - 2002 - -* Abstract - - The Mono C# compiler is a C# compiler written in C# itself. - Its goals are to provide a free and alternate implementation - of the C# language. The Mono C# compiler generates ECMA CIL - images through the use of the System.Reflection.Emit API which - enable the compiler to be platform independent. - -* Overview: How the compiler fits together - - The compilation process is managed by the compiler driver (it - lives in driver.cs). - - The compiler reads a set of C# source code files, and parses - them. Any assemblies or modules that the user might want to - use with his project are loaded after parsing is done. - - Once all the files have been parsed, the type hierarchy is - resolved. First interfaces are resolved, then types and - enumerations. - - Once the type hierarchy is resolved, every type is populated: - fields, methods, indexers, properties, events and delegates - are entered into the type system. - - At this point the program skeleton has been completed. The - next process is to actually emit the code for each of the - executable methods. The compiler drives this from - RootContext.EmitCode. - - Each type then has to populate its methods: populating a - method requires creating a structure that is used as the state - of the block being emitted (this is the EmitContext class) and - then generating code for the topmost statement (the Block). - - Code generation has two steps: the first step is the semantic - analysis (Resolve method) that resolves any pending tasks, and - guarantees that the code is correct. The second phase is the - actual code emission. All errors are flagged during in the - "Resolution" process. - - After all code has been emitted, then the compiler closes all - the types (this basically tells the Reflection.Emit library to - finish up the types), resources, and definition of the entry - point are done at this point, and the output is saved to - disk. - -* The parsing process - - All the input files that make up a program need to be read in - advance, because C# allows declarations to happen after an - entity is used, for example, the following is a valid program: - - class X : Y { - static void Main () - { - a = "hello"; b = "world"; - } - string a; - } - - class Y { - public string b; - } - - At the time the assignment expression `a = "hello"' is parsed, - it is not know whether a is a class field from this class, or - its parents, or whether it is a property access or a variable - reference. The actual meaning of `a' will not be discvored - until the semantic analysis phase. - -** The Tokenizer and the pre-processor - - The tokenizer is contained in the file `cs-tokenizer.cs', and - the main entry point is the `token ()' method. The tokenizer - implements the `yyParser.yyInput' interface, which is what the - Yacc/Jay parser will use when fetching tokens. - - Token definitions are generated by jay during the compilation - process, and those can be references from the tokenizer class - with the `Token.' prefix. - - Each time a token is returned, the location for the token is - recorded into the `Location' property, that can be accessed by - the parser. The parser retrieves the Location properties as - it builds its internal representation to allow the semantic - analysis phase to produce error messages that can pin point - the location of the problem. - - Some tokens have values associated with it, for example when - the tokenizer encounters a string, it will return a - LITERAL_STRING token, and the actual string parsed will be - available in the `Value' property of the tokenizer. The same - mechanism is used to return integers and floating point - numbers. - - C# has a limited pre-processor that allows conditional - compilation, but it is not as fully featured as the C - pre-processor, and most notably, macros are missing. This - makes it simple to implement in very few lines and mesh it - with the tokenizer. - - The `handle_preprocessing_directive' method in the tokenizer - handles all the pre-processing, and it is invoked when the '#' - symbol is found as the first token in a line. - - The state of the pre-processor is contained in a Stack called - `ifstack', this state is used to track the if/elif/else/endif - nesting and the current state. The state is encoded in the - top of the stack as a number of values `TAKING', - `TAKEN_BEFORE', `ELSE_SEEN', `PARENT_TAKING'. - -** Locations - - Locations are encoded as a 32-bit number (the Location - struct) that map each input source line to a linear number. - As new files are parsed, the Location manager is informed of - the new file, to allow it to map back from an int constant to - a file + line number. - - The tokenizer also tracks the column number for a token, but - this is currently not being used or encoded. It could - probably be encoded in the low 9 bits, allowing for columns - from 1 to 512 to be encoded. - -* The Parser - - The parser is written using Jay, which is a port of Berkeley - Yacc to Java, that I later ported to C#. - - Many people ask why the grammar of the parser does not match - exactly the definition in the C# specification. The reason is - simple: the grammar in the C# specification is designed to be - consumed by humans, and not by a computer program. Before - you can feed this grammar to a tool, it needs to be simplified - to allow the tool to generate a correct parser for it. - - In the Mono C# compiler, we use a class for each of the - statements and expressions in the C# language. For example, - there is a `While' class for the the `while' statement, a - `Cast' class to represent a cast expression and so on. - - There is a Statement class, and an Expression class which are - the base classes for statements and expressions. - -** Namespaces - - Using list. - -* Internal Representation - -** Expressions - -*** The Expression Class - - The utility functions that can be called by all children of - Expression. - -** Constants - - Constants in the Mono C# compiler are reprensented by the - abstract class `Constant'. Constant is in turn derived from - Expression. The base constructor for `Constant' just sets the - expression class to be an `ExprClass.Value', Constants are - born in a fully resolved state, so the `DoResolve' method - only returns a reference to itself. - - Each Constant should implement the `GetValue' method which - returns an object with the actual contents of this constant, a - utility virtual method called `AsString' is used to render a - diagnostic message. The output of AsString is shown to the - developer when an error or a warning is triggered. - - Constant classes also participate in the constant folding - process. Constant folding is invoked by those expressions - that can be constant folded invoking the functionality - provided by the ConstantFold class (cfold.cs). - - Each Constant has to implement a number of methods to convert - itself into a Constant of a different type. These methods are - called `ConvertToXXXX' and they are invoked by the wrapper - functions `ToXXXX'. These methods only perform implicit - numeric conversions. Explicit conversions are handled by the - `Cast' expression class. - - The `ToXXXX' methods are the entry point, and provide error - reporting in case a conversion can not be performed. - -** Constant Folding - - The C# language requires constant folding to be implemented. - Constant folding is hooked up in the Binary.Resolve method. - If both sides of a binary expression are constants, then the - ConstantFold.BinaryFold routine is invoked. - - This routine implements all the binary operator rules, it - is a mirror of the code that generates code for binary - operators, but that has to be evaluated at runtime. - - If the constants can be folded, then a new constant expression - is returned, if not, then the null value is returned (for - example, the concatenation of a string constant and a numeric - constant is deferred to the runtime). - -** Side effects - - a [i++]++ - a [i++] += 5; - -** Statements - -* The semantic analysis - - Hence, the compiler driver has to parse all the input files. - Once all the input files have been parsed, and an internal - representation of the input program exists, the following - steps are taken: - - * The interface hierarchy is resolved first. - As the interface hierarchy is constructed, - TypeBuilder objects are created for each one of - them. - - * Classes and structure hierarchy is resolved next, - TypeBuilder objects are created for them. - - * Constants and enumerations are resolved. - - * Method, indexer, properties, delegates and event - definitions are now entered into the TypeBuilders. - - * Elements that contain code are now invoked to - perform semantic analysis and code generation. - -* Output Generation - -** Code Generation - - The EmitContext class is created any time that IL code is to - be generated (methods, properties, indexers and attributes all - create EmitContexts). - - The EmitContext keeps track of the current namespace and type - container. This is used during name resolution. - - An EmitContext is used by the underlying code generation - facilities to track the state of code generation: - - * The ILGenerator used to generate code for this - method. - - * The TypeContainer where the code lives, this is used - to access the TypeBuilder. - - * The DeclSpace, this is used to resolve names through - RootContext.LookupType in the various statements and - expressions. - - Code generation state is also tracked here: - - * CheckState: - - This variable tracks the `checked' state of the - compilation, it controls whether we should generate - code that does overflow checking, or if we generate - code that ignores overflows. - - The default setting comes from the command line - option to generate checked or unchecked code plus - any source code changes using the checked/unchecked - statements or expressions. Contrast this with the - ConstantCheckState flag. - - * ConstantCheckState - - The constant check state is always set to `true' and - cant be changed from the command line. The source - code can change this setting with the `checked' and - `unchecked' statements and expressions. - - * IsStatic - - Whether we are emitting code inside a static or - instance method - - * ReturnType - - The value that is allowed to be returned or NULL if - there is no return type. - - - * ContainerType - - Points to the Type (extracted from the - TypeContainer) that declares this body of code - summary> - - - * IsConstructor - - Whether this is generating code for a constructor - - * CurrentBlock - - Tracks the current block being generated. - - * ReturnLabel; - - The location where return has to jump to return the - value - - A few variables are used to track the state for checking in - for loops, or in try/catch statements: - - * InFinally - - Whether we are in a Finally block - - * InTry - - Whether we are in a Try block - - * InCatch - - Whether we are in a Catch block - - * InUnsafe - Whether we are inside an unsafe block - -* Miscelaneous - -** Error Processing. - - Errors are reported during the various stages of the - compilation process. The compiler stops its processing if - there are errors between the various phases. This simplifies - the code, because it is safe to assume always that the data - structures that the compiler is operating on are always - consistent. - - The error codes in the Mono C# compiler are the same as those - found in the Microsoft C# compiler, with a few exceptions - (where we report a few more errors, those are documented in - mcs/errors/errors.txt). The goal is to reduce confussion to - the users, and also to help us track the progress of the - compiler in terms of the errors we report. - - The Report class provides error and warning display functions, - and also keeps an error count which is used to stop the - compiler between the phases. - - A couple of debugging tools are available here, and are useful - when extending or fixing bugs in the compiler. If the - `--fatal' flag is passed to the compiler, the Report.Error - routine will throw an exception. This can be used to pinpoint - the location of the bug and examine the variables around the - error location. - - Warnings can be turned into errors by using the `--werror' - flag to the compiler. - - The report class also ignores warnings that have been - specified on the command line with the `--nowarn' flag. - - Finally, code in the compiler uses the global variable - RootContext.WarningLevel in a few places to decide whether a - warning is worth reporting to the user or not. - |