pcl-run.py has changed the interface to the execute module function, and more documentation.

author: Ian Johnson <ian.johnson@appliedlanguage.com> 2013-07-02 20:15:41 +0400
committer: Ian Johnson <ian.johnson@appliedlanguage.com> 2013-07-02 20:15:41 +0400
commit: 1957008df973856481a061df875096974064b56e (patch)
tree: bdef6f5c47957e6745c765f9e1de47cc1149ac6f
parent: 4aecfcffe8a236c36bbe2d8451e992015f3f7250 (diff)
7 files changed, 161 insertions, 32 deletions
diff --git a/documentation/chapters/adapter/adapter.tex b/documentation/chapters/adapter/adapter.tex
new file mode 100644
index 0000000..fff6c63
--- /dev/null
+++ b/documentation/chapters/adapter/adapter.tex
@@ -0,0 +1,4 @@
+\chapter{Adapting to PCL}
+Your existing scripts and programs can be used with PCL. A simple Python file is required which contains six functions that inform PCLc about the natures of the component.
+
+Care must be taken when adapting your existing work to PCL pipelines. Threading issues and batch or on-line processing must be considered as the dynamics of your final pipeline may depend on it. Also, any state that may need to accumulate over the lifetime of a PCL component must be handled by the adapter for your programs.
+\ No newline at end of file
diff --git a/documentation/chapters/compiler/compiler.tex b/documentation/chapters/compiler/compiler.tex
index 4b222de..984ea5b 100644
--- a/documentation/chapters/compiler/compiler.tex
+++ b/documentation/chapters/compiler/compiler.tex
@@ -419,7 +419,7 @@ component parallel_sleep
 \end{figure}
 This component constructs two sleep components using a static sleep command to execute. The two sleep components are composed using the fanout operation, such that they run in parallel. Other example PCL files can be found in the \texttt{examples} directory of your Git clone.
 
-\section{Usage}
+\section{Usage}\label{sec:pclc-usage}
 Ensure you have \texttt{src/pclc/pclc.py} in your platform path. Running \texttt{pclc.py -h} yields:
 \begin{verbatim}
 Usage: pclc.py [options] [PCL file]
@@ -438,7 +438,7 @@ The command-line options are:
 \begin{itemize}
 \item \texttt{-h}, \texttt{--help}: Display the help message,
 \item \texttt{-l}, \texttt{--loglevel}: The logging level for the \texttt{pclc.log} file that is created during compilation. This file, depending on log level, shall show information about the parsing internals of PCLc,
-\item \texttt{-i}, \texttt{--instrument}: Specifying this flag shall add code to the generated component which shall log to standard error when component starts and finishes. The log messages are time stamped so can be used as a rudimentary profiling tool, and
+\item \texttt{-i}, \texttt{--instrument}: Specifying this flag shall add code to the generated component which shall log to standard error when the component's constructed and used components start and finish. The log messages are time stamped so can be used as a rudimentary profiling tool, and
 \item \texttt{-v}, \texttt{--version}: Show the version of PCLc.
 \end{itemize}
 
@@ -446,6 +446,11 @@ For example, change directory to \texttt{src/examples/parallel\_sleep} and issue
 \begin{verbatim}
 pclc.py -i parallel_sleep.pcl
 \end{verbatim}
+The \texttt{pcl} extension is not required so running the compiler with:
+\begin{verbatim}
+pclc.py -i parallel_sleep
+\end{verbatim}
+has the same effect.
 The compilation process will generate three new files:
 \begin{itemize}
 \item \texttt{parallel\_sleep.py}: The object code from the compilation. PCL compiles to Python and this file shall be used by the runtime to build the final pipeline,
diff --git a/documentation/chapters/run-time/run-time.tex b/documentation/chapters/run-time/run-time.tex
index d401b58..b8dddb6 100644
--- a/documentation/chapters/run-time/run-time.tex
+++ b/documentation/chapters/run-time/run-time.tex
@@ -1,8 +1,128 @@
 \chapter{PCL Runtime}
+The PCL runtime is an optional method of running a pipeline. It can be found in the \texttt{src/pcl-run} directory of the Git clone. Ensure this directory is in your platform path and issue:
+\begin{verbatim}
+pcl-run.py -h
+\end{verbatim}
+This yields:
+\begin{verbatim}
+Usage: pcl-run.py [options] [PCL configuration]
+
+Options:
+  -h, --help            show this help message and exit
+  -v, --version         show version and exit
+  -n NO_WORKERS, --noworkers=NO_WORKERS
+                        number of pipeline evaluation
+                        workers [default: 5]
+\end{verbatim}
+
+The command-line options are:
+\begin{itemize}
+\item \texttt{-h}, \texttt{--help}: Display the help message,
+\item \texttt{-v}, \texttt{--version}: Show the version of PCLc.
+\item \texttt{-n}, \texttt{--noworkers}: The components are executed in a thread pool. This option determines the maximum size of this thread pool. If you find that components that are expected to execute in parallel are running sequentially, then increasing the number of threads in the pool may help.
+\end{itemize}
+
+\section{Pipeline Configuration}
+The pipeline configuration file contains the static configuration used by components to construct other components, and the pipeline's inputs. The filename must be the same as the component you wish to run with a \texttt{.cfg} extension, e.g., the \texttt{parallel\_sleep} configuration file is called \texttt{parallel\_sleep.cfg}. The configuration file contains two sections \texttt{[Configuration]}, for configuration values, and \texttt{[Inputs]}, for pipeline inputs. Each section contains key value pairs, e.g., the \texttt{parallel\_sleep} configuration file looks like this:
+\begin{verbatim}
+[Configuration]
+sleep_command = /bin/sleep
+[Inputs]
+sleep_time = 5
+\end{verbatim}
+Environment variables can be used in configuration files with \texttt{\$(VAR\_NAME)}. The environment variable, if it exists, shall be substituted and used in the pipeline.
+
+\section{Running a Pipeline}
+At the end of the last chapter you compiled the \texttt{parallel\_sleep} component from the \texttt{examples} directory. To run this pipeline return to the \texttt{examples/parallel\_sleep} direcotry and run:
+\begin{verbatim}
+pcl-run.py parallel_sleep.cfg
+\end{verbatim}
+or
+\begin{verbatim}
+pcl-run.py parallel_sleep
+\end{verbatim}
+After 5 seconds the runtime should display on \texttt{stdout}:
+\begin{verbatim}
+({'complete': True}, {'complete': True})
+\end{verbatim}
+
+If, on the other hand, you have compiled this pipeline with instrumentation enabled (see Section \ref{sec:pclc-usage}) you should see something like this:
+\begin{verbatim}
+07/02/13 15:45:40.851373: MainThread: Component parallel_sleep
+is constructing bottom_sleep (id = 38338448) with
+configuration {'sleep_command': '/bin/sleep'}
+(sleep instance declared at line 27)
+07/02/13 15:45:40.851504: MainThread: Component parallel_sleep
+is constructing top_sleep (id = 38392400) with
+configuration {'sleep_command': '/bin/sleep'}
+(sleep instance declared at line 26)
+07/02/13 15:45:40.852697: Thread-2: Component parallel_sleep
+is starting top_sleep (id = 38392400) with input
+{'sleep_time': 5} and state {'sleep_command': '/bin/sleep'}
+07/02/13 15:45:40.856738: Thread-2: Component parallel_sleep
+is starting bottom_sleep (id = 38338448) with input
+{'sleep_time': 5} and state {'sleep_command': '/bin/sleep'}
+07/02/13 15:45:45.857495: Thread-1: Component parallel_sleep
+is finishing top_sleep (id = 38392400) with input
+{'complete': True} and state {'sleep_command': '/bin/sleep'}
+07/02/13 15:45:45.859939: Thread-5: Component parallel_sleep
+is finishing bottom_sleep (id = 38338448) with input
+{'complete': True} and state {'sleep_command': '/bin/sleep'}
+({'complete': True}, {'complete': True})
+\end{verbatim}
+The timestamped lines appear on \texttt{stderr}.
 
+\section{Gotchas}
+PCL allows for components to be defined in hierarchical namespace. All directories, in your PCL component heirarchical namespace, that do not contain PCL files must contain \texttt{\_\_init\_\_.py} in order for the Python runtime to ``see'' these directories as Python packages. Failure to do so will yield an error in the form:
+\begin{verbatim}
+ERROR: Failed to import PCL module parallel_sleep: No module
+named parallel_sleep
+\end{verbatim}
 
-\subsection{Gotchas}
-PCL allows for components to be defined in hierarchical namespace. All directories that do not contain PCL files must contain \texttt{\_\_init\_\_.py} in order for the Python runtime to ``see'' these directories as Python packages. Failure to do so will yield an error in the form:
+\section{Using PCL in your own Python programs}
+If you wish to running PCL pipelines in your own programs a function exists in \texttt{src/pcl-run/runner/runner.py} called \texttt{execute\_module(executor,\ pcl\_import\_path,\ pcl\_module,\ get\_configuration\_fn,\ get\_inputs\_fn)}. This function returns a 2-tuple whose first element is the expected outputs of the pipeline, and the second element is the output of the executed pipeline.
+
+For example, the \texttt{parallel\_sleep} pipeline would output:
 \begin{verbatim}
-ERROR: Failed to import PCL module parallel_sleep: No module named parallel_sleep
+((['complete'], ['complete']),
+ ({'complete': True}, {'complete': True}))
+\end{verbatim}
+
+The inputs are:
+\begin{itemize}
+\item \texttt{executor}: A \texttt{concurrent.futures.ThreadPoolExecutor} object,
+\item \texttt{pcl\_import\_path}: A colon separated string of directories from which to search for PCL components,\\e.g., \texttt{com.mammon.wizz.components.pre\_processing:\\com.mammon.wizz.components.workers}.
+\item \texttt{pcl\_module}: A dot separated string representing the path to a compiled PCL module, e.g., \texttt{trail.pipelines.gonzo},
+\item \texttt{get\_configuration\_fn}: A function which shall receive an iterable which contains the expected configuration for the component. This fucntion shall return a dictionary whose keys are the expected configuration along with their values, e.g.,
+\begin{verbatim}
+def get_configuration(expected_configuration):
+  configuration = dict()
+  for config_key in expected_configurations:
+    configuration[config_key] =
+      # You need to implement this function
+      get_configuration_from_provider(config_key)
+
+   return configuration
+\end{verbatim}
+\item \texttt{get\_inputs\_fn}: A function that shall receive the input port specification for the component. A tuple indicates a two port input and shall contain two iterable collections containing the signals for both input ports, otherwise it is an iterable collection of signal names for the single output port. The function shall return a 2-tuple of dictionaries whose keys are the expected input signal names and values when the component has two input ports. Or, a dictionary whose keys represent the signals of a single input port, e.g.,
+\begin{verbatim}
+def get_inputs(expected_inputs):
+  def build_inputs_fn(inputs):
+    input_dict = dict()
+    for an_input in inputs:
+      input_dict[an_input] =
+        # You need to implement this function
+        get_input_from_provider(an_input)
+    return input_dict
+
+  if isinstance(expected_inputs, tuple):
+    inputs = list()
+    for set_inputs in expected_inputs:
+      inputs.append(build_inputs_fn(set_inputs))
+    inputs = tuple(pipeline_inputs)
+  else:
+    inputs = build_inputs_fn(expected_inputs)
+
+  return inputs
 \end{verbatim}
+\end{itemize}
+\ No newline at end of file
diff --git a/documentation/pcl-manual.tex b/documentation/pcl-manual.tex
index e2cbd06..af66dae 100644
--- a/documentation/pcl-manual.tex
+++ b/documentation/pcl-manual.tex
@@ -7,7 +7,8 @@
 \includeonly{%
   chapters/introduction/introduction,
   chapters/compiler/compiler,
-  chapters/run-time/run-time
+  chapters/run-time/run-time,
+  chapters/adapter/adapter
 }
 
 
@@ -50,11 +51,11 @@
 
 
 %%% Macro definitions for Commonly used symbols
-\newcommand{\ReleaseVersion}{1.0.0}
+\newcommand{\ReleaseVersion}{1.0.0-beta}
 
 \begin{document}
 \title{\huge{Pipeline Creation Language (PCL)}\\
-\LARGE{Users' Manual}\\
+\LARGE{User Manual}\\
 \normalsize{Version: \ReleaseVersion}}
 \author{Ian Johnson}
 \date{\today}
@@ -65,6 +66,8 @@
 \onehalfspacing
 
 \begin{abstract}
+Pipeline Creation Language (PCL) is a general purpose language for creating non-recurrent software pipelines. This manual describes the syntax of PCL, how to compile it, and run it. Also, how to adapt your existing scripts or programs for use with PCL.
+
 PCL was developed as part of the MosesCore project sponsored by the European Commission's Seveth Framework Programme (Grant Number 288487) \url{http://www.statmt.org/mosescore/}. For more information on the Seventh Framework Programme please see \url{http://cordis.europa.eu/fp7/home_en.html}.
 \end{abstract}
 
@@ -72,7 +75,6 @@ PCL was developed as part of the MosesCore project sponsored by the European Com
 
 \tableofcontents
 \listoffigures
-\listoftables
 
 \cleardoublepage
 \setcounter{page}{1}
@@ -81,6 +83,7 @@ PCL was developed as part of the MosesCore project sponsored by the European Com
 \include{chapters/introduction/introduction}
 \include{chapters/compiler/compiler}
 \include{chapters/run-time/run-time}
+\include{chapters/adapter/adapter}
 
 \bibliographystyle{plain}{}
 \bibliography{pcl-manual}
diff --git a/src/pcl-run/pcl-run.py b/src/pcl-run/pcl-run.py
index 674e9d6..2fd0ddd 100755
--- a/src/pcl-run/pcl-run.py
+++ b/src/pcl-run/pcl-run.py
@@ -22,11 +22,12 @@ import os
 import re
 import sys
 
+from concurrent.futures import ThreadPoolExecutor
 from optparse import OptionParser
 from runner.runner import PCLImportError, execute_module
 
 
-__VERSION = "1.2.1"
+__VERSION = "1.3.0"
 
 
 if __name__ == '__main__':
@@ -153,10 +154,12 @@ if __name__ == '__main__':
 
         return pipeline_inputs
 
+    # The execution environment
+    executor = ThreadPoolExecutor(max_workers = options.no_workers)
     try:
-        print >> sys.stderr, execute_module(pcl_import_path,
+        print >> sys.stderr, execute_module(executor,
+                                            pcl_import_path,
                                             pcl_module,
-                                            options.no_workers,
                                             get_configuration_values,
                                             get_input_values)[1]
     except PCLImportError as ex:
@@ -165,3 +168,5 @@ if __name__ == '__main__':
     except AttributeError as ex:
         print >> sys.stderr, "ERROR: PCL module %s does not have required functions: %s" % (pcl_module, ex)
         sys.exit(1)
+    finally:
+        executor.shutdown(True)
diff --git a/src/pcl-run/runner/runner.py b/src/pcl-run/runner/runner.py
index 82d1cc4..030d53b 100644
--- a/src/pcl-run/runner/runner.py
+++ b/src/pcl-run/runner/runner.py
@@ -19,7 +19,6 @@
 #
 import sys
 
-from concurrent.futures import ThreadPoolExecutor
 from pypeline.core.arrows.kleisli_arrow import KleisliArrow
 from pypeline.helpers.parallel_helpers import eval_pipeline, cons_function_component
 
@@ -35,8 +34,8 @@ class PCLImportError(Exception):
         return "PCLImportError(cause = %s)" % self.__cause.__repr__()
 
 
-def execute_module(pcl_import_path, pcl_module, no_workers, get_configuration_fn, get_inputs_fn):
-    """Executes a PCL component in a concurrent environment. Provide a colon separated PCL import path, the fully qualified PCL module name, the number of workers in the concurrent execution environment, and getter two functions. The configuration function receives the expected configuration keys and should return a dictionary, whose keys are the expected configuration, with appropriate values. The input function received the expected inputs and should return a dictionary, whose keys are the expected inputs, with appropriate values."""
+def execute_module(executor, pcl_import_path, pcl_module, get_configuration_fn, get_inputs_fn):
+    """Executes a PCL component in a concurrent environment. Provide a the concurrent execution environment, a colon separated PCL import path, the fully qualified PCL module name, and getter two functions. The configuration function receives the expected configuration keys and should return a dictionary, whose keys are the expected configuration, with appropriate values. The input function received the expected inputs and should return a dictionary, whose keys are the expected inputs, with appropriate values."""
     # Set up Python path to import compiled PCL modules
     pcl_import_path_bits = pcl_import_path.split(":")
     for pcl_import_path_bit in pcl_import_path_bits:
@@ -46,6 +45,7 @@ def execute_module(pcl_import_path, pcl_module, no_workers, get_configuration_fn
     # Import PCL
     try:
         pcl = __import__(pcl_module, fromlist = ['get_inputs',
+                                                 'get_outputs',
                                                  'get_configuration',
                                                  'configure',
                                                  'initialise'])
@@ -74,13 +74,5 @@ def execute_module(pcl_import_path, pcl_module, no_workers, get_configuration_fn
         pipeline = cons_function_component(pipeline)
 
     return (expected_outputs,
-            execute_component(pipeline, pipeline_inputs, pipeline_configuration, no_workers))
-
-
-def execute_component(component, inputs, configuration, no_workers):
-    """Execute a pre-configured and initialised component."""
-    executor = ThreadPoolExecutor(max_workers = no_workers)
-    try:
-        return eval_pipeline(executor, component, inputs, configuration)
-    finally:
-        executor.shutdown(True)
+            eval_pipeline(executor, pipeline, pipeline_inputs, pipeline_configuration))
+    
diff --git a/src/pclc/pclc.py b/src/pclc/pclc.py
index 0a45c07..ed047fd 100755
--- a/src/pclc/pclc.py
+++ b/src/pclc/pclc.py
@@ -27,7 +27,7 @@ from parser.resolver import Resolver
 from visitors.executor_visitor import ExecutorVisitor
 
 
-__VERSION = "1.1.6"
+__VERSION = "1.1.7"
 
 
 if __name__ == '__main__':
@@ -59,7 +59,7 @@ if __name__ == '__main__':
 
     # Check we've got at least one command line argument
     if len(args) < 1:
-        print "ERROR: no input file"
+        print >> sys.stderr, "ERROR: no input file"
         sys.exit(2)
 
     # Add the PCL extension is one is missing
@@ -72,7 +72,7 @@ if __name__ == '__main__':
     # PCL file name
     pcl_filename = os.path.join(os.path.dirname(args[0]), ".".join(basename_bits))
     if os.path.isfile(pcl_filename) is False:
-        print "ERROR: Cannot find file %s" % pcl_filename
+        print >> sys.stderr, "ERROR: Cannot find file %s" % pcl_filename
         sys.exit(1)
 
     # Parse...
@@ -84,10 +84,10 @@ if __name__ == '__main__':
     resolver = Resolver(os.getenv("PCL_IMPORT_PATH", "."))
     resolver.resolve(ast)
     for warning in resolver.get_warnings():
-        print warning
+        print >> sys.stderr, warning
     if resolver.has_errors():
         for error in resolver.get_errors():
-            print error
+            print >> sys.stderr, error
         sys.exit(1)
 
     # Execute.
@@ -95,8 +95,8 @@ if __name__ == '__main__':
     try:
         ast.accept(executor)
     except Exception as ex:
-        print traceback.format_exc()
-        print "ERROR: Code generation failed: %s" % ex
+        print >> sys.stderr, traceback.format_exc()
+        print >> sys.stderr, "ERROR: Code generation failed: %s" % ex
         sys.exit(1)
 
     sys.exit(0)
author	Ian Johnson <ian.johnson@appliedlanguage.com>	2013-07-02 20:15:41 +0400
committer	Ian Johnson <ian.johnson@appliedlanguage.com>	2013-07-02 20:15:41 +0400
commit	1957008df973856481a061df875096974064b56e (patch)
tree	bdef6f5c47957e6745c765f9e1de47cc1149ac6f
parent	4aecfcffe8a236c36bbe2d8451e992015f3f7250 (diff)