documentation/chapters/adapter/adapter.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192

\chapter{Adapting to PCL}
Adapting pre-existing executables to PCL can be achieved in two ways, they are:
\begin{itemize}
\item \textbf{Imperative PCL}: The PCL language supports an imperative style grammar which allows component authors to use runtime libraries to run external executables. This portion of the PCL grammar is not \emph{Turing complete} since there are no looping constructs. This feature of PCL exists to quickly initialise pre-requisites of an external executable, run the executable, post-process a result, and return the result. If it turns out your component needs a more complex pre- and post-processing, or a component cannot be, or need not be an external exectuable, the second approach can be used.
\item \textbf{Python Wrapper}: A Python file, containing, six functions can be written that informs PCLc about the nature fo the component. Properties defined are: component's name, input and output port specifications, configuration and pre-processing configuration, and the component's computation. Since the computation of the component is described using Python, an arbitrarily complex component can be written.
\end{itemize}

Care must be taken when adapting your existing work to PCL pipelines. Threading issues and batch or on-line processing must be considered as the dynamics of your final pipeline may depend on it. Also, any state that may need to accumulate over the lifetime of a PCL component must be handled by the adaptor for your programs.

\section{Imperative PCL}
The imperative PCL language is a free-form language which allows the programmer to use arbitrary white-space to format your component definitions. Comments are a single line and should start with the \texttt{\#} and can appear at any point in a PCL file.
\begin{figure}[h!]
  \centering
    \includegraphics[scale=\DiagramScale,angle=90]{chapters/adapter/diagrams/component}
  \caption{Imperative PCL file syntax.}
  \label{fig:imperative-pcl-top-level}
\end{figure}
The top level syntax of a PCL file is shown in Figure \ref{fig:imperative-pcl-top-level}, and consists of the following sections:
\begin{itemize}
\item \textbf{Imports}: Imports can be optionally specified. The imports here are PCL runtime libraries. These are written in Python and are modules which contain functions. Imports must specify an alias and this is use to call the functions, e.g., \texttt{list.insert(...)}.
\item \textbf{Component}: This starts the component definition and provides the name. The component's name must be the same as the filename. E.g., a component in \texttt{fred.pcl} must be called \texttt{fred}.
\item \textbf{Inputs}: Defines the inputs of the component. This information is used to verify that the outputs of a previous component is compatible with another. Only \emph{one} input port can be defined.
\item \textbf{Outputs}: Defines the outputs of the component. This information is used to verify that the inputs of a subsequent component is compatible with another. Only \emph{one} output port can be defined.
\item \textbf{Configuration}: Optional configuration for the component. This is static data that shall be used to construct imported components used in this component. 
\item \textbf{Commands}: This portion is the component definition. It is a list of commands which are executed in order from top to bottom.
\end{itemize}

An example imperative PCL file can be seen in Figure \ref{fig:imperative-pcl-example}. This example can be found in the \texttt{parallel\_sleep} example in the PCL Git repository.
\begin{figure}[h!]
  \begin{verbatim}
import pcl.system.process as process
import pcl.util.list as list

component sleep
  input sleep_time
  output complete
  configuration sleep_command
  do
    cmd <- list.cons(@sleep_command, sleep_time)
    process.callAndCheck(cmd)

    return complete <- True
  \end{verbatim}
  \caption{Example imperative PCL file.}
  \label{fig:imperative-pcl-example}
\end{figure}

\subsection{Imports}
\begin{figure}[h!]
  \centering
    \includegraphics[scale=\DiagramScale]{chapters/adapter/diagrams/imports}
  \caption{\texttt{imports} : Importing PCL runtime functions.}
  \label{fig:imperative-pcl-imports}
\end{figure}
Imperative PCL files can use runtime library functions to import functionality. These runtime libraries are Python files which contain functions. The names of functions are the name of the function in PCL. Figure \ref{fig:imperative-pcl-imports} shows the syntax for importing. The environment variable \texttt{PCL\_IMPORT\_PATH} is a colon separated list of directories from which a search shall take place for the PCL runtime libraries. If this environment variable is not set then the current working directory is used as a starting point for the component search.

Each imported component must specify an alias. This is the name by which this component shall be referred to in this PCL file. E.g., \texttt{import pcl.util.list as list} shall import a PCL runtime library called \texttt{list} from the package \texttt{pcl.util} and shall be refereed to as, i.e. has the alias, \texttt{list}.

Figure \ref{fig:imperative-pcl-import-non-terminals} shows how the non-terminals expand in the import syntax.
\begin{figure}[h!]
  \centering
  \begin{subfigure}[b]{0.4\textwidth}
    \includegraphics[scale=\DiagramScale]{chapters/adapter/diagrams/pcl_module}
    \caption{\texttt{pcl\_module} expansion}
  \end{subfigure}
  ~
  \begin{subfigure}[b]{0.4\textwidth}
    \includegraphics[scale=\DiagramScale]{chapters/adapter/diagrams/pcl_module_alias}
    \caption{\texttt{pcl\_module\_alias} expansion}
  \end{subfigure}
  \caption{\texttt{pcl\_module} \& \texttt{pcl\_module\_alias} : Imperative import PCL non-terminals.}
  \label{fig:imperative-pcl-import-non-terminals}
\end{figure}

\subsection{Port Definition}
\begin{figure}[h!]
  \centering
    \includegraphics[scale=\DiagramScale]{chapters/adapter/diagrams/port_definition}
  \caption{\texttt{port\_definition} : Imperative PCL port definition.}
  \label{fig:imperative-pcl-port-def}
\end{figure}
A port definition informs the PCL compiler about the nature of a component's input or an output. Components defined with imperative PCL can only have one input and one output port. Figure \ref{fig:pcl-port-def} shows the syntax for this grammatical construct.

Again, ports carry one or more \emph{signals}. A signal is a piece of data that flows through ports and has a unique name to that port, and can be fully qualified. The signal names, for a port, are declared in a port definition. Signal names are read-only.

\subsection{Configuration}
\begin{figure}[h!]
  \centering
    \includegraphics[scale=\DiagramScale]{chapters/adapter/diagrams/configuration}
  \caption{\texttt{configuration} : Imperative PCL configuration.}
  \label{fig:imperative-pcl-config}
\end{figure}
A component's configuration is static and read-only data. Configuration data is named using identifiers, which can be fully qualified. Figure \ref{fig:imperative-pcl-config} shows the configuration syntax. Configuration identifiers may be used at any point where a variable, or input signal name can be used, e.g., and \emph{if} command, or function call. In imperative PCL zero or more configuration identifiers can be declared.

\subsection{Commands}
The command syntax is shown in Figure \ref{fig:imperative-pcl-command}. Each command yields a value which can, optionally, be assign to a write-once ``variable''. 
\begin{figure}[h!]
  \centering
    \includegraphics[scale=0.45,angle=90]{chapters/adapter/diagrams/command}
  \caption{\texttt{command} : Imperative PCL commmands.}
  \label{fig:imperative-pcl-command}
\end{figure}

\subsubsection{Function Calls}


\subsubsection{Let Bindings}

\subsubsection{If Commands}

\section{Python Wrapper}
The Python wrappers for your programs can inhabit the same hierarchical package structure as your PCL hierarchy. This is because the PCL hierarchy mirrors the Python one\footnote{This is the reason why \texttt{\_\_init\_\_.py} files must be manually placed in directories in your PCL heirarchy which have no PCL files.}.

Six functions are required in your Python wrapper, they are:
\begin{itemize}
\item \texttt{get\_name()}: Returns an object representing the name of the component. The \texttt{\_\_str\_\_()} function should be implemented to return a meaninful name. E.g.,
\begin{verbatim}
def get_name():
  return 'tokenisation'
\end{verbatim}
\item \texttt{get\_inputs()}: Returns the inputs of the component. Components should only be defined with one input port, which is defined by returning a single list of input port signal names. E.g.,
\begin{verbatim}
def get_inputs():
  return ['port.in.a', 'port.in.b']
\end{verbatim}
\item \texttt{get\_outputs()} - Returns the outputs of the component. Components should only be defined with one output port, which is defined by returning a single list of output port signal names. E.g.,
\begin{verbatim}
def get_outputs():
  return ['port.out.a', 'port.out.b', 'port.out.c']
\end{verbatim}
\item \texttt{get\_configuration()}: Returns a list of names that represent the static data that shall be used to construct the component. E.g.,
\begin{verbatim}
def get_configuration():
  return ['buffer.file', 'buffer.size']
\end{verbatim}
\item \texttt{configure(args)}: This function is the component designer's chance to preprocess configuration injected at runtime. The \texttt{args} parameter is a dictionary that contains all the configuration provided to the entire pipeline. This function is to filter out, and optionally preprocess, the configuration used by this component. This function shall return an object containing configuration necessary to construct this component. E.g. this example returns a dictionary of configuration,
\begin{verbatim}
import os
def configure(args):
  buffer_file = os.path.abspath(args['buffer.file'])
  return {'buffer.dir' : os.path.dirname(buffer_file),
          'buffer.file' : os.path.basename(buffer_file),
          'buffer.size' : args['buffer.size']}
\end{verbatim}
\item \texttt{initialise(config)}: This function is where the component designer defines the component's computation. The function receives the object returned from the \texttt{configure()} function and must return a function that takes two parameters, an input object, and a state object. The input object, \texttt{a} in the example below, is a dictionary that is received from the previous component in the pipeline. The keys of this dictionary are the signal names from the previous component's output port. The state object, \texttt{s} in the example below, is a dictionary containing the configuration for the component. The keys of the configuration dictionary are defined by the \texttt{get\_configuration()} function. The returned function should be used to define the component's computation. E.g.,
\begin{verbatim}
import subprocess
def initialise(config):
  def sleep_function(a, s):
    proc = subprocess.Popen([config['sleep_command'],
                             str(a['sleep_time'])])
    proc.communicate()
    return {'complete' : True}

  return sleep_function
\end{verbatim}
The function returned by \texttt{initialise()} is executed in the thread pool used by the runtime (see Chapter \ref{chap:runtime}). It is implementation defined as to whether this function blocks, waiting for a computation to complete, or not.
\end{itemize}

An example of a complete Python wrapper file is shown in Figure \ref{fig:python-wrapper}.
\begin{figure}[h!]
\begin{verbatim}
import subprocess

def get_name():
  return "sleep"

def get_inputs():
  return ['sleep_time']

def get_outputs():
  return ['complete']

def get_configuration():
  return ['sleep_command']

def configure(args):
  return {'sleep_command' : args['sleep_command']}

def initialise(config):
  def sleep_function(a, s):
    proc = subprocess.Popen([config['sleep_command'],
                             str(a['sleep_time'])])
    proc.communicate()
    return {'complete' : True}

  return sleep_function
\end{verbatim}
\caption{\texttt{sleep.py}: An example Python wrapper for PCL.}
\label{fig:python-wrapper}
\end{figure}
This wrapper is the Python implementation of the imperative PCL \texttt{sleep} component shown in Figure \ref{fig:imperative-pcl-example}.