For layers where the first weight dimension does not correspond to the output
dimension, self.viewOut will *not* correspond to self.weight:size(). This is
fixed by introducing another member variable that simply holds the original size
of the weight matrix.
In evaluation mode, it's not necessary to re-compute the child module's weights
at every forward pass. Instead, do it once when switching from training to
evaluation mode. This speeds up inference.