Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMarcin Junczys-Dowmunt <junczys@amu.edu.pl>2012-09-11 18:13:03 +0400
committerMarcin Junczys-Dowmunt <junczys@amu.edu.pl>2012-09-11 18:13:03 +0400
commit3f5dcf4d06f662da9ad33d1b0732782b9819ecd9 (patch)
tree746ad1233baf53821458a2ffd4492bb03c4643f3 /contrib
parentbac859c4513a69c8497e115776564c4cb3d01dd3 (diff)
parent0f3de7493491bb5a3c5e7311a4d4ebcc9ac4d2e7 (diff)
Merge branch 'master' of github.com:moses-smt/mosesdecoder
Diffstat (limited to 'contrib')
-rw-r--r--contrib/other-builds/OnDiskPt/.cproject7
-rw-r--r--contrib/other-builds/lm/.cproject10
-rw-r--r--contrib/other-builds/lm/.project15
-rw-r--r--contrib/other-builds/moses-cmd/.cproject32
-rw-r--r--contrib/other-builds/moses/.cproject173
-rw-r--r--contrib/other-builds/moses/.project732
-rw-r--r--contrib/other-builds/util/.cproject6
-rw-r--r--contrib/relent-filter/AUTHORS1
-rw-r--r--contrib/relent-filter/README.txt91
-rw-r--r--contrib/relent-filter/scripts/calcEmpiricalDistribution.pl53
-rwxr-xr-xcontrib/relent-filter/scripts/calcPruningScores.pl351
-rw-r--r--contrib/relent-filter/scripts/interpolateScores.pl94
-rwxr-xr-xcontrib/relent-filter/scripts/prunePT.pl114
-rwxr-xr-xcontrib/relent-filter/sigtest-filter/Makefile10
-rwxr-xr-xcontrib/relent-filter/sigtest-filter/README.txt42
-rwxr-xr-xcontrib/relent-filter/sigtest-filter/WIN32_functions.cpp231
-rwxr-xr-xcontrib/relent-filter/sigtest-filter/WIN32_functions.h24
-rwxr-xr-xcontrib/relent-filter/sigtest-filter/check-install5
-rwxr-xr-xcontrib/relent-filter/sigtest-filter/filter-pt.cpp377
-rwxr-xr-xcontrib/relent-filter/sigtest-filter/sigtest-filter.sln20
-rwxr-xr-xcontrib/relent-filter/src/IOWrapper.cpp580
-rwxr-xr-xcontrib/relent-filter/src/IOWrapper.h142
-rwxr-xr-xcontrib/relent-filter/src/Jamfile6
-rwxr-xr-xcontrib/relent-filter/src/LatticeMBR.cpp669
-rwxr-xr-xcontrib/relent-filter/src/LatticeMBR.h153
-rwxr-xr-xcontrib/relent-filter/src/LatticeMBRGrid.cpp213
-rwxr-xr-xcontrib/relent-filter/src/Main.cpp282
-rwxr-xr-xcontrib/relent-filter/src/Main.h39
-rwxr-xr-xcontrib/relent-filter/src/RelativeEntropyCalc.cpp83
-rwxr-xr-xcontrib/relent-filter/src/RelativeEntropyCalc.h51
-rwxr-xr-xcontrib/relent-filter/src/TranslationAnalysis.cpp126
-rwxr-xr-xcontrib/relent-filter/src/TranslationAnalysis.h25
-rwxr-xr-xcontrib/relent-filter/src/mbr.cpp178
-rwxr-xr-xcontrib/relent-filter/src/mbr.h28
-rw-r--r--contrib/reranking/data/README5
-rw-r--r--contrib/reranking/data/nbest.small7
-rw-r--r--contrib/reranking/data/weights11
-rw-r--r--contrib/reranking/src/Hypo.cpp59
-rw-r--r--contrib/reranking/src/Hypo.h44
-rw-r--r--contrib/reranking/src/Main.cpp98
-rw-r--r--contrib/reranking/src/Makefile18
-rw-r--r--contrib/reranking/src/NBest.cpp131
-rw-r--r--contrib/reranking/src/NBest.h44
-rw-r--r--contrib/reranking/src/ParameterNBest.cpp337
-rw-r--r--contrib/reranking/src/ParameterNBest.h76
-rw-r--r--contrib/reranking/src/Tools.cpp29
-rw-r--r--contrib/reranking/src/Tools.h73
-rw-r--r--contrib/sigtest-filter/Makefile2
-rw-r--r--contrib/sigtest-filter/filter-pt.cpp85
-rw-r--r--contrib/tmcombine/README.md2
-rwxr-xr-xcontrib/tmcombine/tmcombine.py14
-rw-r--r--contrib/tmcombine/train_model.patch24
52 files changed, 4436 insertions, 1586 deletions
diff --git a/contrib/other-builds/OnDiskPt/.cproject b/contrib/other-builds/OnDiskPt/.cproject
index 41f2a5141..472888f48 100644
--- a/contrib/other-builds/OnDiskPt/.cproject
+++ b/contrib/other-builds/OnDiskPt/.cproject
@@ -41,9 +41,13 @@
<option id="gnu.cpp.compilermacosx.exe.debug.option.optimization.level.676959181" name="Optimization Level" superClass="gnu.cpp.compilermacosx.exe.debug.option.optimization.level" value="gnu.cpp.compiler.optimization.level.none" valueType="enumerated"/>
<option id="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level.1484480101" name="Debug Level" superClass="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level" value="gnu.cpp.compiler.debugging.level.max" valueType="enumerated"/>
<option id="gnu.cpp.compiler.option.include.paths.1556683035" name="Include paths (-I)" superClass="gnu.cpp.compiler.option.include.paths" valueType="includePath">
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt"/>
+ <listOptionValue builtIn="false" value="${workspace_loc}/../../"/>
+ <listOptionValue builtIn="false" value="${workspace_loc}/../../moses/src"/>
<listOptionValue builtIn="false" value="/opt/local/include"/>
</option>
+ <option id="gnu.cpp.compiler.option.preprocessor.def.1052680347" name="Defined symbols (-D)" superClass="gnu.cpp.compiler.option.preprocessor.def" valueType="definedSymbols">
+ <listOptionValue builtIn="false" value="TRACE_ENABLE"/>
+ </option>
<inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.1930757481" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
</tool>
<tool id="cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.debug.1161943634" name="GCC C Compiler" superClass="cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.debug">
@@ -128,4 +132,5 @@
<storageModule moduleId="refreshScope" versionNumber="1">
<resource resourceType="PROJECT" workspacePath="/OnDiskPt"/>
</storageModule>
+ <storageModule moduleId="org.eclipse.cdt.make.core.buildtargets"/>
</cproject>
diff --git a/contrib/other-builds/lm/.cproject b/contrib/other-builds/lm/.cproject
index f89e80f49..8ecb60e02 100644
--- a/contrib/other-builds/lm/.cproject
+++ b/contrib/other-builds/lm/.cproject
@@ -42,7 +42,11 @@
<option id="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level.7139692" name="Debug Level" superClass="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level" value="gnu.cpp.compiler.debugging.level.max" valueType="enumerated"/>
<option id="gnu.cpp.compiler.option.include.paths.1988092227" name="Include paths (-I)" superClass="gnu.cpp.compiler.option.include.paths" valueType="includePath">
<listOptionValue builtIn="false" value="/opt/local/include"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt"/>
+ <listOptionValue builtIn="false" value="&quot;${workspace_loc}/../../&quot;"/>
+ </option>
+ <option id="gnu.cpp.compiler.option.preprocessor.def.1980966336" name="Defined symbols (-D)" superClass="gnu.cpp.compiler.option.preprocessor.def" valueType="definedSymbols">
+ <listOptionValue builtIn="false" value="KENLM_MAX_ORDER=7"/>
+ <listOptionValue builtIn="false" value="TRACE_ENABLE"/>
</option>
<inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.20502600" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
</tool>
@@ -53,6 +57,9 @@
</tool>
</toolChain>
</folderInfo>
+ <sourceEntries>
+ <entry excluding="left_test.cc|model_test.cc" flags="VALUE_WORKSPACE_PATH|RESOLVED" kind="sourcePath" name=""/>
+ </sourceEntries>
</configuration>
</storageModule>
<storageModule moduleId="org.eclipse.cdt.core.externalSettings"/>
@@ -122,4 +129,5 @@
</scannerConfigBuildInfo>
</storageModule>
<storageModule moduleId="refreshScope"/>
+ <storageModule moduleId="org.eclipse.cdt.make.core.buildtargets"/>
</cproject>
diff --git a/contrib/other-builds/lm/.project b/contrib/other-builds/lm/.project
index 0d30e24cb..204771764 100644
--- a/contrib/other-builds/lm/.project
+++ b/contrib/other-builds/lm/.project
@@ -327,6 +327,21 @@
<locationURI>PARENT-3-PROJECT_LOC/lm/trie_sort.hh</locationURI>
</link>
<link>
+ <name>value.hh</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/lm/value.hh</locationURI>
+ </link>
+ <link>
+ <name>value_build.cc</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/lm/value_build.cc</locationURI>
+ </link>
+ <link>
+ <name>value_build.hh</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/lm/value_build.hh</locationURI>
+ </link>
+ <link>
<name>virtual_interface.cc</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/lm/virtual_interface.cc</locationURI>
diff --git a/contrib/other-builds/moses-cmd/.cproject b/contrib/other-builds/moses-cmd/.cproject
index 53c112cb8..cdad4ad64 100644
--- a/contrib/other-builds/moses-cmd/.cproject
+++ b/contrib/other-builds/moses-cmd/.cproject
@@ -25,17 +25,27 @@
<tool id="cdt.managedbuild.tool.macosx.cpp.linker.macosx.exe.debug.84059290" name="MacOS X C++ Linker" superClass="cdt.managedbuild.tool.macosx.cpp.linker.macosx.exe.debug">
<option id="macosx.cpp.link.option.libs.1641794848" name="Libraries (-l)" superClass="macosx.cpp.link.option.libs" valueType="libs">
<listOptionValue builtIn="false" value="moses"/>
+ <listOptionValue builtIn="false" value="rt"/>
+ <listOptionValue builtIn="false" value="misc"/>
+ <listOptionValue builtIn="false" value="dstruct"/>
+ <listOptionValue builtIn="false" value="oolm"/>
+ <listOptionValue builtIn="false" value="flm"/>
+ <listOptionValue builtIn="false" value="lattice"/>
<listOptionValue builtIn="false" value="OnDiskPt"/>
<listOptionValue builtIn="false" value="lm"/>
<listOptionValue builtIn="false" value="util"/>
<listOptionValue builtIn="false" value="irstlm"/>
+ <listOptionValue builtIn="false" value="z"/>
+ <listOptionValue builtIn="false" value="boost_system"/>
+ <listOptionValue builtIn="false" value="boost_filesystem"/>
</option>
<option id="macosx.cpp.link.option.paths.1615268628" name="Library search path (-L)" superClass="macosx.cpp.link.option.paths" valueType="libPaths">
- <listOptionValue builtIn="false" value="/Users/hieuhoang/workspace/github/moses-smt/contrib/other-builds/moses/Debug"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/workspace/github/moses-smt/contrib/other-builds/OnDiskPt/Debug"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/workspace/github/moses-smt/contrib/other-builds/lm/Debug"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/workspace/github/moses-smt/contrib/other-builds/util/Debug"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/workspace/github/moses-smt/irstlm/lib"/>
+ <listOptionValue builtIn="false" value="${workspace_loc:/moses}/Debug"/>
+ <listOptionValue builtIn="false" value="${workspace_loc:}/../../srilm/lib/i686-m64"/>
+ <listOptionValue builtIn="false" value="${workspace_loc:/OnDiskPt}/Debug"/>
+ <listOptionValue builtIn="false" value="${workspace_loc:/lm}/Debug"/>
+ <listOptionValue builtIn="false" value="${workspace_loc:/util}/Debug"/>
+ <listOptionValue builtIn="false" value="${workspace_loc:}/../../irstlm/lib"/>
</option>
<inputType id="cdt.managedbuild.tool.macosx.cpp.linker.input.412058804" superClass="cdt.managedbuild.tool.macosx.cpp.linker.input">
<additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
@@ -51,8 +61,11 @@
<option id="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level.1176009559" name="Debug Level" superClass="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level" value="gnu.cpp.compiler.debugging.level.max" valueType="enumerated"/>
<option id="gnu.cpp.compiler.option.include.paths.1024398579" name="Include paths (-I)" superClass="gnu.cpp.compiler.option.include.paths" valueType="includePath">
<listOptionValue builtIn="false" value="/opt/local/include"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt/moses/src"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt"/>
+ <listOptionValue builtIn="false" value="${workspace_loc}/../../moses/src"/>
+ <listOptionValue builtIn="false" value="${workspace_loc}/../../"/>
+ </option>
+ <option id="gnu.cpp.compiler.option.preprocessor.def.491464216" name="Defined symbols (-D)" superClass="gnu.cpp.compiler.option.preprocessor.def" valueType="definedSymbols">
+ <listOptionValue builtIn="false" value="TRACE_ENABLE"/>
</option>
<inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.240921565" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
</tool>
@@ -122,12 +135,13 @@
<storageModule moduleId="refreshScope" versionNumber="1">
<resource resourceType="PROJECT" workspacePath="/moses-cmd"/>
</storageModule>
+ <storageModule moduleId="org.eclipse.cdt.make.core.buildtargets"/>
<storageModule moduleId="scannerConfiguration">
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId=""/>
- <scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.macosx.exe.debug.341255150;cdt.managedbuild.config.gnu.macosx.exe.debug.341255150.;cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.debug.1201400609;cdt.managedbuild.tool.gnu.c.compiler.input.2031799877">
+ <scannerConfigBuildInfo instanceId="cdt.managedbuild.config.macosx.exe.release.1916112479;cdt.managedbuild.config.macosx.exe.release.1916112479.;cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.release.759110223;cdt.managedbuild.tool.gnu.c.compiler.input.1452105399">
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileC"/>
</scannerConfigBuildInfo>
- <scannerConfigBuildInfo instanceId="cdt.managedbuild.config.macosx.exe.release.1916112479;cdt.managedbuild.config.macosx.exe.release.1916112479.;cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.release.759110223;cdt.managedbuild.tool.gnu.c.compiler.input.1452105399">
+ <scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.macosx.exe.debug.341255150;cdt.managedbuild.config.gnu.macosx.exe.debug.341255150.;cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.debug.1201400609;cdt.managedbuild.tool.gnu.c.compiler.input.2031799877">
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileC"/>
</scannerConfigBuildInfo>
<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.macosx.exe.release.1916112479;cdt.managedbuild.config.macosx.exe.release.1916112479.;cdt.managedbuild.tool.gnu.cpp.compiler.macosx.exe.release.1219375865;cdt.managedbuild.tool.gnu.cpp.compiler.input.604224475">
diff --git a/contrib/other-builds/moses/.cproject b/contrib/other-builds/moses/.cproject
index 2995d5eae..0148cc6f2 100644
--- a/contrib/other-builds/moses/.cproject
+++ b/contrib/other-builds/moses/.cproject
@@ -3,8 +3,8 @@
<cproject storage_type_id="org.eclipse.cdt.core.XmlProjectDescriptionStorage">
<storageModule moduleId="org.eclipse.cdt.core.settings">
- <cconfiguration id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426">
- <storageModule buildSystemId="org.eclipse.cdt.managedbuilder.core.configurationDataProvider" id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426" moduleId="org.eclipse.cdt.core.settings" name="Debug">
+ <cconfiguration id="cdt.managedbuild.config.gnu.exe.debug.656913512">
+ <storageModule buildSystemId="org.eclipse.cdt.managedbuilder.core.configurationDataProvider" id="cdt.managedbuild.config.gnu.exe.debug.656913512" moduleId="org.eclipse.cdt.core.settings" name="Debug">
<externalSettings>
<externalSetting>
<entry flags="VALUE_WORKSPACE_PATH" kind="includePath" name="/moses"/>
@@ -13,7 +13,7 @@
</externalSetting>
</externalSettings>
<extensions>
- <extension id="org.eclipse.cdt.core.MachO64" point="org.eclipse.cdt.core.BinaryParser"/>
+ <extension id="org.eclipse.cdt.core.ELF" point="org.eclipse.cdt.core.BinaryParser"/>
<extension id="org.eclipse.cdt.core.GmakeErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
<extension id="org.eclipse.cdt.core.CWDLocator" point="org.eclipse.cdt.core.ErrorParser"/>
<extension id="org.eclipse.cdt.core.GCCErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
@@ -21,65 +21,70 @@
</extensions>
</storageModule>
<storageModule moduleId="cdtBuildSystem" version="4.0.0">
- <configuration artifactExtension="a" artifactName="${ProjName}" buildArtefactType="org.eclipse.cdt.build.core.buildArtefactType.staticLib" buildProperties="org.eclipse.cdt.build.core.buildType=org.eclipse.cdt.build.core.buildType.debug,org.eclipse.cdt.build.core.buildArtefactType=org.eclipse.cdt.build.core.buildArtefactType.staticLib" cleanCommand="rm -rf" description="" id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426" name="Debug" parent="cdt.managedbuild.config.gnu.macosx.exe.debug">
- <folderInfo id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426." name="/" resourcePath="">
- <toolChain id="cdt.managedbuild.toolchain.gnu.macosx.exe.debug.497902212" name="MacOSX GCC" superClass="cdt.managedbuild.toolchain.gnu.macosx.exe.debug">
- <targetPlatform id="cdt.managedbuild.target.gnu.platform.macosx.exe.debug.1820609450" name="Debug Platform" superClass="cdt.managedbuild.target.gnu.platform.macosx.exe.debug"/>
- <builder buildPath="${workspace_loc:/moses/Debug}" id="cdt.managedbuild.target.gnu.builder.macosx.exe.debug.1998579330" keepEnvironmentInBuildfile="false" managedBuildOn="true" name="Gnu Make Builder" superClass="cdt.managedbuild.target.gnu.builder.macosx.exe.debug"/>
- <tool id="cdt.managedbuild.tool.macosx.c.linker.macosx.exe.debug.1330311562" name="MacOS X C Linker" superClass="cdt.managedbuild.tool.macosx.c.linker.macosx.exe.debug"/>
- <tool id="cdt.managedbuild.tool.macosx.cpp.linker.macosx.exe.debug.1226580551" name="MacOS X C++ Linker" superClass="cdt.managedbuild.tool.macosx.cpp.linker.macosx.exe.debug">
- <inputType id="cdt.managedbuild.tool.macosx.cpp.linker.input.102127808" superClass="cdt.managedbuild.tool.macosx.cpp.linker.input">
- <additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
- <additionalInput kind="additionalinput" paths="$(LIBS)"/>
- </inputType>
- </tool>
- <tool command="as" commandLinePattern="${COMMAND} ${FLAGS} ${OUTPUT_FLAG} ${OUTPUT_PREFIX}${OUTPUT} ${INPUTS}" id="cdt.managedbuild.tool.gnu.assembler.macosx.exe.debug.1556759720" name="GCC Assembler" superClass="cdt.managedbuild.tool.gnu.assembler.macosx.exe.debug">
- <inputType id="cdt.managedbuild.tool.gnu.assembler.input.897776351" superClass="cdt.managedbuild.tool.gnu.assembler.input"/>
- </tool>
- <tool id="cdt.managedbuild.tool.gnu.archiver.macosx.base.1820797229" name="GCC Archiver" superClass="cdt.managedbuild.tool.gnu.archiver.macosx.base"/>
- <tool id="cdt.managedbuild.tool.gnu.cpp.compiler.macosx.exe.debug.1867588805" name="GCC C++ Compiler" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.macosx.exe.debug">
- <option id="gnu.cpp.compilermacosx.exe.debug.option.optimization.level.1898625650" name="Optimization Level" superClass="gnu.cpp.compilermacosx.exe.debug.option.optimization.level" value="gnu.cpp.compiler.optimization.level.none" valueType="enumerated"/>
- <option id="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level.806998992" name="Debug Level" superClass="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level" value="gnu.cpp.compiler.debugging.level.max" valueType="enumerated"/>
- <option id="gnu.cpp.compiler.option.include.paths.1819917957" name="Include paths (-I)" superClass="gnu.cpp.compiler.option.include.paths" valueType="includePath">
- <listOptionValue builtIn="false" value="/opt/local/include"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt/moses/src"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt/srilm/include"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt/irstlm/include"/>
+ <configuration artifactExtension="a" artifactName="${ProjName}" buildArtefactType="org.eclipse.cdt.build.core.buildArtefactType.staticLib" buildProperties="org.eclipse.cdt.build.core.buildType=org.eclipse.cdt.build.core.buildType.debug,org.eclipse.cdt.build.core.buildArtefactType=org.eclipse.cdt.build.core.buildArtefactType.staticLib" cleanCommand="rm -rf" description="" id="cdt.managedbuild.config.gnu.exe.debug.656913512" name="Debug" parent="cdt.managedbuild.config.gnu.exe.debug">
+ <folderInfo id="cdt.managedbuild.config.gnu.exe.debug.656913512." name="/" resourcePath="">
+ <toolChain id="cdt.managedbuild.toolchain.gnu.exe.debug.1793369992" name="Linux GCC" superClass="cdt.managedbuild.toolchain.gnu.exe.debug">
+ <targetPlatform id="cdt.managedbuild.target.gnu.platform.exe.debug.1051650049" name="Debug Platform" superClass="cdt.managedbuild.target.gnu.platform.exe.debug"/>
+ <builder buildPath="${workspace_loc:/moses/Debug}" id="cdt.managedbuild.target.gnu.builder.exe.debug.505583888" keepEnvironmentInBuildfile="false" managedBuildOn="true" name="Gnu Make Builder" superClass="cdt.managedbuild.target.gnu.builder.exe.debug"/>
+ <tool id="cdt.managedbuild.tool.gnu.archiver.base.1976472988" name="GCC Archiver" superClass="cdt.managedbuild.tool.gnu.archiver.base"/>
+ <tool id="cdt.managedbuild.tool.gnu.cpp.compiler.exe.debug.1774992327" name="GCC C++ Compiler" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.exe.debug">
+ <option id="gnu.cpp.compiler.exe.debug.option.optimization.level.1759650532" name="Optimization Level" superClass="gnu.cpp.compiler.exe.debug.option.optimization.level" value="gnu.cpp.compiler.optimization.level.none" valueType="enumerated"/>
+ <option id="gnu.cpp.compiler.exe.debug.option.debugging.level.2123672332" name="Debug Level" superClass="gnu.cpp.compiler.exe.debug.option.debugging.level" value="gnu.cpp.compiler.debugging.level.max" valueType="enumerated"/>
+ <option id="gnu.cpp.compiler.option.include.paths.57896781" name="Include paths (-I)" superClass="gnu.cpp.compiler.option.include.paths" valueType="includePath">
+ <listOptionValue builtIn="false" value="/opt/local/include/"/>
+ <listOptionValue builtIn="false" value="${workspace_loc}/../../irstlm/include"/>
+ <listOptionValue builtIn="false" value="${workspace_loc}/../../srilm/include"/>
+ <listOptionValue builtIn="false" value="${workspace_loc}/../../moses/src"/>
+ <listOptionValue builtIn="false" value="${workspace_loc}/../../"/>
</option>
- <option id="gnu.cpp.compiler.option.preprocessor.def.1569452418" name="Defined symbols (-D)" superClass="gnu.cpp.compiler.option.preprocessor.def" valueType="definedSymbols">
- <listOptionValue builtIn="false" value="LM_SRI"/>
- <listOptionValue builtIn="false" value="LM_IRST"/>
+ <option id="gnu.cpp.compiler.option.preprocessor.def.752586397" name="Defined symbols (-D)" superClass="gnu.cpp.compiler.option.preprocessor.def" valueType="definedSymbols">
+ <listOptionValue builtIn="false" value="KENLM_MAX_ORDER=7"/>
<listOptionValue builtIn="false" value="TRACE_ENABLE"/>
+ <listOptionValue builtIn="false" value="LM_IRST"/>
+ <listOptionValue builtIn="false" value="_FILE_OFFSET_BIT=64"/>
+ <listOptionValue builtIn="false" value="_LARGE_FILES"/>
</option>
- <inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.1110302565" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
+ <inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.1905116220" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
</tool>
- <tool id="cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.debug.401409202" name="GCC C Compiler" superClass="cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.debug">
- <option defaultValue="gnu.c.optimization.level.none" id="gnu.c.compiler.macosx.exe.debug.option.optimization.level.753046525" name="Optimization Level" superClass="gnu.c.compiler.macosx.exe.debug.option.optimization.level" valueType="enumerated"/>
- <option id="gnu.c.compiler.macosx.exe.debug.option.debugging.level.1396911098" name="Debug Level" superClass="gnu.c.compiler.macosx.exe.debug.option.debugging.level" value="gnu.c.debugging.level.max" valueType="enumerated"/>
- <inputType id="cdt.managedbuild.tool.gnu.c.compiler.input.1919272901" superClass="cdt.managedbuild.tool.gnu.c.compiler.input"/>
+ <tool id="cdt.managedbuild.tool.gnu.c.compiler.exe.debug.2126314903" name="GCC C Compiler" superClass="cdt.managedbuild.tool.gnu.c.compiler.exe.debug">
+ <option defaultValue="gnu.c.optimization.level.none" id="gnu.c.compiler.exe.debug.option.optimization.level.1524900118" name="Optimization Level" superClass="gnu.c.compiler.exe.debug.option.optimization.level" valueType="enumerated"/>
+ <option id="gnu.c.compiler.exe.debug.option.debugging.level.581728958" name="Debug Level" superClass="gnu.c.compiler.exe.debug.option.debugging.level" value="gnu.c.debugging.level.max" valueType="enumerated"/>
+ <inputType id="cdt.managedbuild.tool.gnu.c.compiler.input.877210753" superClass="cdt.managedbuild.tool.gnu.c.compiler.input"/>
+ </tool>
+ <tool id="cdt.managedbuild.tool.gnu.c.linker.exe.debug.1168585173" name="GCC C Linker" superClass="cdt.managedbuild.tool.gnu.c.linker.exe.debug"/>
+ <tool id="cdt.managedbuild.tool.gnu.cpp.linker.exe.debug.2074660557" name="GCC C++ Linker" superClass="cdt.managedbuild.tool.gnu.cpp.linker.exe.debug">
+ <inputType id="cdt.managedbuild.tool.gnu.cpp.linker.input.340054018" superClass="cdt.managedbuild.tool.gnu.cpp.linker.input">
+ <additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
+ <additionalInput kind="additionalinput" paths="$(LIBS)"/>
+ </inputType>
+ </tool>
+ <tool id="cdt.managedbuild.tool.gnu.assembler.exe.debug.933467113" name="GCC Assembler" superClass="cdt.managedbuild.tool.gnu.assembler.exe.debug">
+ <inputType id="cdt.managedbuild.tool.gnu.assembler.input.99047750" superClass="cdt.managedbuild.tool.gnu.assembler.input"/>
</tool>
</toolChain>
</folderInfo>
- <fileInfo id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426.1722029461" name="SyntacticLanguageModelState.h" rcbsApplicability="disable" resourcePath="SyntacticLanguageModelState.h" toolsToInvoke=""/>
- <fileInfo id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426.1432960145" name="SyntacticLanguageModelFiles.h" rcbsApplicability="disable" resourcePath="SyntacticLanguageModelFiles.h" toolsToInvoke=""/>
- <fileInfo id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426.1906856645" name="SyntacticLanguageModel.h" rcbsApplicability="disable" resourcePath="SyntacticLanguageModel.h" toolsToInvoke=""/>
- <fileInfo id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426.460380900" name="Rand.h" rcbsApplicability="disable" resourcePath="LM/Rand.h" toolsToInvoke=""/>
- <fileInfo id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426.1692203139" name="ORLM.h" rcbsApplicability="disable" resourcePath="LM/ORLM.h" toolsToInvoke=""/>
- <fileInfo id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426.538301588" name="Remote.h" rcbsApplicability="disable" resourcePath="LM/Remote.h" toolsToInvoke=""/>
- <fileInfo id="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426.854427429" name="LDHT.h" rcbsApplicability="disable" resourcePath="LM/LDHT.h" toolsToInvoke=""/>
+ <fileInfo id="cdt.managedbuild.config.gnu.exe.debug.656913512.558758254" name="SyntacticLanguageModelState.h" rcbsApplicability="disable" resourcePath="SyntacticLanguageModelState.h" toolsToInvoke=""/>
+ <fileInfo id="cdt.managedbuild.config.gnu.exe.debug.656913512.1930327037" name="SyntacticLanguageModelFiles.h" rcbsApplicability="disable" resourcePath="SyntacticLanguageModelFiles.h" toolsToInvoke=""/>
+ <fileInfo id="cdt.managedbuild.config.gnu.exe.debug.656913512.1751563578" name="PhraseTableCreator.cpp" rcbsApplicability="disable" resourcePath="CompactPT/PhraseTableCreator.cpp" toolsToInvoke="cdt.managedbuild.tool.gnu.cpp.compiler.exe.debug.1774992327.1652631861">
+ <tool id="cdt.managedbuild.tool.gnu.cpp.compiler.exe.debug.1774992327.1652631861" name="GCC C++ Compiler" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.exe.debug.1774992327"/>
+ </fileInfo>
+ <fileInfo id="cdt.managedbuild.config.gnu.exe.debug.656913512.1174630266" name="Rand.h" rcbsApplicability="disable" resourcePath="LM/Rand.h" toolsToInvoke=""/>
+ <fileInfo id="cdt.managedbuild.config.gnu.exe.debug.656913512.707830535" name="SRI.h" rcbsApplicability="disable" resourcePath="LM/SRI.h" toolsToInvoke=""/>
+ <fileInfo id="cdt.managedbuild.config.gnu.exe.debug.656913512.160366559" name="LDHT.h" rcbsApplicability="disable" resourcePath="LM/LDHT.h" toolsToInvoke=""/>
+ <fileInfo id="cdt.managedbuild.config.gnu.exe.debug.656913512.622077510" name="ParallelBackoff.h" rcbsApplicability="disable" resourcePath="LM/ParallelBackoff.h" toolsToInvoke=""/>
+ <fileInfo id="cdt.managedbuild.config.gnu.exe.debug.656913512.1084194539" name="SyntacticLanguageModel.h" rcbsApplicability="disable" resourcePath="SyntacticLanguageModel.h" toolsToInvoke=""/>
<sourceEntries>
- <entry excluding="SyntacticLanguageModelState.h|SyntacticLanguageModelFiles.h|SyntacticLanguageModel.h|SyntacticLanguageModel.cpp|LM/LDHT.cpp|LM/LDHT.h|LM/Remote.h|LM/Remote.cpp|LM/Rand.h|LM/Rand.cpp|LM/ORLM.h|LM/ORLM.cpp" flags="VALUE_WORKSPACE_PATH|RESOLVED" kind="sourcePath" name=""/>
+ <entry excluding="CompactPT/PhraseTableCreator.cpp|CompactPT/LexicalReorderingTableCreator.cpp|LM/SRI.h|LM/SRI.cpp|SyntacticLanguageModelState.h|SyntacticLanguageModelFiles.h|SyntacticLanguageModel.h|SyntacticLanguageModel.cpp|LM/ParallelBackoff.h|LM/ParallelBackoff.cpp|LM/Rand.h|LM/Rand.cpp|LM/LDHT.h|LM/LDHT.cpp" flags="VALUE_WORKSPACE_PATH|RESOLVED" kind="sourcePath" name=""/>
</sourceEntries>
</configuration>
</storageModule>
<storageModule moduleId="org.eclipse.cdt.core.externalSettings"/>
</cconfiguration>
- <cconfiguration id="cdt.managedbuild.config.macosx.exe.release.722580523">
- <storageModule buildSystemId="org.eclipse.cdt.managedbuilder.core.configurationDataProvider" id="cdt.managedbuild.config.macosx.exe.release.722580523" moduleId="org.eclipse.cdt.core.settings" name="Release">
+ <cconfiguration id="cdt.managedbuild.config.gnu.exe.release.401150096">
+ <storageModule buildSystemId="org.eclipse.cdt.managedbuilder.core.configurationDataProvider" id="cdt.managedbuild.config.gnu.exe.release.401150096" moduleId="org.eclipse.cdt.core.settings" name="Release">
<externalSettings/>
<extensions>
- <extension id="org.eclipse.cdt.core.MachO64" point="org.eclipse.cdt.core.BinaryParser"/>
+ <extension id="org.eclipse.cdt.core.ELF" point="org.eclipse.cdt.core.BinaryParser"/>
<extension id="org.eclipse.cdt.core.GmakeErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
<extension id="org.eclipse.cdt.core.CWDLocator" point="org.eclipse.cdt.core.ErrorParser"/>
<extension id="org.eclipse.cdt.core.GCCErrorParser" point="org.eclipse.cdt.core.ErrorParser"/>
@@ -88,59 +93,41 @@
</extensions>
</storageModule>
<storageModule moduleId="cdtBuildSystem" version="4.0.0">
- <configuration artifactName="${ProjName}" buildArtefactType="org.eclipse.cdt.build.core.buildArtefactType.exe" buildProperties="org.eclipse.cdt.build.core.buildType=org.eclipse.cdt.build.core.buildType.release,org.eclipse.cdt.build.core.buildArtefactType=org.eclipse.cdt.build.core.buildArtefactType.exe" cleanCommand="rm -rf" description="" id="cdt.managedbuild.config.macosx.exe.release.722580523" name="Release" parent="cdt.managedbuild.config.macosx.exe.release">
- <folderInfo id="cdt.managedbuild.config.macosx.exe.release.722580523." name="/" resourcePath="">
- <toolChain id="cdt.managedbuild.toolchain.gnu.macosx.exe.release.2070671582" name="MacOSX GCC" superClass="cdt.managedbuild.toolchain.gnu.macosx.exe.release">
- <targetPlatform id="cdt.managedbuild.target.gnu.platform.macosx.exe.release.503591386" name="Debug Platform" superClass="cdt.managedbuild.target.gnu.platform.macosx.exe.release"/>
- <builder buildPath="${workspace_loc:/moses/Release}" id="cdt.managedbuild.target.gnu.builder.macosx.exe.release.108117223" keepEnvironmentInBuildfile="false" managedBuildOn="true" name="Gnu Make Builder" superClass="cdt.managedbuild.target.gnu.builder.macosx.exe.release"/>
- <tool id="cdt.managedbuild.tool.macosx.c.linker.macosx.exe.release.1203406445" name="MacOS X C Linker" superClass="cdt.managedbuild.tool.macosx.c.linker.macosx.exe.release"/>
- <tool id="cdt.managedbuild.tool.macosx.cpp.linker.macosx.exe.release.1539915639" name="MacOS X C++ Linker" superClass="cdt.managedbuild.tool.macosx.cpp.linker.macosx.exe.release">
- <inputType id="cdt.managedbuild.tool.macosx.cpp.linker.input.1333560300" superClass="cdt.managedbuild.tool.macosx.cpp.linker.input">
+ <configuration artifactName="${ProjName}" buildArtefactType="org.eclipse.cdt.build.core.buildArtefactType.exe" buildProperties="org.eclipse.cdt.build.core.buildType=org.eclipse.cdt.build.core.buildType.release,org.eclipse.cdt.build.core.buildArtefactType=org.eclipse.cdt.build.core.buildArtefactType.exe" cleanCommand="rm -rf" description="" id="cdt.managedbuild.config.gnu.exe.release.401150096" name="Release" parent="cdt.managedbuild.config.gnu.exe.release">
+ <folderInfo id="cdt.managedbuild.config.gnu.exe.release.401150096." name="/" resourcePath="">
+ <toolChain id="cdt.managedbuild.toolchain.gnu.exe.release.36295137" name="Linux GCC" superClass="cdt.managedbuild.toolchain.gnu.exe.release">
+ <targetPlatform id="cdt.managedbuild.target.gnu.platform.exe.release.538725710" name="Debug Platform" superClass="cdt.managedbuild.target.gnu.platform.exe.release"/>
+ <builder buildPath="${workspace_loc:/moses/Release}" id="cdt.managedbuild.target.gnu.builder.exe.release.1875953334" keepEnvironmentInBuildfile="false" managedBuildOn="true" name="Gnu Make Builder" superClass="cdt.managedbuild.target.gnu.builder.exe.release"/>
+ <tool id="cdt.managedbuild.tool.gnu.archiver.base.1633496039" name="GCC Archiver" superClass="cdt.managedbuild.tool.gnu.archiver.base"/>
+ <tool id="cdt.managedbuild.tool.gnu.cpp.compiler.exe.release.2060881562" name="GCC C++ Compiler" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.exe.release">
+ <option id="gnu.cpp.compiler.exe.release.option.optimization.level.1375372870" name="Optimization Level" superClass="gnu.cpp.compiler.exe.release.option.optimization.level" value="gnu.cpp.compiler.optimization.level.most" valueType="enumerated"/>
+ <option id="gnu.cpp.compiler.exe.release.option.debugging.level.815283803" name="Debug Level" superClass="gnu.cpp.compiler.exe.release.option.debugging.level" value="gnu.cpp.compiler.debugging.level.none" valueType="enumerated"/>
+ <inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.1020483420" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
+ </tool>
+ <tool id="cdt.managedbuild.tool.gnu.c.compiler.exe.release.85324871" name="GCC C Compiler" superClass="cdt.managedbuild.tool.gnu.c.compiler.exe.release">
+ <option defaultValue="gnu.c.optimization.level.most" id="gnu.c.compiler.exe.release.option.optimization.level.1137534635" name="Optimization Level" superClass="gnu.c.compiler.exe.release.option.optimization.level" valueType="enumerated"/>
+ <option id="gnu.c.compiler.exe.release.option.debugging.level.143589037" name="Debug Level" superClass="gnu.c.compiler.exe.release.option.debugging.level" value="gnu.c.debugging.level.none" valueType="enumerated"/>
+ <inputType id="cdt.managedbuild.tool.gnu.c.compiler.input.304912704" superClass="cdt.managedbuild.tool.gnu.c.compiler.input"/>
+ </tool>
+ <tool id="cdt.managedbuild.tool.gnu.c.linker.exe.release.283583965" name="GCC C Linker" superClass="cdt.managedbuild.tool.gnu.c.linker.exe.release"/>
+ <tool id="cdt.managedbuild.tool.gnu.cpp.linker.exe.release.2059280959" name="GCC C++ Linker" superClass="cdt.managedbuild.tool.gnu.cpp.linker.exe.release">
+ <inputType id="cdt.managedbuild.tool.gnu.cpp.linker.input.2020956494" superClass="cdt.managedbuild.tool.gnu.cpp.linker.input">
<additionalInput kind="additionalinputdependency" paths="$(USER_OBJS)"/>
<additionalInput kind="additionalinput" paths="$(LIBS)"/>
</inputType>
</tool>
- <tool id="cdt.managedbuild.tool.gnu.assembler.macosx.exe.release.1693865756" name="GCC Assembler" superClass="cdt.managedbuild.tool.gnu.assembler.macosx.exe.release">
- <inputType id="cdt.managedbuild.tool.gnu.assembler.input.2000339940" superClass="cdt.managedbuild.tool.gnu.assembler.input"/>
- </tool>
- <tool id="cdt.managedbuild.tool.gnu.archiver.macosx.base.505919286" name="GCC Archiver" superClass="cdt.managedbuild.tool.gnu.archiver.macosx.base"/>
- <tool id="cdt.managedbuild.tool.gnu.cpp.compiler.macosx.exe.release.1662892925" name="GCC C++ Compiler" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.macosx.exe.release">
- <option id="gnu.cpp.compiler.macosx.exe.release.option.optimization.level.1036481202" name="Optimization Level" superClass="gnu.cpp.compiler.macosx.exe.release.option.optimization.level" value="gnu.cpp.compiler.optimization.level.most" valueType="enumerated"/>
- <option id="gnu.cpp.compiler.macosx.exe.release.option.debugging.level.484015287" name="Debug Level" superClass="gnu.cpp.compiler.macosx.exe.release.option.debugging.level" value="gnu.cpp.compiler.debugging.level.none" valueType="enumerated"/>
- <option id="gnu.cpp.compiler.option.preprocessor.def.1089615214" name="Defined symbols (-D)" superClass="gnu.cpp.compiler.option.preprocessor.def" valueType="definedSymbols">
- <listOptionValue builtIn="false" value="LM_SRI"/>
- <listOptionValue builtIn="false" value="LM_IRST"/>
- <listOptionValue builtIn="false" value="TRACE_ENABLE"/>
- </option>
- <option id="gnu.cpp.compiler.option.include.paths.1722702487" superClass="gnu.cpp.compiler.option.include.paths" valueType="includePath">
- <listOptionValue builtIn="false" value="/opt/local/include"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt/moses/src"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt/srilm/include"/>
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt/irstlm/include"/>
- </option>
- <inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.936283391" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
- </tool>
- <tool id="cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.release.1404156839" name="GCC C Compiler" superClass="cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.release">
- <option defaultValue="gnu.c.optimization.level.most" id="gnu.c.compiler.macosx.exe.release.option.optimization.level.1487222992" name="Optimization Level" superClass="gnu.c.compiler.macosx.exe.release.option.optimization.level" valueType="enumerated"/>
- <option id="gnu.c.compiler.macosx.exe.release.option.debugging.level.1171203697" name="Debug Level" superClass="gnu.c.compiler.macosx.exe.release.option.debugging.level" value="gnu.c.debugging.level.none" valueType="enumerated"/>
- <inputType id="cdt.managedbuild.tool.gnu.c.compiler.input.1172147378" superClass="cdt.managedbuild.tool.gnu.c.compiler.input"/>
+ <tool id="cdt.managedbuild.tool.gnu.assembler.exe.release.782286837" name="GCC Assembler" superClass="cdt.managedbuild.tool.gnu.assembler.exe.release">
+ <inputType id="cdt.managedbuild.tool.gnu.assembler.input.1766138143" superClass="cdt.managedbuild.tool.gnu.assembler.input"/>
</tool>
</toolChain>
</folderInfo>
- <fileInfo id="cdt.managedbuild.config.macosx.exe.release.722580523.1831545277" name="Rand.h" rcbsApplicability="disable" resourcePath="LM/Rand.h" toolsToInvoke=""/>
- <fileInfo id="cdt.managedbuild.config.macosx.exe.release.722580523.1743378025" name="ORLM.h" rcbsApplicability="disable" resourcePath="LM/ORLM.h" toolsToInvoke=""/>
- <fileInfo id="cdt.managedbuild.config.macosx.exe.release.722580523.1490362543" name="Remote.h" rcbsApplicability="disable" resourcePath="LM/Remote.h" toolsToInvoke=""/>
- <sourceEntries>
- <entry excluding="LM/LDHT.cpp|LM/Rand.h|LM/Rand.cpp|LM/ORLM.h|LM/ORLM.cpp" flags="VALUE_WORKSPACE_PATH|RESOLVED" kind="sourcePath" name=""/>
- </sourceEntries>
</configuration>
</storageModule>
<storageModule moduleId="org.eclipse.cdt.core.externalSettings"/>
</cconfiguration>
</storageModule>
<storageModule moduleId="cdtBuildSystem" version="4.0.0">
- <project id="moses.cdt.managedbuild.target.macosx.exe.1209017164" name="Executable" projectType="cdt.managedbuild.target.macosx.exe"/>
+ <project id="moses.cdt.managedbuild.target.gnu.exe.1375079569" name="Executable" projectType="cdt.managedbuild.target.gnu.exe"/>
</storageModule>
<storageModule moduleId="scannerConfiguration">
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId=""/>
@@ -150,12 +137,24 @@
<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.macosx.exe.release.722580523;cdt.managedbuild.config.macosx.exe.release.722580523.;cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.release.1404156839;cdt.managedbuild.tool.gnu.c.compiler.input.1172147378">
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileC"/>
</scannerConfigBuildInfo>
+ <scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.exe.release.401150096;cdt.managedbuild.config.gnu.exe.release.401150096.;cdt.managedbuild.tool.gnu.c.compiler.exe.release.85324871;cdt.managedbuild.tool.gnu.c.compiler.input.304912704">
+ <autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileC"/>
+ </scannerConfigBuildInfo>
+ <scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.exe.debug.656913512;cdt.managedbuild.config.gnu.exe.debug.656913512.;cdt.managedbuild.tool.gnu.cpp.compiler.exe.debug.1774992327;cdt.managedbuild.tool.gnu.cpp.compiler.input.1905116220">
+ <autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileCPP"/>
+ </scannerConfigBuildInfo>
<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426;cdt.managedbuild.config.gnu.macosx.exe.debug.1895695426.;cdt.managedbuild.tool.gnu.cpp.compiler.macosx.exe.debug.1867588805;cdt.managedbuild.tool.gnu.cpp.compiler.input.1110302565">
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileCPP"/>
</scannerConfigBuildInfo>
<scannerConfigBuildInfo instanceId="cdt.managedbuild.config.macosx.exe.release.722580523;cdt.managedbuild.config.macosx.exe.release.722580523.;cdt.managedbuild.tool.gnu.cpp.compiler.macosx.exe.release.1662892925;cdt.managedbuild.tool.gnu.cpp.compiler.input.936283391">
<autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileCPP"/>
</scannerConfigBuildInfo>
+ <scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.exe.debug.656913512;cdt.managedbuild.config.gnu.exe.debug.656913512.;cdt.managedbuild.tool.gnu.c.compiler.exe.debug.2126314903;cdt.managedbuild.tool.gnu.c.compiler.input.877210753">
+ <autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileC"/>
+ </scannerConfigBuildInfo>
+ <scannerConfigBuildInfo instanceId="cdt.managedbuild.config.gnu.exe.release.401150096;cdt.managedbuild.config.gnu.exe.release.401150096.;cdt.managedbuild.tool.gnu.cpp.compiler.exe.release.2060881562;cdt.managedbuild.tool.gnu.cpp.compiler.input.1020483420">
+ <autodiscovery enabled="true" problemReportingEnabled="true" selectedProfileId="org.eclipse.cdt.managedbuilder.core.GCCManagedMakePerProjectProfileCPP"/>
+ </scannerConfigBuildInfo>
</storageModule>
<storageModule moduleId="refreshScope" versionNumber="1">
<resource resourceType="PROJECT" workspacePath="/moses"/>
diff --git a/contrib/other-builds/moses/.project b/contrib/other-builds/moses/.project
index 8d534dbd4..31c11819a 100644
--- a/contrib/other-builds/moses/.project
+++ b/contrib/other-builds/moses/.project
@@ -102,6 +102,16 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/AlignmentInfoCollection.h</locationURI>
</link>
<link>
+ <name>ApplicableRuleTrie.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/ApplicableRuleTrie.cpp</locationURI>
+ </link>
+ <link>
+ <name>ApplicableRuleTrie.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/ApplicableRuleTrie.h</locationURI>
+ </link>
+ <link>
<name>BilingualDynSuffixArray.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/BilingualDynSuffixArray.cpp</locationURI>
@@ -272,6 +282,11 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/ChartTrellisPathList.h</locationURI>
</link>
<link>
+ <name>CompactPT</name>
+ <type>2</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CompactPT</locationURI>
+ </link>
+ <link>
<name>ConfusionNet.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/ConfusionNet.cpp</locationURI>
@@ -442,6 +457,16 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/FloydWarshall.h</locationURI>
</link>
<link>
+ <name>FuzzyMatchWrapper.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/FuzzyMatchWrapper.cpp</locationURI>
+ </link>
+ <link>
+ <name>FuzzyMatchWrapper.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/FuzzyMatchWrapper.h</locationURI>
+ </link>
+ <link>
<name>GenerationDictionary.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/GenerationDictionary.cpp</locationURI>
@@ -537,6 +562,11 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/InputType.h</locationURI>
</link>
<link>
+ <name>IntermediateVarSpanNode.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/IntermediateVarSpanNode.h</locationURI>
+ </link>
+ <link>
<name>Jamfile</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/Jamfile</locationURI>
@@ -607,6 +637,11 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/Manager.h</locationURI>
</link>
<link>
+ <name>Match.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/Match.h</locationURI>
+ </link>
+ <link>
<name>NonTerminal.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/NonTerminal.cpp</locationURI>
@@ -662,6 +697,16 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/Parameter.h</locationURI>
</link>
<link>
+ <name>Parser.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/Parser.cpp</locationURI>
+ </link>
+ <link>
+ <name>Parser.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/Parser.h</locationURI>
+ </link>
+ <link>
<name>PartialTranslOptColl.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/PartialTranslOptColl.cpp</locationURI>
@@ -809,7 +854,7 @@
<link>
<name>RuleTable</name>
<type>2</type>
- <locationURI>virtual:/virtual</locationURI>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable</locationURI>
</link>
<link>
<name>SRI.lo</name>
@@ -822,11 +867,6 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/SRI.o</locationURI>
</link>
<link>
- <name>Scope3Parser</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
<name>ScoreComponentCollection.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/ScoreComponentCollection.cpp</locationURI>
@@ -887,6 +927,16 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/SearchNormal.h</locationURI>
</link>
<link>
+ <name>SearchNormalBatch.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-1-ECLIPSE_HOME/workspace/github/hieuhoang/moses/src/SearchNormalBatch.cpp</locationURI>
+ </link>
+ <link>
+ <name>SearchNormalBatch.h</name>
+ <type>1</type>
+ <locationURI>PARENT-1-ECLIPSE_HOME/workspace/github/hieuhoang/moses/src/SearchNormalBatch.h</locationURI>
+ </link>
+ <link>
<name>Sentence.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/Sentence.cpp</locationURI>
@@ -897,6 +947,21 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/Sentence.h</locationURI>
</link>
<link>
+ <name>SentenceAlignment.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/SentenceAlignment.cpp</locationURI>
+ </link>
+ <link>
+ <name>SentenceAlignment.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/SentenceAlignment.h</locationURI>
+ </link>
+ <link>
+ <name>SentenceMap.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/SentenceMap.h</locationURI>
+ </link>
+ <link>
<name>SentenceStats.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/SentenceStats.cpp</locationURI>
@@ -917,6 +982,26 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/SquareMatrix.h</locationURI>
</link>
<link>
+ <name>StackLattice.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/StackLattice.h</locationURI>
+ </link>
+ <link>
+ <name>StackLatticeBuilder.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/StackLatticeBuilder.cpp</locationURI>
+ </link>
+ <link>
+ <name>StackLatticeBuilder.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/StackLatticeBuilder.h</locationURI>
+ </link>
+ <link>
+ <name>StackLatticeSearcher.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/StackLatticeSearcher.h</locationURI>
+ </link>
+ <link>
<name>StackVec.h</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/StackVec.h</locationURI>
@@ -942,6 +1027,16 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/StaticData.o</locationURI>
</link>
<link>
+ <name>SuffixArray.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/SuffixArray.cpp</locationURI>
+ </link>
+ <link>
+ <name>SuffixArray.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/SuffixArray.h</locationURI>
+ </link>
+ <link>
<name>SyntacticLanguageModel.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/SyntacticLanguageModel.cpp</locationURI>
@@ -1182,6 +1277,31 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/Util.o</locationURI>
</link>
<link>
+ <name>VarSpanNode.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/VarSpanNode.h</locationURI>
+ </link>
+ <link>
+ <name>VarSpanTrieBuilder.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/VarSpanTrieBuilder.cpp</locationURI>
+ </link>
+ <link>
+ <name>VarSpanTrieBuilder.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/VarSpanTrieBuilder.h</locationURI>
+ </link>
+ <link>
+ <name>Vocabulary.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/Vocabulary.cpp</locationURI>
+ </link>
+ <link>
+ <name>Vocabulary.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/Vocabulary.h</locationURI>
+ </link>
+ <link>
<name>Word.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/Word.cpp</locationURI>
@@ -1337,6 +1457,16 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/ChartRuleLookupManagerMemory.h</locationURI>
</link>
<link>
+ <name>CYKPlusParser/ChartRuleLookupManagerMemoryPerSentence.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/ChartRuleLookupManagerMemoryPerSentence.cpp</locationURI>
+ </link>
+ <link>
+ <name>CYKPlusParser/ChartRuleLookupManagerMemoryPerSentence.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/ChartRuleLookupManagerMemoryPerSentence.h</locationURI>
+ </link>
+ <link>
<name>CYKPlusParser/ChartRuleLookupManagerOnDisk.cpp</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/ChartRuleLookupManagerOnDisk.cpp</locationURI>
@@ -1382,6 +1512,16 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
+ <name>DynSAInclude/FileHandler.cpp</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/FileHandler.cpp</locationURI>
+ </link>
+ <link>
+ <name>DynSAInclude/FileHandler.h</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/FileHandler.h</locationURI>
+ </link>
+ <link>
<name>DynSAInclude/Jamfile</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/Jamfile</locationURI>
@@ -1397,26 +1537,11 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/RandLMFilter.h</locationURI>
</link>
<link>
- <name>DynSAInclude/bin</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
<name>DynSAInclude/fdstream.h</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/fdstream.h</locationURI>
</link>
<link>
- <name>DynSAInclude/file.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/file.cpp</locationURI>
- </link>
- <link>
- <name>DynSAInclude/file.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/file.h</locationURI>
- </link>
- <link>
<name>DynSAInclude/hash.h</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/hash.h</locationURI>
@@ -1617,207 +1742,12 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>RuleTable/Jamfile</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/Jamfile</locationURI>
- </link>
- <link>
- <name>RuleTable/Loader.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/Loader.h</locationURI>
- </link>
- <link>
- <name>RuleTable/LoaderCompact.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/LoaderCompact.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/LoaderCompact.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/LoaderCompact.h</locationURI>
- </link>
- <link>
- <name>RuleTable/LoaderFactory.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/LoaderFactory.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/LoaderFactory.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/LoaderFactory.h</locationURI>
- </link>
- <link>
- <name>RuleTable/LoaderHiero.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/LoaderHiero.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/LoaderHiero.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/LoaderHiero.h</locationURI>
- </link>
- <link>
- <name>RuleTable/LoaderStandard.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/LoaderStandard.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/LoaderStandard.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/LoaderStandard.h</locationURI>
- </link>
- <link>
- <name>RuleTable/PhraseDictionaryALSuffixArray.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/PhraseDictionaryALSuffixArray.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/PhraseDictionaryALSuffixArray.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/PhraseDictionaryALSuffixArray.h</locationURI>
- </link>
- <link>
- <name>RuleTable/PhraseDictionaryNodeSCFG.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/PhraseDictionaryNodeSCFG.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/PhraseDictionaryNodeSCFG.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/PhraseDictionaryNodeSCFG.h</locationURI>
- </link>
- <link>
- <name>RuleTable/PhraseDictionaryOnDisk.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/PhraseDictionaryOnDisk.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/PhraseDictionaryOnDisk.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/PhraseDictionaryOnDisk.h</locationURI>
- </link>
- <link>
- <name>RuleTable/PhraseDictionarySCFG.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/PhraseDictionarySCFG.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/PhraseDictionarySCFG.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/PhraseDictionarySCFG.h</locationURI>
- </link>
- <link>
- <name>RuleTable/Trie.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/Trie.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/Trie.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/Trie.h</locationURI>
- </link>
- <link>
- <name>RuleTable/UTrie.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/UTrie.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/UTrie.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/UTrie.h</locationURI>
- </link>
- <link>
- <name>RuleTable/UTrieNode.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/UTrieNode.cpp</locationURI>
- </link>
- <link>
- <name>RuleTable/UTrieNode.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/UTrieNode.h</locationURI>
- </link>
- <link>
- <name>RuleTable/bin</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>Scope3Parser/ApplicableRuleTrie.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/ApplicableRuleTrie.cpp</locationURI>
- </link>
- <link>
- <name>Scope3Parser/ApplicableRuleTrie.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/ApplicableRuleTrie.h</locationURI>
- </link>
- <link>
- <name>Scope3Parser/IntermediateVarSpanNode.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/IntermediateVarSpanNode.h</locationURI>
- </link>
- <link>
- <name>Scope3Parser/Jamfile</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/Jamfile</locationURI>
- </link>
- <link>
- <name>Scope3Parser/Parser.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/Parser.cpp</locationURI>
- </link>
- <link>
- <name>Scope3Parser/Parser.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/Parser.h</locationURI>
- </link>
- <link>
- <name>Scope3Parser/SentenceMap.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/SentenceMap.h</locationURI>
- </link>
- <link>
- <name>Scope3Parser/StackLattice.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/StackLattice.h</locationURI>
- </link>
- <link>
- <name>Scope3Parser/StackLatticeBuilder.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/StackLatticeBuilder.cpp</locationURI>
- </link>
- <link>
- <name>Scope3Parser/StackLatticeBuilder.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/StackLatticeBuilder.h</locationURI>
- </link>
- <link>
- <name>Scope3Parser/StackLatticeSearcher.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/StackLatticeSearcher.h</locationURI>
- </link>
- <link>
- <name>Scope3Parser/VarSpanNode.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/VarSpanNode.h</locationURI>
- </link>
- <link>
- <name>Scope3Parser/VarSpanTrieBuilder.cpp</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/VarSpanTrieBuilder.cpp</locationURI>
- </link>
- <link>
- <name>Scope3Parser/VarSpanTrieBuilder.h</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/VarSpanTrieBuilder.h</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin</name>
+ <name>bin/darwin-4.2.1</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>bin/darwin-4.2.1</name>
+ <name>bin/gcc-4.6</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -1832,12 +1762,7 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/darwin-4.2.1</name>
+ <name>CYKPlusParser/bin/gcc-4.6</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -1857,17 +1782,12 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/LM/bin/lm.log</locationURI>
</link>
<link>
- <name>RuleTable/bin/darwin-4.2.1</name>
+ <name>bin/darwin-4.2.1/release</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>Scope3Parser/bin/darwin-4.2.1</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>bin/darwin-4.2.1/release</name>
+ <name>bin/gcc-4.6/release</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -1882,12 +1802,7 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1/release</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/darwin-4.2.1/release</name>
+ <name>CYKPlusParser/bin/gcc-4.6/release</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -1902,17 +1817,12 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>RuleTable/bin/darwin-4.2.1/release</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release</name>
+ <name>bin/darwin-4.2.1/release/debug-symbols-on</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>bin/darwin-4.2.1/release/debug-symbols-on</name>
+ <name>bin/gcc-4.6/release/debug-symbols-on</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -1927,12 +1837,7 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on</name>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -1952,17 +1857,12 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on</name>
+ <name>bin/darwin-4.2.1/release/debug-symbols-on/link-static</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>bin/darwin-4.2.1/release/debug-symbols-on/link-static</name>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -1982,12 +1882,7 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static</name>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -2012,27 +1907,12 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi</name>
+ <name>bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi</name>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -2072,12 +1952,7 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/DotChartOnDisk.o</locationURI>
</link>
<link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi</name>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi</name>
<type>2</type>
<locationURI>virtual:/virtual</locationURI>
</link>
@@ -2192,91 +2067,6 @@
<locationURI>virtual:/virtual</locationURI>
</link>
<link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/LoaderCompact.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/LoaderCompact.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/LoaderFactory.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/LoaderFactory.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/LoaderHiero.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/LoaderHiero.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/LoaderStandard.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/LoaderStandard.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/PhraseDictionaryALSuffixArray.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/PhraseDictionaryALSuffixArray.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/PhraseDictionaryNodeSCFG.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/PhraseDictionaryNodeSCFG.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/PhraseDictionaryOnDisk.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/PhraseDictionaryOnDisk.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/PhraseDictionarySCFG.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/PhraseDictionarySCFG.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/Trie.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/Trie.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/UTrie.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/UTrie.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/UTrieNode.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/UTrieNode.o</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/ApplicableRuleTrie.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/ApplicableRuleTrie.o</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/Parser.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/Parser.o</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/StackLatticeBuilder.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/StackLatticeBuilder.o</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/VarSpanTrieBuilder.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/threading-multi/VarSpanTrieBuilder.o</locationURI>
- </link>
- <link>
<name>bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/AlignmentInfo.o</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/AlignmentInfo.o</locationURI>
@@ -2752,6 +2542,56 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libmoses_internal.a</locationURI>
</link>
<link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ApplicableRuleTrie.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ApplicableRuleTrie.o</locationURI>
+ </link>
+ <link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/FuzzyMatchWrapper.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/FuzzyMatchWrapper.o</locationURI>
+ </link>
+ <link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/Parser.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/Parser.o</locationURI>
+ </link>
+ <link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/SentenceAlignment.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/SentenceAlignment.o</locationURI>
+ </link>
+ <link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/StackLatticeBuilder.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/StackLatticeBuilder.o</locationURI>
+ </link>
+ <link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/SuffixArray.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/SuffixArray.o</locationURI>
+ </link>
+ <link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/VarSpanTrieBuilder.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/VarSpanTrieBuilder.o</locationURI>
+ </link>
+ <link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/Vocabulary.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/Vocabulary.o</locationURI>
+ </link>
+ <link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/libScope3Parser.a</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/libScope3Parser.a</locationURI>
+ </link>
+ <link>
+ <name>bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/libfuzzy-match.a</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/fuzzy-match/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/libfuzzy-match.a</locationURI>
+ </link>
+ <link>
<name>CYKPlusParser/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DotChartOnDisk.o</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DotChartOnDisk.o</locationURI>
@@ -2787,24 +2627,39 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libCYKPlusParser.a</locationURI>
</link>
<link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ChartRuleLookupManagerCYKPlus.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ChartRuleLookupManagerCYKPlus.o</locationURI>
</link>
<link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libdynsa.a</name>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ChartRuleLookupManagerMemory.o</name>
<type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libdynsa.a</locationURI>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ChartRuleLookupManagerMemory.o</locationURI>
</link>
<link>
- <name>DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude</name>
- <type>2</type>
- <locationURI>virtual:/virtual</locationURI>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ChartRuleLookupManagerMemoryPerSentence.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ChartRuleLookupManagerMemoryPerSentence.o</locationURI>
+ </link>
+ <link>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ChartRuleLookupManagerOnDisk.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/ChartRuleLookupManagerOnDisk.o</locationURI>
+ </link>
+ <link>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/DotChartInMemory.o</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/DotChartInMemory.o</locationURI>
</link>
<link>
- <name>DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libdynsa.a</name>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/DotChartOnDisk.o</name>
<type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libdynsa.a</locationURI>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/DotChartOnDisk.o</locationURI>
+ </link>
+ <link>
+ <name>CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/libCYKPlusParser.a</name>
+ <type>1</type>
+ <locationURI>PARENT-3-PROJECT_LOC/moses/src/CYKPlusParser/bin/gcc-4.6/release/debug-symbols-on/link-static/threading-multi/libCYKPlusParser.a</locationURI>
</link>
<link>
<name>LM/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/Base.o</name>
@@ -2922,91 +2777,6 @@
<locationURI>PARENT-3-PROJECT_LOC/moses/src/LM/bin/gcc-4.2.1/release/debug-symbols-on/link-static/threading-multi/libLM.a</locationURI>
</link>
<link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/LoaderCompact.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/LoaderCompact.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/LoaderFactory.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/LoaderFactory.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/LoaderHiero.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/LoaderHiero.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/LoaderStandard.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/LoaderStandard.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/PhraseDictionaryALSuffixArray.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/PhraseDictionaryALSuffixArray.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/PhraseDictionaryNodeSCFG.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/PhraseDictionaryNodeSCFG.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/PhraseDictionaryOnDisk.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/PhraseDictionaryOnDisk.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/PhraseDictionarySCFG.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/PhraseDictionarySCFG.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/Trie.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/Trie.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/UTrie.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/UTrie.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/UTrieNode.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/UTrieNode.o</locationURI>
- </link>
- <link>
- <name>RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libRuleTable.a</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/RuleTable/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libRuleTable.a</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/ApplicableRuleTrie.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/ApplicableRuleTrie.o</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/Parser.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/Parser.o</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/StackLatticeBuilder.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/StackLatticeBuilder.o</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/VarSpanTrieBuilder.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/VarSpanTrieBuilder.o</locationURI>
- </link>
- <link>
- <name>Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libScope3Parser.a</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/Scope3Parser/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/libScope3Parser.a</locationURI>
- </link>
- <link>
<name>bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/file.o</name>
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/file.o</locationURI>
@@ -3021,35 +2791,5 @@
<type>1</type>
<locationURI>PARENT-3-PROJECT_LOC/moses/src/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/vocab.o</locationURI>
</link>
- <link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/file.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/file.o</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/params.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/params.o</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/vocab.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/bin/clang-darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/vocab.o</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/file.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/file.o</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/params.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/params.o</locationURI>
- </link>
- <link>
- <name>DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/vocab.o</name>
- <type>1</type>
- <locationURI>PARENT-3-PROJECT_LOC/moses/src/DynSAInclude/bin/darwin-4.2.1/release/debug-symbols-on/link-static/threading-multi/DynSAInclude/vocab.o</locationURI>
- </link>
</linkedResources>
</projectDescription>
diff --git a/contrib/other-builds/util/.cproject b/contrib/other-builds/util/.cproject
index 46e9a02b6..8ea5ab73b 100644
--- a/contrib/other-builds/util/.cproject
+++ b/contrib/other-builds/util/.cproject
@@ -41,9 +41,12 @@
<option id="gnu.cpp.compilermacosx.exe.debug.option.optimization.level.623959371" name="Optimization Level" superClass="gnu.cpp.compilermacosx.exe.debug.option.optimization.level" value="gnu.cpp.compiler.optimization.level.none" valueType="enumerated"/>
<option id="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level.892917290" name="Debug Level" superClass="gnu.cpp.compiler.macosx.exe.debug.option.debugging.level" value="gnu.cpp.compiler.debugging.level.max" valueType="enumerated"/>
<option id="gnu.cpp.compiler.option.include.paths.1401298824" name="Include paths (-I)" superClass="gnu.cpp.compiler.option.include.paths" valueType="includePath">
- <listOptionValue builtIn="false" value="/Users/hieuhoang/unison/workspace/github/moses-smt"/>
+ <listOptionValue builtIn="false" value="${workspace_loc}/../../"/>
<listOptionValue builtIn="false" value="/opt/local/include"/>
</option>
+ <option id="gnu.cpp.compiler.option.preprocessor.def.1952961175" name="Defined symbols (-D)" superClass="gnu.cpp.compiler.option.preprocessor.def" valueType="definedSymbols">
+ <listOptionValue builtIn="false" value="TRACE_ENABLE"/>
+ </option>
<inputType id="cdt.managedbuild.tool.gnu.cpp.compiler.input.1420621104" superClass="cdt.managedbuild.tool.gnu.cpp.compiler.input"/>
</tool>
<tool id="cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.debug.1724141901" name="GCC C Compiler" superClass="cdt.managedbuild.tool.gnu.c.compiler.macosx.exe.debug">
@@ -130,4 +133,5 @@
<storageModule moduleId="refreshScope" versionNumber="1">
<resource resourceType="PROJECT" workspacePath="/util"/>
</storageModule>
+ <storageModule moduleId="org.eclipse.cdt.make.core.buildtargets"/>
</cproject>
diff --git a/contrib/relent-filter/AUTHORS b/contrib/relent-filter/AUTHORS
new file mode 100644
index 000000000..184a6dddd
--- /dev/null
+++ b/contrib/relent-filter/AUTHORS
@@ -0,0 +1 @@
+Wang Ling - lingwang at cs dot cmu dot edu
diff --git a/contrib/relent-filter/README.txt b/contrib/relent-filter/README.txt
new file mode 100644
index 000000000..e791d1f8a
--- /dev/null
+++ b/contrib/relent-filter/README.txt
@@ -0,0 +1,91 @@
+Implementation of the Relative Entropy-based Phrase table filtering algorithm by Wang Ling (Ling et al, 2012).
+
+This implementation also calculates the significance scores for the phrase tables based on the Fisher's Test(Johnson et al, 2007). Uses a slightly modified version of the "sigtest-filter" by Chris Dyer.
+
+-------BUILD INSTRUCTIONS-------
+
+1 - Build the sigtest-filter binary
+
+1.1 - Download and build SALM available at http://projectile.sv.cmu.edu/research/public/tools/salm/salm.htm
+
+1.2 - Run "make SALMDIR=<path_to_salm>" in "<path_to_moses>/contrib/relent-filter/sigtest-filter" to create the executable filter-pt
+
+2 - Build moses project by running "./bjam <options>", this will create the executables for relent filtering
+
+-------USAGE INSTRUCTIONS-------
+
+Required files:
+s_train - source training file
+t_train - target training file
+moses_ini - path to the moses configuration file ( after tuning )
+pruning_binaries - path to the relent pruning binaries ( should be "<path_to_moses>/bin" )
+pruning_scripts - path to the relent pruning scripts ( should be "<path_to_moses>/contrib/relent-filter/scripts" )
+sigbin - path to the sigtest filter binaries ( should be "<path_to_moses>/contrib/relent-filter/sigtest-filter" )
+output_dir - path to write the output
+
+1 - build suffix arrays for the source and target parallel training data
+
+1.1 - run "<path to salm>/Bin/Linux/Index/IndexSA.O32 <s_train>" (or IndexSA.O64)
+
+1.2 - run "<path to salm>/Bin/Linux/Index/IndexSA.O32 <t_train>" (or IndexSA.O64)
+
+2 - calculate phrase pair scores by running:
+
+perl <pruning_scripts>/calcPruningScores.pl -moses_ini <moses_ini> -training_s <s_train> -training_t <t_train> -prune_bin <pruning_binaries> -prune_scripts <pruning_scripts> -moses_scripts <path_to_moses>/scripts/training/ -workdir <output_dir> -dec_size 10000
+
+this will create the following files in the <output_dir/scores/> dir:
+
+count.txt - counts of the phrase pairs for N(s,t) N(s,*) and N(*,t)
+divergence.txt - negative log of the divergence of the phrase pair
+empirical.txt - empirical distribution of the phrase pairs N(s,t)/N(*,*)
+rel_ent.txt - relative entropy of the phrase pairs
+significance.txt - significance of the phrase pairs
+
+You can use any one of these files for pruning and also combine these scores using <pruning_scripts>/interpolateScores.pl
+
+3 - To actually prune a phrase table you should run <pruning_scripts>/prunePT.pl
+
+For instance, to prune 30% of the phrase table using rel_ent run:
+perl <pruning_scripts>/prunePT.pl -table <phrase_table_file> -scores <output_dir>/scores/rel_ent.txt -percentage 70 > <pruned_phrase_table_file>
+
+You can also prune by threshold
+perl <pruning_scripts>/prunePT.pl -table <phrase_table_file> -scores <output_dir>/scores/rel_ent.txt -threshold 0.1 > <pruned_phrase_table_file>
+
+The same must be done for the reordering table by replacing <phrase_table_file> with the <reord_table_file>
+
+perl <pruning_scripts>/prunePT.pl -table <reord_table_file> -scores <output_dir>/scores/rel_ent.txt -percentage 70 > <pruned_reord_table_file>
+
+-------RUNNING STEP 2 IN PARALLEL-------
+
+Step 2 requires the forced decoding of the whole set of phrase pairs in the table, so unless you test it on a small corpora, it usually requires large amounts of time to process.
+Thus, we recommend users to run multiple instances of "<pruning_scripts>/calcPruningScores.pl" in parallel to process different parts of the phrase table.
+
+To do this, run:
+
+perl <pruning_scripts>/calcPruningScores.pl -moses_ini <moses_ini> -training_s <s_train> -training_t <t_train> -prune_bin <pruning_binaries> -prune_scripts <pruning_scripts> -moses_scripts <path_to_moses>/scripts/training/ -workdir <output_dir> -dec_size 10000 -start 0 -end 100000
+
+The -start and -end tags tell the script to only calculate the results for phrase pairs between 0 and 99999.
+
+Thus, an example of a shell script to run for the whole phrase table would be:
+
+size=`wc <phrase_table_file> | gawk '{print $1}'`
+phrases_per_process=100000
+
+for i in $(seq 0 $phrases_per_process $size)
+do
+ end=`expr $i + $phrases_per_process`
+ perl <pruning_scripts>/calcPruningScores.pl -moses_ini <moses_ini> -training_s <s_train> -training_t <t_train> -prune_bin <pruning_binaries> -prune_scripts <pruning_scripts> -moses_scripts <path_to_moses>/scripts/training/ -workdir <output_dir>.$i-$end -dec_size 10000 -start $i -end $end
+done
+
+After all processes finish, simply join the partial score files together in the same order.
+
+-------REFERENCES-------
+Ling, W., Graça, J., Trancoso, I., and Black, A. (2012). Entropy-based pruning for phrase-based
+machine translation. In Proceedings of the 2012
+Joint Conference on Empirical Methods in Natural Language Processing and
+Computational Natural Language Learning (EMNLP-CoNLL), pp. 962-971.
+
+H. Johnson, J. Martin, G. Foster and R. Kuhn. (2007) Improving Translation
+Quality by Discarding Most of the Phrasetable. In Proceedings of the 2007
+Joint Conference on Empirical Methods in Natural Language Processing and
+Computational Natural Language Learning (EMNLP-CoNLL), pp. 967-975.
diff --git a/contrib/relent-filter/scripts/calcEmpiricalDistribution.pl b/contrib/relent-filter/scripts/calcEmpiricalDistribution.pl
new file mode 100644
index 000000000..462ec5339
--- /dev/null
+++ b/contrib/relent-filter/scripts/calcEmpiricalDistribution.pl
@@ -0,0 +1,53 @@
+#!/usr/bin/perl -w
+
+# read arguments
+my $countFile = $ARGV[0];
+
+my $ZCAT = "gzip -cd";
+my $BZCAT = "bzcat";
+
+&process_count_file($countFile);
+
+sub process_count_file {
+ $file = $_[0];
+ open(COUNT_READER, &open_compressed($file)) or die "ERROR: Can't read $file";
+
+ print STDERR "reading file to calculate normalizer";
+ $normalizer=0;
+ while(<COUNT_READER>) {
+ my $line = $_;
+ chomp($line);
+ my @line_array = split(/\s+/, $line);
+ my $count = $line_array[0];
+ $normalizer+=$count;
+ }
+
+ close(COUNT_READER);
+
+ print STDERR "reading file again to print the counts";
+ open(COUNT_READER, &open_compressed($file)) or die "ERROR: Can't read $file";
+
+ while(<COUNT_READER>) {
+ my $line = $_;
+ chomp($line);
+ my @line_array = split(/\s+/, $line);
+ my $score = $line_array[0]/$normalizer;
+ print $score."\n";
+ }
+
+ close(COUNT_READER);
+}
+
+sub open_compressed {
+ my ($file) = @_;
+ print STDERR "FILE: $file\n";
+
+ # add extensions, if necessary
+ $file = $file.".bz2" if ! -e $file && -e $file.".bz2";
+ $file = $file.".gz" if ! -e $file && -e $file.".gz";
+
+ # pipe zipped, if necessary
+ return "$BZCAT $file|" if $file =~ /\.bz2$/;
+ return "$ZCAT $file|" if $file =~ /\.gz$/;
+ return $file;
+}
diff --git a/contrib/relent-filter/scripts/calcPruningScores.pl b/contrib/relent-filter/scripts/calcPruningScores.pl
new file mode 100755
index 000000000..cbfabac55
--- /dev/null
+++ b/contrib/relent-filter/scripts/calcPruningScores.pl
@@ -0,0 +1,351 @@
+#!/usr/bin/perl -w
+use Getopt::Long;
+use File::Basename;
+use POSIX;
+
+# read arguments
+my $line_start = 0;
+my $line_end = LONG_MAX;
+my $tmp_dir = "";
+my $dec_size = LONG_MAX;
+$_HELP = 1 if (@ARGV < 1 or !GetOptions ("moses_ini=s" => \$moses_ini, #moses conf file
+"start:i" => \$line_start, #fisrt phrase to process
+"end:i" => \$line_end, #last sentence to process (not including)
+"training_s=s" => \$training_s, #source training file
+"training_t=s" => \$training_t, #target training file
+"prune_bin=s" => \$prune_bin, #binary files in the pruning toolkit
+"prune_scripts=s" => \$prune_scripts, #scripts in the pruning toolkit
+"sig_bin=s" => \$sig_bin, #binary files to calculate significance
+"moses_scripts=s" => \$moses_scripts, #dir with the moses scripts
+"tmp_dir:s" => \$tmp_dir, #dir with the moses scripts
+"dec_size:i" => \$dec_size, #dir with the moses scripts
+"workdir=s" => \$workdir)); #directory to put all the output files
+
+# help message if arguments are not correct
+if ($_HELP) {
+ print "
+Usage: perl calcPruningScores.pl [PARAMS]
+Function: Calculates relative entropy for each phrase pair in a translation model.
+Authors: Wang Ling ( lingwang at cs dot cmu dot edu )
+PARAMS:
+ -moses_ini : moses configuration file with the model to prune (phrase table, reordering table, weights etc...)
+ -training_s : source training file, please run salm first
+ -training_t : target training file, please run salm first
+ -prune_bin : path to the binaries for pruning (probably <PATH_TO_MOSES>/bin)
+ -prune_scripts : path to the scripts for pruning (probably the directory where this script is)
+ -sig_bin : path to the binary for significance testing included in this toolkit
+ -moses_scripts : path to the moses training scripts (where filter-model-given-input.pl is)
+ -workdir : directory to produce the output
+ -tmp_dir : directory to store temporary files (improve performance if stored in a local disk), omit to store in workdir
+ -dec_size : number of phrase pairs to be decoded at a time, omit to decode all selected phrase pairs at once
+ -start and -end : starting and ending phrase pairs to process, to be used if you want to launch multiple processes in parallel for different parts of the phrase table. If specified the process will process the phrase pairs from <start> to <end-1>
+
+For any questions contact lingwang at cs dot cmu dot edu
+";
+ exit(1);
+}
+
+# setting up working dirs
+my $TMP_DIR = $tmp_dir;
+if ($tmp_dir eq ""){
+ $TMP_DIR = "$workdir/tmp";
+}
+my $SCORE_DIR = "$workdir/scores";
+my $FILTER_DIR = "$TMP_DIR/filter";
+
+# files for divergence module
+my $SOURCE_FILE = "$TMP_DIR/source.txt";
+my $CONSTRAINT_FILE = "$TMP_DIR/constraint.txt";
+my $DIVERGENCE_FILE = "$SCORE_DIR/divergence.txt";
+
+# files for significance module
+my $SIG_TABLE_FILE = "$TMP_DIR/source_target.txt";
+my $SIG_MOD_OUTPUT = "$TMP_DIR/sig_mod.out";
+my $SIG_FILE = "$SCORE_DIR/significance.txt";
+my $COUNT_FILE = "$SCORE_DIR/count.txt";
+my $EMP_DIST_FILE= "$SCORE_DIR/empirical.txt";
+my $REL_ENT_FILE= "$SCORE_DIR/rel_ent.txt";
+
+# setting up executables
+my $ZCAT = "gzip -cd";
+my $BZCAT = "bzcat";
+my $CP = "cp";
+my $SED = "sed";
+my $RM = "rm";
+my $SORT_EXEC = "sort";
+my $PRUNE_EXEC = "$prune_bin/calcDivergence";
+my $SIG_EXEC = "$sig_bin/filter-pt";
+my $FILTER_EXEC = "perl $moses_scripts/filter-model-given-input.pl";
+my $CALC_EMP_EXEC ="perl $prune_scripts/calcEmpiricalDistribution.pl";
+my $INT_TABLE_EXEC = "perl $prune_scripts/interpolateScores.pl";
+
+# moses ini variables
+my ($TRANSLATION_TABLE_FILE, $REORDERING_TABLE_FILE);
+
+# phrase table variables
+my ($N_PHRASES, $N_PHRASES_TO_PROCESS);
+
+# main functions
+&prepare();
+&calc_sig_and_counts();
+&calc_div();
+&clear_up();
+
+# (1) preparing data
+sub prepare {
+ print STDERR "(1) preparing data @ ".`date`;
+ safesystem("mkdir -p $workdir") or die("ERROR: could not create work dir $workdir");
+ safesystem("mkdir -p $TMP_DIR") or die("ERROR: could not create work dir $TMP_DIR");
+ safesystem("mkdir -p $SCORE_DIR") or die("ERROR: could not create work dir $SCORE_DIR");
+ &get_moses_ini_params();
+ &copy_tables_to_tmp_dir();
+ &write_data_files();
+
+ $N_PHRASES = &get_number_of_phrases();
+ $line_end = ($line_end > $N_PHRASES) ? $N_PHRASES : $line_end;
+ $N_PHRASES_TO_PROCESS = $line_end - $line_start;
+}
+
+sub write_data_files {
+ open(SOURCE_WRITER,">".$SOURCE_FILE) or die "ERROR: Can't write $SOURCE_FILE";
+ open(CONSTRAINT_WRITER,">".$CONSTRAINT_FILE) or die "ERROR: Can't write $CONSTRAINT_FILE";
+ open(TABLE_WRITER,">".$SIG_TABLE_FILE) or die "ERROR: Can't write $SIG_TABLE_FILE";
+ open(TTABLE_READER, &open_compressed($TRANSLATION_TABLE_FILE)) or die "ERROR: Can't read $TRANSLATION_TABLE_FILE";
+
+ $line_number = 0;
+ while($line_number < $line_start && !eof(TTABLE_READER)){
+ <TTABLE_READER>;
+ $line_number++;
+ }
+ while($line_number < $line_end && !eof(TTABLE_READER)) {
+ my $line = <TTABLE_READER>;
+ chomp($line);
+ my @line_array = split(/\s+\|\|\|\s+/, $line);
+ my $source = $line_array[0];
+ my $target = $line_array[1];
+ my $scores = $line_array[2];
+ print TABLE_WRITER $source." ||| ".$target." ||| ".$scores."\n";
+ print SOURCE_WRITER $source."\n";
+ print CONSTRAINT_WRITER $target."\n";
+ $line_number++;
+ }
+
+ close(SOURCE_WRITER);
+ close(CONSTRAINT_WRITER);
+ close(TABLE_WRITER);
+ close(TTABLE_READER);
+}
+
+sub copy_tables_to_tmp_dir {
+ $tmp_t_table = "$TMP_DIR/".basename($TRANSLATION_TABLE_FILE);
+ $tmp_r_table = "$TMP_DIR/".basename($REORDERING_TABLE_FILE);
+ $tmp_moses_ini = "$TMP_DIR/moses.ini";
+ $cp_t_cmd = "$CP $TRANSLATION_TABLE_FILE $TMP_DIR";
+ $cp_r_cmd = "$CP $REORDERING_TABLE_FILE $TMP_DIR";
+ safesystem("$cp_t_cmd") or die("ERROR: could not run:\n $cp_t_cmd");
+ safesystem("$cp_r_cmd") or die("ERROR: could not run:\n $cp_r_cmd");
+
+ $sed_cmd = "$SED s#$TRANSLATION_TABLE_FILE#$tmp_t_table#g $moses_ini | $SED s#$REORDERING_TABLE_FILE#$tmp_r_table#g > $tmp_moses_ini";
+ safesystem("$sed_cmd") or die("ERROR: could not run:\n $sed_cmd");
+
+ $TRANSLATION_TABLE_FILE = $tmp_t_table;
+ $REORDERING_TABLE_FILE = $tmp_r_table;
+ $moses_ini = $tmp_moses_ini;
+}
+
+# (2) calculating sig and counts
+sub calc_sig_and_counts {
+ print STDERR "(2) calculating counts and significance".`date`;
+ print STDERR "(2.1) running significance module".`date`;
+ &run_significance_module();
+ print STDERR "(2.2) writing counts and significance tables".`date`;
+ &write_counts_and_significance_table();
+ print STDERR "(2.3) calculating empirical distribution".`date`;
+}
+
+sub write_counts_and_significance_table {
+ open(COUNT_WRITER,">".$COUNT_FILE) or die "ERROR: Can't write $COUNT_FILE";
+ open(SIG_WRITER,">".$SIG_FILE) or die "ERROR: Can't write $SIG_FILE";
+ open(SIG_MOD_READER, &open_compressed($SIG_MOD_OUTPUT)) or die "ERROR: Can't read $SIG_MOD_OUTPUT";
+
+ while(<SIG_MOD_READER>) {
+ my($line) = $_;
+ chomp($line);
+ my @line_array = split(/\s+\|\|\|\s+/, $line);
+ my $count = $line_array[0];
+ my $sig = $line_array[1];
+ print COUNT_WRITER $count."\n";
+ print SIG_WRITER $sig."\n";
+ }
+
+ close(SIG_MOD_READER);
+ close(COUNT_WRITER);
+ close(SIG_WRITER);
+}
+
+sub run_significance_module {
+ my $sig_cmd = "cat $SIG_TABLE_FILE | $SIG_EXEC -e $training_t -f $training_s -l -10000 -p -c > $SIG_MOD_OUTPUT";
+ safesystem("$sig_cmd") or die("ERROR: could not run:\n $sig_cmd");
+}
+
+# (3) calculating divergence
+sub calc_div {
+ print STDERR "(3) calculating relative entropy".`date`;
+ print STDERR "(3.1) calculating empirical distribution".`date`;
+ &calculate_empirical_distribution();
+ print STDERR "(3.2) calculating divergence (this might take a while)".`date`;
+ if($N_PHRASES_TO_PROCESS > $dec_size) {
+ &calculate_divergence_shared("$FILTER_DIR");
+ }
+ else{
+ &calculate_divergence($moses_ini);
+ }
+ print STDERR "(3.3) calculating relative entropy from empirical and divergence distributions".`date`;
+ &calculate_relative_entropy();
+}
+
+sub calculate_empirical_distribution {
+ my $emp_cmd = "$CALC_EMP_EXEC $COUNT_FILE > $EMP_DIST_FILE";
+ safesystem("$emp_cmd") or die("ERROR: could not run:\n $emp_cmd");
+}
+
+sub get_fragmented_file_name {
+ my ($name, $frag, $interval) = @_;
+ return "$name-$frag-".($frag+$interval);
+}
+
+sub calculate_divergence {
+ my $moses_ini_file = $_[0];
+ print STDERR "force decoding phrase pairs\n";
+ my $prune_cmd = "cat $SOURCE_FILE | $PRUNE_EXEC -f $moses_ini_file -constraint $CONSTRAINT_FILE -early-discarding-threshold 0 -s 100000 -ttable-limit 0 > $DIVERGENCE_FILE 2> /dev/null";
+ safesystem("$prune_cmd") or die("ERROR: could not run:\n $prune_cmd");
+}
+
+sub calculate_divergence_shared {
+ my $filter_dir = $_[0];
+
+ &split_file_into_chunks($SOURCE_FILE, $dec_size, $N_PHRASES_TO_PROCESS);
+ &split_file_into_chunks($CONSTRAINT_FILE, $dec_size, $N_PHRASES_TO_PROCESS);
+
+ for(my $i = 0; $i < $N_PHRASES_TO_PROCESS; $i = $i + $dec_size) {
+ my $filter_cmd = "$FILTER_EXEC ".&get_fragmented_file_name($FILTER_DIR, $i, $dec_size)." $moses_ini ".&get_fragmented_file_name($SOURCE_FILE, $i, $dec_size);
+ safesystem("$filter_cmd") or die("ERROR: could not run:\n $filter_cmd");
+
+ my $moses_ini_file = &get_fragmented_file_name($filter_dir, $i, $dec_size)."/moses.ini";
+ my $source_file = &get_fragmented_file_name($SOURCE_FILE, $i, $dec_size);
+ my $constraint_file = &get_fragmented_file_name($CONSTRAINT_FILE, $i, $dec_size);
+ my $prune_cmd;
+ print STDERR "force decoding phrase pairs $i to ".($i + $dec_size)."\n";
+ if($i == 0){
+ $prune_cmd = "cat $source_file | $PRUNE_EXEC -f $moses_ini_file -constraint $constraint_file -early-discarding-threshold 0 -s 100000 -ttable-limit 0 > $DIVERGENCE_FILE 2> /dev/null";
+ }
+ else{
+ $prune_cmd = "cat $source_file | $PRUNE_EXEC -f $moses_ini_file -constraint $constraint_file -early-discarding-threshold 0 -s 100000 -ttable-limit 0 >> $DIVERGENCE_FILE 2> /dev/null";
+ }
+ safesystem("$prune_cmd") or die("ERROR: could not run:\n $prune_cmd");
+
+ my $rm_cmd = "$RM -r ".&get_fragmented_file_name($FILTER_DIR, $i, $dec_size);
+ safesystem("$rm_cmd") or die("ERROR: could not run:\n $rm_cmd");
+
+ }
+}
+
+sub calculate_relative_entropy {
+ my $int_cmd = "$INT_TABLE_EXEC -files \"$EMP_DIST_FILE $DIVERGENCE_FILE\" -weights \"1 1\" -operation \"*\" > $REL_ENT_FILE";
+ safesystem("$int_cmd") or die("ERROR: could not run:\n $int_cmd");
+
+}
+
+# (4) clear up stuff that is not needed
+sub clear_up {
+ print STDERR "(4) removing tmp dir".`date`;
+ $rm_cmd = "$RM -r $TMP_DIR";
+ safesystem("$rm_cmd") or die("ERROR: could not run:\n $rm_cmd");
+}
+
+# utility functions
+
+sub safesystem {
+ print STDERR "Executing: @_\n";
+ system(@_);
+ if ($? == -1) {
+ print STDERR "ERROR: Failed to execute: @_\n $!\n";
+ exit(1);
+ }
+ elsif ($? & 127) {
+ printf STDERR "ERROR: Execution of: @_\n died with signal %d, %s coredump\n",
+ ($? & 127), ($? & 128) ? 'with' : 'without';
+ exit(1);
+ }
+ else {
+ my $exitcode = $? >> 8;
+ print STDERR "Exit code: $exitcode\n" if $exitcode;
+ return ! $exitcode;
+ }
+}
+
+sub open_compressed {
+ my ($file) = @_;
+ print STDERR "FILE: $file\n";
+
+ # add extensions, if necessary
+ $file = $file.".bz2" if ! -e $file && -e $file.".bz2";
+ $file = $file.".gz" if ! -e $file && -e $file.".gz";
+
+ # pipe zipped, if necessary
+ return "$BZCAT $file|" if $file =~ /\.bz2$/;
+ return "$ZCAT $file|" if $file =~ /\.gz$/;
+ return $file;
+}
+
+sub get_moses_ini_params {
+
+ open(MOSES_READER, $moses_ini);
+ while(<MOSES_READER>) {
+ my($line) = $_;
+ chomp($line);
+
+ if($line eq "[ttable-file]"){
+ $tableLine = <MOSES_READER>;
+ chomp($tableLine);
+ ($_,$_,$_,$_,$TRANSLATION_TABLE_FILE) = split(" ",$tableLine); # put the other parameters there if needed
+ }
+ if($line eq "[distortion-file]"){
+ $tableLine = <MOSES_READER>;
+ chomp($tableLine);
+ ($_,$_,$_,$REORDERING_TABLE_FILE) = split(" ",$tableLine); # put the other parameters there if needed
+ }
+ }
+ close(MOSES_READER);
+}
+
+sub get_number_of_phrases {
+ my $ret = 0;
+ open(TABLE_READER, &open_compressed($TRANSLATION_TABLE_FILE)) or die "ERROR: Can't read $TRANSLATION_TABLE_FILE";
+
+ while(<TABLE_READER>) {
+ $ret++;
+ }
+
+ close (TABLE_READER);
+ return $ret;
+}
+
+sub split_file_into_chunks {
+ my ($file_to_split, $chunk_size, $number_of_phrases_to_process) = @_;
+ open(SOURCE_READER, &open_compressed($file_to_split)) or die "ERROR: Can't read $file_to_split";
+ my $FRAG_SOURCE_WRITER;
+ for(my $i = 0; $i < $number_of_phrases_to_process && !eof(SOURCE_READER); $i++) {
+ if(($i % $chunk_size) == 0){ # open fragmented file to write
+ my $frag_file = &get_fragmented_file_name($file_to_split, $i, $chunk_size);
+ open(FRAG_SOURCE_WRITER, ">".$frag_file) or die "ERROR: Can't write $frag_file";
+ }
+ my $line = <SOURCE_READER>;
+ print FRAG_SOURCE_WRITER $line;
+ if((%i % $chunk_size) == $chunk_size - 1 || (%i % $chunk_size) == $number_of_phrases_to_process - 1){ # close fragmented file before opening a new one
+ close(FRAG_SOURCE_WRITER);
+ }
+ }
+}
+
+
diff --git a/contrib/relent-filter/scripts/interpolateScores.pl b/contrib/relent-filter/scripts/interpolateScores.pl
new file mode 100644
index 000000000..b204e951a
--- /dev/null
+++ b/contrib/relent-filter/scripts/interpolateScores.pl
@@ -0,0 +1,94 @@
+#!/usr/bin/perl -w
+use Getopt::Long;
+use File::Basename;
+use POSIX;
+
+$operation="+";
+
+# read arguments
+$_HELP = 1 if (@ARGV < 1 or !GetOptions ("files=s" => \$files, #moses conf file
+"weights=s" => \$weights,
+"operation=s" => \$operation)); #directory to put all the output files
+
+
+# help message if arguments are not correct
+if ($_HELP) {
+ print "Relative Entropy Pruning
+Usage: perl interpolateScores.pl [PARAMS]
+Function: interpolates any number of score files interlated by their weights
+Authors: Wang Ling ( lingwang at cs dot cmu dot edu )
+PARAMS:
+ -files=s : table files to interpolate separated by a space (Ex \"file1 file2 file3\")
+ -weights : interpolation weights separated by a space (Ex \"0.3 0.3 0.4\")
+ -operation : +,* or min depending on the operation to perform to combine scores
+For any questions contact lingwang at cs dot cmu dot edu
+";
+ exit(1);
+}
+
+@FILES = split(/\s+/, $files);
+@WEIGHTS = split(/\s+/, $weights);
+
+my $ZCAT = "gzip -cd";
+my $BZCAT = "bzcat";
+
+&interpolate();
+
+sub interpolate {
+ my @READERS;
+ for($i = 0; $i < @FILES; $i++){
+ local *FILE;
+ open(FILE, &open_compressed($FILES[$i])) or die "ERROR: Can't read $FILES[$i]";
+ push(@READERS, *FILE);
+ }
+ $FIRST = $READERS[0];
+ while(!eof($FIRST)) {
+ if($operation eq "+"){
+ my $score = 0;
+ for($i = 0; $i < @FILES; $i++){
+ my $READER = $READERS[$i];
+ my $line = <$READER>;
+ chomp($line);
+ $score += $line*$WEIGHTS[$i];
+ }
+ print "$score\n";
+ }
+ if($operation eq "*"){
+ my $score = 1;
+ for($i = 0; $i < @FILES; $i++){
+ my $READER = $READERS[$i];
+ my $line = <$READER>;
+ chomp($line);
+ $score *= $line ** $WEIGHTS[$i];
+ }
+ print "$score\n"
+ }
+ if($operation eq "min"){
+ my $score = 99999;
+ for($i = 0; $i < @FILES; $i++){
+ my $READER = $READERS[$i];
+ my $line = <$READER>;
+ chomp($line);
+ if ($score > $line*$WEIGHTS[$i]){
+ $score = $line*$WEIGHTS[$i];
+ }
+ }
+ print "$score\n"
+
+ }
+ }
+}
+
+sub open_compressed {
+ my ($file) = @_;
+ print STDERR "FILE: $file\n";
+
+ # add extensions, if necessary
+ $file = $file.".bz2" if ! -e $file && -e $file.".bz2";
+ $file = $file.".gz" if ! -e $file && -e $file.".gz";
+
+ # pipe zipped, if necessary
+ return "$BZCAT $file|" if $file =~ /\.bz2$/;
+ return "$ZCAT $file|" if $file =~ /\.gz$/;
+ return $file;
+}
diff --git a/contrib/relent-filter/scripts/prunePT.pl b/contrib/relent-filter/scripts/prunePT.pl
new file mode 100755
index 000000000..37dc30bad
--- /dev/null
+++ b/contrib/relent-filter/scripts/prunePT.pl
@@ -0,0 +1,114 @@
+#!/usr/bin/perl -w
+
+# read arguments
+my $tmp_dir = "";
+my $percentage = -1;
+my $threshold = -1;
+use Getopt::Long;
+$_HELP = 1 if (@ARGV < 1 or !GetOptions ("table=s" => \$table, #table to filter
+"scores=s" => \$scores_file, #scores of each phrase pair, should have same size as the table to filter
+"percentage=i" => \$percentage, # percentage of phrase table to remain
+"threshold=i" => \$threshold)); # threshold (score < threshold equals prune entry)
+
+# help message if arguments are not correct
+if ($_HELP) {
+ print "Relative Entropy Pruning
+Usage: perl prunePT.pl [PARAMS]
+Function: prunes a phrase table given a score file
+Authors: Wang Ling ( lingwang at cs dot cmu dot edu )
+PARAMS:
+ -table : table to prune
+ -percentage : percentage of phrase table to remain (if the scores do not allow the exact percentage if multiple entries have the same threshold, the script chooses to retain more than the given percentage)
+ -threshold : threshold to prune (score < threshold equals prune entry), do not use this if percentage is specified
+For any questions contact lingwang at cs dot cmu dot edu
+";
+ exit(1);
+}
+
+
+my $THRESHOLD = $threshold;
+if ($percentage != -1){
+ $THRESHOLD = &get_threshold_by_percentage($percentage);
+}
+
+my $ZCAT = "gzip -cd";
+my $BZCAT = "bzcat";
+
+&prune_by_threshold($THRESHOLD);
+
+sub prune_by_threshold {
+ my $th = $_[0];
+ print STDERR "pruning using threshold $th \n";
+ open (SCORE_READER, &open_compressed($scores_file));
+ open (TABLE_READER, &open_compressed($table));
+ $number_of_phrases=0;
+ $number_of_unpruned_phrases=0;
+ while(!eof(SCORE_READER) && !eof(TABLE_READER)){
+ $score_line = <SCORE_READER>;
+ $table_line = <TABLE_READER>;
+ chomp($score_line);
+ if($score_line >= $th){
+ print $table_line;
+ $number_of_unpruned_phrases++;
+ }
+ $number_of_phrases++;
+ }
+ print STDERR "pruned ".($number_of_phrases - $number_of_unpruned_phrases)." phrase pairs out of $number_of_phrases\n";
+}
+
+sub get_threshold_by_percentage {
+ my $percentage = $_[0];
+ $ret = 0;
+
+ $number_of_phrases = &get_number_of_phrases();
+ $stop_phrase = ($percentage * $number_of_phrases) / 100;
+ $phrase_number = 0;
+
+
+ open (SCORE_READER, &open_compressed($scores_file));
+ while(<SCORE_READER>) {
+ my $line = $_;
+
+ }
+ close (SCORE_READER);
+
+ open (SCORE_READER, "cat $scores_file | LC_ALL=c sort -g |");
+ while(<SCORE_READER>) {
+ my $line = $_;
+ if($phrase_number >= $stop_phrase){
+ chomp($line);
+ $ret = $line;
+ last;
+ }
+ $phrase_number++;
+ }
+
+ close (SCORE_READER);
+ return $ret;
+}
+
+sub get_number_of_phrases {
+ $ret = 0;
+ open (SCORE_READER, $scores_file);
+
+ while(<SCORE_READER>) {
+ $ret++;
+ }
+
+ close (SCORE_READER);
+ return $ret;
+}
+
+sub open_compressed {
+ my ($file) = @_;
+ print STDERR "FILE: $file\n";
+
+ # add extensions, if necessary
+ $file = $file.".bz2" if ! -e $file && -e $file.".bz2";
+ $file = $file.".gz" if ! -e $file && -e $file.".gz";
+
+ # pipe zipped, if necessary
+ return "$BZCAT $file|" if $file =~ /\.bz2$/;
+ return "$ZCAT $file|" if $file =~ /\.gz$/;
+ return $file;
+}
diff --git a/contrib/relent-filter/sigtest-filter/Makefile b/contrib/relent-filter/sigtest-filter/Makefile
new file mode 100755
index 000000000..71de9c45f
--- /dev/null
+++ b/contrib/relent-filter/sigtest-filter/Makefile
@@ -0,0 +1,10 @@
+SALMDIR=/Users/hieuhoang/workspace/salm
+FLAVOR?=o64
+INC=-I$(SALMDIR)/Src/Shared -I$(SALMDIR)/Src/SuffixArrayApplications -I$(SALMDIR)/Src/SuffixArrayApplications/SuffixArraySearch
+OBJS=$(SALMDIR)/Distribution/Linux/Objs/Search/_SuffixArrayApplicationBase.$(FLAVOR) $(SALMDIR)/Distribution/Linux/Objs/Search/_SuffixArraySearchApplicationBase.$(FLAVOR) $(SALMDIR)/Distribution/Linux/Objs/Shared/_String.$(FLAVOR) $(SALMDIR)/Distribution/Linux/Objs/Shared/_IDVocabulary.$(FLAVOR)
+
+all: filter-pt
+
+filter-pt: filter-pt.cpp
+ ./check-install $(SALMDIR)
+ $(CXX) -O6 $(INC) $(OBJS) -o filter-pt filter-pt.cpp
diff --git a/contrib/relent-filter/sigtest-filter/README.txt b/contrib/relent-filter/sigtest-filter/README.txt
new file mode 100755
index 000000000..b21129b89
--- /dev/null
+++ b/contrib/relent-filter/sigtest-filter/README.txt
@@ -0,0 +1,42 @@
+Re-implementation of Johnson et al. (2007)'s phrasetable filtering strategy.
+
+This implementation relies on Joy Zhang's SALM Suffix Array toolkit. It is
+available here:
+
+ http://projectile.sv.cmu.edu/research/public/tools/salm/salm.htm
+
+--Chris Dyer <redpony@umd.edu>
+
+BUILD INSTRUCTIONS
+---------------------------------
+
+1. Download and build SALM.
+
+2. make SALMDIR=/path/to/SALM
+
+
+USAGE INSTRUCTIONS
+---------------------------------
+
+1. Using the SALM/Bin/Linux/Index/IndexSA.O32, create a suffix array index
+ of the source and target sides of your training bitext.
+
+2. cat phrase-table.txt | ./filter-pt -e TARG.suffix -f SOURCE.suffix \
+ -l <FILTER-VALUE>
+
+ FILTER-VALUE is the -log prob threshold described in Johnson et al.
+ (2007)'s paper. It may be either 'a+e', 'a-e', or a positive real
+ value. 'a+e' is a good setting- it filters out <1,1,1> phrase pairs.
+ I also recommend using -n 30, which filteres out all but the top
+ 30 phrase pairs, sorted by P(e|f). This was used in the paper.
+
+3. Run with no options to see more use-cases.
+
+
+REFERENCES
+---------------------------------
+
+H. Johnson, J. Martin, G. Foster and R. Kuhn. (2007) Improving Translation
+ Quality by Discarding Most of the Phrasetable. In Proceedings of the 2007
+ Joint Conference on Empirical Methods in Natural Language Processing and
+ Computational Natural Language Learning (EMNLP-CoNLL), pp. 967-975.
diff --git a/contrib/relent-filter/sigtest-filter/WIN32_functions.cpp b/contrib/relent-filter/sigtest-filter/WIN32_functions.cpp
new file mode 100755
index 000000000..60ddd340c
--- /dev/null
+++ b/contrib/relent-filter/sigtest-filter/WIN32_functions.cpp
@@ -0,0 +1,231 @@
+// XGetopt.cpp Version 1.2
+//
+// Author: Hans Dietrich
+// hdietrich2@hotmail.com
+//
+// Description:
+// XGetopt.cpp implements getopt(), a function to parse command lines.
+//
+// History
+// Version 1.2 - 2003 May 17
+// - Added Unicode support
+//
+// Version 1.1 - 2002 March 10
+// - Added example to XGetopt.cpp module header
+//
+// This software is released into the public domain.
+// You are free to use it in any way you like.
+//
+// This software is provided "as is" with no expressed
+// or implied warranty. I accept no liability for any
+// damage or loss of business that this software may cause.
+//
+///////////////////////////////////////////////////////////////////////////////
+
+
+///////////////////////////////////////////////////////////////////////////////
+// if you are using precompiled headers then include this line:
+///////////////////////////////////////////////////////////////////////////////
+
+
+///////////////////////////////////////////////////////////////////////////////
+// if you are not using precompiled headers then include these lines:
+//#include <windows.h>
+//#include <stdio.h>
+//#include <tchar.h>
+///////////////////////////////////////////////////////////////////////////////
+
+
+#include <stdio.h>
+#include <string.h>
+#include <math.h>
+#include "WIN32_functions.h"
+
+
+///////////////////////////////////////////////////////////////////////////////
+//
+// X G e t o p t . c p p
+//
+//
+// NAME
+// getopt -- parse command line options
+//
+// SYNOPSIS
+// int getopt(int argc, char *argv[], char *optstring)
+//
+// extern char *optarg;
+// extern int optind;
+//
+// DESCRIPTION
+// The getopt() function parses the command line arguments. Its
+// arguments argc and argv are the argument count and array as
+// passed into the application on program invocation. In the case
+// of Visual C++ programs, argc and argv are available via the
+// variables __argc and __argv (double underscores), respectively.
+// getopt returns the next option letter in argv that matches a
+// letter in optstring. (Note: Unicode programs should use
+// __targv instead of __argv. Also, all character and string
+// literals should be enclosed in ( ) ).
+//
+// optstring is a string of recognized option letters; if a letter
+// is followed by a colon, the option is expected to have an argument
+// that may or may not be separated from it by white space. optarg
+// is set to point to the start of the option argument on return from
+// getopt.
+//
+// Option letters may be combined, e.g., "-ab" is equivalent to
+// "-a -b". Option letters are case sensitive.
+//
+// getopt places in the external variable optind the argv index
+// of the next argument to be processed. optind is initialized
+// to 0 before the first call to getopt.
+//
+// When all options have been processed (i.e., up to the first
+// non-option argument), getopt returns EOF, optarg will point
+// to the argument, and optind will be set to the argv index of
+// the argument. If there are no non-option arguments, optarg
+// will be set to NULL.
+//
+// The special option "--" may be used to delimit the end of the
+// options; EOF will be returned, and "--" (and everything after it)
+// will be skipped.
+//
+// RETURN VALUE
+// For option letters contained in the string optstring, getopt
+// will return the option letter. getopt returns a question mark (?)
+// when it encounters an option letter not included in optstring.
+// EOF is returned when processing is finished.
+//
+// BUGS
+// 1) Long options are not supported.
+// 2) The GNU double-colon extension is not supported.
+// 3) The environment variable POSIXLY_CORRECT is not supported.
+// 4) The + syntax is not supported.
+// 5) The automatic permutation of arguments is not supported.
+// 6) This implementation of getopt() returns EOF if an error is
+// encountered, instead of -1 as the latest standard requires.
+//
+// EXAMPLE
+// BOOL CMyApp::ProcessCommandLine(int argc, char *argv[])
+// {
+// int c;
+//
+// while ((c = getopt(argc, argv, ("aBn:"))) != EOF)
+// {
+// switch (c)
+// {
+// case ('a'):
+// TRACE(("option a\n"));
+// //
+// // set some flag here
+// //
+// break;
+//
+// case ('B'):
+// TRACE( ("option B\n"));
+// //
+// // set some other flag here
+// //
+// break;
+//
+// case ('n'):
+// TRACE(("option n: value=%d\n"), atoi(optarg));
+// //
+// // do something with value here
+// //
+// break;
+//
+// case ('?'):
+// TRACE(("ERROR: illegal option %s\n"), argv[optind-1]);
+// return FALSE;
+// break;
+//
+// default:
+// TRACE(("WARNING: no handler for option %c\n"), c);
+// return FALSE;
+// break;
+// }
+// }
+// //
+// // check for non-option args here
+// //
+// return TRUE;
+// }
+//
+///////////////////////////////////////////////////////////////////////////////
+
+char *optarg; // global argument pointer
+int optind = 0; // global argv index
+
+int getopt(int argc, char *argv[], char *optstring)
+{
+ static char *next = NULL;
+ if (optind == 0)
+ next = NULL;
+
+ optarg = NULL;
+
+ if (next == NULL || *next =='\0') {
+ if (optind == 0)
+ optind++;
+
+ if (optind >= argc || argv[optind][0] != ('-') || argv[optind][1] == ('\0')) {
+ optarg = NULL;
+ if (optind < argc)
+ optarg = argv[optind];
+ return EOF;
+ }
+
+ if (strcmp(argv[optind], "--") == 0) {
+ optind++;
+ optarg = NULL;
+ if (optind < argc)
+ optarg = argv[optind];
+ return EOF;
+ }
+
+ next = argv[optind];
+ next++; // skip past -
+ optind++;
+ }
+
+ char c = *next++;
+ char *cp = strchr(optstring, c);
+
+ if (cp == NULL || c == (':'))
+ return ('?');
+
+ cp++;
+ if (*cp == (':')) {
+ if (*next != ('\0')) {
+ optarg = next;
+ next = NULL;
+ } else if (optind < argc) {
+ optarg = argv[optind];
+ optind++;
+ } else {
+ return ('?');
+ }
+ }
+
+ return c;
+}
+
+// for an overview, see
+// W. Press, S. Teukolsky and W. Vetterling. (1992) Numerical Recipes in C. Chapter 6.1.
+double lgamma(int x)
+{
+ // size_t xx=(size_t)x; xx--; size_t sum=1; while (xx) { sum *= xx--; } return log((double)(sum));
+ if (x <= 2) {
+ return 0.0;
+ }
+ static double coefs[6] = {76.18009172947146, -86.50532032941677, 24.01409824083091, -1.231739572450155, 0.1208650973866179e-2, -0.5395239384953e-5};
+ double tmp=(double)x+5.5;
+ tmp -= (((double)x)+0.5)*log(tmp);
+ double y=(double)x;
+ double sum = 1.000000000190015;
+ for (size_t j=0; j<6; ++j) {
+ sum += coefs[j]/++y;
+ }
+ return -tmp+log(2.5066282746310005*sum/(double)x);
+} \ No newline at end of file
diff --git a/contrib/relent-filter/sigtest-filter/WIN32_functions.h b/contrib/relent-filter/sigtest-filter/WIN32_functions.h
new file mode 100755
index 000000000..6a719392e
--- /dev/null
+++ b/contrib/relent-filter/sigtest-filter/WIN32_functions.h
@@ -0,0 +1,24 @@
+// XGetopt.h Version 1.2
+//
+// Author: Hans Dietrich
+// hdietrich2@hotmail.com
+//
+// This software is released into the public domain.
+// You are free to use it in any way you like.
+//
+// This software is provided "as is" with no expressed
+// or implied warranty. I accept no liability for any
+// damage or loss of business that this software may cause.
+//
+///////////////////////////////////////////////////////////////////////////////
+
+#ifndef XGETOPT_H
+#define XGETOPT_H
+
+extern int optind, opterr;
+extern char *optarg;
+
+int getopt(int argc, char *argv[], char *optstring);
+double lgamma(int x);
+
+#endif //XGETOPT_H
diff --git a/contrib/relent-filter/sigtest-filter/check-install b/contrib/relent-filter/sigtest-filter/check-install
new file mode 100755
index 000000000..ba4f431e0
--- /dev/null
+++ b/contrib/relent-filter/sigtest-filter/check-install
@@ -0,0 +1,5 @@
+#!/usr/bin/perl -w
+use strict;
+my $path = shift @ARGV;
+die "Can't find SALM installation path: $path\nPlease use:\n\n make SALMDIR=/path/to/SALM\n\n" unless (-d $path);
+exit 0;
diff --git a/contrib/relent-filter/sigtest-filter/filter-pt.cpp b/contrib/relent-filter/sigtest-filter/filter-pt.cpp
new file mode 100755
index 000000000..4a51953ea
--- /dev/null
+++ b/contrib/relent-filter/sigtest-filter/filter-pt.cpp
@@ -0,0 +1,377 @@
+
+#include <cstring>
+#include <cassert>
+#include <cstdio>
+#include <cstdlib>
+#include <algorithm>
+
+#include "_SuffixArraySearchApplicationBase.h"
+
+#include <vector>
+#include <iostream>
+#include <set>
+
+#ifdef WIN32
+#include "WIN32_functions.h"
+#else
+#include <unistd.h>
+#endif
+
+typedef std::set<TextLenType> SentIdSet;
+typedef std::map<std::string, SentIdSet> PhraseSetMap;
+
+#undef min
+
+// constants
+const size_t MINIMUM_SIZE_TO_KEEP = 10000; // reduce this to improve memory usage,
+// increase for speed
+const std::string SEPARATOR = " ||| ";
+
+const double ALPHA_PLUS_EPS = -1000.0; // dummy value
+const double ALPHA_MINUS_EPS = -2000.0; // dummy value
+
+// configuration params
+int pfe_filter_limit = 0; // 0 = don't filter anything based on P(f|e)
+bool print_cooc_counts = false; // add cooc counts to phrase table?
+bool print_neglog_significance = false; // add -log(p) to phrase table?
+double sig_filter_limit = 0; // keep phrase pairs with -log(sig) > sig_filter_limit
+// higher = filter-more
+bool pef_filter_only = false; // only filter based on pef
+
+// globals
+PhraseSetMap esets;
+double p_111 = 0.0; // alpha
+size_t nremoved_sigfilter = 0;
+size_t nremoved_pfefilter = 0;
+
+C_SuffixArraySearchApplicationBase e_sa;
+C_SuffixArraySearchApplicationBase f_sa;
+int num_lines;
+
+void usage()
+{
+ std::cerr << "\nFilter phrase table using significance testing as described\n"
+ << "in H. Johnson, et al. (2007) Improving Translation Quality\n"
+ << "by Discarding Most of the Phrasetable. EMNLP 2007.\n"
+ << "\nUsage:\n"
+ << "\n filter-pt -e english.suf-arr -f french.suf-arr\n"
+ << " [-c] [-p] [-l threshold] [-n num] < PHRASE-TABLE > FILTERED-PHRASE-TABLE\n\n"
+ << " [-l threshold] >0.0, a+e, or a-e: keep values that have a -log significance > this\n"
+ << " [-n num ] 0, 1...: 0=no filtering, >0 sort by P(e|f) and keep the top num elements\n"
+ << " [-c ] add the cooccurence counts to the phrase table\n"
+ << " [-p ] add -log(significance) to the phrasetable\n\n";
+ exit(1);
+}
+
+struct PTEntry {
+ PTEntry(const std::string& str, int index);
+ std::string f_phrase;
+ std::string e_phrase;
+ std::string extra;
+ std::string scores;
+ float pfe;
+ int cf;
+ int ce;
+ int cfe;
+ float nlog_pte;
+ void set_cooc_stats(int _cef, int _cf, int _ce, float nlp) {
+ cfe = _cef;
+ cf = _cf;
+ ce = _ce;
+ nlog_pte = nlp;
+ }
+
+};
+
+PTEntry::PTEntry(const std::string& str, int index) :
+ cf(0), ce(0), cfe(0), nlog_pte(0.0)
+{
+ size_t pos = 0;
+ std::string::size_type nextPos = str.find(SEPARATOR, pos);
+ this->f_phrase = str.substr(pos,nextPos);
+
+ pos = nextPos + SEPARATOR.size();
+ nextPos = str.find(SEPARATOR, pos);
+ this->e_phrase = str.substr(pos,nextPos-pos);
+
+ pos = nextPos + SEPARATOR.size();
+ nextPos = str.find(SEPARATOR, pos);
+ this->scores = str.substr(pos,nextPos-pos);
+
+ pos = nextPos + SEPARATOR.size();
+ this->extra = str.substr(pos);
+
+ int c = 0;
+ std::string::iterator i=scores.begin();
+ if (index > 0) {
+ for (; i != scores.end(); ++i) {
+ if ((*i) == ' ') {
+ c++;
+ if (c == index) break;
+ }
+ }
+ }
+ if (i != scores.end()) {
+ ++i;
+ }
+ char f[24];
+ char *fp=f;
+ while (i != scores.end() && *i != ' ') {
+ *fp++=*i++;
+ }
+ *fp++=0;
+
+ this->pfe = atof(f);
+
+ // std::cerr << "L: " << f_phrase << " ::: " << e_phrase << " ::: " << scores << " ::: " << pfe << std::endl;
+ // std::cerr << "X: " << extra << "\n";
+}
+
+struct PfeComparer {
+ bool operator()(const PTEntry* a, const PTEntry* b) const {
+ return a->pfe > b->pfe;
+ }
+};
+
+struct NlogSigThresholder {
+ NlogSigThresholder(float threshold) : t(threshold) {}
+ float t;
+ bool operator()(const PTEntry* a) const {
+ if (a->nlog_pte < t) {
+ delete a;
+ return true;
+ } else return false;
+ }
+};
+
+std::ostream& operator << (std::ostream& os, const PTEntry& pp)
+{
+ //os << pp.f_phrase << " ||| " << pp.e_phrase;
+ //os << " ||| " << pp.scores;
+ //if (pp.extra.size()>0) os << " ||| " << pp.extra;
+ if (print_cooc_counts) os << pp.cfe << " " << pp.cf << " " << pp.ce;
+ if (print_neglog_significance) os << " ||| " << pp.nlog_pte;
+ return os;
+}
+
+void print(int a, int b, int c, int d, float p)
+{
+ std::cerr << a << "\t" << b << "\t P=" << p << "\n"
+ << c << "\t" << d << "\t xf=" << (double)(b)*(double)(c)/(double)(a+1)/(double)(d+1) << "\n\n";
+}
+
+// 2x2 (one-sided) Fisher's exact test
+// see B. Moore. (2004) On Log Likelihood and the Significance of Rare Events
+double fisher_exact(int cfe, int ce, int cf)
+{
+ assert(cfe <= ce);
+ assert(cfe <= cf);
+
+ int a = cfe;
+ int b = (cf - cfe);
+ int c = (ce - cfe);
+ int d = (num_lines - ce - cf + cfe);
+ int n = a + b + c + d;
+
+ double cp = exp(lgamma(1+a+c) + lgamma(1+b+d) + lgamma(1+a+b) + lgamma(1+c+d) - lgamma(1+n) - lgamma(1+a) - lgamma(1+b) - lgamma(1+c) - lgamma(1+d));
+ double total_p = 0.0;
+ int tc = std::min(b,c);
+ for (int i=0; i<=tc; i++) {
+ total_p += cp;
+// double lg = lgamma(1+a+c) + lgamma(1+b+d) + lgamma(1+a+b) + lgamma(1+c+d) - lgamma(1+n) - lgamma(1+a) - lgamma(1+b) - lgamma(1+c) - lgamma(1+d); double cp = exp(lg);
+// print(a,b,c,d,cp);
+ double coef = (double)(b)*(double)(c)/(double)(a+1)/(double)(d+1);
+ cp *= coef;
+ ++a;
+ --c;
+ ++d;
+ --b;
+ }
+ return total_p;
+}
+
+// input: unordered list of translation options for a single source phrase
+void compute_cooc_stats_and_filter(std::vector<PTEntry*>& options)
+{
+ if (pfe_filter_limit>0 && options.size() > pfe_filter_limit) {
+ nremoved_pfefilter += (options.size() - pfe_filter_limit);
+ std::nth_element(options.begin(), options.begin()+pfe_filter_limit, options.end(), PfeComparer());
+ for (std::vector<PTEntry*>::iterator i=options.begin()+pfe_filter_limit; i != options.end(); ++i)
+ delete *i;
+ options.erase(options.begin()+pfe_filter_limit,options.end());
+ }
+ if (pef_filter_only) return;
+
+ SentIdSet fset;
+ vector<S_SimplePhraseLocationElement> locations;
+ //std::cerr << "Looking up f-phrase: " << options.front()->f_phrase << "\n";
+
+ locations = f_sa.locateExactPhraseInCorpus(options.front()->f_phrase.c_str());
+ if(locations.size()==0) {
+ cerr<<"No occurrences found!!\n";
+ }
+ for (vector<S_SimplePhraseLocationElement>::iterator i=locations.begin();
+ i != locations.end();
+ ++i) {
+ fset.insert(i->sentIdInCorpus);
+ }
+ size_t cf = fset.size();
+ for (std::vector<PTEntry*>::iterator i=options.begin(); i != options.end(); ++i) {
+ const std::string& e_phrase = (*i)->e_phrase;
+ size_t cef=0;
+ SentIdSet& eset = esets[(*i)->e_phrase];
+ if (eset.empty()) {
+ //std::cerr << "Looking up e-phrase: " << e_phrase << "\n";
+ vector<S_SimplePhraseLocationElement> locations = e_sa.locateExactPhraseInCorpus(e_phrase.c_str());
+ for (vector<S_SimplePhraseLocationElement>::iterator i=locations.begin(); i!= locations.end(); ++i) {
+ TextLenType curSentId = i->sentIdInCorpus;
+ eset.insert(curSentId);
+ }
+ }
+ size_t ce=eset.size();
+ if (ce < cf) {
+ for (SentIdSet::iterator i=eset.begin(); i != eset.end(); ++i) {
+ if (fset.find(*i) != fset.end()) cef++;
+ }
+ } else {
+ for (SentIdSet::iterator i=fset.begin(); i != fset.end(); ++i) {
+ if (eset.find(*i) != eset.end()) cef++;
+ }
+ }
+ double nlp = -log(fisher_exact(cef, cf, ce));
+ (*i)->set_cooc_stats(cef, cf, ce, nlp);
+ if (ce < MINIMUM_SIZE_TO_KEEP) {
+ esets.erase(e_phrase);
+ }
+ }
+ std::vector<PTEntry*>::iterator new_end =
+ std::remove_if(options.begin(), options.end(), NlogSigThresholder(sig_filter_limit));
+ nremoved_sigfilter += (options.end() - new_end);
+ options.erase(new_end,options.end());
+}
+
+int main(int argc, char * argv[])
+{
+ int c;
+ const char* efile=0;
+ const char* ffile=0;
+ int pfe_index = 2;
+ while ((c = getopt(argc, argv, "cpf:e:i:n:l:")) != -1) {
+ switch (c) {
+ case 'e':
+ efile = optarg;
+ break;
+ case 'f':
+ ffile = optarg;
+ break;
+ case 'i': // index of pfe in phrase table
+ pfe_index = atoi(optarg);
+ break;
+ case 'n': // keep only the top n entries in phrase table sorted by p(f|e) (0=all)
+ pfe_filter_limit = atoi(optarg);
+ std::cerr << "P(f|e) filter limit: " << pfe_filter_limit << std::endl;
+ break;
+ case 'c':
+ print_cooc_counts = true;
+ break;
+ case 'p':
+ print_neglog_significance = true;
+ break;
+ case 'l':
+ std::cerr << "-l = " << optarg << "\n";
+ if (strcmp(optarg,"a+e") == 0) {
+ sig_filter_limit = ALPHA_PLUS_EPS;
+ } else if (strcmp(optarg,"a-e") == 0) {
+ sig_filter_limit = ALPHA_MINUS_EPS;
+ } else {
+ char *x;
+ sig_filter_limit = strtod(optarg, &x);
+ }
+ break;
+ default:
+ usage();
+ }
+ }
+ //-----------------------------------------------------------------------------
+ if (optind != argc || ((!efile || !ffile) && !pef_filter_only)) {
+ usage();
+ }
+
+ //load the indexed corpus with vocabulary(noVoc=false) and with offset(noOffset=false)
+ if (!pef_filter_only) {
+ e_sa.loadData_forSearch(efile, false, false);
+ f_sa.loadData_forSearch(ffile, false, false);
+ size_t elines = e_sa.returnTotalSentNumber();
+ size_t flines = f_sa.returnTotalSentNumber();
+ if (elines != flines) {
+ std::cerr << "Number of lines in e-corpus != number of lines in f-corpus!\n";
+ usage();
+ } else {
+ std::cerr << "Training corpus: " << elines << " lines\n";
+ num_lines = elines;
+ }
+ p_111 = -log(fisher_exact(1,1,1));
+ std::cerr << "\\alpha = " << p_111 << "\n";
+ if (sig_filter_limit == ALPHA_MINUS_EPS) {
+ sig_filter_limit = p_111 - 0.001;
+ } else if (sig_filter_limit == ALPHA_PLUS_EPS) {
+ sig_filter_limit = p_111 + 0.001;
+ }
+ std::cerr << "Sig filter threshold is = " << sig_filter_limit << "\n";
+ } else {
+ std::cerr << "Filtering using P(e|f) only. n=" << pfe_filter_limit << std::endl;
+ }
+
+ char tmpString[10000];
+ std::string prev = "";
+ std::vector<PTEntry*> options;
+ size_t pt_lines = 0;
+ while(!cin.eof()) {
+ cin.getline(tmpString,10000,'\n');
+ if(++pt_lines%10000==0) {
+ std::cerr << ".";
+ if(pt_lines%500000==0) std::cerr << "[n:"<<pt_lines<<"]\n";
+ }
+
+ if(strlen(tmpString)>0) {
+ PTEntry* pp = new PTEntry(tmpString, pfe_index);
+ if (prev != pp->f_phrase) {
+ prev = pp->f_phrase;
+
+ if (!options.empty()) { // always true after first line
+ compute_cooc_stats_and_filter(options);
+ }
+ for (std::vector<PTEntry*>::iterator i=options.begin(); i != options.end(); ++i) {
+ std::cout << **i << std::endl;
+ delete *i;
+ }
+ options.clear();
+ options.push_back(pp);
+
+ } else {
+ options.push_back(pp);
+ }
+ // for(int i=0;i<locations.size(); i++){
+ // cout<<"SentId="<<locations[i].sentIdInCorpus<<" Pos="<<(int)locations[i].posInSentInCorpus<<endl;
+ // }
+ }
+ }
+ compute_cooc_stats_and_filter(options);
+ for (std::vector<PTEntry*>::iterator i=options.begin(); i != options.end(); ++i) {
+ std::cout << **i << std::endl;
+ delete *i;
+ }
+ float pfefper = (100.0*(float)nremoved_pfefilter)/(float)pt_lines;
+ float sigfper = (100.0*(float)nremoved_sigfilter)/(float)pt_lines;
+ std::cerr << "\n\n------------------------------------------------------\n"
+ << " unfiltered phrases pairs: " << pt_lines << "\n"
+ << "\n"
+ << " P(f|e) filter [first]: " << nremoved_pfefilter << " (" << pfefper << "%)\n"
+ << " significance filter: " << nremoved_sigfilter << " (" << sigfper << "%)\n"
+ << " TOTAL FILTERED: " << (nremoved_pfefilter + nremoved_sigfilter) << " (" << (sigfper + pfefper) << "%)\n"
+ << "\n"
+ << " FILTERED phrase pairs: " << (pt_lines - nremoved_pfefilter - nremoved_sigfilter) << " (" << (100.0-sigfper - pfefper) << "%)\n"
+ << "------------------------------------------------------\n";
+
+ return 0;
+}
diff --git a/contrib/relent-filter/sigtest-filter/sigtest-filter.sln b/contrib/relent-filter/sigtest-filter/sigtest-filter.sln
new file mode 100755
index 000000000..517b06238
--- /dev/null
+++ b/contrib/relent-filter/sigtest-filter/sigtest-filter.sln
@@ -0,0 +1,20 @@
+
+Microsoft Visual Studio Solution File, Format Version 9.00
+# Visual Studio 2005
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "sigtest-filter", "sigtest-filter.vcproj", "{FA2910DF-FD9D-4E6D-A393-9F9F9E309E78}"
+EndProject
+Global
+ GlobalSection(SolutionConfigurationPlatforms) = preSolution
+ Debug|Win32 = Debug|Win32
+ Release|Win32 = Release|Win32
+ EndGlobalSection
+ GlobalSection(ProjectConfigurationPlatforms) = postSolution
+ {FA2910DF-FD9D-4E6D-A393-9F9F9E309E78}.Debug|Win32.ActiveCfg = Debug|Win32
+ {FA2910DF-FD9D-4E6D-A393-9F9F9E309E78}.Debug|Win32.Build.0 = Debug|Win32
+ {FA2910DF-FD9D-4E6D-A393-9F9F9E309E78}.Release|Win32.ActiveCfg = Release|Win32
+ {FA2910DF-FD9D-4E6D-A393-9F9F9E309E78}.Release|Win32.Build.0 = Release|Win32
+ EndGlobalSection
+ GlobalSection(SolutionProperties) = preSolution
+ HideSolutionNode = FALSE
+ EndGlobalSection
+EndGlobal
diff --git a/contrib/relent-filter/src/IOWrapper.cpp b/contrib/relent-filter/src/IOWrapper.cpp
new file mode 100755
index 000000000..053735c96
--- /dev/null
+++ b/contrib/relent-filter/src/IOWrapper.cpp
@@ -0,0 +1,580 @@
+// $Id$
+
+/***********************************************************************
+Moses - factored phrase-based language decoder
+Copyright (c) 2006 University of Edinburgh
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice,
+ this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+ this list of conditions and the following disclaimer in the documentation
+ and/or other materials provided with the distribution.
+ * Neither the name of the University of Edinburgh nor the names of its contributors
+ may be used to endorse or promote products derived from this software
+ without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+***********************************************************************/
+
+// example file on how to use moses library
+
+#include <iostream>
+#include <stack>
+#include "TypeDef.h"
+#include "Util.h"
+#include "IOWrapper.h"
+#include "Hypothesis.h"
+#include "WordsRange.h"
+#include "TrellisPathList.h"
+#include "StaticData.h"
+#include "DummyScoreProducers.h"
+#include "InputFileStream.h"
+
+using namespace std;
+using namespace Moses;
+
+namespace MosesCmd
+{
+
+IOWrapper::IOWrapper(
+ const vector<FactorType> &inputFactorOrder
+ , const vector<FactorType> &outputFactorOrder
+ , const FactorMask &inputFactorUsed
+ , size_t nBestSize
+ , const string &nBestFilePath)
+ :m_inputFactorOrder(inputFactorOrder)
+ ,m_outputFactorOrder(outputFactorOrder)
+ ,m_inputFactorUsed(inputFactorUsed)
+ ,m_inputFile(NULL)
+ ,m_inputStream(&std::cin)
+ ,m_nBestStream(NULL)
+ ,m_outputWordGraphStream(NULL)
+ ,m_outputSearchGraphStream(NULL)
+ ,m_detailedTranslationReportingStream(NULL)
+ ,m_alignmentOutputStream(NULL)
+{
+ Initialization(inputFactorOrder, outputFactorOrder
+ , inputFactorUsed
+ , nBestSize, nBestFilePath);
+}
+
+IOWrapper::IOWrapper(const std::vector<FactorType> &inputFactorOrder
+ , const std::vector<FactorType> &outputFactorOrder
+ , const FactorMask &inputFactorUsed
+ , size_t nBestSize
+ , const std::string &nBestFilePath
+ , const std::string &inputFilePath)
+ :m_inputFactorOrder(inputFactorOrder)
+ ,m_outputFactorOrder(outputFactorOrder)
+ ,m_inputFactorUsed(inputFactorUsed)
+ ,m_inputFilePath(inputFilePath)
+ ,m_inputFile(new InputFileStream(inputFilePath))
+ ,m_nBestStream(NULL)
+ ,m_outputWordGraphStream(NULL)
+ ,m_outputSearchGraphStream(NULL)
+ ,m_detailedTranslationReportingStream(NULL)
+ ,m_alignmentOutputStream(NULL)
+{
+ Initialization(inputFactorOrder, outputFactorOrder
+ , inputFactorUsed
+ , nBestSize, nBestFilePath);
+
+ m_inputStream = m_inputFile;
+}
+
+IOWrapper::~IOWrapper()
+{
+ if (m_inputFile != NULL)
+ delete m_inputFile;
+ if (m_nBestStream != NULL && !m_surpressSingleBestOutput) {
+ // outputting n-best to file, rather than stdout. need to close file and delete obj
+ delete m_nBestStream;
+ }
+ if (m_outputWordGraphStream != NULL) {
+ delete m_outputWordGraphStream;
+ }
+ if (m_outputSearchGraphStream != NULL) {
+ delete m_outputSearchGraphStream;
+ }
+ delete m_detailedTranslationReportingStream;
+ delete m_alignmentOutputStream;
+}
+
+void IOWrapper::Initialization(const std::vector<FactorType> &/*inputFactorOrder*/
+ , const std::vector<FactorType> &/*outputFactorOrder*/
+ , const FactorMask &/*inputFactorUsed*/
+ , size_t nBestSize
+ , const std::string &nBestFilePath)
+{
+ const StaticData &staticData = StaticData::Instance();
+
+ // n-best
+ m_surpressSingleBestOutput = false;
+
+ if (nBestSize > 0) {
+ if (nBestFilePath == "-" || nBestFilePath == "/dev/stdout") {
+ m_nBestStream = &std::cout;
+ m_surpressSingleBestOutput = true;
+ } else {
+ std::ofstream *file = new std::ofstream;
+ m_nBestStream = file;
+ file->open(nBestFilePath.c_str());
+ }
+ }
+
+ // wordgraph output
+ if (staticData.GetOutputWordGraph()) {
+ string fileName = staticData.GetParam("output-word-graph")[0];
+ std::ofstream *file = new std::ofstream;
+ m_outputWordGraphStream = file;
+ file->open(fileName.c_str());
+ }
+
+
+// search graph output
+ if (staticData.GetOutputSearchGraph()) {
+ string fileName;
+ if (staticData.GetOutputSearchGraphExtended())
+ fileName = staticData.GetParam("output-search-graph-extended")[0];
+ else
+ fileName = staticData.GetParam("output-search-graph")[0];
+ std::ofstream *file = new std::ofstream;
+ m_outputSearchGraphStream = file;
+ file->open(fileName.c_str());
+ }
+
+ // detailed translation reporting
+ if (staticData.IsDetailedTranslationReportingEnabled()) {
+ const std::string &path = staticData.GetDetailedTranslationReportingFilePath();
+ m_detailedTranslationReportingStream = new std::ofstream(path.c_str());
+ CHECK(m_detailedTranslationReportingStream->good());
+ }
+
+ // sentence alignment output
+ if (! staticData.GetAlignmentOutputFile().empty()) {
+ m_alignmentOutputStream = new ofstream(staticData.GetAlignmentOutputFile().c_str());
+ CHECK(m_alignmentOutputStream->good());
+ }
+
+}
+
+InputType*IOWrapper::GetInput(InputType* inputType)
+{
+ if(inputType->Read(*m_inputStream, m_inputFactorOrder)) {
+ if (long x = inputType->GetTranslationId()) {
+ if (x>=m_translationId) m_translationId = x+1;
+ } else inputType->SetTranslationId(m_translationId++);
+
+ return inputType;
+ } else {
+ delete inputType;
+ return NULL;
+ }
+}
+
+/***
+ * print surface factor only for the given phrase
+ */
+void OutputSurface(std::ostream &out, const Hypothesis &edge, const std::vector<FactorType> &outputFactorOrder,
+ bool reportSegmentation, bool reportAllFactors)
+{
+ CHECK(outputFactorOrder.size() > 0);
+ const Phrase& phrase = edge.GetCurrTargetPhrase();
+ if (reportAllFactors == true) {
+ out << phrase;
+ } else {
+ size_t size = phrase.GetSize();
+ for (size_t pos = 0 ; pos < size ; pos++) {
+ const Factor *factor = phrase.GetFactor(pos, outputFactorOrder[0]);
+ out << *factor;
+ CHECK(factor);
+
+ for (size_t i = 1 ; i < outputFactorOrder.size() ; i++) {
+ const Factor *factor = phrase.GetFactor(pos, outputFactorOrder[i]);
+ CHECK(factor);
+
+ out << "|" << *factor;
+ }
+ out << " ";
+ }
+ }
+
+ // trace option "-t"
+ if (reportSegmentation == true && phrase.GetSize() > 0) {
+ out << "|" << edge.GetCurrSourceWordsRange().GetStartPos()
+ << "-" << edge.GetCurrSourceWordsRange().GetEndPos() << "| ";
+ }
+}
+
+void OutputBestSurface(std::ostream &out, const Hypothesis *hypo, const std::vector<FactorType> &outputFactorOrder,
+ bool reportSegmentation, bool reportAllFactors)
+{
+ if (hypo != NULL) {
+ // recursively retrace this best path through the lattice, starting from the end of the hypothesis sentence
+ OutputBestSurface(out, hypo->GetPrevHypo(), outputFactorOrder, reportSegmentation, reportAllFactors);
+ OutputSurface(out, *hypo, outputFactorOrder, reportSegmentation, reportAllFactors);
+ }
+}
+
+void OutputAlignment(ostream &out, const AlignmentInfo &ai, size_t sourceOffset, size_t targetOffset)
+{
+ typedef std::vector< const std::pair<size_t,size_t>* > AlignVec;
+ AlignVec alignments = ai.GetSortedAlignments();
+
+ AlignVec::const_iterator it;
+ for (it = alignments.begin(); it != alignments.end(); ++it) {
+ const std::pair<size_t,size_t> &alignment = **it;
+ out << alignment.first + sourceOffset << "-" << alignment.second + targetOffset << " ";
+ }
+
+}
+
+void OutputAlignment(ostream &out, const vector<const Hypothesis *> &edges)
+{
+ size_t targetOffset = 0;
+
+ for (int currEdge = (int)edges.size() - 1 ; currEdge >= 0 ; currEdge--) {
+ const Hypothesis &edge = *edges[currEdge];
+ const TargetPhrase &tp = edge.GetCurrTargetPhrase();
+ size_t sourceOffset = edge.GetCurrSourceWordsRange().GetStartPos();
+
+ OutputAlignment(out, tp.GetAlignmentInfo(), sourceOffset, targetOffset);
+
+ targetOffset += tp.GetSize();
+ }
+ out << std::endl;
+}
+
+void OutputAlignment(OutputCollector* collector, size_t lineNo , const vector<const Hypothesis *> &edges)
+{
+ ostringstream out;
+ OutputAlignment(out, edges);
+
+ collector->Write(lineNo,out.str());
+}
+
+void OutputAlignment(OutputCollector* collector, size_t lineNo , const Hypothesis *hypo)
+{
+ if (collector) {
+ std::vector<const Hypothesis *> edges;
+ const Hypothesis *currentHypo = hypo;
+ while (currentHypo) {
+ edges.push_back(currentHypo);
+ currentHypo = currentHypo->GetPrevHypo();
+ }
+
+ OutputAlignment(collector,lineNo, edges);
+ }
+}
+
+void OutputAlignment(OutputCollector* collector, size_t lineNo , const TrellisPath &path)
+{
+ if (collector) {
+ OutputAlignment(collector,lineNo, path.GetEdges());
+ }
+}
+
+void OutputBestHypo(const Moses::TrellisPath &path, long /*translationId*/, bool reportSegmentation, bool reportAllFactors, std::ostream &out)
+{
+ const std::vector<const Hypothesis *> &edges = path.GetEdges();
+
+ for (int currEdge = (int)edges.size() - 1 ; currEdge >= 0 ; currEdge--) {
+ const Hypothesis &edge = *edges[currEdge];
+ OutputSurface(out, edge, StaticData::Instance().GetOutputFactorOrder(), reportSegmentation, reportAllFactors);
+ }
+ out << endl;
+}
+
+void IOWrapper::Backtrack(const Hypothesis *hypo)
+{
+
+ if (hypo->GetPrevHypo() != NULL) {
+ VERBOSE(3,hypo->GetId() << " <= ");
+ Backtrack(hypo->GetPrevHypo());
+ }
+}
+
+void OutputBestHypo(const std::vector<Word>& mbrBestHypo, long /*translationId*/, bool /*reportSegmentation*/, bool /*reportAllFactors*/, ostream& out)
+{
+
+ for (size_t i = 0 ; i < mbrBestHypo.size() ; i++) {
+ const Factor *factor = mbrBestHypo[i].GetFactor(StaticData::Instance().GetOutputFactorOrder()[0]);
+ CHECK(factor);
+ if (i>0) out << " " << *factor;
+ else out << *factor;
+ }
+ out << endl;
+}
+
+
+void OutputInput(std::vector<const Phrase*>& map, const Hypothesis* hypo)
+{
+ if (hypo->GetPrevHypo()) {
+ OutputInput(map, hypo->GetPrevHypo());
+ map[hypo->GetCurrSourceWordsRange().GetStartPos()] = hypo->GetSourcePhrase();
+ }
+}
+
+void OutputInput(std::ostream& os, const Hypothesis* hypo)
+{
+ size_t len = hypo->GetInput().GetSize();
+ std::vector<const Phrase*> inp_phrases(len, 0);
+ OutputInput(inp_phrases, hypo);
+ for (size_t i=0; i<len; ++i)
+ if (inp_phrases[i]) os << *inp_phrases[i];
+}
+
+void IOWrapper::OutputBestHypo(const Hypothesis *hypo, long /*translationId*/, bool reportSegmentation, bool reportAllFactors)
+{
+ if (hypo != NULL) {
+ VERBOSE(1,"BEST TRANSLATION: " << *hypo << endl);
+ VERBOSE(3,"Best path: ");
+ Backtrack(hypo);
+ VERBOSE(3,"0" << std::endl);
+ if (!m_surpressSingleBestOutput) {
+ if (StaticData::Instance().IsPathRecoveryEnabled()) {
+ OutputInput(cout, hypo);
+ cout << "||| ";
+ }
+ OutputBestSurface(cout, hypo, m_outputFactorOrder, reportSegmentation, reportAllFactors);
+ cout << endl;
+ }
+ } else {
+ VERBOSE(1, "NO BEST TRANSLATION" << endl);
+ if (!m_surpressSingleBestOutput) {
+ cout << endl;
+ }
+ }
+}
+
+void OutputNBest(std::ostream& out, const Moses::TrellisPathList &nBestList, const std::vector<Moses::FactorType>& outputFactorOrder, const TranslationSystem* system, long translationId, bool reportSegmentation)
+{
+ const StaticData &staticData = StaticData::Instance();
+ bool labeledOutput = staticData.IsLabeledNBestList();
+ bool reportAllFactors = staticData.GetReportAllFactorsNBest();
+ bool includeAlignment = staticData.NBestIncludesAlignment();
+ bool includeWordAlignment = staticData.PrintAlignmentInfoInNbest();
+
+ TrellisPathList::const_iterator iter;
+ for (iter = nBestList.begin() ; iter != nBestList.end() ; ++iter) {
+ const TrellisPath &path = **iter;
+ const std::vector<const Hypothesis *> &edges = path.GetEdges();
+
+ // print the surface factor of the translation
+ out << translationId << " ||| ";
+ for (int currEdge = (int)edges.size() - 1 ; currEdge >= 0 ; currEdge--) {
+ const Hypothesis &edge = *edges[currEdge];
+ OutputSurface(out, edge, outputFactorOrder, reportSegmentation, reportAllFactors);
+ }
+ out << " |||";
+
+ std::string lastName = "";
+ const vector<const StatefulFeatureFunction*>& sff = system->GetStatefulFeatureFunctions();
+ for( size_t i=0; i<sff.size(); i++ ) {
+ if( labeledOutput && lastName != sff[i]->GetScoreProducerWeightShortName() ) {
+ lastName = sff[i]->GetScoreProducerWeightShortName();
+ out << " " << lastName << ":";
+ }
+ vector<float> scores = path.GetScoreBreakdown().GetScoresForProducer( sff[i] );
+ for (size_t j = 0; j<scores.size(); ++j) {
+ out << " " << scores[j];
+ }
+ }
+
+ const vector<const StatelessFeatureFunction*>& slf = system->GetStatelessFeatureFunctions();
+ for( size_t i=0; i<slf.size(); i++ ) {
+ if( labeledOutput && lastName != slf[i]->GetScoreProducerWeightShortName() ) {
+ lastName = slf[i]->GetScoreProducerWeightShortName();
+ out << " " << lastName << ":";
+ }
+ vector<float> scores = path.GetScoreBreakdown().GetScoresForProducer( slf[i] );
+ for (size_t j = 0; j<scores.size(); ++j) {
+ out << " " << scores[j];
+ }
+ }
+
+ // translation components
+ const vector<PhraseDictionaryFeature*>& pds = system->GetPhraseDictionaries();
+ if (pds.size() > 0) {
+
+ for( size_t i=0; i<pds.size(); i++ ) {
+ size_t pd_numinputscore = pds[i]->GetNumInputScores();
+ vector<float> scores = path.GetScoreBreakdown().GetScoresForProducer( pds[i] );
+ for (size_t j = 0; j<scores.size(); ++j){
+
+ if (labeledOutput && (i == 0) ){
+ if ((j == 0) || (j == pd_numinputscore)){
+ lastName = pds[i]->GetScoreProducerWeightShortName(j);
+ out << " " << lastName << ":";
+ }
+ }
+ out << " " << scores[j];
+ }
+ }
+ }
+
+ // generation
+ const vector<GenerationDictionary*>& gds = system->GetGenerationDictionaries();
+ if (gds.size() > 0) {
+
+ for( size_t i=0; i<gds.size(); i++ ) {
+ size_t pd_numinputscore = gds[i]->GetNumInputScores();
+ vector<float> scores = path.GetScoreBreakdown().GetScoresForProducer( gds[i] );
+ for (size_t j = 0; j<scores.size(); ++j){
+
+ if (labeledOutput && (i == 0) ){
+ if ((j == 0) || (j == pd_numinputscore)){
+ lastName = gds[i]->GetScoreProducerWeightShortName(j);
+ out << " " << lastName << ":";
+ }
+ }
+ out << " " << scores[j];
+ }
+ }
+ }
+
+ // total
+ out << " ||| " << path.GetTotalScore();
+
+ //phrase-to-phrase alignment
+ if (includeAlignment) {
+ out << " |||";
+ for (int currEdge = (int)edges.size() - 2 ; currEdge >= 0 ; currEdge--) {
+ const Hypothesis &edge = *edges[currEdge];
+ const WordsRange &sourceRange = edge.GetCurrSourceWordsRange();
+ WordsRange targetRange = path.GetTargetWordsRange(edge);
+ out << " " << sourceRange.GetStartPos();
+ if (sourceRange.GetStartPos() < sourceRange.GetEndPos()) {
+ out << "-" << sourceRange.GetEndPos();
+ }
+ out<< "=" << targetRange.GetStartPos();
+ if (targetRange.GetStartPos() < targetRange.GetEndPos()) {
+ out<< "-" << targetRange.GetEndPos();
+ }
+ }
+ }
+
+ if (includeWordAlignment) {
+ out << " ||| ";
+ for (int currEdge = (int)edges.size() - 2 ; currEdge >= 0 ; currEdge--) {
+ const Hypothesis &edge = *edges[currEdge];
+ const WordsRange &sourceRange = edge.GetCurrSourceWordsRange();
+ WordsRange targetRange = path.GetTargetWordsRange(edge);
+ const int sourceOffset = sourceRange.GetStartPos();
+ const int targetOffset = targetRange.GetStartPos();
+ const AlignmentInfo &ai = edge.GetCurrTargetPhrase().GetAlignmentInfo();
+
+ OutputAlignment(out, ai, sourceOffset, targetOffset);
+
+ }
+ }
+
+ if (StaticData::Instance().IsPathRecoveryEnabled()) {
+ out << "|||";
+ OutputInput(out, edges[0]);
+ }
+
+ out << endl;
+ }
+
+
+ out <<std::flush;
+}
+
+void OutputLatticeMBRNBest(std::ostream& out, const vector<LatticeMBRSolution>& solutions,long translationId)
+{
+ for (vector<LatticeMBRSolution>::const_iterator si = solutions.begin(); si != solutions.end(); ++si) {
+ out << translationId;
+ out << " |||";
+ const vector<Word> mbrHypo = si->GetWords();
+ for (size_t i = 0 ; i < mbrHypo.size() ; i++) {
+ const Factor *factor = mbrHypo[i].GetFactor(StaticData::Instance().GetOutputFactorOrder()[0]);
+ if (i>0) out << " " << *factor;
+ else out << *factor;
+ }
+ out << " |||";
+ out << " map: " << si->GetMapScore();
+ out << " w: " << mbrHypo.size();
+ const vector<float>& ngramScores = si->GetNgramScores();
+ for (size_t i = 0; i < ngramScores.size(); ++i) {
+ out << " " << ngramScores[i];
+ }
+ out << " ||| " << si->GetScore();
+
+ out << endl;
+ }
+}
+
+
+void IOWrapper::OutputLatticeMBRNBestList(const vector<LatticeMBRSolution>& solutions,long translationId)
+{
+ OutputLatticeMBRNBest(*m_nBestStream, solutions,translationId);
+}
+
+bool ReadInput(IOWrapper &ioWrapper, InputTypeEnum inputType, InputType*& source)
+{
+ delete source;
+ switch(inputType) {
+ case SentenceInput:
+ source = ioWrapper.GetInput(new Sentence);
+ break;
+ case ConfusionNetworkInput:
+ source = ioWrapper.GetInput(new ConfusionNet);
+ break;
+ case WordLatticeInput:
+ source = ioWrapper.GetInput(new WordLattice);
+ break;
+ default:
+ TRACE_ERR("Unknown input type: " << inputType << "\n");
+ }
+ return (source ? true : false);
+}
+
+
+
+IOWrapper *GetIOWrapper(const StaticData &staticData)
+{
+ IOWrapper *ioWrapper;
+ const std::vector<FactorType> &inputFactorOrder = staticData.GetInputFactorOrder()
+ ,&outputFactorOrder = staticData.GetOutputFactorOrder();
+ FactorMask inputFactorUsed(inputFactorOrder);
+
+ // io
+ if (staticData.GetParam("input-file").size() == 1) {
+ VERBOSE(2,"IO from File" << endl);
+ string filePath = staticData.GetParam("input-file")[0];
+
+ ioWrapper = new IOWrapper(inputFactorOrder, outputFactorOrder, inputFactorUsed
+ , staticData.GetNBestSize()
+ , staticData.GetNBestFilePath()
+ , filePath);
+ } else {
+ VERBOSE(1,"IO from STDOUT/STDIN" << endl);
+ ioWrapper = new IOWrapper(inputFactorOrder, outputFactorOrder, inputFactorUsed
+ , staticData.GetNBestSize()
+ , staticData.GetNBestFilePath());
+ }
+ ioWrapper->ResetTranslationId();
+
+ IFVERBOSE(1)
+ PrintUserTime("Created input-output object");
+
+ return ioWrapper;
+}
+
+}
+
diff --git a/contrib/relent-filter/src/IOWrapper.h b/contrib/relent-filter/src/IOWrapper.h
new file mode 100755
index 000000000..e44208002
--- /dev/null
+++ b/contrib/relent-filter/src/IOWrapper.h
@@ -0,0 +1,142 @@
+// $Id$
+
+/***********************************************************************
+Moses - factored phrase-based language decoder
+Copyright (c) 2006 University of Edinburgh
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice,
+ this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+ this list of conditions and the following disclaimer in the documentation
+ and/or other materials provided with the distribution.
+ * Neither the name of the University of Edinburgh nor the names of its contributors
+ may be used to endorse or promote products derived from this software
+ without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+***********************************************************************/
+
+// example file on how to use moses library
+
+#ifndef moses_cmd_IOWrapper_h
+#define moses_cmd_IOWrapper_h
+
+#include <cassert>
+#include <fstream>
+#include <ostream>
+#include <vector>
+#include "util/check.hh"
+
+#include "TypeDef.h"
+#include "Sentence.h"
+#include "FactorTypeSet.h"
+#include "FactorCollection.h"
+#include "Hypothesis.h"
+#include "OutputCollector.h"
+#include "TrellisPathList.h"
+#include "InputFileStream.h"
+#include "InputType.h"
+#include "WordLattice.h"
+#include "LatticeMBR.h"
+
+namespace MosesCmd
+{
+
+/** Helper class that holds misc variables to write data out to command line.
+ */
+class IOWrapper
+{
+protected:
+ long m_translationId;
+
+ const std::vector<Moses::FactorType> &m_inputFactorOrder;
+ const std::vector<Moses::FactorType> &m_outputFactorOrder;
+ const Moses::FactorMask &m_inputFactorUsed;
+ std::string m_inputFilePath;
+ Moses::InputFileStream *m_inputFile;
+ std::istream *m_inputStream;
+ std::ostream *m_nBestStream
+ ,*m_outputWordGraphStream,*m_outputSearchGraphStream;
+ std::ostream *m_detailedTranslationReportingStream;
+ std::ofstream *m_alignmentOutputStream;
+ bool m_surpressSingleBestOutput;
+
+ void Initialization(const std::vector<Moses::FactorType> &inputFactorOrder
+ , const std::vector<Moses::FactorType> &outputFactorOrder
+ , const Moses::FactorMask &inputFactorUsed
+ , size_t nBestSize
+ , const std::string &nBestFilePath);
+
+public:
+ IOWrapper(const std::vector<Moses::FactorType> &inputFactorOrder
+ , const std::vector<Moses::FactorType> &outputFactorOrder
+ , const Moses::FactorMask &inputFactorUsed
+ , size_t nBestSize
+ , const std::string &nBestFilePath);
+
+ IOWrapper(const std::vector<Moses::FactorType> &inputFactorOrder
+ , const std::vector<Moses::FactorType> &outputFactorOrder
+ , const Moses::FactorMask &inputFactorUsed
+ , size_t nBestSize
+ , const std::string &nBestFilePath
+ , const std::string &infilePath);
+ ~IOWrapper();
+
+ Moses::InputType* GetInput(Moses::InputType *inputType);
+
+ void OutputBestHypo(const Moses::Hypothesis *hypo, long translationId, bool reportSegmentation, bool reportAllFactors);
+ void OutputLatticeMBRNBestList(const std::vector<LatticeMBRSolution>& solutions,long translationId);
+ void Backtrack(const Moses::Hypothesis *hypo);
+
+ void ResetTranslationId() {
+ m_translationId = 0;
+ }
+
+ std::ofstream *GetAlignmentOutputStream() {
+ return m_alignmentOutputStream;
+ }
+
+ std::ostream &GetOutputWordGraphStream() {
+ return *m_outputWordGraphStream;
+ }
+ std::ostream &GetOutputSearchGraphStream() {
+ return *m_outputSearchGraphStream;
+ }
+
+ std::ostream &GetDetailedTranslationReportingStream() {
+ assert (m_detailedTranslationReportingStream);
+ return *m_detailedTranslationReportingStream;
+ }
+};
+
+IOWrapper *GetIOWrapper(const Moses::StaticData &staticData);
+bool ReadInput(IOWrapper &ioWrapper, Moses::InputTypeEnum inputType, Moses::InputType*& source);
+void OutputBestSurface(std::ostream &out, const Moses::Hypothesis *hypo, const std::vector<Moses::FactorType> &outputFactorOrder, bool reportSegmentation, bool reportAllFactors);
+void OutputNBest(std::ostream& out, const Moses::TrellisPathList &nBestList, const std::vector<Moses::FactorType>&,
+ const Moses::TranslationSystem* system, long translationId, bool reportSegmentation);
+void OutputLatticeMBRNBest(std::ostream& out, const std::vector<LatticeMBRSolution>& solutions,long translationId);
+void OutputBestHypo(const std::vector<Moses::Word>& mbrBestHypo, long /*translationId*/,
+ bool reportSegmentation, bool reportAllFactors, std::ostream& out);
+void OutputBestHypo(const Moses::TrellisPath &path, long /*translationId*/,bool reportSegmentation, bool reportAllFactors, std::ostream &out);
+void OutputInput(std::ostream& os, const Moses::Hypothesis* hypo);
+void OutputAlignment(Moses::OutputCollector* collector, size_t lineNo, const Moses::Hypothesis *hypo);
+void OutputAlignment(Moses::OutputCollector* collector, size_t lineNo, const Moses::TrellisPath &path);
+
+
+}
+
+#endif
diff --git a/contrib/relent-filter/src/Jamfile b/contrib/relent-filter/src/Jamfile
new file mode 100755
index 000000000..c0aa6160d
--- /dev/null
+++ b/contrib/relent-filter/src/Jamfile
@@ -0,0 +1,6 @@
+alias deps : ../../../moses/src//moses ;
+
+exe calcDivergence : Main.cpp mbr.cpp IOWrapper.cpp TranslationAnalysis.cpp LatticeMBR.cpp RelativeEntropyCalc.cpp deps ;
+
+alias programs : calcDivergence ;
+
diff --git a/contrib/relent-filter/src/LatticeMBR.cpp b/contrib/relent-filter/src/LatticeMBR.cpp
new file mode 100755
index 000000000..2bd62747e
--- /dev/null
+++ b/contrib/relent-filter/src/LatticeMBR.cpp
@@ -0,0 +1,669 @@
+/*
+ * LatticeMBR.cpp
+ * moses-cmd
+ *
+ * Created by Abhishek Arun on 26/01/2010.
+ * Copyright 2010 __MyCompanyName__. All rights reserved.
+ *
+ */
+
+#include "LatticeMBR.h"
+#include "StaticData.h"
+#include <algorithm>
+#include <set>
+
+using namespace std;
+using namespace Moses;
+
+namespace MosesCmd
+{
+
+size_t bleu_order = 4;
+float UNKNGRAMLOGPROB = -20;
+void GetOutputWords(const TrellisPath &path, vector <Word> &translation)
+{
+ const std::vector<const Hypothesis *> &edges = path.GetEdges();
+
+ // print the surface factor of the translation
+ for (int currEdge = (int)edges.size() - 1 ; currEdge >= 0 ; currEdge--) {
+ const Hypothesis &edge = *edges[currEdge];
+ const Phrase &phrase = edge.GetCurrTargetPhrase();
+ size_t size = phrase.GetSize();
+ for (size_t pos = 0 ; pos < size ; pos++) {
+ translation.push_back(phrase.GetWord(pos));
+ }
+ }
+}
+
+
+void extract_ngrams(const vector<Word >& sentence, map < Phrase, int > & allngrams)
+{
+ for (int k = 0; k < (int)bleu_order; k++) {
+ for(int i =0; i < max((int)sentence.size()-k,0); i++) {
+ Phrase ngram( k+1);
+ for ( int j = i; j<= i+k; j++) {
+ ngram.AddWord(sentence[j]);
+ }
+ ++allngrams[ngram];
+ }
+ }
+}
+
+
+
+void NgramScores::addScore(const Hypothesis* node, const Phrase& ngram, float score)
+{
+ set<Phrase>::const_iterator ngramIter = m_ngrams.find(ngram);
+ if (ngramIter == m_ngrams.end()) {
+ ngramIter = m_ngrams.insert(ngram).first;
+ }
+ map<const Phrase*,float>& ngramScores = m_scores[node];
+ map<const Phrase*,float>::iterator scoreIter = ngramScores.find(&(*ngramIter));
+ if (scoreIter == ngramScores.end()) {
+ ngramScores[&(*ngramIter)] = score;
+ } else {
+ ngramScores[&(*ngramIter)] = log_sum(score,scoreIter->second);
+ }
+}
+
+NgramScores::NodeScoreIterator NgramScores::nodeBegin(const Hypothesis* node)
+{
+ return m_scores[node].begin();
+}
+
+
+NgramScores::NodeScoreIterator NgramScores::nodeEnd(const Hypothesis* node)
+{
+ return m_scores[node].end();
+}
+
+LatticeMBRSolution::LatticeMBRSolution(const TrellisPath& path, bool isMap) :
+ m_score(0.0f)
+{
+ const std::vector<const Hypothesis *> &edges = path.GetEdges();
+
+ for (int currEdge = (int)edges.size() - 1 ; currEdge >= 0 ; currEdge--) {
+ const Hypothesis &edge = *edges[currEdge];
+ const Phrase &phrase = edge.GetCurrTargetPhrase();
+ size_t size = phrase.GetSize();
+ for (size_t pos = 0 ; pos < size ; pos++) {
+ m_words.push_back(phrase.GetWord(pos));
+ }
+ }
+ if (isMap) {
+ m_mapScore = path.GetTotalScore();
+ } else {
+ m_mapScore = 0;
+ }
+}
+
+
+void LatticeMBRSolution::CalcScore(map<Phrase, float>& finalNgramScores, const vector<float>& thetas, float mapWeight)
+{
+ m_ngramScores.assign(thetas.size()-1, -10000);
+
+ map < Phrase, int > counts;
+ extract_ngrams(m_words,counts);
+
+ //Now score this translation
+ m_score = thetas[0] * m_words.size();
+
+ //Calculate the ngramScores, working in log space at first
+ for (map < Phrase, int >::iterator ngrams = counts.begin(); ngrams != counts.end(); ++ngrams) {
+ float ngramPosterior = UNKNGRAMLOGPROB;
+ map<Phrase,float>::const_iterator ngramPosteriorIt = finalNgramScores.find(ngrams->first);
+ if (ngramPosteriorIt != finalNgramScores.end()) {
+ ngramPosterior = ngramPosteriorIt->second;
+ }
+ size_t ngramSize = ngrams->first.GetSize();
+ m_ngramScores[ngramSize-1] = log_sum(log((float)ngrams->second) + ngramPosterior,m_ngramScores[ngramSize-1]);
+ }
+
+ //convert from log to probability and create weighted sum
+ for (size_t i = 0; i < m_ngramScores.size(); ++i) {
+ m_ngramScores[i] = exp(m_ngramScores[i]);
+ m_score += thetas[i+1] * m_ngramScores[i];
+ }
+
+
+ //The map score
+ m_score += m_mapScore*mapWeight;
+}
+
+
+void pruneLatticeFB(Lattice & connectedHyp, map < const Hypothesis*, set <const Hypothesis* > > & outgoingHyps, map<const Hypothesis*, vector<Edge> >& incomingEdges,
+ const vector< float> & estimatedScores, const Hypothesis* bestHypo, size_t edgeDensity, float scale)
+{
+
+ //Need hyp 0 in connectedHyp - Find empty hypothesis
+ VERBOSE(2,"Pruning lattice to edge density " << edgeDensity << endl);
+ const Hypothesis* emptyHyp = connectedHyp.at(0);
+ while (emptyHyp->GetId() != 0) {
+ emptyHyp = emptyHyp->GetPrevHypo();
+ }
+ connectedHyp.push_back(emptyHyp); //Add it to list of hyps
+
+ //Need hyp 0's outgoing Hyps
+ for (size_t i = 0; i < connectedHyp.size(); ++i) {
+ if (connectedHyp[i]->GetId() > 0 && connectedHyp[i]->GetPrevHypo()->GetId() == 0)
+ outgoingHyps[emptyHyp].insert(connectedHyp[i]);
+ }
+
+ //sort hyps based on estimated scores - do so by copying to multimap
+ multimap<float, const Hypothesis*> sortHypsByVal;
+ for (size_t i =0; i < estimatedScores.size(); ++i) {
+ sortHypsByVal.insert(make_pair(estimatedScores[i], connectedHyp[i]));
+ }
+
+ multimap<float, const Hypothesis*>::const_iterator it = --sortHypsByVal.end();
+ float bestScore = it->first;
+ //store best score as score of hyp 0
+ sortHypsByVal.insert(make_pair(bestScore, emptyHyp));
+
+
+ IFVERBOSE(3) {
+ for (multimap<float, const Hypothesis*>::const_iterator it = --sortHypsByVal.end(); it != --sortHypsByVal.begin(); --it) {
+ const Hypothesis* currHyp = it->second;
+ cerr << "Hyp " << currHyp->GetId() << ", estimated score: " << it->first << endl;
+ }
+ }
+
+
+ set <const Hypothesis*> survivingHyps; //store hyps that make the cut in this
+
+ VERBOSE(2, "BEST HYPO TARGET LENGTH : " << bestHypo->GetSize() << endl)
+ size_t numEdgesTotal = edgeDensity * bestHypo->GetSize(); //as per Shankar, aim for (density * target length of MAP solution) arcs
+ size_t numEdgesCreated = 0;
+ VERBOSE(2, "Target edge count: " << numEdgesTotal << endl);
+
+ float prevScore = -999999;
+
+ //now iterate over multimap
+ for (multimap<float, const Hypothesis*>::const_iterator it = --sortHypsByVal.end(); it != --sortHypsByVal.begin(); --it) {
+ float currEstimatedScore = it->first;
+ const Hypothesis* currHyp = it->second;
+
+ if (numEdgesCreated >= numEdgesTotal && prevScore > currEstimatedScore) //if this hyp has equal estimated score to previous, include its edges too
+ break;
+
+ prevScore = currEstimatedScore;
+ VERBOSE(3, "Num edges created : "<< numEdgesCreated << ", numEdges wanted " << numEdgesTotal << endl)
+ VERBOSE(3, "Considering hyp " << currHyp->GetId() << ", estimated score: " << it->first << endl)
+
+ survivingHyps.insert(currHyp); //CurrHyp made the cut
+
+ // is its best predecessor already included ?
+ if (survivingHyps.find(currHyp->GetPrevHypo()) != survivingHyps.end()) { //yes, then add an edge
+ vector <Edge>& edges = incomingEdges[currHyp];
+ Edge winningEdge(currHyp->GetPrevHypo(),currHyp,scale*(currHyp->GetScore() - currHyp->GetPrevHypo()->GetScore()),currHyp->GetCurrTargetPhrase());
+ edges.push_back(winningEdge);
+ ++numEdgesCreated;
+ }
+
+ //let's try the arcs too
+ const ArcList *arcList = currHyp->GetArcList();
+ if (arcList != NULL) {
+ ArcList::const_iterator iterArcList;
+ for (iterArcList = arcList->begin() ; iterArcList != arcList->end() ; ++iterArcList) {
+ const Hypothesis *loserHypo = *iterArcList;
+ const Hypothesis* loserPrevHypo = loserHypo->GetPrevHypo();
+ if (survivingHyps.find(loserPrevHypo) != survivingHyps.end()) { //found it, add edge
+ double arcScore = loserHypo->GetScore() - loserPrevHypo->GetScore();
+ Edge losingEdge(loserPrevHypo, currHyp, arcScore*scale, loserHypo->GetCurrTargetPhrase());
+ vector <Edge>& edges = incomingEdges[currHyp];
+ edges.push_back(losingEdge);
+ ++numEdgesCreated;
+ }
+ }
+ }
+
+ //Now if a successor node has already been visited, add an edge connecting the two
+ map < const Hypothesis*, set < const Hypothesis* > >::const_iterator outgoingIt = outgoingHyps.find(currHyp);
+
+ if (outgoingIt != outgoingHyps.end()) {//currHyp does have successors
+ const set<const Hypothesis*> & outHyps = outgoingIt->second; //the successors
+ for (set<const Hypothesis*>::const_iterator outHypIts = outHyps.begin(); outHypIts != outHyps.end(); ++outHypIts) {
+ const Hypothesis* succHyp = *outHypIts;
+
+ if (survivingHyps.find(succHyp) == survivingHyps.end()) //Have we encountered the successor yet?
+ continue; //No, move on to next
+
+ //Curr Hyp can be : a) the best predecessor of succ b) or an arc attached to succ
+ if (succHyp->GetPrevHypo() == currHyp) { //best predecessor
+ vector <Edge>& succEdges = incomingEdges[succHyp];
+ Edge succWinningEdge(currHyp, succHyp, scale*(succHyp->GetScore() - currHyp->GetScore()), succHyp->GetCurrTargetPhrase());
+ succEdges.push_back(succWinningEdge);
+ survivingHyps.insert(succHyp);
+ ++numEdgesCreated;
+ }
+
+ //now, let's find an arc
+ const ArcList *arcList = succHyp->GetArcList();
+ if (arcList != NULL) {
+ ArcList::const_iterator iterArcList;
+ //QUESTION: What happens if there's more than one loserPrevHypo?
+ for (iterArcList = arcList->begin() ; iterArcList != arcList->end() ; ++iterArcList) {
+ const Hypothesis *loserHypo = *iterArcList;
+ const Hypothesis* loserPrevHypo = loserHypo->GetPrevHypo();
+ if (loserPrevHypo == currHyp) { //found it
+ vector <Edge>& succEdges = incomingEdges[succHyp];
+ double arcScore = loserHypo->GetScore() - currHyp->GetScore();
+ Edge losingEdge(currHyp, succHyp,scale* arcScore, loserHypo->GetCurrTargetPhrase());
+ succEdges.push_back(losingEdge);
+ ++numEdgesCreated;
+ }
+ }
+ }
+ }
+ }
+ }
+
+ connectedHyp.clear();
+ for (set <const Hypothesis*>::iterator it = survivingHyps.begin(); it != survivingHyps.end(); ++it) {
+ connectedHyp.push_back(*it);
+ }
+
+ VERBOSE(2, "Done! Num edges created : "<< numEdgesCreated << ", numEdges wanted " << numEdgesTotal << endl)
+
+ IFVERBOSE(3) {
+ cerr << "Surviving hyps: " ;
+ for (set <const Hypothesis*>::iterator it = survivingHyps.begin(); it != survivingHyps.end(); ++it) {
+ cerr << (*it)->GetId() << " ";
+ }
+ cerr << endl;
+ }
+
+
+}
+
+void calcNgramExpectations(Lattice & connectedHyp, map<const Hypothesis*, vector<Edge> >& incomingEdges,
+ map<Phrase, float>& finalNgramScores, bool posteriors)
+{
+
+ sort(connectedHyp.begin(),connectedHyp.end(),ascendingCoverageCmp); //sort by increasing source word cov
+
+ /*cerr << "Lattice:" << endl;
+ for (Lattice::const_iterator i = connectedHyp.begin(); i != connectedHyp.end(); ++i) {
+ const Hypothesis* h = *i;
+ cerr << *h << endl;
+ const vector<Edge>& edges = incomingEdges[h];
+ for (size_t e = 0; e < edges.size(); ++e) {
+ cerr << edges[e];
+ }
+ }*/
+
+ map<const Hypothesis*, float> forwardScore;
+ forwardScore[connectedHyp[0]] = 0.0f; //forward score of hyp 0 is 1 (or 0 in logprob space)
+ set< const Hypothesis *> finalHyps; //store completed hyps
+
+ NgramScores ngramScores;//ngram scores for each hyp
+
+ for (size_t i = 1; i < connectedHyp.size(); ++i) {
+ const Hypothesis* currHyp = connectedHyp[i];
+ if (currHyp->GetWordsBitmap().IsComplete()) {
+ finalHyps.insert(currHyp);
+ }
+
+ VERBOSE(3, "Processing hyp: " << currHyp->GetId() << ", num words cov= " << currHyp->GetWordsBitmap().GetNumWordsCovered() << endl)
+
+ vector <Edge> & edges = incomingEdges[currHyp];
+ for (size_t e = 0; e < edges.size(); ++e) {
+ const Edge& edge = edges[e];
+ if (forwardScore.find(currHyp) == forwardScore.end()) {
+ forwardScore[currHyp] = forwardScore[edge.GetTailNode()] + edge.GetScore();
+ VERBOSE(3, "Fwd score["<<currHyp->GetId()<<"] = fwdScore["<<edge.GetTailNode()->GetId() << "] + edge Score: " << edge.GetScore() << endl)
+ } else {
+ forwardScore[currHyp] = log_sum(forwardScore[currHyp], forwardScore[edge.GetTailNode()] + edge.GetScore());
+ VERBOSE(3, "Fwd score["<<currHyp->GetId()<<"] += fwdScore["<<edge.GetTailNode()->GetId() << "] + edge Score: " << edge.GetScore() << endl)
+ }
+ }
+
+ //Process ngrams now
+ for (size_t j =0 ; j < edges.size(); ++j) {
+ Edge& edge = edges[j];
+ const NgramHistory & incomingPhrases = edge.GetNgrams(incomingEdges);
+
+ //let's first score ngrams introduced by this edge
+ for (NgramHistory::const_iterator it = incomingPhrases.begin(); it != incomingPhrases.end(); ++it) {
+ const Phrase& ngram = it->first;
+ const PathCounts& pathCounts = it->second;
+ VERBOSE(4, "Calculating score for: " << it->first << endl)
+
+ for (PathCounts::const_iterator pathCountIt = pathCounts.begin(); pathCountIt != pathCounts.end(); ++pathCountIt) {
+ //Score of an n-gram is forward score of head node of leftmost edge + all edge scores
+ const Path& path = pathCountIt->first;
+ //cerr << "path count for " << ngram << " is " << pathCountIt->second << endl;
+ float score = forwardScore[path[0]->GetTailNode()];
+ for (size_t i = 0; i < path.size(); ++i) {
+ score += path[i]->GetScore();
+ }
+ //if we're doing expectations, then the number of times the ngram
+ //appears on the path is relevant.
+ size_t count = posteriors ? 1 : pathCountIt->second;
+ for (size_t k = 0; k < count; ++k) {
+ ngramScores.addScore(currHyp,ngram,score);
+ }
+ }
+ }
+
+ //Now score ngrams that are just being propagated from the history
+ for (NgramScores::NodeScoreIterator it = ngramScores.nodeBegin(edge.GetTailNode());
+ it != ngramScores.nodeEnd(edge.GetTailNode()); ++it) {
+ const Phrase & currNgram = *(it->first);
+ float currNgramScore = it->second;
+ VERBOSE(4, "Calculating score for: " << currNgram << endl)
+
+ // For posteriors, don't double count ngrams
+ if (!posteriors || incomingPhrases.find(currNgram) == incomingPhrases.end()) {
+ float score = edge.GetScore() + currNgramScore;
+ ngramScores.addScore(currHyp,currNgram,score);
+ }
+ }
+
+ }
+ }
+
+ float Z = 9999999; //the total score of the lattice
+
+ //Done - Print out ngram posteriors for final hyps
+ for (set< const Hypothesis *>::iterator finalHyp = finalHyps.begin(); finalHyp != finalHyps.end(); ++finalHyp) {
+ const Hypothesis* hyp = *finalHyp;
+
+ for (NgramScores::NodeScoreIterator it = ngramScores.nodeBegin(hyp); it != ngramScores.nodeEnd(hyp); ++it) {
+ const Phrase& ngram = *(it->first);
+ if (finalNgramScores.find(ngram) == finalNgramScores.end()) {
+ finalNgramScores[ngram] = it->second;
+ } else {
+ finalNgramScores[ngram] = log_sum(it->second, finalNgramScores[ngram]);
+ }
+ }
+
+ if (Z == 9999999) {
+ Z = forwardScore[hyp];
+ } else {
+ Z = log_sum(Z, forwardScore[hyp]);
+ }
+ }
+
+ //Z *= scale; //scale the score
+
+ for (map<Phrase, float>::iterator finalScoresIt = finalNgramScores.begin(); finalScoresIt != finalNgramScores.end(); ++finalScoresIt) {
+ finalScoresIt->second = finalScoresIt->second - Z;
+ IFVERBOSE(2) {
+ VERBOSE(2,finalScoresIt->first << " [" << finalScoresIt->second << "]" << endl);
+ }
+ }
+
+}
+
+const NgramHistory& Edge::GetNgrams(map<const Hypothesis*, vector<Edge> > & incomingEdges)
+{
+
+ if (m_ngrams.size() > 0)
+ return m_ngrams;
+
+ const Phrase& currPhrase = GetWords();
+ //Extract the n-grams local to this edge
+ for (size_t start = 0; start < currPhrase.GetSize(); ++start) {
+ for (size_t end = start; end < start + bleu_order; ++end) {
+ if (end < currPhrase.GetSize()) {
+ Phrase edgeNgram(end-start+1);
+ for (size_t index = start; index <= end; ++index) {
+ edgeNgram.AddWord(currPhrase.GetWord(index));
+ }
+ //cout << "Inserting Phrase : " << edgeNgram << endl;
+ vector<const Edge*> edgeHistory;
+ edgeHistory.push_back(this);
+ storeNgramHistory(edgeNgram, edgeHistory);
+ } else {
+ break;
+ }
+ }
+ }
+
+ map<const Hypothesis*, vector<Edge> >::iterator it = incomingEdges.find(m_tailNode);
+ if (it != incomingEdges.end()) { //node has incoming edges
+ vector<Edge> & inEdges = it->second;
+
+ for (vector<Edge>::iterator edge = inEdges.begin(); edge != inEdges.end(); ++edge) {//add the ngrams straddling prev and curr edge
+ const NgramHistory & edgeIncomingNgrams = edge->GetNgrams(incomingEdges);
+ for (NgramHistory::const_iterator edgeInNgramHist = edgeIncomingNgrams.begin(); edgeInNgramHist != edgeIncomingNgrams.end(); ++edgeInNgramHist) {
+ const Phrase& edgeIncomingNgram = edgeInNgramHist->first;
+ const PathCounts & edgeIncomingNgramPaths = edgeInNgramHist->second;
+ size_t back = min(edgeIncomingNgram.GetSize(), edge->GetWordsSize());
+ const Phrase& edgeWords = edge->GetWords();
+ IFVERBOSE(3) {
+ cerr << "Edge: "<< *edge <<endl;
+ cerr << "edgeWords: " << edgeWords << endl;
+ cerr << "edgeInNgram: " << edgeIncomingNgram << endl;
+ }
+
+ Phrase edgeSuffix(ARRAY_SIZE_INCR);
+ Phrase ngramSuffix(ARRAY_SIZE_INCR);
+ GetPhraseSuffix(edgeWords,back,edgeSuffix);
+ GetPhraseSuffix(edgeIncomingNgram,back,ngramSuffix);
+
+ if (ngramSuffix == edgeSuffix) { //we've got the suffix of previous edge
+ size_t edgeInNgramSize = edgeIncomingNgram.GetSize();
+
+ for (size_t i = 0; i < GetWordsSize() && i + edgeInNgramSize < bleu_order ; ++i) {
+ Phrase newNgram(edgeIncomingNgram);
+ for (size_t j = 0; j <= i ; ++j) {
+ newNgram.AddWord(GetWords().GetWord(j));
+ }
+ VERBOSE(3, "Inserting New Phrase : " << newNgram << endl)
+
+ for (PathCounts::const_iterator pathIt = edgeIncomingNgramPaths.begin(); pathIt != edgeIncomingNgramPaths.end(); ++pathIt) {
+ Path newNgramPath = pathIt->first;
+ newNgramPath.push_back(this);
+ storeNgramHistory(newNgram, newNgramPath, pathIt->second);
+ }
+ }
+ }
+ }
+ }
+ }
+ return m_ngrams;
+}
+
+//Add the last lastN words of origPhrase to targetPhrase
+void Edge::GetPhraseSuffix(const Phrase& origPhrase, size_t lastN, Phrase& targetPhrase) const
+{
+ size_t origSize = origPhrase.GetSize();
+ size_t startIndex = origSize - lastN;
+ for (size_t index = startIndex; index < origPhrase.GetSize(); ++index) {
+ targetPhrase.AddWord(origPhrase.GetWord(index));
+ }
+}
+
+bool Edge::operator< (const Edge& compare ) const
+{
+ if (m_headNode->GetId() < compare.m_headNode->GetId())
+ return true;
+ if (compare.m_headNode->GetId() < m_headNode->GetId())
+ return false;
+ if (m_tailNode->GetId() < compare.m_tailNode->GetId())
+ return true;
+ if (compare.m_tailNode->GetId() < m_tailNode->GetId())
+ return false;
+ return GetScore() < compare.GetScore();
+}
+
+ostream& operator<< (ostream& out, const Edge& edge)
+{
+ out << "Head: " << edge.m_headNode->GetId() << ", Tail: " << edge.m_tailNode->GetId() << ", Score: " << edge.m_score << ", Phrase: " << edge.m_targetPhrase << endl;
+ return out;
+}
+
+bool ascendingCoverageCmp(const Hypothesis* a, const Hypothesis* b)
+{
+ return a->GetWordsBitmap().GetNumWordsCovered() < b->GetWordsBitmap().GetNumWordsCovered();
+}
+
+void getLatticeMBRNBest(Manager& manager, TrellisPathList& nBestList,
+ vector<LatticeMBRSolution>& solutions, size_t n)
+{
+ const StaticData& staticData = StaticData::Instance();
+ std::map < int, bool > connected;
+ std::vector< const Hypothesis *> connectedList;
+ map<Phrase, float> ngramPosteriors;
+ std::map < const Hypothesis*, set <const Hypothesis*> > outgoingHyps;
+ map<const Hypothesis*, vector<Edge> > incomingEdges;
+ vector< float> estimatedScores;
+ manager.GetForwardBackwardSearchGraph(&connected, &connectedList, &outgoingHyps, &estimatedScores);
+ pruneLatticeFB(connectedList, outgoingHyps, incomingEdges, estimatedScores, manager.GetBestHypothesis(), staticData.GetLatticeMBRPruningFactor(),staticData.GetMBRScale());
+ calcNgramExpectations(connectedList, incomingEdges, ngramPosteriors,true);
+
+ vector<float> mbrThetas = staticData.GetLatticeMBRThetas();
+ float p = staticData.GetLatticeMBRPrecision();
+ float r = staticData.GetLatticeMBRPRatio();
+ float mapWeight = staticData.GetLatticeMBRMapWeight();
+ if (mbrThetas.size() == 0) { //thetas not specified on the command line, use p and r instead
+ mbrThetas.push_back(-1); //Theta 0
+ mbrThetas.push_back(1/(bleu_order*p));
+ for (size_t i = 2; i <= bleu_order; ++i) {
+ mbrThetas.push_back(mbrThetas[i-1] / r);
+ }
+ }
+ IFVERBOSE(2) {
+ VERBOSE(2,"Thetas: ");
+ for (size_t i = 0; i < mbrThetas.size(); ++i) {
+ VERBOSE(2,mbrThetas[i] << " ");
+ }
+ VERBOSE(2,endl);
+ }
+ TrellisPathList::const_iterator iter;
+ size_t ctr = 0;
+ LatticeMBRSolutionComparator comparator;
+ for (iter = nBestList.begin() ; iter != nBestList.end() ; ++iter, ++ctr) {
+ const TrellisPath &path = **iter;
+ solutions.push_back(LatticeMBRSolution(path,iter==nBestList.begin()));
+ solutions.back().CalcScore(ngramPosteriors,mbrThetas,mapWeight);
+ sort(solutions.begin(), solutions.end(), comparator);
+ while (solutions.size() > n) {
+ solutions.pop_back();
+ }
+ }
+ VERBOSE(2,"LMBR Score: " << solutions[0].GetScore() << endl);
+}
+
+vector<Word> doLatticeMBR(Manager& manager, TrellisPathList& nBestList)
+{
+
+ vector<LatticeMBRSolution> solutions;
+ getLatticeMBRNBest(manager, nBestList, solutions,1);
+ return solutions.at(0).GetWords();
+}
+
+const TrellisPath doConsensusDecoding(Manager& manager, TrellisPathList& nBestList)
+{
+ static const int BLEU_ORDER = 4;
+ static const float SMOOTH = 1;
+
+ //calculate the ngram expectations
+ const StaticData& staticData = StaticData::Instance();
+ std::map < int, bool > connected;
+ std::vector< const Hypothesis *> connectedList;
+ map<Phrase, float> ngramExpectations;
+ std::map < const Hypothesis*, set <const Hypothesis*> > outgoingHyps;
+ map<const Hypothesis*, vector<Edge> > incomingEdges;
+ vector< float> estimatedScores;
+ manager.GetForwardBackwardSearchGraph(&connected, &connectedList, &outgoingHyps, &estimatedScores);
+ pruneLatticeFB(connectedList, outgoingHyps, incomingEdges, estimatedScores, manager.GetBestHypothesis(), staticData.GetLatticeMBRPruningFactor(),staticData.GetMBRScale());
+ calcNgramExpectations(connectedList, incomingEdges, ngramExpectations,false);
+
+ //expected length is sum of expected unigram counts
+ //cerr << "Thread " << pthread_self() << " Ngram expectations size: " << ngramExpectations.size() << endl;
+ float ref_length = 0.0f;
+ for (map<Phrase,float>::const_iterator ref_iter = ngramExpectations.begin();
+ ref_iter != ngramExpectations.end(); ++ref_iter) {
+ //cerr << "Ngram: " << ref_iter->first << " score: " <<
+ // ref_iter->second << endl;
+ if (ref_iter->first.GetSize() == 1) {
+ ref_length += exp(ref_iter->second);
+ // cerr << "Expected for " << ref_iter->first << " is " << exp(ref_iter->second) << endl;
+ }
+ }
+
+ VERBOSE(2,"REF Length: " << ref_length << endl);
+
+ //use the ngram expectations to rescore the nbest list.
+ TrellisPathList::const_iterator iter;
+ TrellisPathList::const_iterator best = nBestList.end();
+ float bestScore = -100000;
+ //cerr << "nbest list size: " << nBestList.GetSize() << endl;
+ for (iter = nBestList.begin() ; iter != nBestList.end() ; ++iter) {
+ const TrellisPath &path = **iter;
+ vector<Word> words;
+ map<Phrase,int> ngrams;
+ GetOutputWords(path,words);
+ /*for (size_t i = 0; i < words.size(); ++i) {
+ cerr << words[i].GetFactor(0)->GetString() << " ";
+ }
+ cerr << endl;
+ */
+ extract_ngrams(words,ngrams);
+
+ vector<float> comps(2*BLEU_ORDER+1);
+ float logbleu = 0.0;
+ float brevity = 0.0;
+ int hyp_length = words.size();
+ for (int i = 0; i < BLEU_ORDER; ++i) {
+ comps[2*i] = 0.0;
+ comps[2*i+1] = max(hyp_length-i,0);
+ }
+
+ for (map<Phrase,int>::const_iterator hyp_iter = ngrams.begin();
+ hyp_iter != ngrams.end(); ++hyp_iter) {
+ map<Phrase,float>::const_iterator ref_iter = ngramExpectations.find(hyp_iter->first);
+ if (ref_iter != ngramExpectations.end()) {
+ comps[2*(hyp_iter->first.GetSize()-1)] += min(exp(ref_iter->second), (float)(hyp_iter->second));
+ }
+
+ }
+ comps[comps.size()-1] = ref_length;
+ /*for (size_t i = 0; i < comps.size(); ++i) {
+ cerr << comps[i] << " ";
+ }
+ cerr << endl;
+ */
+
+ float score = 0.0f;
+ if (comps[0] != 0) {
+ for (int i=0; i<BLEU_ORDER; i++) {
+ if ( i > 0 ) {
+ logbleu += log((float)comps[2*i]+SMOOTH)-log((float)comps[2*i+1]+SMOOTH);
+ } else {
+ logbleu += log((float)comps[2*i])-log((float)comps[2*i+1]);
+ }
+ }
+ logbleu /= BLEU_ORDER;
+ brevity = 1.0-(float)comps[comps.size()-1]/comps[1]; // comps[comps_n-1] is the ref length, comps[1] is the test length
+ if (brevity < 0.0) {
+ logbleu += brevity;
+ }
+ score = exp(logbleu);
+ }
+
+ //cerr << "score: " << score << " bestScore: " << bestScore << endl;
+ if (score > bestScore) {
+ bestScore = score;
+ best = iter;
+ VERBOSE(2,"NEW BEST: " << score << endl);
+ //for (size_t i = 0; i < comps.size(); ++i) {
+ // cerr << comps[i] << " ";
+ //}
+ //cerr << endl;
+ }
+ }
+
+ assert (best != nBestList.end());
+ return **best;
+ //vector<Word> bestWords;
+ //GetOutputWords(**best,bestWords);
+ //return bestWords;
+}
+
+}
+
+
diff --git a/contrib/relent-filter/src/LatticeMBR.h b/contrib/relent-filter/src/LatticeMBR.h
new file mode 100755
index 000000000..14a2e22da
--- /dev/null
+++ b/contrib/relent-filter/src/LatticeMBR.h
@@ -0,0 +1,153 @@
+/*
+ * LatticeMBR.h
+ * moses-cmd
+ *
+ * Created by Abhishek Arun on 26/01/2010.
+ * Copyright 2010 __MyCompanyName__. All rights reserved.
+ *
+ */
+
+#ifndef moses_cmd_LatticeMBR_h
+#define moses_cmd_LatticeMBR_h
+
+#include <map>
+#include <vector>
+#include <set>
+#include "Hypothesis.h"
+#include "Manager.h"
+#include "TrellisPathList.h"
+
+
+
+namespace MosesCmd
+{
+
+class Edge;
+
+typedef std::vector< const Moses::Hypothesis *> Lattice;
+typedef std::vector<const Edge*> Path;
+typedef std::map<Path, size_t> PathCounts;
+typedef std::map<Moses::Phrase, PathCounts > NgramHistory;
+
+class Edge
+{
+ const Moses::Hypothesis* m_tailNode;
+ const Moses::Hypothesis* m_headNode;
+ float m_score;
+ Moses::TargetPhrase m_targetPhrase;
+ NgramHistory m_ngrams;
+
+public:
+ Edge(const Moses::Hypothesis* from, const Moses::Hypothesis* to, float score, const Moses::TargetPhrase& targetPhrase) : m_tailNode(from), m_headNode(to), m_score(score), m_targetPhrase(targetPhrase) {
+ //cout << "Creating new edge from Node " << from->GetId() << ", to Node : " << to->GetId() << ", score: " << score << " phrase: " << targetPhrase << endl;
+ }
+
+ const Moses::Hypothesis* GetHeadNode() const {
+ return m_headNode;
+ }
+
+ const Moses::Hypothesis* GetTailNode() const {
+ return m_tailNode;
+ }
+
+ float GetScore() const {
+ return m_score;
+ }
+
+ size_t GetWordsSize() const {
+ return m_targetPhrase.GetSize();
+ }
+
+ const Moses::Phrase& GetWords() const {
+ return m_targetPhrase;
+ }
+
+ friend std::ostream& operator<< (std::ostream& out, const Edge& edge);
+
+ const NgramHistory& GetNgrams( std::map<const Moses::Hypothesis*, std::vector<Edge> > & incomingEdges) ;
+
+ bool operator < (const Edge & compare) const;
+
+ void GetPhraseSuffix(const Moses::Phrase& origPhrase, size_t lastN, Moses::Phrase& targetPhrase) const;
+
+ void storeNgramHistory(const Moses::Phrase& phrase, Path & path, size_t count = 1) {
+ m_ngrams[phrase][path]+= count;
+ }
+
+};
+
+/**
+* Data structure to hold the ngram scores as we traverse the lattice. Maps (hypo,ngram) to score
+*/
+class NgramScores
+{
+public:
+ NgramScores() {}
+
+ /** logsum this score to the existing score */
+ void addScore(const Moses::Hypothesis* node, const Moses::Phrase& ngram, float score);
+
+ /** Iterate through ngrams for selected node */
+ typedef std::map<const Moses::Phrase*, float>::const_iterator NodeScoreIterator;
+ NodeScoreIterator nodeBegin(const Moses::Hypothesis* node);
+ NodeScoreIterator nodeEnd(const Moses::Hypothesis* node);
+
+private:
+ std::set<Moses::Phrase> m_ngrams;
+ std::map<const Moses::Hypothesis*, std::map<const Moses::Phrase*, float> > m_scores;
+};
+
+
+/** Holds a lattice mbr solution, and its scores */
+class LatticeMBRSolution
+{
+public:
+ /** Read the words from the path */
+ LatticeMBRSolution(const Moses::TrellisPath& path, bool isMap);
+ const std::vector<float>& GetNgramScores() const {
+ return m_ngramScores;
+ }
+ const std::vector<Moses::Word>& GetWords() const {
+ return m_words;
+ }
+ float GetMapScore() const {
+ return m_mapScore;
+ }
+ float GetScore() const {
+ return m_score;
+ }
+
+ /** Initialise ngram scores */
+ void CalcScore(std::map<Moses::Phrase, float>& finalNgramScores, const std::vector<float>& thetas, float mapWeight);
+
+private:
+ std::vector<Moses::Word> m_words;
+ float m_mapScore;
+ std::vector<float> m_ngramScores;
+ float m_score;
+};
+
+struct LatticeMBRSolutionComparator {
+ bool operator()(const LatticeMBRSolution& a, const LatticeMBRSolution& b) {
+ return a.GetScore() > b.GetScore();
+ }
+};
+
+void pruneLatticeFB(Lattice & connectedHyp, std::map < const Moses::Hypothesis*, std::set <const Moses::Hypothesis* > > & outgoingHyps, std::map<const Moses::Hypothesis*, std::vector<Edge> >& incomingEdges,
+ const std::vector< float> & estimatedScores, const Moses::Hypothesis*, size_t edgeDensity,float scale);
+
+//Use the ngram scores to rerank the nbest list, return at most n solutions
+void getLatticeMBRNBest(Moses::Manager& manager, Moses::TrellisPathList& nBestList, std::vector<LatticeMBRSolution>& solutions, size_t n);
+//calculate expectated ngram counts, clipping at 1 (ie calculating posteriors) if posteriors==true.
+void calcNgramExpectations(Lattice & connectedHyp, std::map<const Moses::Hypothesis*, std::vector<Edge> >& incomingEdges, std::map<Moses::Phrase,
+ float>& finalNgramScores, bool posteriors);
+void GetOutputFactors(const Moses::TrellisPath &path, std::vector <Moses::Word> &translation);
+void extract_ngrams(const std::vector<Moses::Word >& sentence, std::map < Moses::Phrase, int > & allngrams);
+bool ascendingCoverageCmp(const Moses::Hypothesis* a, const Moses::Hypothesis* b);
+std::vector<Moses::Word> doLatticeMBR(Moses::Manager& manager, Moses::TrellisPathList& nBestList);
+const Moses::TrellisPath doConsensusDecoding(Moses::Manager& manager, Moses::TrellisPathList& nBestList);
+//std::vector<Moses::Word> doConsensusDecoding(Moses::Manager& manager, Moses::TrellisPathList& nBestList);
+
+}
+
+#endif
diff --git a/contrib/relent-filter/src/LatticeMBRGrid.cpp b/contrib/relent-filter/src/LatticeMBRGrid.cpp
new file mode 100755
index 000000000..71c387839
--- /dev/null
+++ b/contrib/relent-filter/src/LatticeMBRGrid.cpp
@@ -0,0 +1,213 @@
+// $Id: LatticeMBRGrid.cpp 3045 2010-04-05 13:07:29Z hieuhoang1972 $
+
+/***********************************************************************
+Moses - factored phrase-based language decoder
+Copyright (c) 2010 University of Edinburgh
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice,
+ this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+ this list of conditions and the following disclaimer in the documentation
+ and/or other materials provided with the distribution.
+ * Neither the name of the University of Edinburgh nor the names of its contributors
+ may be used to endorse or promote products derived from this software
+ without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+***********************************************************************/
+/**
+* Lattice MBR grid search. Enables a grid search through the four parameters (p,r,scale and prune) used in lattice MBR.
+ See 'Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation by Tromble, Kumar, Och and Macherey,
+ EMNLP 2008 for details of the parameters.
+
+ The grid search is controlled by specifying comma separated lists for the lmbr parameters (-lmbr-p, -lmbr-r,
+ -lmbr-pruning-factor and -mbr-scale). All other parameters are passed through to moses. If any of the lattice mbr
+ parameters are missing, then they are set to their default values. Output is of the form:
+ sentence-id ||| p r prune scale ||| translation-hypothesis
+**/
+
+#include <cstdlib>
+#include <iostream>
+#include <map>
+#include <stdexcept>
+#include <set>
+
+#include "IOWrapper.h"
+#include "LatticeMBR.h"
+#include "Manager.h"
+#include "StaticData.h"
+
+
+using namespace std;
+using namespace Moses;
+using namespace MosesCmd;
+
+//keys
+enum gridkey {lmbr_p,lmbr_r,lmbr_prune,lmbr_scale};
+
+namespace MosesCmd
+{
+
+class Grid
+{
+public:
+ /** Add a parameter with key, command line argument, and default value */
+ void addParam(gridkey key, const string& arg, float defaultValue) {
+ m_args[arg] = key;
+ CHECK(m_grid.find(key) == m_grid.end());
+ m_grid[key].push_back(defaultValue);
+ }
+
+ /** Parse the arguments, removing those that define the grid and returning a copy of the rest */
+ void parseArgs(int& argc, char**& argv) {
+ char** newargv = new char*[argc+1]; //Space to add mbr parameter
+ int newargc = 0;
+ for (int i = 0; i < argc; ++i) {
+ bool consumed = false;
+ for (map<string,gridkey>::const_iterator argi = m_args.begin(); argi != m_args.end(); ++argi) {
+ if (!strcmp(argv[i], argi->first.c_str())) {
+ ++i;
+ if (i >= argc) {
+ cerr << "Error: missing parameter for " << argi->first << endl;
+ throw runtime_error("Missing parameter");
+ } else {
+ string value = argv[i];
+ gridkey key = argi->second;
+ if (m_grid[key].size() != 1) {
+ throw runtime_error("Duplicate grid argument");
+ }
+ m_grid[key].clear();
+ char delim = ',';
+ string::size_type lastpos = value.find_first_not_of(delim);
+ string::size_type pos = value.find_first_of(delim,lastpos);
+ while (string::npos != pos || string::npos != lastpos) {
+ float param = atof(value.substr(lastpos, pos-lastpos).c_str());
+ if (!param) {
+ cerr << "Error: Illegal grid parameter for " << argi->first << endl;
+ throw runtime_error("Illegal grid parameter");
+ }
+ m_grid[key].push_back(param);
+ lastpos = value.find_first_not_of(delim,pos);
+ pos = value.find_first_of(delim,lastpos);
+ }
+ consumed = true;
+ }
+ if (consumed) break;
+ }
+ }
+ if (!consumed) {
+ newargv[newargc] = new char[strlen(argv[i]) + 1];
+ strcpy(newargv[newargc],argv[i]);
+ ++newargc;
+ }
+ }
+ argc = newargc;
+ argv = newargv;
+ }
+
+ /** Get the grid for a particular key.*/
+ const vector<float>& getGrid(gridkey key) const {
+ map<gridkey,vector<float> >::const_iterator iter = m_grid.find(key);
+ assert (iter != m_grid.end());
+ return iter->second;
+
+ }
+
+private:
+ map<gridkey,vector<float> > m_grid;
+ map<string,gridkey> m_args;
+};
+
+} // namespace
+
+int main(int argc, char* argv[])
+{
+ cerr << "Lattice MBR Grid search" << endl;
+
+ Grid grid;
+ grid.addParam(lmbr_p, "-lmbr-p", 0.5);
+ grid.addParam(lmbr_r, "-lmbr-r", 0.5);
+ grid.addParam(lmbr_prune, "-lmbr-pruning-factor",30.0);
+ grid.addParam(lmbr_scale, "-mbr-scale",1.0);
+
+ grid.parseArgs(argc,argv);
+
+ Parameter* params = new Parameter();
+ if (!params->LoadParam(argc,argv)) {
+ params->Explain();
+ exit(1);
+ }
+ if (!StaticData::LoadDataStatic(params, argv[0])) {
+ exit(1);
+ }
+
+ StaticData& staticData = const_cast<StaticData&>(StaticData::Instance());
+ staticData.SetUseLatticeMBR(true);
+ IOWrapper* ioWrapper = GetIOWrapper(staticData);
+
+ if (!ioWrapper) {
+ throw runtime_error("Failed to initialise IOWrapper");
+ }
+ size_t nBestSize = staticData.GetMBRSize();
+
+ if (nBestSize <= 0) {
+ throw new runtime_error("Non-positive size specified for n-best list");
+ }
+
+ size_t lineCount = 0;
+ InputType* source = NULL;
+
+ const vector<float>& pgrid = grid.getGrid(lmbr_p);
+ const vector<float>& rgrid = grid.getGrid(lmbr_r);
+ const vector<float>& prune_grid = grid.getGrid(lmbr_prune);
+ const vector<float>& scale_grid = grid.getGrid(lmbr_scale);
+
+ while(ReadInput(*ioWrapper,staticData.GetInputType(),source)) {
+ ++lineCount;
+ Sentence sentence;
+ const TranslationSystem& system = staticData.GetTranslationSystem(TranslationSystem::DEFAULT);
+ Manager manager(*source,staticData.GetSearchAlgorithm(), &system);
+ manager.ProcessSentence();
+ TrellisPathList nBestList;
+ manager.CalcNBest(nBestSize, nBestList,true);
+ //grid search
+ for (vector<float>::const_iterator pi = pgrid.begin(); pi != pgrid.end(); ++pi) {
+ float p = *pi;
+ staticData.SetLatticeMBRPrecision(p);
+ for (vector<float>::const_iterator ri = rgrid.begin(); ri != rgrid.end(); ++ri) {
+ float r = *ri;
+ staticData.SetLatticeMBRPRatio(r);
+ for (vector<float>::const_iterator prune_i = prune_grid.begin(); prune_i != prune_grid.end(); ++prune_i) {
+ size_t prune = (size_t)(*prune_i);
+ staticData.SetLatticeMBRPruningFactor(prune);
+ for (vector<float>::const_iterator scale_i = scale_grid.begin(); scale_i != scale_grid.end(); ++scale_i) {
+ float scale = *scale_i;
+ staticData.SetMBRScale(scale);
+ cout << lineCount << " ||| " << p << " " << r << " " << prune << " " << scale << " ||| ";
+ vector<Word> mbrBestHypo = doLatticeMBR(manager,nBestList);
+ OutputBestHypo(mbrBestHypo, lineCount, staticData.GetReportSegmentation(),
+ staticData.GetReportAllFactors(),cout);
+ }
+ }
+
+ }
+ }
+
+
+ }
+
+}
diff --git a/contrib/relent-filter/src/Main.cpp b/contrib/relent-filter/src/Main.cpp
new file mode 100755
index 000000000..07525cfc0
--- /dev/null
+++ b/contrib/relent-filter/src/Main.cpp
@@ -0,0 +1,282 @@
+/***********************************************************************
+Relative Entropy-based Phrase table Pruning
+Copyright (C) 2012 Wang Ling
+
+This library is free software; you can redistribute it and/or
+modify it under the terms of the GNU Lesser General Public
+License as published by the Free Software Foundation; either
+version 2.1 of the License, or (at your option) any later version.
+
+This library is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with this library; if not, write to the Free Software
+Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+***********************************************************************/
+
+/**
+ * Moses main, for single-threaded and multi-threaded.
+ **/
+
+#include <exception>
+#include <fstream>
+#include <sstream>
+#include <vector>
+
+#ifdef WIN32
+// Include Visual Leak Detector
+//#include <vld.h>
+#endif
+
+#include "Hypothesis.h"
+#include "Manager.h"
+#include "IOWrapper.h"
+#include "StaticData.h"
+#include "Util.h"
+#include "ThreadPool.h"
+#include "TranslationAnalysis.h"
+#include "OutputCollector.h"
+#include "RelativeEntropyCalc.h"
+#include "LexicalReordering.h"
+#include "LexicalReorderingState.h"
+
+#ifdef HAVE_PROTOBUF
+#include "hypergraph.pb.h"
+#endif
+
+using namespace std;
+using namespace Moses;
+using namespace MosesCmd;
+
+namespace MosesCmd
+{
+// output floats with three significant digits
+static const size_t PRECISION = 3;
+
+/** Enforce rounding */
+void fix(std::ostream& stream, size_t size)
+{
+ stream.setf(std::ios::fixed);
+ stream.precision(size);
+}
+
+/** Translates a sentence.
+ * - calls the search (Manager)
+ * - applies the decision rule
+ * - outputs best translation and additional reporting
+ **/
+class TranslationTask : public Task
+{
+
+public:
+
+ TranslationTask(size_t lineNumber,
+ InputType* source, OutputCollector* searchGraphCollector) :
+ m_source(source), m_lineNumber(lineNumber),
+ m_searchGraphCollector(searchGraphCollector) {}
+
+ /** Translate one sentence
+ * gets called by main function implemented at end of this source file */
+ void Run() {
+
+ // report thread number
+#ifdef BOOST_HAS_PTHREADS
+ TRACE_ERR("Translating line " << m_lineNumber << " in thread id " << pthread_self() << std::endl);
+#endif
+
+ // shorthand for "global data"
+ const StaticData &staticData = StaticData::Instance();
+ // input sentence
+ Sentence sentence();
+ // set translation system
+ const TranslationSystem& system = staticData.GetTranslationSystem(TranslationSystem::DEFAULT);
+
+ // execute the translation
+ // note: this executes the search, resulting in a search graph
+ // we still need to apply the decision rule (MAP, MBR, ...)
+ Manager manager(m_lineNumber, *m_source,staticData.GetSearchAlgorithm(), &system);
+ manager.ProcessSentence();
+
+ // output search graph
+ if (m_searchGraphCollector) {
+ ostringstream out;
+ fix(out,PRECISION);
+
+ vector<SearchGraphNode> searchGraph;
+ manager.GetSearchGraph(searchGraph);
+ out << RelativeEntropyCalc::CalcRelativeEntropy(m_lineNumber,searchGraph) << endl;
+ m_searchGraphCollector->Write(m_lineNumber, out.str());
+
+ }
+ manager.CalcDecoderStatistics();
+ }
+
+ ~TranslationTask() {
+ delete m_source;
+ }
+
+private:
+ InputType* m_source;
+ size_t m_lineNumber;
+ OutputCollector* m_searchGraphCollector;
+ std::ofstream *m_alignmentStream;
+
+};
+
+static void PrintFeatureWeight(const FeatureFunction* ff)
+{
+
+ size_t weightStart = StaticData::Instance().GetScoreIndexManager().GetBeginIndex(ff->GetScoreBookkeepingID());
+ size_t weightEnd = StaticData::Instance().GetScoreIndexManager().GetEndIndex(ff->GetScoreBookkeepingID());
+ for (size_t i = weightStart; i < weightEnd; ++i) {
+ cout << ff->GetScoreProducerDescription(i-weightStart) << " " << ff->GetScoreProducerWeightShortName(i-weightStart) << " "
+ << StaticData::Instance().GetAllWeights()[i] << endl;
+ }
+}
+
+
+static void ShowWeights()
+{
+ fix(cout,6);
+ const StaticData& staticData = StaticData::Instance();
+ const TranslationSystem& system = staticData.GetTranslationSystem(TranslationSystem::DEFAULT);
+ const vector<const StatelessFeatureFunction*>& slf =system.GetStatelessFeatureFunctions();
+ const vector<const StatefulFeatureFunction*>& sff = system.GetStatefulFeatureFunctions();
+ const vector<PhraseDictionaryFeature*>& pds = system.GetPhraseDictionaries();
+ const vector<GenerationDictionary*>& gds = system.GetGenerationDictionaries();
+ for (size_t i = 0; i < sff.size(); ++i) {
+ PrintFeatureWeight(sff[i]);
+ }
+ for (size_t i = 0; i < slf.size(); ++i) {
+ PrintFeatureWeight(slf[i]);
+ }
+ for (size_t i = 0; i < pds.size(); ++i) {
+ PrintFeatureWeight(pds[i]);
+ }
+ for (size_t i = 0; i < gds.size(); ++i) {
+ PrintFeatureWeight(gds[i]);
+ }
+}
+
+} //namespace
+
+/** main function of the command line version of the decoder **/
+int main(int argc, char** argv)
+{
+ try {
+
+ // echo command line, if verbose
+ IFVERBOSE(1) {
+ TRACE_ERR("command: ");
+ for(int i=0; i<argc; ++i) TRACE_ERR(argv[i]<<" ");
+ TRACE_ERR(endl);
+ }
+
+ // set number of significant decimals in output
+ fix(cout,PRECISION);
+ fix(cerr,PRECISION);
+
+ // load all the settings into the Parameter class
+ // (stores them as strings, or array of strings)
+ Parameter* params = new Parameter();
+ if (!params->LoadParam(argc,argv)) {
+ params->Explain();
+ exit(1);
+ }
+
+
+ // initialize all "global" variables, which are stored in StaticData
+ // note: this also loads models such as the language model, etc.
+ if (!StaticData::LoadDataStatic(params, argv[0])) {
+ exit(1);
+ }
+
+ // setting "-show-weights" -> just dump out weights and exit
+ if (params->isParamSpecified("show-weights")) {
+ ShowWeights();
+ exit(0);
+ }
+
+ // shorthand for accessing information in StaticData
+ const StaticData& staticData = StaticData::Instance();
+
+
+ //initialise random numbers
+ srand(time(NULL));
+
+ // set up read/writing class
+ IOWrapper* ioWrapper = GetIOWrapper(staticData);
+ if (!ioWrapper) {
+ cerr << "Error; Failed to create IO object" << endl;
+ exit(1);
+ }
+
+ // check on weights
+ vector<float> weights = staticData.GetAllWeights();
+ IFVERBOSE(2) {
+ TRACE_ERR("The score component vector looks like this:\n" << staticData.GetScoreIndexManager());
+ TRACE_ERR("The global weight vector looks like this:");
+ for (size_t j=0; j<weights.size(); j++) {
+ TRACE_ERR(" " << weights[j]);
+ }
+ TRACE_ERR("\n");
+ }
+ // every score must have a weight! check that here:
+ if(weights.size() != staticData.GetScoreIndexManager().GetTotalNumberOfScores()) {
+ TRACE_ERR("ERROR: " << staticData.GetScoreIndexManager().GetTotalNumberOfScores() << " score components, but " << weights.size() << " weights defined" << std::endl);
+ exit(1);
+ }
+
+ // setting lexicalized reordering setup
+ PhraseBasedReorderingState::m_useFirstBackwardScore = false;
+
+
+ auto_ptr<OutputCollector> outputCollector;
+ outputCollector.reset(new OutputCollector());
+
+#ifdef WITH_THREADS
+ ThreadPool pool(staticData.ThreadCount());
+#endif
+
+ // main loop over set of input sentences
+ InputType* source = NULL;
+ size_t lineCount = 0;
+ while(ReadInput(*ioWrapper,staticData.GetInputType(),source)) {
+ IFVERBOSE(1) {
+ ResetUserTime();
+ }
+ // set up task of translating one sentence
+ TranslationTask* task =
+ new TranslationTask(lineCount,source, outputCollector.get());
+ // execute task
+#ifdef WITH_THREADS
+ pool.Submit(task);
+#else
+ task->Run();
+ delete task;
+#endif
+
+ source = NULL; //make sure it doesn't get deleted
+ ++lineCount;
+ }
+
+ // we are done, finishing up
+#ifdef WITH_THREADS
+ pool.Stop(true); //flush remaining jobs
+#endif
+
+ } catch (const std::exception &e) {
+ std::cerr << "Exception: " << e.what() << std::endl;
+ return EXIT_FAILURE;
+ }
+
+#ifndef EXIT_RETURN
+ //This avoids that destructors are called (it can take a long time)
+ exit(EXIT_SUCCESS);
+#else
+ return EXIT_SUCCESS;
+#endif
+}
diff --git a/contrib/relent-filter/src/Main.h b/contrib/relent-filter/src/Main.h
new file mode 100755
index 000000000..f0782144e
--- /dev/null
+++ b/contrib/relent-filter/src/Main.h
@@ -0,0 +1,39 @@
+/*********************************************************************
+Relative Entropy-based Phrase table Pruning
+Copyright (C) 2012 Wang Ling
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice,
+ this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+ this list of conditions and the following disclaimer in the documentation
+ and/or other materials provided with the distribution.
+ * Neither the name of the University of Edinburgh nor the names of its contributors
+ may be used to endorse or promote products derived from this software
+ without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+***********************************************************************/
+
+#ifndef moses_cmd_Main_h
+#define moses_cmd_Main_h
+
+#include "StaticData.h"
+
+class IOWrapper;
+
+int main(int argc, char* argv[]);
+#endif
diff --git a/contrib/relent-filter/src/RelativeEntropyCalc.cpp b/contrib/relent-filter/src/RelativeEntropyCalc.cpp
new file mode 100755
index 000000000..212eedf87
--- /dev/null
+++ b/contrib/relent-filter/src/RelativeEntropyCalc.cpp
@@ -0,0 +1,83 @@
+/***********************************************************************
+Relative Entropy-based Phrase table Pruning
+Copyright (C) 2012 Wang Ling
+
+This library is free software; you can redistribute it and/or
+modify it under the terms of the GNU Lesser General Public
+License as published by the Free Software Foundation; either
+version 2.1 of the License, or (at your option) any later version.
+
+This library is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with this library; if not, write to the Free Software
+Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+***********************************************************************/
+
+#include <vector>
+#include "Hypothesis.h"
+#include "StaticData.h"
+#include "RelativeEntropyCalc.h"
+#include "Manager.h"
+
+using namespace std;
+using namespace Moses;
+using namespace MosesCmd;
+
+namespace MosesCmd
+{
+ double RelativeEntropyCalc::CalcRelativeEntropy(int translationId, std::vector<SearchGraphNode>& searchGraph){
+ const StaticData &staticData = StaticData::Instance();
+ const Phrase *m_constraint = staticData.GetConstrainingPhrase(translationId);
+
+ double prunedScore = -numeric_limits<double>::max();
+ double unprunedScore = -numeric_limits<double>::max();
+ for (size_t i = 0; i < searchGraph.size(); ++i) {
+ const SearchGraphNode& searchNode = searchGraph[i];
+ int nodeId = searchNode.hypo->GetId();
+ if(nodeId == 0) continue; // initial hypothesis
+
+ int forwardId = searchNode.forward;
+ if(forwardId == -1){ // is final hypothesis
+ Phrase catOutput(0);
+ ConcatOutputPhraseRecursive(catOutput, searchNode.hypo);
+ if(catOutput == *m_constraint){ // is the output actually the same as the constraint (forced decoding does not always force the output)
+ const Hypothesis *prevHypo = searchNode.hypo->GetPrevHypo();
+ int backId = prevHypo->GetId();
+ double derivationScore = searchNode.hypo->GetScore();
+ if(backId != 0){ // derivation using smaller units
+ if(prunedScore < derivationScore){
+ prunedScore = derivationScore;
+ }
+ }
+ if(unprunedScore < derivationScore){
+ unprunedScore = derivationScore;
+ }
+ }
+ }
+ }
+
+ double neg_log_div = 0;
+ if( unprunedScore == -numeric_limits<double>::max()){
+ neg_log_div = numeric_limits<double>::max(); // could not find phrase pair, give it a low score so that it doesnt get pruned
+ }
+ else{
+ neg_log_div = unprunedScore - prunedScore;
+ }
+ if (neg_log_div > 100){
+ return 100;
+ }
+ return neg_log_div;
+ }
+
+ void RelativeEntropyCalc::ConcatOutputPhraseRecursive(Phrase& phrase, const Hypothesis *hypo){
+ int nodeId = hypo->GetId();
+ if(nodeId == 0) return; // initial hypothesis
+ ConcatOutputPhraseRecursive(phrase, hypo->GetPrevHypo());
+ const Phrase &endPhrase = hypo->GetCurrTargetPhrase();
+ phrase.Append(endPhrase);
+ }
+}
diff --git a/contrib/relent-filter/src/RelativeEntropyCalc.h b/contrib/relent-filter/src/RelativeEntropyCalc.h
new file mode 100755
index 000000000..efe8ba495
--- /dev/null
+++ b/contrib/relent-filter/src/RelativeEntropyCalc.h
@@ -0,0 +1,51 @@
+/*********************************************************************
+Relative Entropy-based Phrase table Pruning
+Copyright (C) 2012 Wang Ling
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice,
+ this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+ this list of conditions and the following disclaimer in the documentation
+ and/or other materials provided with the distribution.
+ * Neither the name of the University of Edinburgh nor the names of its contributors
+ may be used to endorse or promote products derived from this software
+ without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
+IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+***********************************************************************/
+
+#include <vector>
+#include "Hypothesis.h"
+#include "StaticData.h"
+#include "Manager.h"
+
+using namespace std;
+using namespace Moses;
+
+namespace MosesCmd
+{
+
+class RelativeEntropyCalc
+{
+public:
+ static double CalcRelativeEntropy(int translationId, std::vector<SearchGraphNode>& searchGraph);
+
+protected:
+ static void ConcatOutputPhraseRecursive(Phrase& phrase, const Hypothesis *hypo);
+};
+
+}
diff --git a/contrib/relent-filter/src/TranslationAnalysis.cpp b/contrib/relent-filter/src/TranslationAnalysis.cpp
new file mode 100755
index 000000000..89da48301
--- /dev/null
+++ b/contrib/relent-filter/src/TranslationAnalysis.cpp
@@ -0,0 +1,126 @@
+// $Id$
+
+#include <iostream>
+#include <sstream>
+#include <algorithm>
+#include "StaticData.h"
+#include "Hypothesis.h"
+#include "TranslationAnalysis.h"
+
+using namespace Moses;
+
+namespace TranslationAnalysis
+{
+
+void PrintTranslationAnalysis(const TranslationSystem* system, std::ostream &os, const Hypothesis* hypo)
+{
+ os << std::endl << "TRANSLATION HYPOTHESIS DETAILS:" << std::endl;
+ std::vector<const Hypothesis*> translationPath;
+
+ while (hypo) {
+ translationPath.push_back(hypo);
+ hypo = hypo->GetPrevHypo();
+ }
+
+ std::reverse(translationPath.begin(), translationPath.end());
+ std::vector<std::string> droppedWords;
+ std::vector<const Hypothesis*>::iterator tpi = translationPath.begin();
+ if(tpi == translationPath.end())
+ return;
+ ++tpi; // skip initial translation state
+ std::vector<std::string> sourceMap;
+ std::vector<std::string> targetMap;
+ std::vector<unsigned int> lmAcc(0);
+ size_t lmCalls = 0;
+ bool doLMStats = ((*tpi)->GetLMStats() != 0);
+ if (doLMStats)
+ lmAcc.resize((*tpi)->GetLMStats()->size(), 0);
+ for (; tpi != translationPath.end(); ++tpi) {
+ std::ostringstream sms;
+ std::ostringstream tms;
+ std::string target = (*tpi)->GetTargetPhraseStringRep();
+ std::string source = (*tpi)->GetSourcePhraseStringRep();
+ WordsRange twr = (*tpi)->GetCurrTargetWordsRange();
+ WordsRange swr = (*tpi)->GetCurrSourceWordsRange();
+ const AlignmentInfo &alignmentInfo = (*tpi)->GetCurrTargetPhrase().GetAlignmentInfo();
+ // language model backoff stats,
+ if (doLMStats) {
+ std::vector<std::vector<unsigned int> >& lmstats = *(*tpi)->GetLMStats();
+ std::vector<std::vector<unsigned int> >::iterator i = lmstats.begin();
+ std::vector<unsigned int>::iterator acc = lmAcc.begin();
+
+ for (; i != lmstats.end(); ++i, ++acc) {
+ std::vector<unsigned int>::iterator j = i->begin();
+ lmCalls += i->size();
+ for (; j != i->end(); ++j) {
+ (*acc) += *j;
+ }
+ }
+ }
+
+ bool epsilon = false;
+ if (target == "") {
+ target="<EPSILON>";
+ epsilon = true;
+ droppedWords.push_back(source);
+ }
+ os << " SOURCE: " << swr << " " << source << std::endl
+ << " TRANSLATED AS: " << target << std::endl
+ << " WORD ALIGNED: " << alignmentInfo << std::endl;
+ size_t twr_i = twr.GetStartPos();
+ size_t swr_i = swr.GetStartPos();
+ if (!epsilon) {
+ sms << twr_i;
+ }
+ if (epsilon) {
+ tms << "del(" << swr_i << ")";
+ } else {
+ tms << swr_i;
+ }
+ swr_i++;
+ twr_i++;
+ for (; twr_i <= twr.GetEndPos() && twr.GetEndPos() != NOT_FOUND; twr_i++) {
+ sms << '-' << twr_i;
+ }
+ for (; swr_i <= swr.GetEndPos() && swr.GetEndPos() != NOT_FOUND; swr_i++) {
+ tms << '-' << swr_i;
+ }
+ if (!epsilon) targetMap.push_back(sms.str());
+ sourceMap.push_back(tms.str());
+ }
+ std::vector<std::string>::iterator si = sourceMap.begin();
+ std::vector<std::string>::iterator ti = targetMap.begin();
+ os << std::endl << "SOURCE/TARGET SPANS:";
+ os << std::endl << " SOURCE:";
+ for (; si != sourceMap.end(); ++si) {
+ os << " " << *si;
+ }
+ os << std::endl << " TARGET:";
+ for (; ti != targetMap.end(); ++ti) {
+ os << " " << *ti;
+ }
+ os << std::endl << std::endl;
+ if (doLMStats && lmCalls > 0) {
+ std::vector<unsigned int>::iterator acc = lmAcc.begin();
+ const LMList& lmlist = system->GetLanguageModels();
+ LMList::const_iterator i = lmlist.begin();
+ for (; acc != lmAcc.end(); ++acc, ++i) {
+ char buf[256];
+ sprintf(buf, "%.4f", (float)(*acc)/(float)lmCalls);
+ os << (*i)->GetScoreProducerDescription() <<", AVG N-GRAM LENGTH: " << buf << std::endl;
+ }
+ }
+
+ if (droppedWords.size() > 0) {
+ std::vector<std::string>::iterator dwi = droppedWords.begin();
+ os << std::endl << "WORDS/PHRASES DROPPED:" << std::endl;
+ for (; dwi != droppedWords.end(); ++dwi) {
+ os << "\tdropped=" << *dwi << std::endl;
+ }
+ }
+ os << std::endl << "SCORES (UNWEIGHTED/WEIGHTED): ";
+ StaticData::Instance().GetScoreIndexManager().PrintLabeledWeightedScores(os, translationPath.back()->GetScoreBreakdown(), StaticData::Instance().GetAllWeights());
+ os << std::endl;
+}
+
+}
diff --git a/contrib/relent-filter/src/TranslationAnalysis.h b/contrib/relent-filter/src/TranslationAnalysis.h
new file mode 100755
index 000000000..1eb7a04fd
--- /dev/null
+++ b/contrib/relent-filter/src/TranslationAnalysis.h
@@ -0,0 +1,25 @@
+// $Id$
+
+/*
+ * also see moses/SentenceStats
+ */
+
+#ifndef moses_cmd_TranslationAnalysis_h
+#define moses_cmd_TranslationAnalysis_h
+
+#include <iostream>
+#include "Hypothesis.h"
+#include "TranslationSystem.h"
+
+namespace TranslationAnalysis
+{
+
+/***
+ * print details about the translation represented in hypothesis to
+ * os. Included information: phrase alignment, words dropped, scores
+ */
+void PrintTranslationAnalysis(const Moses::TranslationSystem* system, std::ostream &os, const Moses::Hypothesis* hypo);
+
+}
+
+#endif
diff --git a/contrib/relent-filter/src/mbr.cpp b/contrib/relent-filter/src/mbr.cpp
new file mode 100755
index 000000000..7462d3fc6
--- /dev/null
+++ b/contrib/relent-filter/src/mbr.cpp
@@ -0,0 +1,178 @@
+#include <iostream>
+#include <fstream>
+#include <sstream>
+#include <iomanip>
+#include <vector>
+#include <map>
+#include <stdlib.h>
+#include <math.h>
+#include <algorithm>
+#include <stdio.h>
+#include "TrellisPathList.h"
+#include "TrellisPath.h"
+#include "StaticData.h"
+#include "Util.h"
+#include "mbr.h"
+
+using namespace std ;
+using namespace Moses;
+
+
+/* Input :
+ 1. a sorted n-best list, with duplicates filtered out in the following format
+ 0 ||| amr moussa is currently on a visit to libya , tomorrow , sunday , to hold talks with regard to the in sudan . ||| 0 -4.94418 0 0 -2.16036 0 0 -81.4462 -106.593 -114.43 -105.55 -12.7873 -26.9057 -25.3715 -52.9336 7.99917 -24 ||| -4.58432
+
+ 2. a weight vector
+ 3. bleu order ( default = 4)
+ 4. scaling factor to weigh the weight vector (default = 1.0)
+
+ Output :
+ translations that minimise the Bayes Risk of the n-best list
+
+
+*/
+
+int BLEU_ORDER = 4;
+int SMOOTH = 1;
+float min_interval = 1e-4;
+void extract_ngrams(const vector<const Factor* >& sentence, map < vector < const Factor* >, int > & allngrams)
+{
+ vector< const Factor* > ngram;
+ for (int k = 0; k < BLEU_ORDER; k++) {
+ for(int i =0; i < max((int)sentence.size()-k,0); i++) {
+ for ( int j = i; j<= i+k; j++) {
+ ngram.push_back(sentence[j]);
+ }
+ ++allngrams[ngram];
+ ngram.clear();
+ }
+ }
+}
+
+float calculate_score(const vector< vector<const Factor*> > & sents, int ref, int hyp, vector < map < vector < const Factor *>, int > > & ngram_stats )
+{
+ int comps_n = 2*BLEU_ORDER+1;
+ vector<int> comps(comps_n);
+ float logbleu = 0.0, brevity;
+
+ int hyp_length = sents[hyp].size();
+
+ for (int i =0; i<BLEU_ORDER; i++) {
+ comps[2*i] = 0;
+ comps[2*i+1] = max(hyp_length-i,0);
+ }
+
+ map< vector < const Factor * > ,int > & hyp_ngrams = ngram_stats[hyp] ;
+ map< vector < const Factor * >, int > & ref_ngrams = ngram_stats[ref] ;
+
+ for (map< vector< const Factor * >, int >::iterator it = hyp_ngrams.begin();
+ it != hyp_ngrams.end(); it++) {
+ map< vector< const Factor * >, int >::iterator ref_it = ref_ngrams.find(it->first);
+ if(ref_it != ref_ngrams.end()) {
+ comps[2* (it->first.size()-1)] += min(ref_it->second,it->second);
+ }
+ }
+ comps[comps_n-1] = sents[ref].size();
+
+ for (int i=0; i<BLEU_ORDER; i++) {
+ if (comps[0] == 0)
+ return 0.0;
+ if ( i > 0 )
+ logbleu += log((float)comps[2*i]+SMOOTH)-log((float)comps[2*i+1]+SMOOTH);
+ else
+ logbleu += log((float)comps[2*i])-log((float)comps[2*i+1]);
+ }
+ logbleu /= BLEU_ORDER;
+ brevity = 1.0-(float)comps[comps_n-1]/comps[1]; // comps[comps_n-1] is the ref length, comps[1] is the test length
+ if (brevity < 0.0)
+ logbleu += brevity;
+ return exp(logbleu);
+}
+
+const TrellisPath doMBR(const TrellisPathList& nBestList)
+{
+ float marginal = 0;
+
+ vector<float> joint_prob_vec;
+ vector< vector<const Factor*> > translations;
+ float joint_prob;
+ vector< map < vector <const Factor *>, int > > ngram_stats;
+
+ TrellisPathList::const_iterator iter;
+
+ // get max score to prevent underflow
+ float maxScore = -1e20;
+ for (iter = nBestList.begin() ; iter != nBestList.end() ; ++iter) {
+ const TrellisPath &path = **iter;
+ float score = StaticData::Instance().GetMBRScale()
+ * path.GetScoreBreakdown().InnerProduct(StaticData::Instance().GetAllWeights());
+ if (maxScore < score) maxScore = score;
+ }
+
+ for (iter = nBestList.begin() ; iter != nBestList.end() ; ++iter) {
+ const TrellisPath &path = **iter;
+ joint_prob = UntransformScore(StaticData::Instance().GetMBRScale() * path.GetScoreBreakdown().InnerProduct(StaticData::Instance().GetAllWeights()) - maxScore);
+ marginal += joint_prob;
+ joint_prob_vec.push_back(joint_prob);
+
+ // get words in translation
+ vector<const Factor*> translation;
+ GetOutputFactors(path, translation);
+
+ // collect n-gram counts
+ map < vector < const Factor *>, int > counts;
+ extract_ngrams(translation,counts);
+
+ ngram_stats.push_back(counts);
+ translations.push_back(translation);
+ }
+
+ vector<float> mbr_loss;
+ float bleu, weightedLoss;
+ float weightedLossCumul = 0;
+ float minMBRLoss = 1000000;
+ int minMBRLossIdx = -1;
+
+ /* Main MBR computation done here */
+ iter = nBestList.begin();
+ for (unsigned int i = 0; i < nBestList.GetSize(); i++) {
+ weightedLossCumul = 0;
+ for (unsigned int j = 0; j < nBestList.GetSize(); j++) {
+ if ( i != j) {
+ bleu = calculate_score(translations, j, i,ngram_stats );
+ weightedLoss = ( 1 - bleu) * ( joint_prob_vec[j]/marginal);
+ weightedLossCumul += weightedLoss;
+ if (weightedLossCumul > minMBRLoss)
+ break;
+ }
+ }
+ if (weightedLossCumul < minMBRLoss) {
+ minMBRLoss = weightedLossCumul;
+ minMBRLossIdx = i;
+ }
+ iter++;
+ }
+ /* Find sentence that minimises Bayes Risk under 1- BLEU loss */
+ return nBestList.at(minMBRLossIdx);
+ //return translations[minMBRLossIdx];
+}
+
+void GetOutputFactors(const TrellisPath &path, vector <const Factor*> &translation)
+{
+ const std::vector<const Hypothesis *> &edges = path.GetEdges();
+ const std::vector<FactorType>& outputFactorOrder = StaticData::Instance().GetOutputFactorOrder();
+ assert (outputFactorOrder.size() == 1);
+
+ // print the surface factor of the translation
+ for (int currEdge = (int)edges.size() - 1 ; currEdge >= 0 ; currEdge--) {
+ const Hypothesis &edge = *edges[currEdge];
+ const Phrase &phrase = edge.GetCurrTargetPhrase();
+ size_t size = phrase.GetSize();
+ for (size_t pos = 0 ; pos < size ; pos++) {
+
+ const Factor *factor = phrase.GetFactor(pos, outputFactorOrder[0]);
+ translation.push_back(factor);
+ }
+ }
+}
+
diff --git a/contrib/relent-filter/src/mbr.h b/contrib/relent-filter/src/mbr.h
new file mode 100755
index 000000000..d08b11a98
--- /dev/null
+++ b/contrib/relent-filter/src/mbr.h
@@ -0,0 +1,28 @@
+// $Id$
+
+/***********************************************************************
+Moses - factored phrase-based language decoder
+Copyright (C) 2006 University of Edinburgh
+
+This library is free software; you can redistribute it and/or
+modify it under the terms of the GNU Lesser General Public
+License as published by the Free Software Foundation; either
+version 2.1 of the License, or (at your option) any later version.
+
+This library is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with this library; if not, write to the Free Software
+Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+***********************************************************************/
+
+#ifndef moses_cmd_mbr_h
+#define moses_cmd_mbr_h
+
+const Moses::TrellisPath doMBR(const Moses::TrellisPathList& nBestList);
+void GetOutputFactors(const Moses::TrellisPath &path, std::vector <const Moses::Factor*> &translation);
+float calculate_score(const std::vector< std::vector<const Moses::Factor*> > & sents, int ref, int hyp, std::vector < std::map < std::vector < const Moses::Factor *>, int > > & ngram_stats );
+#endif
diff --git a/contrib/reranking/data/README b/contrib/reranking/data/README
deleted file mode 100644
index 59b20b32d..000000000
--- a/contrib/reranking/data/README
+++ /dev/null
@@ -1,5 +0,0 @@
-
-sample usage:
-
-../src/nbest -input-file nbest.small -output-file nbest.1best 1 -sort -weights weights
-
diff --git a/contrib/reranking/data/nbest.small b/contrib/reranking/data/nbest.small
deleted file mode 100644
index 0fcbc44ce..000000000
--- a/contrib/reranking/data/nbest.small
+++ /dev/null
@@ -1,7 +0,0 @@
-0 ||| Once a major milestone in the Balkans ||| d: 0 -0.608213 0 0 -0.512647 0 0 lm: -35.7187 tm: -3.97053 -17.5137 -3.24082 -15.8638 2.99969 w: -7 ||| -3.92049
-0 ||| Once a crucial period in the Balkans ||| d: 0 -0.944329 0 0 -1.06468 0 0 lm: -37.5341 tm: -4.27619 -19.441 -3.81074 -14.767 3.99959 w: -7 ||| -4.00353
-1 ||| Since the world is focused on Iraq , North Korea and a possible crisis with Iran on nuclear weapons , Kosovo is somewhat unnoticed . ||| d: -6 -5.80589 -0.65383 -1.29291 -6.19413 -0.0861354 -0.993748 lm: -112.868 tm: -42.7841 -61.6487 -16.5351 -23.8061 21.9977 w: -25 ||| -13.0796
-2 ||| The public will soon turn its attention back to that province during a decision regarding his fate . ||| d: -8 -4.61691 0 -3.62979 -4.85916 0 -4.43407 lm: -81.3478 tm: -46.0407 -63.79 -23.7663 -25.175 14.9984 w: -18 ||| -12.1226
-2 ||| The public will soon be able to turn its attention back into this province during a decision on his fate . ||| d: -8 -5.53064 0 -3.51999 -3.26708 0 -4.44003 lm: -84.7939 tm: -36.2621 -66.32 -21.0804 -33.9136 13.9985 w: -21 ||| -12.1227
-2 ||| The public will soon turn his attention to them at a decision on his destiny . ||| d: -8 -5.3448 0 -2.65118 -4.35949 0 -3.95447 lm: -67.451 tm: -54.851 -89.0503 -17.9389 -22.9488 12.9986 w: -16 ||| -12.1234
-2 ||| The public will soon turn his attention to them at a decision on his destiny . ||| d: -8 -5.3448 0 -2.65118 -4.35949 0 -3.95447 lm: -67.451 tm: -54.851 -89.0503 -17.9389 -22.9488 12.9986 w: -16 ||| -12.1234
diff --git a/contrib/reranking/data/weights b/contrib/reranking/data/weights
deleted file mode 100644
index c6b6c1ac0..000000000
--- a/contrib/reranking/data/weights
+++ /dev/null
@@ -1,11 +0,0 @@
-0
-1 2 3
-4
-5
-6
-7
-8
-9
-10
-11
-12 13
diff --git a/contrib/reranking/src/Hypo.cpp b/contrib/reranking/src/Hypo.cpp
deleted file mode 100644
index 0ceb21abd..000000000
--- a/contrib/reranking/src/Hypo.cpp
+++ /dev/null
@@ -1,59 +0,0 @@
-/*
- * nbest: tool to process moses n-best lists
- *
- * File: Hypo.cpp
- * basic functions to process one hypothesis
- *
- * Created by Holger Schwenk, University of Le Mans, 05/16/2008
- *
- */
-
-
-#include "Hypo.h"
-#include <iostream>
-
-//const char* NBEST_DELIM = "|||";
-
-Hypo::Hypo()
-{
- //cerr << "Hypo: constructor called" << endl;
-}
-
-Hypo::~Hypo()
-{
- //cerr << "Hypo: destructor called" << endl;
-}
-
-void Hypo::Write(ofstream &outf)
-{
- outf << id << NBEST_DELIM2 << trg << NBEST_DELIM2;
- for (vector<float>::iterator i = f.begin(); i != f.end(); i++)
- outf << (*i) << " ";
- outf << NBEST_DELIM << " " << s << endl;
-
-}
-
-float Hypo::CalcGlobal(Weights &w)
-{
- //cerr << " HYP: calc global" << endl;
- int sz=w.val.size();
- if (sz<f.size()) {
- cerr << " - NOTE: padding weight vector with " << f.size()-sz << " zeros" << endl;
- w.val.resize(f.size());
- }
-
- s=0;
- for (int i=0; i<f.size(); i++) {
- //cerr << "i=" << i << ", " << w.val[i] << ", " << f[i] << endl;
- s+=w.val[i]*f[i];
- }
- //cerr << "s=" << s << endl;
- return s;
-}
-
-// this is actually a "greater than" since we want to sort in descending order
-bool Hypo::operator< (const Hypo &h2) const
-{
- return (this->s > h2.s);
-}
-
diff --git a/contrib/reranking/src/Hypo.h b/contrib/reranking/src/Hypo.h
deleted file mode 100644
index a85410289..000000000
--- a/contrib/reranking/src/Hypo.h
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * nbest: tool to process moses n-best lists
- *
- * File: Hypo.h
- * basic functions to process one hypothesis
- *
- * Created by Holger Schwenk, University of Le Mans, 05/16/2008
- *
- */
-
-
-#ifndef _HYPO_H_
-#define _HYPO_H_
-
-using namespace std;
-
-#include <iostream>
-#include <fstream>
-#include <string>
-#include <vector>
-
-#include "Tools.h"
-
-#define NBEST_DELIM "|||"
-#define NBEST_DELIM2 " ||| "
-
-class Hypo
-{
- int id;
- string trg; // translation
- vector<float> f; // feature function scores
- float s; // global score
- // segmentation
-public:
- Hypo();
- Hypo(int p_id,string &p_trg, vector<float> &p_f, float p_s) : id(p_id),trg(p_trg),f(p_f),s(p_s) {};
- ~Hypo();
- float CalcGlobal(Weights&);
- void Write(ofstream&);
- bool operator< (const Hypo&) const;
- // bool CompareLikelihoods (const Hypo&, const Hypo&) const;
-};
-
-#endif
diff --git a/contrib/reranking/src/Main.cpp b/contrib/reranking/src/Main.cpp
deleted file mode 100644
index 4a20b013c..000000000
--- a/contrib/reranking/src/Main.cpp
+++ /dev/null
@@ -1,98 +0,0 @@
-/*
- * nbest: tool to process moses n-best lists
- *
- * File: Main.cpp
- * command line interface
- *
- * Created by Holger Schwenk, University of Le Mans, 05/16/2008
- *
- */
-
-#include <iostream>
-#include <fstream>
-#include "ParameterNBest.h"
-#include "NBest.h"
-#include "Tools.h"
-
-#include "../../../moses/src/Util.h"
-
-
-using namespace std;
-
-int main (int argc, char *argv[])
-{
- // parse parameters
- ParameterNBest *parameter = new ParameterNBest();
- if (!parameter->LoadParam(argc, argv)) {
- parameter->Explain();
- delete parameter;
- return 1;
- }
-
- // read input
- ifstream inpf;
- PARAM_VEC p=parameter->GetParam("input-file");
- if (p.size()<1 || p.size()>2) Error("The option -input-file requires one or two arguments");
- int in_n=p.size()>1 ? Moses::Scan<int>(p[1]) : 0;
- cout << "NBest version 0.1, written by Holger.Schwenk@lium.univ-lemans.fr" << endl
- << " - reading input from file '" << p[0] << "'";
- if (in_n>0) cout << " (limited to the first " << in_n << " hypothesis)";
- cout << endl;
- inpf.open(p[0].c_str());
- if (inpf.fail()) {
- perror ("ERROR");
- exit(1);
- }
-
- // open output
- ofstream outf;
- p=parameter->GetParam("output-file");
- if (p.size()<1 || p.size()>2) Error("The option -output-file requires one or two arguments");
- int out_n=p.size()>1 ? Moses::Scan<int>(p[1]) : 0;
- cout << " - writing output to file '" << p[0] << "'";
- if (out_n>0) cout << " (limited to the first " << out_n << " hypothesis)";
- cout << endl;
- outf.open(p[0].c_str());
- if (outf.fail()) {
- perror ("ERROR");
- exit(1);
- }
-
- // eventually read weights
- Weights w;
- int do_calc=false;
- if (parameter->isParamSpecified("weights")) {
- p=parameter->GetParam("weights");
- if (p.size()<1) Error("The option -weights requires one argument");
- cout << " - reading weights from file '" << p[0] << "'";
- int n=w.Read(p[0].c_str());
- cout << " (found " << n << " values)" << endl;
- do_calc=true;
- cout << " - recalculating global scores" << endl;
- }
-
- // shall we sort ?
- bool do_sort = parameter->isParamSpecified("sort");
- if (do_sort) cout << " - sorting global scores" << endl;
-
- // main loop
- int nb_sent=0, nb_nbest=0;
- while (!inpf.eof()) {
- NBest nbest(inpf, in_n);
-
- if (do_calc) nbest.CalcGlobal(w);
- if (do_sort) nbest.Sort();
- nbest.Write(outf, out_n);
-
- nb_sent++;
- nb_nbest+=nbest.NbNBest();
- }
- inpf.close();
- outf.close();
-
- // display final statistics
- cout << " - processed " << nb_nbest << " n-best hypotheses in " << nb_sent << " sentences"
- << " (average " << (float) nb_nbest/nb_sent << ")" << endl;
-
- return 0;
-}
diff --git a/contrib/reranking/src/Makefile b/contrib/reranking/src/Makefile
deleted file mode 100644
index c2711741e..000000000
--- a/contrib/reranking/src/Makefile
+++ /dev/null
@@ -1,18 +0,0 @@
-
-# where to find include files and libraries from Moses
-MOSES_INC=../../../moses/src ../../..
-LIB_DIR=../../../moses/src/
-
-LIBS=-lmoses -lz
-OBJS=Main.o NBest.o Hypo.o Tools.o ParameterNBest.o
-
-CFLAGS=-I$(MOSES_INC)
-
-nbest-tool: $(OBJS)
- $(CXX) -o nbest $(OBJS) -L$(LIB_DIR) $(LIBS)
-
-%.o: %.cpp
- $(CXX) $(CFLAGS) -o $@ -c $<
-
-clean:
- -rm $(OBJS) nbest
diff --git a/contrib/reranking/src/NBest.cpp b/contrib/reranking/src/NBest.cpp
deleted file mode 100644
index 24a0f60c3..000000000
--- a/contrib/reranking/src/NBest.cpp
+++ /dev/null
@@ -1,131 +0,0 @@
-/*
- * nbest: tool to process moses n-best lists
- *
- * File: NBest.cpp
- * basic functions on n-best lists
- *
- * Created by Holger Schwenk, University of Le Mans, 05/16/2008
- *
- */
-
-
-#include "NBest.h"
-
-#include "Util.h" // from Moses
-
-#include <sstream>
-#include <algorithm>
-
-//NBest::NBest() {
-//cerr << "NBEST: constructor called" << endl;
-//}
-
-
-bool NBest::ParseLine(ifstream &inpf, const int n)
-{
- static string line; // used internally to buffer an input line
- static int prev_id=-1; // used to detect a change of the n-best ID
- int id;
- vector<float> f;
- float s;
- int pos=0, epos;
- vector<string> blocks;
-
-
- if (line.empty()) {
- getline(inpf,line);
- if (inpf.eof()) return false;
- }
-
- // split line into blocks
- //cerr << "PARSE line: " << line << endl;
- while ((epos=line.find(NBEST_DELIM,pos))!=string::npos) {
- blocks.push_back(line.substr(pos,epos-pos));
- // cerr << " block: " << blocks.back() << endl;
- pos=epos+strlen(NBEST_DELIM);
- }
- blocks.push_back(line.substr(pos,line.size()));
- // cerr << " block: " << blocks.back() << endl;
-
- if (blocks.size()<4) {
- cerr << line << endl;
- Error("can't parse the above line");
- }
-
- // parse ID
- id=Scan<int>(blocks[0]);
- if (prev_id>=0 && id!=prev_id) {
- prev_id=id; // new nbest list has started
- return false;
- }
- prev_id=id;
- //cerr << "same ID " << id << endl;
-
- if (n>0 && nbest.size() >= n) {
- //cerr << "skipped" << endl;
- line.clear();
- return true; // skip parsing of unused hypos
- }
-
- // parse feature function scores
- //cerr << "PARSE features: '" << blocks[2] << "' size: " << blocks[2].size() << endl;
- pos=blocks[2].find_first_not_of(' ');
- while (pos<blocks[2].size() && (epos=blocks[2].find(" ",pos))!=string::npos) {
- string feat=blocks[2].substr(pos,epos-pos);
- //cerr << " feat: '" << feat << "', pos: " << pos << ", " << epos << endl;
- if (feat.find(":",0)!=string::npos) {
- //cerr << " name: " << feat << endl;
- } else {
- f.push_back(Scan<float>(feat));
- //cerr << " value: " << f.back() << endl;
- }
- pos=epos+1;
- }
-
- // eventually parse segmentation
- if (blocks.size()>4) {
- Error("parsing segmentation not yet supported");
- }
-
- nbest.push_back(Hypo(id, blocks[1], f, Scan<float>(blocks[3])));
-
- line.clear(); // force read of new line
-
- return true;
-}
-
-
-NBest::NBest(ifstream &inpf, const int n)
-{
- //cerr << "NBEST: constructor with file called" << endl;
- while (ParseLine(inpf,n));
- //cerr << "NBEST: found " << nbest.size() << " lines" << endl;
-}
-
-
-NBest::~NBest()
-{
- //cerr << "NBEST: destructor called" << endl;
-}
-
-void NBest::Write(ofstream &outf, int n)
-{
- if (n<1 || n>nbest.size()) n=nbest.size();
- for (int i=0; i<n; i++) nbest[i].Write(outf);
-}
-
-
-float NBest::CalcGlobal(Weights &w)
-{
- //cerr << "NBEST: calc global of size " << nbest.size() << endl;
- for (vector<Hypo>::iterator i = nbest.begin(); i != nbest.end(); i++) {
- (*i).CalcGlobal(w);
- }
-}
-
-
-void NBest::Sort()
-{
- sort(nbest.begin(),nbest.end());
-}
-
diff --git a/contrib/reranking/src/NBest.h b/contrib/reranking/src/NBest.h
deleted file mode 100644
index 9a4aa9447..000000000
--- a/contrib/reranking/src/NBest.h
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * nbest: tool to process moses n-best lists
- *
- * File: NBest.h
- * basic functions on n-best lists
- *
- * Created by Holger Schwenk, University of Le Mans, 05/16/2008
- *
- */
-
-
-#ifndef _NBEST_H_
-#define _NBEST_H_
-
-using namespace std;
-
-#include <iostream>
-#include <fstream>
-#include <string>
-#include <vector>
-
-#include "Tools.h"
-#include "Hypo.h"
-
-class NBest
-{
- int id;
- string src;
- vector<Hypo> nbest;
- bool ParseLine(ifstream &inpf, const int n);
-public:
- NBest(ifstream&, const int=0);
- ~NBest();
- int NbNBest() {
- return nbest.size();
- };
- float CalcGlobal(Weights&);
- void Sort(); // largest values first
- void Write(ofstream&, int=0);
-};
-
-void Error(char *msg);
-
-#endif
diff --git a/contrib/reranking/src/ParameterNBest.cpp b/contrib/reranking/src/ParameterNBest.cpp
deleted file mode 100644
index 005f3890c..000000000
--- a/contrib/reranking/src/ParameterNBest.cpp
+++ /dev/null
@@ -1,337 +0,0 @@
-// $Id: $
-
-/***********************************************************************
-nbest - tool to process Moses n-best list
-Copyright (C) 2008 Holger Schwenk, University of Le Mans, France
-
-This library is free software; you can redistribute it and/or
-modify it under the terms of the GNU Lesser General Public
-License as published by the Free Software Foundation; either
-version 2.1 of the License, or (at your option) any later version.
-
-This library is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this library; if not, write to the Free Software
-Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
-***********************************************************************/
-
-#include <iostream>
-#include <iterator>
-#include <fstream>
-#include <sstream>
-#include <algorithm>
-#include "ParameterNBest.h"
-#include "Tools.h"
-
-#include "Util.h" // from Moses
-#include "InputFileStream.h"
-#include "UserMessage.h"
-
-using namespace std;
-
-/** define allowed parameters */
-ParameterNBest::ParameterNBest()
-{
- AddParam("input-file", "i", "file name of the input n-best list");
- AddParam("output-file", "o", "file name of the output n-best list");
- AddParam("recalc", "r", "recalc global scores");
- AddParam("weights", "w", "coefficients of the feature functions");
- AddParam("sort", "s", "sort n-best list according to the global scores");
- AddParam("lexical", "l", "report number of lexically different hypothesis");
-}
-
-ParameterNBest::~ParameterNBest()
-{
-}
-
-/** initialize a parameter, sub of constructor */
-void ParameterNBest::AddParam(const string &paramName, const string &description)
-{
- m_valid[paramName] = true;
- m_description[paramName] = description;
-}
-
-/** initialize a parameter (including abbreviation), sub of constructor */
-void ParameterNBest::AddParam(const string &paramName, const string &abbrevName, const string &description)
-{
- m_valid[paramName] = true;
- m_valid[abbrevName] = true;
- m_abbreviation[paramName] = abbrevName;
- m_description[paramName] = description;
-}
-
-/** print descriptions of all parameters */
-void ParameterNBest::Explain()
-{
- cerr << "Usage:" << endl;
- for(PARAM_STRING::const_iterator iterParam = m_description.begin(); iterParam != m_description.end(); iterParam++) {
- const string paramName = iterParam->first;
- const string paramDescription = iterParam->second;
- cerr << "\t-" << paramName;
- PARAM_STRING::const_iterator iterAbbr = m_abbreviation.find( paramName );
- if ( iterAbbr != m_abbreviation.end() )
- cerr << " (" << iterAbbr->second << ")";
- cerr << ": " << paramDescription << endl;
- }
-}
-
-/** check whether an item on the command line is a switch or a value
- * \param token token on the command line to checked **/
-
-bool ParameterNBest::isOption(const char* token)
-{
- if (! token) return false;
- std::string tokenString(token);
- size_t length = tokenString.size();
- if (length > 0 && tokenString.substr(0,1) != "-") return false;
- if (length > 1 && tokenString.substr(1,1).find_first_not_of("0123456789") == 0) return true;
- return false;
-}
-
-/** load all parameters from the configuration file and the command line switches */
-bool ParameterNBest::LoadParam(const string &filePath)
-{
- const char *argv[] = {"executable", "-f", filePath.c_str() };
- return LoadParam(3, (char**) argv);
-}
-
-/** load all parameters from the configuration file and the command line switches */
-bool ParameterNBest::LoadParam(int argc, char* argv[])
-{
- // config file (-f) arg mandatory
- string configPath;
- /*
- if ( (configPath = FindParam("-f", argc, argv)) == ""
- && (configPath = FindParam("-config", argc, argv)) == "")
- {
- PrintCredit();
-
- UserMessage::Add("No configuration file was specified. Use -config or -f");
- return false;
- }
- else
- {
- if (!ReadConfigFile(configPath))
- {
- UserMessage::Add("Could not read "+configPath);
- return false;
- }
- }
- */
-
- // overwrite parameters with values from switches
- for(PARAM_STRING::const_iterator iterParam = m_description.begin(); iterParam != m_description.end(); iterParam++) {
- const string paramName = iterParam->first;
- OverwriteParam("-" + paramName, paramName, argc, argv);
- }
-
- // ... also shortcuts
- for(PARAM_STRING::const_iterator iterParam = m_abbreviation.begin(); iterParam != m_abbreviation.end(); iterParam++) {
- const string paramName = iterParam->first;
- const string paramShortName = iterParam->second;
- OverwriteParam("-" + paramShortName, paramName, argc, argv);
- }
-
- // logging of parameters that were set in either config or switch
- int verbose = 1;
- if (m_setting.find("verbose") != m_setting.end() &&
- m_setting["verbose"].size() > 0)
- verbose = Scan<int>(m_setting["verbose"][0]);
- if (verbose >= 1) { // only if verbose
- TRACE_ERR( "Defined parameters (per moses.ini or switch):" << endl);
- for(PARAM_MAP::const_iterator iterParam = m_setting.begin() ; iterParam != m_setting.end(); iterParam++) {
- TRACE_ERR( "\t" << iterParam->first << ": ");
- for ( size_t i = 0; i < iterParam->second.size(); i++ )
- TRACE_ERR( iterParam->second[i] << " ");
- TRACE_ERR( endl);
- }
- }
-
- // check for illegal parameters
- bool noErrorFlag = true;
- for (int i = 0 ; i < argc ; i++) {
- if (isOption(argv[i])) {
- string paramSwitch = (string) argv[i];
- string paramName = paramSwitch.substr(1);
- if (m_valid.find(paramName) == m_valid.end()) {
- UserMessage::Add("illegal switch: " + paramSwitch);
- noErrorFlag = false;
- }
- }
- }
-
- // check if parameters make sense
- return Validate() && noErrorFlag;
-}
-
-/** check that parameter settings make sense */
-bool ParameterNBest::Validate()
-{
- bool noErrorFlag = true;
-
- // required parameters
- if (m_setting["input-file"].size() == 0) {
- UserMessage::Add("No input-file");
- noErrorFlag = false;
- }
-
- if (m_setting["output-file"].size() == 0) {
- UserMessage::Add("No output-file");
- noErrorFlag = false;
- }
-
- if (m_setting["recalc"].size() > 0 && m_setting["weights"].size()==0) {
- UserMessage::Add("you need to spezify weight when recalculating global scores");
- noErrorFlag = false;
- }
-
-
- return noErrorFlag;
-}
-
-/** check whether a file exists */
-bool ParameterNBest::FilesExist(const string &paramName, size_t tokenizeIndex,std::vector<std::string> const& extensions)
-{
- typedef std::vector<std::string> StringVec;
- StringVec::const_iterator iter;
-
- PARAM_MAP::const_iterator iterParam = m_setting.find(paramName);
- if (iterParam == m_setting.end()) {
- // no param. therefore nothing to check
- return true;
- }
- const StringVec &pathVec = (*iterParam).second;
- for (iter = pathVec.begin() ; iter != pathVec.end() ; ++iter) {
- StringVec vec = Tokenize(*iter);
- if (tokenizeIndex >= vec.size()) {
- stringstream errorMsg("");
- errorMsg << "Expected at least " << (tokenizeIndex+1) << " tokens per emtry in '"
- << paramName << "', but only found "
- << vec.size();
- UserMessage::Add(errorMsg.str());
- return false;
- }
- const string &pathStr = vec[tokenizeIndex];
-
- bool fileFound=0;
- for(size_t i=0; i<extensions.size() && !fileFound; ++i) {
- fileFound|=FileExists(pathStr + extensions[i]);
- }
- if(!fileFound) {
- stringstream errorMsg("");
- errorMsg << "File " << pathStr << " does not exist";
- UserMessage::Add(errorMsg.str());
- return false;
- }
- }
- return true;
-}
-
-/** look for a switch in arg, update parameter */
-// TODO arg parsing like this does not belong in the library, it belongs
-// in moses-cmd
-string ParameterNBest::FindParam(const string &paramSwitch, int argc, char* argv[])
-{
- for (int i = 0 ; i < argc ; i++) {
- if (string(argv[i]) == paramSwitch) {
- if (i+1 < argc) {
- return argv[i+1];
- } else {
- stringstream errorMsg("");
- errorMsg << "Option " << paramSwitch << " requires a parameter!";
- UserMessage::Add(errorMsg.str());
- // TODO return some sort of error, not the empty string
- }
- }
- }
- return "";
-}
-
-/** update parameter settings with command line switches
- * \param paramSwitch (potentially short) name of switch
- * \param paramName full name of parameter
- * \param argc number of arguments on command line
- * \param argv values of paramters on command line */
-void ParameterNBest::OverwriteParam(const string &paramSwitch, const string &paramName, int argc, char* argv[])
-{
- int startPos = -1;
- for (int i = 0 ; i < argc ; i++) {
- if (string(argv[i]) == paramSwitch) {
- startPos = i+1;
- break;
- }
- }
- if (startPos < 0)
- return;
-
- int index = 0;
- m_setting[paramName]; // defines the parameter, important for boolean switches
- while (startPos < argc && (!isOption(argv[startPos]))) {
- if (m_setting[paramName].size() > (size_t)index)
- m_setting[paramName][index] = argv[startPos];
- else
- m_setting[paramName].push_back(argv[startPos]);
- index++;
- startPos++;
- }
-}
-
-
-/** read parameters from a configuration file */
-bool ParameterNBest::ReadConfigFile( string filePath )
-{
- InputFileStream inFile(filePath);
- string line, paramName;
- while(getline(inFile, line)) {
- // comments
- size_t comPos = line.find_first_of("#");
- if (comPos != string::npos)
- line = line.substr(0, comPos);
- // trim leading and trailing spaces/tabs
- line = Trim(line);
-
- if (line[0]=='[') {
- // new parameter
- for (size_t currPos = 0 ; currPos < line.size() ; currPos++) {
- if (line[currPos] == ']') {
- paramName = line.substr(1, currPos - 1);
- break;
- }
- }
- } else if (line != "") {
- // add value to parameter
- m_setting[paramName].push_back(line);
- }
- }
- return true;
-}
-
-
-void ParameterNBest::PrintCredit()
-{
- cerr << "NBest - A tool to process Moses n-best lists" << endl
- << "Copyright (C) 2008 Holger Schwenk" << endl << endl
-
- << "This library is free software; you can redistribute it and/or" << endl
- << "modify it under the terms of the GNU Lesser General Public" << endl
- << "License as published by the Free Software Foundation; either" << endl
- << "version 2.1 of the License, or (at your option) any later version." << endl << endl
-
- << "This library is distributed in the hope that it will be useful," << endl
- << "but WITHOUT ANY WARRANTY; without even the implied warranty of" << endl
- << "MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU" << endl
- << "Lesser General Public License for more details." << endl << endl
-
- << "You should have received a copy of the GNU Lesser General Public" << endl
- << "License along with this library; if not, write to the Free Software" << endl
- << "Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA" << endl << endl
- << "***********************************************************************" << endl << endl
- << "Built on " << __DATE__ << endl << endl
-
- << "Written by Holger Schwenk, Holger.Schwenk@lium.univ-lemans.fr" << endl << endl;
-}
-
diff --git a/contrib/reranking/src/ParameterNBest.h b/contrib/reranking/src/ParameterNBest.h
deleted file mode 100644
index bc554d4b9..000000000
--- a/contrib/reranking/src/ParameterNBest.h
+++ /dev/null
@@ -1,76 +0,0 @@
-// $Id: $
-
-/***********************************************************************
-nbest - tool to process Moses n-best list
-Copyright (C) 2008 Holger Schwenk, University of Le Mans, France
-
-This library is free software; you can redistribute it and/or
-modify it under the terms of the GNU Lesser General Public
-License as published by the Free Software Foundation; either
-version 2.1 of the License, or (at your option) any later version.
-
-This library is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-Lesser General Public License for more details.
-
-You should have received a copy of the GNU Lesser General Public
-License along with this library; if not, write to the Free Software
-Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
-***********************************************************************/
-
-#ifndef _PARAMETER_NBEST_H_
-#define _PARAMETER_NBEST_H_
-
-#include <string>
-#include <map>
-#include <vector>
-#include "TypeDef.h"
-
-typedef std::vector<std::string> PARAM_VEC;
-typedef std::map<std::string, PARAM_VEC > PARAM_MAP;
-typedef std::map<std::string, bool> PARAM_BOOL;
-typedef std::map<std::string, std::string > PARAM_STRING;
-
-/** Handles parameter values set in config file or on command line.
- * Process raw parameter data (names and values as strings) for StaticData
- * to parse; to get useful values, see StaticData. */
-class ParameterNBest
-{
-protected:
- PARAM_MAP m_setting;
- PARAM_BOOL m_valid;
- PARAM_STRING m_abbreviation;
- PARAM_STRING m_description;
-
- std::string FindParam(const std::string &paramSwitch, int argc, char* argv[]);
- void OverwriteParam(const std::string &paramSwitch, const std::string &paramName, int argc, char* argv[]);
- bool ReadConfigFile( std::string filePath );
- bool FilesExist(const std::string &paramName, size_t tokenizeIndex,std::vector<std::string> const& fileExtension=std::vector<std::string>(1,""));
- bool isOption(const char* token);
- bool Validate();
-
- void AddParam(const std::string &paramName, const std::string &description);
- void AddParam(const std::string &paramName, const std::string &abbrevName, const std::string &description);
-
- void PrintCredit();
-
-public:
- ParameterNBest();
- ~ParameterNBest();
- bool LoadParam(int argc, char* argv[]);
- bool LoadParam(const std::string &filePath);
- void Explain();
-
- /** return a vector of strings holding the whitespace-delimited values on the ini-file line corresponding to the given parameter name */
- const PARAM_VEC &GetParam(const std::string &paramName) {
- return m_setting[paramName];
- }
- /** check if parameter is defined (either in moses.ini or as switch) */
- bool isParamSpecified(const std::string &paramName) {
- return m_setting.find( paramName ) != m_setting.end();
- }
-
-};
-
-#endif
diff --git a/contrib/reranking/src/Tools.cpp b/contrib/reranking/src/Tools.cpp
deleted file mode 100644
index 8312c3370..000000000
--- a/contrib/reranking/src/Tools.cpp
+++ /dev/null
@@ -1,29 +0,0 @@
-/*
- * nbest: tool to process moses n-best lists
- *
- * File: Tools.cpp
- * basic utility functions
- *
- * Created by Holger Schwenk, University of Le Mans, 05/16/2008
- *
- */
-
-#include "Tools.h"
-
-int Weights::Read(const char *fname)
-{
- ifstream inpf;
-
- inpf.open(fname);
- if (inpf.fail()) {
- perror ("ERROR");
- exit(1);
- }
-
- float f;
- while (inpf >> f) val.push_back(f);
-
- inpf.close();
- return val.size();
-}
-
diff --git a/contrib/reranking/src/Tools.h b/contrib/reranking/src/Tools.h
deleted file mode 100644
index eb71746b0..000000000
--- a/contrib/reranking/src/Tools.h
+++ /dev/null
@@ -1,73 +0,0 @@
-/*
- * nbest: tool to process moses n-best lists
- *
- * File: Tools.cpp
- * basic utility functions
- *
- * Created by Holger Schwenk, University of Le Mans, 05/16/2008
- *
- */
-
-
-#ifndef _TOOLS_H_
-#define _TOOLS_H_
-
-using namespace std;
-
-#include <iostream>
-#include <fstream>
-#include <vector>
-
-class Weights
-{
- vector<float> val;
-public:
- Weights() {};
- ~Weights() {};
- int Read(const char *);
- friend class Hypo;
-};
-
-//******************************************************
-
-/*
-template<typename T>
-inline T Scan(const std::string &input)
-{
- std::stringstream stream(input);
- T ret;
- stream >> ret;
- return ret;
-}
-*/
-
-//******************************************************
-
-inline void Error (char *msg)
-{
- cerr << "ERROR: " << msg << endl;
- exit(1);
-}
-
-//******************************************************
-// From Moses code:
-
-
-/*
- * Outputting debugging/verbose information to stderr.
- * Use TRACE_ENABLE flag to redirect tracing output into oblivion
- * so that you can output your own ad-hoc debugging info.
- * However, if you use stderr diretly, please delete calls to it once
- * you finished debugging so that it won't clutter up.
- * Also use TRACE_ENABLE to turn off output of any debugging info
- * when compiling for a gui front-end so that running gui won't generate
- * output on command line
- * */
-#ifdef TRACE_ENABLE
-#define TRACE_ERR(str) std::cerr << str
-#else
-#define TRACE_ERR(str) {}
-#endif
-
-#endif
-
diff --git a/contrib/sigtest-filter/Makefile b/contrib/sigtest-filter/Makefile
index ddefc907b..71de9c45f 100644
--- a/contrib/sigtest-filter/Makefile
+++ b/contrib/sigtest-filter/Makefile
@@ -1,5 +1,5 @@
SALMDIR=/Users/hieuhoang/workspace/salm
-FLAVOR?=o32
+FLAVOR?=o64
INC=-I$(SALMDIR)/Src/Shared -I$(SALMDIR)/Src/SuffixArrayApplications -I$(SALMDIR)/Src/SuffixArrayApplications/SuffixArraySearch
OBJS=$(SALMDIR)/Distribution/Linux/Objs/Search/_SuffixArrayApplicationBase.$(FLAVOR) $(SALMDIR)/Distribution/Linux/Objs/Search/_SuffixArraySearchApplicationBase.$(FLAVOR) $(SALMDIR)/Distribution/Linux/Objs/Shared/_String.$(FLAVOR) $(SALMDIR)/Distribution/Linux/Objs/Shared/_IDVocabulary.$(FLAVOR)
diff --git a/contrib/sigtest-filter/filter-pt.cpp b/contrib/sigtest-filter/filter-pt.cpp
index 05a32967f..f06d2b430 100644
--- a/contrib/sigtest-filter/filter-pt.cpp
+++ b/contrib/sigtest-filter/filter-pt.cpp
@@ -17,14 +17,15 @@
#include <unistd.h>
#endif
-typedef std::set<TextLenType> SentIdSet;
-typedef std::map<std::string, SentIdSet> PhraseSetMap;
+typedef std::vector<TextLenType> SentIdSet;
+typedef std::pair<SentIdSet, clock_t> ClockedSentIdSet;
+typedef std::map<std::string, ClockedSentIdSet> PhraseSetMap;
#undef min
// constants
-const size_t MINIMUM_SIZE_TO_KEEP = 10000; // reduce this to improve memory usage,
-// increase for speed
+const size_t MINIMUM_SIZE_TO_KEEP = 10000; // increase this to improve memory usage,
+// reduce for speed
const std::string SEPARATOR = " ||| ";
const double ALPHA_PLUS_EPS = -1000.0; // dummy value
@@ -38,6 +39,7 @@ double sig_filter_limit = 0; // keep phrase pairs with -log(sig) > si
// higher = filter-more
bool pef_filter_only = false; // only filter based on pef
bool hierarchical = false;
+int max_cache = 0;
// globals
PhraseSetMap esets;
@@ -62,7 +64,8 @@ void usage()
<< " [-n num ] 0, 1...: 0=no filtering, >0 sort by P(e|f) and keep the top num elements\n"
<< " [-c ] add the cooccurence counts to the phrase table\n"
<< " [-p ] add -log(significance) to the phrasetable\n"
- << " [-h ] filter hierarchical rule table\n";
+ << " [-h ] filter hierarchical rule table\n"
+ << " [-m num ] limit cache to num most recent phrases\n";
exit(1);
}
@@ -199,20 +202,10 @@ double fisher_exact(int cfe, int ce, int cf)
}
template <class setType>
-setType unordered_set_intersect(setType & set_1, setType & set_2)
+setType ordered_set_intersect(setType & set_1, setType & set_2)
{
setType set_out;
-
- if (set_1.size() < set_2.size()) {
- for (SentIdSet::iterator i=set_1.begin(); i != set_1.end(); ++i) {
- if (set_2.find(*i) != set_2.end()) set_out.insert(*i);
- }
- }
- else {
- for (SentIdSet::iterator i=set_2.begin(); i != set_2.end(); ++i) {
- if (set_1.find(*i) != set_1.end()) set_out.insert(*i);
- }
- }
+ std::set_intersection(set_1.begin(), set_1.end(), set_2.begin(), set_2.end(), inserter(set_out,set_out.begin()) );
return set_out;
}
@@ -227,8 +220,13 @@ SentIdSet lookup_phrase(const std::string & phrase, C_SuffixArraySearchApplicati
cerr<<"No occurrences found!!\n";
}
for (vector<S_SimplePhraseLocationElement>::iterator i=locations.begin(); i != locations.end(); ++i) {
- occur_set.insert(i->sentIdInCorpus);
+ occur_set.push_back(i->sentIdInCorpus);
}
+
+ std::sort(occur_set.begin(), occur_set.end());
+ SentIdSet::iterator it = std::unique(occur_set.begin(), occur_set.end());
+ occur_set.resize(it - occur_set.begin());
+
return occur_set;
}
@@ -243,22 +241,28 @@ SentIdSet lookup_multiple_phrases(vector<std::string> & phrases, C_SuffixArraySe
else {
SentIdSet main_set;
- SentIdSet & first_set = cache[phrases.front()];
+ ClockedSentIdSet & clocked_first_set = cache[phrases.front()];
+ SentIdSet & first_set = clocked_first_set.first;
+ clocked_first_set.second = clock();
+
bool first = true;
if (first_set.empty()) {
first_set = lookup_phrase(phrases.front(), my_sa);
}
for (vector<std::string>::iterator phrase=phrases.begin()+1; phrase != phrases.end(); ++phrase) {
- SentIdSet & temp_set = cache[*phrase];
+ ClockedSentIdSet & clocked_temp_set = cache[*phrase];
+ SentIdSet & temp_set = clocked_temp_set.first;
+ clocked_temp_set.second = clock();
+
if (temp_set.empty()) {
temp_set = lookup_phrase(*phrase, my_sa);
}
if (first) {
- main_set = unordered_set_intersect(first_set,temp_set);
+ main_set = ordered_set_intersect(first_set,temp_set);
first = false;
}
else {
- main_set = unordered_set_intersect(main_set,temp_set);
+ main_set = ordered_set_intersect(main_set,temp_set);
}
if (temp_set.size() < MINIMUM_SIZE_TO_KEEP) {
cache.erase(*phrase);
@@ -328,7 +332,9 @@ void compute_cooc_stats_and_filter(std::vector<PTEntry*>& options)
for (std::vector<PTEntry*>::iterator i=options.begin(); i != options.end(); ++i) {
const std::string& e_phrase = (*i)->e_phrase;
size_t cef=0;
- SentIdSet& eset = esets[e_phrase];
+ ClockedSentIdSet& clocked_eset = esets[e_phrase];
+ SentIdSet & eset = clocked_eset.first;
+ clocked_eset.second = clock();
if (eset.empty()) {
eset = find_occurrences(e_phrase, e_sa, esets);
//std::cerr << "Looking up e-phrase: " << e_phrase << "\n";
@@ -336,11 +342,11 @@ void compute_cooc_stats_and_filter(std::vector<PTEntry*>& options)
size_t ce=eset.size();
if (ce < cf) {
for (SentIdSet::iterator i=eset.begin(); i != eset.end(); ++i) {
- if (fset.find(*i) != fset.end()) cef++;
+ if (std::binary_search(fset.begin(), fset.end(), *i)) cef++;
}
} else {
for (SentIdSet::iterator i=fset.begin(); i != fset.end(); ++i) {
- if (eset.find(*i) != eset.end()) cef++;
+ if (std::binary_search(eset.begin(), eset.end(), *i)) cef++;
}
}
double nlp = -log(fisher_exact(cef, cf, ce));
@@ -356,13 +362,28 @@ void compute_cooc_stats_and_filter(std::vector<PTEntry*>& options)
options.erase(new_end,options.end());
}
+void prune_cache(PhraseSetMap & psm) {
+ if(max_cache && psm.size() > max_cache) {
+ std::vector<clock_t> clocks;
+ for(PhraseSetMap::iterator it = psm.begin(); it != psm.end(); it++)
+ clocks.push_back(it->second.second);
+
+ std::sort(clocks.begin(), clocks.end());
+ clock_t out = clocks[psm.size()-max_cache];
+
+ for(PhraseSetMap::iterator it = psm.begin(); it != psm.end(); it++)
+ if(it->second.second < out)
+ psm.erase(it);
+ }
+}
+
int main(int argc, char * argv[])
{
int c;
const char* efile=0;
const char* ffile=0;
int pfe_index = 2;
- while ((c = getopt(argc, argv, "cpf:e:i:n:l:h")) != -1) {
+ while ((c = getopt(argc, argv, "cpf:e:i:n:l:m:h")) != -1) {
switch (c) {
case 'e':
efile = optarg;
@@ -386,6 +407,9 @@ int main(int argc, char * argv[])
case 'h':
hierarchical = true;
break;
+ case 'm':
+ max_cache = atoi(optarg);
+ break;
case 'l':
std::cerr << "-l = " << optarg << "\n";
if (strcmp(optarg,"a+e") == 0) {
@@ -442,9 +466,14 @@ int main(int argc, char * argv[])
size_t pt_lines = 0;
while(!cin.eof()) {
cin.getline(tmpString,10000,'\n');
- if(++pt_lines%10000==0) {
+ if(++pt_lines%10000==0) {
std::cerr << ".";
- if(pt_lines%500000==0) std::cerr << "[n:"<<pt_lines<<"]\n";
+
+ prune_cache(esets);
+ prune_cache(fsets);
+
+ if(pt_lines%500000==0)
+ std::cerr << "[n:"<<pt_lines<<"]\n";
}
if(strlen(tmpString)>0) {
diff --git a/contrib/tmcombine/README.md b/contrib/tmcombine/README.md
index 2d21b95c8..2cbc83299 100644
--- a/contrib/tmcombine/README.md
+++ b/contrib/tmcombine/README.md
@@ -58,7 +58,7 @@ Regression tests (check if the output files (`test/phrase-table_testN`) differ f
FURTHER NOTES
-------------
- - Different combination algorithms require different statistics. To be on the safe side, apply `train_model.patch` to `train_model.perl` and use the option `-phrase-word-alignment` when training models.
+ - Different combination algorithms require different statistics. To be on the safe side, use the options `-phrase-word-alignment` and `-write-lexical-counts` when training models.
- The script assumes that phrase tables are sorted (to allow incremental, more memory-friendly processing). Sort the tables with `LC_ALL=C`. Phrase tables produced by Moses are sorted correctly.
diff --git a/contrib/tmcombine/tmcombine.py b/contrib/tmcombine/tmcombine.py
index 3c02eaf45..6560ad23b 100755
--- a/contrib/tmcombine/tmcombine.py
+++ b/contrib/tmcombine/tmcombine.py
@@ -15,7 +15,7 @@
# Some general things to note:
-# - Different combination algorithms require different statistics. To be on the safe side, apply train_model.patch to train_model.perl and use the option -phrase-word-alignment for training all models.
+# - Different combination algorithms require different statistics. To be on the safe side, use the options `-phrase-word-alignment` and `-write-lexical-counts` when training models.
# - The script assumes that phrase tables are sorted (to allow incremental, more memory-friendly processing). sort with LC_ALL=C.
# - Some configurations require additional statistics that are loaded in memory (lexical tables; complete list of target phrases). If memory consumption is a problem, use the option --lowmem (slightly slower and writes temporary files to disk), or consider pruning your phrase table before combining (e.g. using Johnson et al. 2007).
# - The script can read/write gzipped files, but the Python implementation is slow. You're better off unzipping the files on the command line and working with the unzipped files.
@@ -362,6 +362,11 @@ class Moses():
# information specific to Moses model: alignment info and comment section with target and source counts
alignment,comments = self.phrase_pairs[src][target][1]
+ if alignment:
+ extra_space = b' '
+ else:
+ extra_space = b''
+
if mode == 'counts':
i_e2f = flags['i_e2f']
i_f2e = flags['i_f2e']
@@ -376,8 +381,7 @@ class Moses():
origin_features = b' '.join([b'%.4f' %(f) for f in origin_features]) + ' '
else:
origin_features = b''
-
- line = b"%s ||| %s ||| %s 2.718 %s||| %s ||| %s\n" %(src,target,features,origin_features,alignment,comments)
+ line = b"%s ||| %s ||| %s 2.718 %s||| %s%s||| %s\n" %(src,target,features,origin_features,alignment,extra_space,comments)
return line
@@ -1233,7 +1237,7 @@ def handle_file(filename,action,fileobj=None,mode='r'):
if 'counts' in filename and os.path.exists(os.path.isdir(filename)):
sys.stderr.write('For a weighted counts combination, we need statistics that Moses doesn\'t write to disk by default.\n')
- sys.stderr.write('Apply train_model.patch to train_model.perl and repeat step 4 of Moses training for all models.\n')
+ sys.stderr.write('Repeat step 4 of Moses training for all models with the option -write-lexical-counts.\n')
exit()
@@ -1327,7 +1331,7 @@ class Combine_TMs():
output_lexical: If defined, also writes combined lexical tables. Writes to output_lexical.e2f and output_lexical.f2e, or output_lexical.counts.e2f in mode 'counts'.
mode: declares the basic mixture-model algorithm. there are currently three options:
- 'counts': weighted counts (requires some statistics that Moses doesn't produce. Apply train_model.patch to train_model.perl and repeat step 4 of Moses training to obtain them.)
+ 'counts': weighted counts (requires some statistics that Moses doesn't produce. Repeat step 4 of Moses training with the option -write-lexical-counts to obtain them.)
Only the standard Moses features are recomputed from weighted counts; additional features are linearly interpolated
(see number_of_features to allow more features, and i_e2f etc. if the standard features are in a non-standard position)
'interpolate': linear interpolation
diff --git a/contrib/tmcombine/train_model.patch b/contrib/tmcombine/train_model.patch
deleted file mode 100644
index d422a1628..000000000
--- a/contrib/tmcombine/train_model.patch
+++ /dev/null
@@ -1,24 +0,0 @@
---- train-model.perl 2011-11-01 15:17:04.763230934 +0100
-+++ train-model.perl 2011-11-01 15:17:00.033229220 +0100
-@@ -1185,15 +1185,21 @@
-
- open(F2E,">$lexical_file.f2e") or die "ERROR: Can't write $lexical_file.f2e";
- open(E2F,">$lexical_file.e2f") or die "ERROR: Can't write $lexical_file.e2f";
-+ open(F2E2,">$lexical_file.counts.f2e") or die "ERROR: Can't write $lexical_file.counts.f2e";
-+ open(E2F2,">$lexical_file.counts.e2f") or die "ERROR: Can't write $lexical_file.counts.e2f";
-
- foreach my $f (keys %WORD_TRANSLATION) {
- foreach my $e (keys %{$WORD_TRANSLATION{$f}}) {
- printf F2E "%s %s %.7f\n",$e,$f,$WORD_TRANSLATION{$f}{$e}/$TOTAL_FOREIGN{$f};
- printf E2F "%s %s %.7f\n",$f,$e,$WORD_TRANSLATION{$f}{$e}/$TOTAL_ENGLISH{$e};
-+ printf F2E2 "%s %s %i %i\n",$e,$f,$WORD_TRANSLATION{$f}{$e},$TOTAL_FOREIGN{$f};
-+ printf E2F2 "%s %s %i %i\n",$f,$e,$WORD_TRANSLATION{$f}{$e},$TOTAL_ENGLISH{$e};
- }
- }
- close(E2F);
- close(F2E);
-+ close(E2F2);
-+ close(F2E2);
- print STDERR "Saved: $lexical_file.f2e and $lexical_file.e2f\n";
- }
-