Welcome to mirror list, hosted at ThFree Co, Russian Federation.

score-0.85 « scripts « moses-for-mere-mortals « contrib - github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: ebe161feb5057b6bb2e9a04465a9efbf395ce177 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
#!/usr/bin/env bash
# score-0.85
# copyright 2010, João L. A. C. Rosas
# licenced under the GPL licence, version 3
# date: 02/09/2010
# Special thanks to Hilário Leal Fontes and Maria José Machado who made research about this script, sent me experimental results, helped to test it and made very helpful suggestions

# ***Purpose***: This script processes all the Moses translation files present in the $mosesdir/translation_files_for_tmx, if you want to prepare a translation to be used with a translation memory, or in the $mosesdir/translation_output directory, if you want to have a plain translation. For each Moses translation present there, it extracts from its name the names of the abbreviations of the source and target languages and of the scorebasename (which must not included the "." sign). With this information, it reconstructs the full name of the source file and reference translation file. For a set of source file, its Moses translation file and its reference (human-made) translation file, this script creates a report presenting, depending on the parameters set by the user, either 1) a score of the whole Moses translation or 2) a score of each segment of the Moses translation. In this latter case, each line of the file consists of the a) BLEU score and b) NIST score of the Moses translation ***of that segment***, c) the number of the segment in the source document, d) the source, e) reference and f) Moses translation segments, in that order. These 6 fields are separated by the "|" character. The lines are sorted by ascending order of BLEU score.

###########################################################################################################################################################
#THIS SCRIPT ASSUMES THAT A IRSTLM AND RANDLM ENABLED MOSES HAS ALREADY BEEN INSTALLED WITH THE create script IN $mosesdir (BY DEFAULT $HOME/moses-irstlm-randlm), THAT A CORPUS HAS BEEN TRAINED WITH THE train script AND THAT A TRANSLATION HAS ALREADY BEEN MADE WITH THE translate script.
# IT ALSO ASSUMES THAT THE PACKAGES UPON WHICH IT DEPENDS, INDICATED IN THE create script, HAVE BEEN INSTALLED
###########################################################################################################################################################

##########################################################################################################################################################
#                             The values of the variables that follow should be filled according to your needs:                                          # ##########################################################################################################################################################
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# !!! THIS SCRIPT SHOULD NOT BE USED WITH DOCUMENTS TRANSLATED WITH THE translate script WITH ITS $translate_for_tmx PARAMETER SET TO 1 ***UNLESS*** the $othercleanings, $improvesegmentation and $ removeduplicates parameters of that script were all set to 0 and $minseglen was set to -1 (this processing changes the order of the segments and can also make the source document have a number of segments that is different from the number of segments of the reference translation, namely because it can delete some segments and/or add some new ones) !!!

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# !!! The names of the source and target reference translation files used for scoring should not include spaces !!!
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# The source file name and the reference translation file MUST observe the following conventions:
#		Source file               : <basename>.<abbreviation of source language>      (ex: 100.en)
#		Reference translation file: <basename>.<abbreviation of target language>.ref  (ex: 100.pt.ref)
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#Base directory of your Moses installation (made with the create script)
mosesdir=$HOME/moses-irstlm-randlm
#Scores documents prepared for TMX translation memories. If this parameter is set to 1, the script will look for the documents $s and $m in the $mosesdir/translation_files_for_tmx directory; if not set to 1, it will look for the $s document in the mosesdir/translation_input directory and for the $m document in $mosesdir/translation_output; in both cases, it will look for the $r document in $mosesdir/translation_reference
scoreTMXdocuments=0
#This is an arbitrary commentary that you can use if you want to register something (a parameter used, whatever) in the name of the scorefile. Like this, you might not have to open several files before discovering the one you are really looking for (if you do many scores of the same document translated with different parameters); more useful while you are trying to discover the right combination of parameters for your specific situation; !!!Remember, however, that most Linux systems have a maximum file name length of 255 characters; if the name of the document to translate is already long, you might exceed that limit !!! Example of a note:"12-07-2010" (date of the batch score)
batch_user_note="12-07-2010"
#Create a report where each segment gets its own score; 0 = score the whole document; 1 = score each segment
score_line_by_line=0
#Remove moses translation segments that are equal to reference translation segments and whose BLEU score is zero (!!! Only active if score_line_by_line=1 !!!)
remove_equal=1
#Tokenize the source document and the reference and the Moses translation
tokenize=1
#Lowercase the source document and the reference and the Moses translation
lowercase=1
##########################################################################################################################################################
#                               DO NOT CHANGE THE LINES THAT FOLLOW ... unless you know what you are doing!                                              #
##########################################################################################################################################################
#Directory where Moses translation tools are located
toolsdir=$mosesdir/tools
if [ "$scoreTMXdocuments" = "1" ]; then
	sourcelanguagedir=$mosesdir/translation_files_for_tmx
	mosestranslationdir=$mosesdir/translation_files_for_tmx
else
	sourcelanguagedir=$mosesdir/translation_input
	mosestranslationdir=$mosesdir/translation_output
fi
reftranslationdir=$mosesdir/translation_reference

#Directory where the output of the present script, the translation scoring document, will be created
scoredir=$mosesdir/translation_scoring

# Create the input directories, if they do not yet exist; later steps will confirm that the input files do not yet exist (this saves time to the user, who will not have to also create these directories)
if [ ! -d $sourcelanguagedir ] ; then mkdir -p $sourcelanguagedir ; fi
if [ ! -d $reftranslationdir ] ; then mkdir -p $reftranslationdir ; fi
if [ ! -d $mosestranslationdir ] ; then mkdir -p $mosestranslationdir ; fi
if [ ! -d $scoredir ] ; then mkdir -p $scoredir ; fi

# Define functions
remove_garbage() {
	if [ -f $scoredir/$s ]; then
		rm $scoredir/$s
	fi
	if [ -f $scoredir/$r ]; then
		rm $scoredir/$r
	fi
	if [ -f $scoredir/$m ]; then
		rm $scoredir/$m
	fi
	if [ -f $scoredir/$scorebasename-src.$lang1.sgm ]; then
		rm $scoredir/$scorebasename-src.$lang1.sgm
	fi
	if [ -f $scoredir/$scorebasename-ref.$lang2.sgm ]; then
		rm $scoredir/$scorebasename-ref.$lang2.sgm
	fi
	if [ -f $scoredir/$scorebasename.moses.sgm ]; then
		rm $scoredir/$scorebasename.moses.sgm
	fi
}
log_wrong_file() {
	if [ ! -f $scoredir/$tmp ]; then
		echo "LIST OF NOT SCORED FILES (in the $mosestranslationdir directory):" > $scoredir/$tmp
		echo "==============================================================================================" >> $scoredir/$tmp
		echo "" >> $scoredir/$tmp
		echo "==============================================================================================" >> $scoredir/$tmp
	fi
	echo -e "***$filename*** file:" >> $scoredir/$tmp
	echo "----------------------------------------------------------------------------------------------" >> $scoredir/$tmp
	echo -e "\t$error_msg" >> $scoredir/$tmp
	echo "==============================================================================================" >> $scoredir/$tmp
}
#-----------------------------------------------------------------------------------------------------------------------------------------
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
tmp="!!!SCORES-NOT-DONE!!!"
if [ -f $scoredir/$tmp ]; then
	rm $scoredir/$tmp
fi

i=0
for filetoscore in $mosestranslationdir/*; do
	if [ ! -d $filetoscore ]; then
		error_msg=""
		filename=${filetoscore##*/}
		tempbasename=${filename%.*}
		tempbasename1=${tempbasename%.*}
		scorebasename=${tempbasename1%.*}
		temp=${filename%.*}
		temp1=${temp%.*}
		lang1=${temp1##*.}
		lang2=${temp##*.}
		s=$scorebasename.$lang1
		m=$filename
		r=$scorebasename.$lang2.ref
		#-----------------------------------------------------------------------------------------------------------------------------------------
		#Define report name
		if [ "$lang1" = "$filename" -a "$lang2" = "$filename" ]; then
			lang1t=""
			lang2t=""
		else
			lang1t=$lang1
			lang2t=$lang2
		fi
		if [ "$score_line_by_line" = "1" ]; then
			scorefile=$scorebasename.$batch_user_note.$lang1t-$lang2t.F-$scoreTMXdocuments-R-$remove_equal-T-$tokenize.L-$lowercase.line-by-line
		else
			scorefile=$scorebasename-$batch_user_note-$lang1t-$lang2t.F-$scoreTMXdocuments-R-$remove_equal-T-$tokenize.L-$lowercase.whole-doc
		fi
		#-----------------------------------------------------------------------------------------------------------------------------------------
		scorefile_name_len=${#scorefile}
		if [ "${filetoscore##*.}" = "moses" ]; then
			echo "--------------------------------------------------------------------"
			echo "MOSES TRANSLATION: $filename (in the $mosestranslationdir directory)"
			let i=$i+1
			if [ "$scorefile_name_len" -gt "229" -a "$score_line_by_line" != "1" ]; then
				     echo "==============================================================================================" >> $scoredir/$tmp
				error_msg="The translated file name and/or the \$batch_user_note parameter would result in a scorefile name that exceeds the maximal limit of 255 characters. Please try to use translation files and user notes that do not lead to files names exceeding the maximal allowable length."
				echo -e "$error_msg Analysing now next Moses translation."
				log_wrong_file
				scorefile=$(echo $scorefile | cut -c1-229)
				continue
			fi 
			if [ "$scorefile_name_len" -gt "242" -a "$score_line_by_line" = "1" ]; then
				error_msg="The translated file name and/or the \$batch_user_note parameter would result in a scorefile name that exceeds the maximal limit of 255 characters. Please try to use translation files and user notes that do not lead to files with names exceeding their maximal allowable length."
				echo -e "$error_msg Analysing now next Moses translation."
				log_wrong_file
				scorefile=$(echo $scorefile | cut -c1-242)
				continue
			fi 
			#-----------------------------------------------------------------------------------------------------------------------------------------
			if [ "$lang1" = "$lang2" ]; then
				error_msg="You did not respect the Moses for Mere Mortals conventions for naming the source and or the reference files.\n\tSource file\t\t\t: <scorebasename>.<source language abbreviation> (ex: 100.pt)\n\tReference translation file\t: <scorebasename>.<target language abbreviation> (ex: 100.en.ref)\nPlease correct the name of the files and then run this script again."
				echo -e "$error_msg Analysing now next Moses translation."
				log_wrong_file
				continue
			fi 
			#-----------------------------------------------------------------------------------------------------------------------------------------
			#Get number of segments for each input file (source, reference and Moses translation)
			#avoid wc error messages when the file does not exist
			exec 3> /dev/stderr 2> /dev/null
			lines_s=`wc -l "$sourcelanguagedir/$s" | awk '{print $1'}` 
			if [ "$lines_s" ]; then 
				echo "Source file      : $lines_s lines"
			else
				echo "Source file      : doesn't exist"
			fi
			lines=`wc -l "$mosestranslationdir/$m" | awk '{print $1'}`
			if [ "$lines" ]; then 
				echo "Moses translation: $lines lines"
			else
				echo "Moses translation: doesn't exist"
			fi
			lines_r=`wc -l "$reftranslationdir/$r" | awk '{print $1'}`
			if [ "$lines_r" ]; then 
				echo "Reference file   : $lines_r lines"
			else
				echo "Reference file   : doesn't exist"
			fi
			exec 2>&3

			#Check that source, reference and Moses translation files have the same number of segments
			if [ "$lines_s" != "$lines_r" ]; then
				if [ "$lines_s" = "" ]; then
					lines_s=0
				fi
				if [ "$lines_r" = "" ]; then
					lines_r=0
				fi
				error_msg="Source and reference files do not have the same number of lines (source = $lines_s and reference = $lines_r lines) or one or both of them might not exist. If you verify manually that they do have the same number of segments, then wc (a Linux command) is interpreting at least one of the characters of one of the files as something it isn't. If that is the case, you will have to isolate the line(s) that is (are) causing problems and to substitute the character in question by some other character."
				echo "$error_msg Analysing now next Moses translation."
				log_wrong_file
				remove_garbage 
				continue
			fi
			if [ "$lines" != "$lines_r" ]; then
				if [ "$lines" = "" ]; then
					lines=0
				fi
				if [ "$lines_r" = "" ]; then
					lines_r=0
				fi
				error_msg="Reference and moses translation files do not have the same number of lines (reference = $lines_r lines and moses translation = $lines) or one  or both of them might not exist. If you verify manually that they do have the same number of segments, then wc (a Linux command) is interpreting at least one of the characters of one of the files as something it isn't. If that is the case, you will have to isolate the line(s) that is (are) causing problems and to substitute the character in question by some other character."
				echo "$error_msg Analysing now next Moses translation."
				log_wrong_file
				remove_garbage 
				continue
			fi
			#-----------------------------------------------------------------------------------------------------------------------------------------
			#Check that $s, $r and $m exist
			if [ ! -f $sourcelanguagedir/$s ] ; then 
				error_msg="The expected source language file ($sourcelanguagedir/$s) needed for scoring the Moses translation ($mosestranslationdir/$m) does not exist. Did you respect the file naming conventions described at the top of the score-0.85 script or did you use the wrong language pair for translating?"
				echo "$error_msg Analysing now next Moses translation."
				log_wrong_file
				continue
			else
				cp $sourcelanguagedir/$s $scoredir
				if [ "$tokenize" = "1" -a "$lowercase" = "1" ]; then
					$toolsdir/scripts/tokenizer.perl -l $lang1 < $scoredir/$s > $scoredir/$s.tok
					$toolsdir/scripts/lowercase.perl < $scoredir/$s.tok > $scoredir/$s
					rm -f $scoredir/$s.tok
				elif [ "$tokenize" = "1" ]; then
					$toolsdir/scripts/tokenizer.perl -l $lang1 < $scoredir/$s > $scoredir/$s.tok
					mv -f $scoredir/$s.tok $scoredir/$s
				elif [ "$lowercase" = "1" ]; then
					$toolsdir/scripts/lowercase.perl < $scoredir/$s > $scoredir/$s.lower
					mv -f $scoredir/$s.lower $scoredir/$s
				fi
				sed 's/\\$/\\ /g' < $scoredir/$s > $scoredir/$s.clean
				mv -f $scoredir/$s.clean $scoredir/$s
			fi
			if [ ! -f $reftranslationdir/$r ] ; then 
				error_msg="The expected reference (human-made) file ($reftranslationdir/$r) needed for scoring the Moses translation ($mosestranslationdir/$m) does not exist."
				echo "$error_msg Analysing now next Moses translation. Did you respect the file naming conventions described at the top of the score-0.21 script or did you use the wrong language pair for translating?"
				log_wrong_file
				continue
			else
				cp $reftranslationdir/$r $scoredir
				if [ "$tokenize" = "1" -a "$lowercase" = "1" ]; then
					$toolsdir/scripts/tokenizer.perl -l $lang2 < $scoredir/$r > $scoredir/$r.tok
					$toolsdir/scripts/lowercase.perl < $scoredir/$r.tok > $scoredir/$r
					rm -f $scoredir/$r.tok
				elif [ "$tokenize" = "1" ]; then
					$toolsdir/scripts/tokenizer.perl -l $lang2 < $scoredir/$r > $scoredir/$r.tok
					mv -f $scoredir/$r.tok $scoredir/$r
				elif [ "$lowercase" = "1" ]; then
					$toolsdir/scripts/lowercase.perl < $scoredir/$r > $scoredir/$r.lower
					mv -f $scoredir/$r.lower $scoredir/$r
				fi
				sed 's/\\$/\\ /g' < $scoredir/$r > $scoredir/$r.clean
				mv -f $scoredir/$r.clean $scoredir/$r
			fi
			if [ ! -f $mosestranslationdir/$m ] ; then 
				error_msg="The Moses translation file ($mosestranslationdir/$m) file does not exist. Did you respect the file naming conventions described at the top of the score-0.80 script?"
				echo "$error_msg Analysing now next Moses translation."
				log_wrong_file
				continue
			else
				cp $mosestranslationdir/$m $scoredir
				if [ "$tokenize" = "1" -a "$lowercase" = "1" ]; then
					$toolsdir/scripts/tokenizer.perl -l $lang2 < $scoredir/$m > $scoredir/$m.tok
					$toolsdir/scripts/lowercase.perl < $scoredir/$m.tok > $scoredir/$m
					rm -f $scoredir/$m.tok
				elif [ "$tokenize" = "1" ]; then
					$toolsdir/scripts/tokenizer.perl -l $lang2 < $scoredir/$m > $scoredir/$m.tok
					mv -f $scoredir/$m.tok $scoredir/$m
				elif [ "$lowercase" = "1" ]; then
					$toolsdir/scripts/lowercase.perl < $scoredir/$m > $scoredir/$m.lower
					mv -f $scoredir/$m.lower $scoredir/$m
				fi
				sed 's/\\$/\\ /g' < $scoredir/$m > $scoredir/$m.clean
				mv -f $scoredir/$m.clean $scoredir/$m
			fi

			echo "===================================================================================" > $scoredir/temp
			echo "*** Script version ***: score-0.85" >> $scoredir/temp
			echo "===================================================================================" >> $scoredir/temp
			echo "===================================================================================" >> $scoredir/temp
			echo "Extracted file names and other data  (extracted automatically; errors are possible):" >> $scoredir/temp
			echo "===================================================================================" >> $scoredir/temp
			echo "source language    : $lang1" >> $scoredir/temp
			echo "target language    : $lang2" >> $scoredir/temp
			echo "-----------------------------------------------------------------------------------" >> $scoredir/temp
			echo "source file        : $sourcelanguagedir/$s" >> $scoredir/temp
			echo "moses translation  : $mosestranslationdir/$m" >> $scoredir/temp
			echo "reference file     : $reftranslationdir/$r" >> $scoredir/temp
			echo "-----------------------------------------------------------------------------------" >> $scoredir/temp
			echo "batch_user_note    : $batch_user_note" >> $scoredir/temp
			echo "===================================================================================" >> $scoredir/temp
			echo "score_line_by_line : $score_line_by_line" >> $scoredir/temp
			if [ "$score_line_by_line" = "1" ]; then
				echo "tokenize           : $tokenize" >> $scoredir/temp
				echo "lowercase          : $lowercase" >> $scoredir/temp
				echo "remove_equal       : $remove_equal" >> $scoredir/temp
			fi
			echo "===================================================================================" >> $scoredir/temp
			#=========================================================================================================================================================
				#1. SCORE LINE BY LINE
			#=========================================================================================================================================================
			if [ "$score_line_by_line" = "1" ]; then
				if [ -f $scoredir/$scorefile ]; then
					rm -f $scoredir/$scorefile
				fi
				echo "************************** Score line by line"
				counter=0
				echo "BLEU|NIST|<segnum>|source seg|ref seg|Moses seg" >> $scoredir/temp
				echo "" >> $scoredir/temp

				sed -e 's#\& #\&amp\; #g' -e 's#<#\&lt\;#g' $scoredir/$s > $scoredir/$s.tmp
				mv $scoredir/$s.tmp $scoredir/$s
				sed -e 's#\& #\&amp\; #g' -e 's#<#\&lt\;#g' $scoredir/$r > $scoredir/$r.tmp
				mv $scoredir/$r.tmp $scoredir/$r
				sed -e 's#\& #\&amp\; #g' -e 's#<#\&lt\;#g' $scoredir/$m > $scoredir/$m.tmp
				mv $scoredir/$m.tmp $scoredir/$m
				echo "***** Score each segment:"
				while [ "$counter" -lt "$lines" ]; do
					let "counter += 1"
					echo "Segment $counter"
					source_sentence=`awk "NR==$counter{print;exit}" $scoredir/$s`
					ref_sentence=`awk "NR==$counter{print;exit}" $scoredir/$r`
					moses_sentence=`awk "NR==$counter{print;exit}" $scoredir/$m`
				#-----------------------------------------------------------------------------------------------------------------------------------------
					# ******** wrap source file
					if [ "$source_sentence" != "" ]; then
						echo '<srcset setid="'$scorebasename'" srclang="'$lang1'">' > $scoredir/$scorebasename-src.$lang1.sgm
						echo '<DOC docid="'$scorebasename'">' >> $scoredir/$scorebasename-src.$lang1.sgm
					   	echo "<seg id=$counter>"$source_sentence"</seg>" >> $scoredir/$scorebasename-src.$lang1.sgm
						echo "</DOC>" >> $scoredir/$scorebasename-src.$lang1.sgm
						echo "</srcset>" >> $scoredir/$scorebasename-src.$lang1.sgm
					fi
				#-----------------------------------------------------------------------------------------------------------------------------------------
					# ******** wrap reference (human-made) translation
					if [ "$ref_sentence" != "" ]; then
						echo '<refset setid="'$scorebasename'" srclang="'$lang1'" trglang="'$lang2'">' > $scoredir/$scorebasename-ref.$lang2.sgm
						echo '<DOC docid="'$scorebasename'" sysid="ref">' >> $scoredir/$scorebasename-ref.$lang2.sgm
					   	echo "<seg id=$counter>"$ref_sentence"</seg>" >> $scoredir/$scorebasename-ref.$lang2.sgm
						echo "</DOC>" >> $scoredir/$scorebasename-ref.$lang2.sgm
						echo "</refset>" >> $scoredir/$scorebasename-ref.$lang2.sgm
					fi
				#-----------------------------------------------------------------------------------------------------------------------------------------
					# ******** wrap Moses translation
					if [ "$moses_sentence" != "" ]; then
						echo '<tstset setid="'$scorebasename'" srclang="'$lang1'" trglang="'$lang2'">' > $scoredir/$scorebasename.moses.sgm
						echo '<DOC docid="'$scorebasename'" sysid="moses">' >> $scoredir/$scorebasename.moses.sgm
					   	echo "<seg id=$counter>"$moses_sentence"</seg>" >> $scoredir/$scorebasename.moses.sgm
						echo "</DOC>" >> $scoredir/$scorebasename.moses.sgm
						echo "</tstset>" >> $scoredir/$scorebasename.moses.sgm
					fi
				#-----------------------------------------------------------------------------------------------------------------------------------------
					sed -e 's/\x1E/\-/g' $scoredir/$scorebasename-src.$lang1.sgm > $scoredir/temp2
					mv $scoredir/temp2 $scoredir/$scorebasename-src.$lang1.sgm
					sed -e 's/\x1E/\-/g' $scoredir/$scorebasename-ref.$lang2.sgm > $scoredir/temp2
					mv $scoredir/temp2 $scoredir/$scorebasename-ref.$lang2.sgm
					sed -e 's/\x1E/\-/g' $scoredir/$scorebasename.moses.sgm > $scoredir/temp2
					mv $scoredir/temp2 $scoredir/$scorebasename.moses.sgm

					# ******** get segment score"
					#in our experience, the mteval-v13a and the mteval-v12 (more recent scorers) stopped with errors (and no score) with strings like " & " and U+001E
					score=`$toolsdir/mteval-v11b.pl -s $scoredir/$scorebasename-src.$lang1.sgm -r $scoredir/$scorebasename-ref.$lang2.sgm -t $scoredir/$scorebasename.moses.sgm -c`
					scoretemp=${score%% for system *}
					scoretemp1=${scoretemp#*NIST score = }
					NIST=${scoretemp1%% *}
					BLEUtemp=${scoretemp1#*BLEU score = }
					BLEU=${BLEUtemp%% *}
					set -f
					BLEUcorr=$(echo "scale=0; $BLEU*10000" | bc)
					set +f
					if [ "$remove_equal" = "1" ]; then
						if [ "$ref_sentence" != "$moses_sentence" ]; then
							echo "$BLEU|$NIST|<$counter>|<seg>$source_sentence</seg>|<seg>$ref_sentence</seg>|<seg>$moses_sentence</seg>" >> $scoredir/$scorefile
						elif [ "$BLEUcorr" = "0" ]; then
							: #do nothing
						else
							echo "$BLEU|$NIST|<$counter>|<seg>$source_sentence</seg>|<seg>$ref_sentence</seg>|<seg>$moses_sentence</seg>" >> $scoredir/$scorefile
						fi
					else
						echo "$BLEU|$NIST|<$counter>|<seg>$source_sentence</seg>|<seg>$ref_sentence</seg>|<seg>$moses_sentence</seg>" >> $scoredir/$scorefile
					fi
				done
				#-----------------------------------------------------------------------------------------------------------------------------------------
				#Sort the output file by score
				sort -g $scoredir/$scorefile -o $scoredir/$scorefile
				echo "===========================================================================" >> $scoredir/temp
				cat $scoredir/$scorefile >> $scoredir/temp
				mv $scoredir/temp $scoredir/$scorefile
				remove_garbage 
			else
			#=========================================================================================================================================================
				#2. SCORE WHOLE DOCUMENT
			#=========================================================================================================================================================
				if [ -f $scoredir/$scorefile ]; then
					rm -f $scoredir/$scorefile
				fi
				echo "************************** Score whole document"
				sed -e 's#\& #\&amp\; #g' -e 's#<#\&lt\;#g' $scoredir/$s > $scoredir/$s.tmp
				mv $scoredir/$s.tmp $scoredir/$s
				sed -e 's#\& #\&amp\; #g' -e 's#<#\&lt\;#g' $scoredir/$r > $scoredir/$r.tmp
				mv $scoredir/$r.tmp $scoredir/$r
				sed -e 's#\& #\&amp\; #g' -e 's#<#\&lt\;#g' $scoredir/$m > $scoredir/$m.tmp
				mv $scoredir/$m.tmp $scoredir/$m
				echo "***************** wrap test result in SGM"
				echo "******** wrap source file"
				exec<$scoredir/$s
				echo '<srcset setid="'$scorebasename'" srclang="'$lang1'">' > $scoredir/$scorebasename-src.$lang1.sgm
				echo '<DOC docid="'$scorebasename'">' >> $scoredir/$scorebasename-src.$lang1.sgm
				numseg=0
				while read line
				   do
					numseg=$(($numseg+1))
				   	echo "<seg id=$numseg>"$line"</seg>" >> $scoredir/$scorebasename-src.$lang1.sgm
				   done
				echo "</DOC>" >> $scoredir/$scorebasename-src.$lang1.sgm
				echo "</srcset>" >> $scoredir/$scorebasename-src.$lang1.sgm
				#-----------------------------------------------------------------------------------------------------------------------------------------
				echo "******** wrap reference (human-made) translation"
				exec<$scoredir/$r
				echo '<refset setid="'$scorebasename'" srclang="'$lang1'" trglang="'$lang2'">' > $scoredir/$scorebasename-ref.$lang2.sgm
				echo '<DOC docid="'$scorebasename'" sysid="ref">' >> $scoredir/$scorebasename-ref.$lang2.sgm
				numseg=0
				while read line
				   do
					numseg=$(($numseg+1))
				   	echo "<seg id=$numseg>"$line"</seg>" >> $scoredir/$scorebasename-ref.$lang2.sgm
				   done
				echo "</DOC>" >> $scoredir/$scorebasename-ref.$lang2.sgm
				echo "</refset>" >> $scoredir/$scorebasename-ref.$lang2.sgm
				#-----------------------------------------------------------------------------------------------------------------------------------------
				echo "******** wrap Moses translation"
				exec<$scoredir/$m
				echo '<tstset setid="'$scorebasename'" srclang="'$lang1'" trglang="'$lang2'">' > $scoredir/$scorebasename.moses.sgm
				echo '<DOC docid="'$scorebasename'" sysid="moses">' >> $scoredir/$scorebasename.moses.sgm
				numseg=0
				while read line
				   do
					numseg=$(($numseg+1))
				   	echo "<seg id=$numseg>"$line"</seg>" >> $scoredir/$scorebasename.moses.sgm
				   done
				echo "</DOC>" >> $scoredir/$scorebasename.moses.sgm
				echo "</tstset>" >> $scoredir/$scorebasename.moses.sgm

				sed -e 's/\x1E/\-/g' $scoredir/$scorebasename-src.$lang1.sgm > $scoredir/temp2
				mv $scoredir/temp2 $scoredir/$scorebasename-src.$lang1.sgm
				sed -e 's/\x1E/\-/g' $scoredir/$scorebasename-ref.$lang2.sgm > $scoredir/temp2
				mv $scoredir/temp2 $scoredir/$scorebasename-ref.$lang2.sgm
				sed -e 's/\x1E/\-/g' $scoredir/$scorebasename.moses.sgm > $scoredir/temp2
				mv $scoredir/temp2 $scoredir/$scorebasename.moses.sgm

				if [ ! -f $scoredir/$scorebasename-src.$lang1.sgm -o ! -f $scoredir/$scorebasename-ref.$lang2.sgm -o ! -f $scoredir/$scorebasename.moses.sgm ]; then
					echo "There was a problem creating the files used by the scorer. Exiting..."
					IFS=$SAVEIFS
					exit 0
				else
					#-----------------------------------------------------------------------------------------------------------------------------------------
					echo "***************** scoring"
					startscoringdate=`date +day:%d/%m/%y-time:%H:%M:%S`
					#in our experience, the mteval-v13a and the mteval-v12 (more recent scorers) stopped with errors (and no score) with strings like " & " and U+001E
					score=`$toolsdir/mteval-v11b.pl -s $scoredir/$scorebasename-src.$lang1.sgm -r $scoredir/$scorebasename-ref.$lang2.sgm -t $scoredir/$scorebasename.moses.sgm -c`
					scoretemp=${score%% for system *}
					scoretemp1=${scoretemp#*NIST score = }
					NIST=${scoretemp1%% *}
					BLEUtemp=${scoretemp1#*BLEU score = }
					BLEU=${BLEUtemp%% *}
					echo $score
					scoretemp2=${score#*NIST score =}
					echo "NIST score = $scoretemp2" > $scoredir/$scorefile
					newscorefile=$scorebasename-BLEU-$BLEU-NIST-$NIST-$batch_user_note-$lang1-$lang2.F-$scoreTMXdocuments-R-$remove_equal-T-$tokenize.L-$lowercase.whole-doc
					echo "===================================================================================" >> $scoredir/$scorefile
					mv -f $scoredir/$scorefile $scoredir/$newscorefile
					#-----------------------------------------------------------------------------------------------------------------------------------------
				fi
				cat $scoredir/$newscorefile >> $scoredir/temp
				mv $scoredir/temp $scoredir/$newscorefile
				remove_garbage 
			fi
		else
			filename=${filetoscore##*/}
			if [ "$filename" != "*" ]; then
				let i=$i+1
				echo "--------------------------------------------------------------------"
				echo -e "$filename file (in the $mosestranslationdir directory):\n\tName of moses translation file is illegal (doesn't end in '.moses' or includes spaces)."
				error_msg="Name of moses translation file is illegal (doesn't end in '.moses' or includes spaces)."
				log_wrong_file
				continue
			fi
		fi
	fi
done
IFS=$SAVEIFS

echo "--------------------------------------------------------------------"
echo -e "Score finished.\n$i files treated.\nResults directory:\n\t$scoredir"
#=================================================================================================================================================
# Changes in version 0.85
#=================================================================================================================================================
# Allows batch processing of the whole $mosesdir/$translation_output directory
# Extracts automatically the source language and target language, the names of the source file, moses translation file and reference translation file and the batch_user_note
# Checks for more file naming errors and informs about them
# More informative report, even in case of error
# Creation of a new file that lists the translations that could not be scored and the reason why
# Corrects a bug that made it fail when the scorer files included the word "for" in their name
# Maintains SGM scorer because newer scorers have caused us more problems with characters that crashed them (ex: " & " and U+001E)
#=================================================================================================================================================