Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/marian-regression-tests.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--.gitignore9
-rw-r--r--Makefile1
-rw-r--r--models/download-lm.sh10
-rw-r--r--models/lmgec/config.yml6
-rw-r--r--models/lmgec/preprocess.sh11
-rw-r--r--tests/scorer/lm/lm_scores.expected100
-rw-r--r--tests/scorer/lm/setup.sh1
-rw-r--r--tests/scorer/lm/test_lm_scores.sh18
-rw-r--r--tests/scorer/lm/text.prep.en100
9 files changed, 252 insertions, 4 deletions
diff --git a/.gitignore b/.gitignore
index d593105..ade9b32 100644
--- a/.gitignore
+++ b/.gitignore
@@ -24,10 +24,11 @@ models/wmt17_systems/scripts
models/wmt17_systems/vars
models/char-s2s
models/wnmt18
-models/transformer/model.*
-models/transformer/tc.*
-models/transformer/vocab.*
-models/transformer/*.bpe
+models/*/model.npz*yml
+models/*/*.npz
+models/*/*.bpe
+models/*/tc.*
+models/*/vocab.*
data/*/corpus.*
data/*/*.bpe
diff --git a/Makefile b/Makefile
index 0553745..00e378b 100644
--- a/Makefile
+++ b/Makefile
@@ -29,6 +29,7 @@ models:
cd $@ && bash ./download-char-s2s.sh
cd $@ && bash ./download-wnmt18.sh
cd $@ && bash ./download-transformer.sh
+ cd $@ && bash ./download-lm.sh
data:
mkdir -p $@
diff --git a/models/download-lm.sh b/models/download-lm.sh
new file mode 100644
index 0000000..ac7a429
--- /dev/null
+++ b/models/download-lm.sh
@@ -0,0 +1,10 @@
+#!/bin/bash -x
+
+test -e lmgec/lm.npz && exit
+
+mkdir -p lmgec
+cd lmgec
+wget -nc http://data.statmt.org/romang/gec-naacl18/models.tgz
+tar zxvf models.tgz lm.npz tc.model gec.bpe vocab.yml
+rm models.tgz
+cd ..
diff --git a/models/lmgec/config.yml b/models/lmgec/config.yml
new file mode 100644
index 0000000..a5c5146
--- /dev/null
+++ b/models/lmgec/config.yml
@@ -0,0 +1,6 @@
+relative-paths: true
+model: lm.npz
+vocabs:
+ - vocab.yml
+mini-batch: 1
+maxi-batch: 1
diff --git a/models/lmgec/preprocess.sh b/models/lmgec/preprocess.sh
new file mode 100644
index 0000000..8d2de7c
--- /dev/null
+++ b/models/lmgec/preprocess.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+
+ROOTDIR=$(realpath ../..)
+
+cat \
+ | perl $ROOTDIR/tools/moses-scripts/scripts/recaser/detruecase.perl \
+ | perl $ROOTDIR/tools/moses-scripts/scripts/tokenizer/detokenizer.perl -l en \
+ | python ./nltk_tok.py \
+ | perl $ROOTDIR/tools/moses-scripts/scripts/tokenizer/escape-special-chars.perl \
+ | perl $ROOTDIR/tools/moses-scripts/scripts/recaser/truecase.perl --model tc.model \
+ | perl $ROOTDIR/tools/subword-nmt/subword_nmt/apply_bpe.py -c gec.bpe
diff --git a/tests/scorer/lm/lm_scores.expected b/tests/scorer/lm/lm_scores.expected
new file mode 100644
index 0000000..fcd66d9
--- /dev/null
+++ b/tests/scorer/lm/lm_scores.expected
@@ -0,0 +1,100 @@
+-102.146797
+-56.570122
+-179.359634
+-126.033112
+-151.495895
+-179.698151
+-194.555847
+-218.278519
+-183.315292
+-290.833252
+-102.915894
+-170.672165
+-52.901627
+-166.206558
+-49.160263
+-139.223511
+-152.224136
+-30.153316
+-53.275631
+-190.775604
+-58.611706
+-68.133881
+-24.469469
+-35.131294
+-168.187225
+-49.200638
+-136.020294
+-327.748901
+-177.979187
+-48.333916
+-192.816528
+-82.605606
+-135.975204
+-54.720943
+-164.225433
+-191.112335
+-124.949036
+-204.887207
+-157.517319
+-93.470726
+-192.979294
+-95.325439
+-92.605972
+-141.128265
+-50.027866
+-52.459736
+-139.888809
+-112.474449
+-107.640236
+-110.293877
+-132.735626
+-68.751846
+-64.823151
+-126.765007
+-32.195976
+-47.674992
+-64.521729
+-166.688812
+-75.829742
+-47.022652
+-83.426292
+-154.526764
+-97.985588
+-95.690933
+-170.144775
+-174.160675
+-179.407593
+-91.198380
+-198.941437
+-202.614502
+-100.660248
+-253.774704
+-50.770256
+-195.531281
+-64.387291
+-77.049728
+-86.907028
+-63.171913
+-73.030006
+-94.385803
+-104.468475
+-90.391045
+-97.847717
+-147.599380
+-130.965668
+-31.314968
+-73.762161
+-202.152832
+-54.794056
+-60.364326
+-114.973816
+-107.025894
+-116.153397
+-37.122612
+-67.105400
+-144.332703
+-125.365646
+-64.766396
+-70.662804
+-115.290909
diff --git a/tests/scorer/lm/setup.sh b/tests/scorer/lm/setup.sh
new file mode 100644
index 0000000..37dc69f
--- /dev/null
+++ b/tests/scorer/lm/setup.sh
@@ -0,0 +1 @@
+test -f $MRT_MODELS/lmgec/lm.npz || exit 1
diff --git a/tests/scorer/lm/test_lm_scores.sh b/tests/scorer/lm/test_lm_scores.sh
new file mode 100644
index 0000000..5a55419
--- /dev/null
+++ b/tests/scorer/lm/test_lm_scores.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+
+#####################################################################
+# SUMMARY: Test scoring sentences with a pretrained language model
+# AUTHOR: snukky
+#####################################################################
+
+# Exit on error
+set -e
+
+# Run scorer
+$MRT_MARIAN/marian-scorer -c $MRT_MODELS/lmgec/config.yml -t $(pwd)/text.prep.en > lm_scores.out
+
+# Compare scores
+$MRT_TOOLS/diff-nums.py lm_scores.out lm_scores.expected -p 0.0003 -o lm_scores.diff
+
+# Exit with success code
+exit 0
diff --git a/tests/scorer/lm/text.prep.en b/tests/scorer/lm/text.prep.en
new file mode 100644
index 0000000..2ea61e5
--- /dev/null
+++ b/tests/scorer/lm/text.prep.en
@@ -0,0 +1,100 @@
+hop@@ s are not a typical enough example for us to understand the importance of the common agricultural policy for farmers in the European Union .
+they do , however , show the extent to which it actually helps our farmers .
+as Mr May@@ er said , hop@@ s are a traditional product which is particularly important to the quality of beer produced , although production is very limited ; some 4 000 hec@@ tar@@ es of land throughout the whole of Europe .
+yet a sufficient number of farming families in the countries where hop@@ s are produced , particularly in Ba@@ vari@@ a , make their living from that product alone .
+these families should not be left to the mercy of continual price falls , neither should they be forced to desert specific rural areas because of difficulties arising from ir@@ regul@@ arities within the market .
+there have been a number of changes to the main regulation relating to this particular product as a result of the market fluctu@@ ations and the changing needs of farmers , the most recent being the Council decision to set a uniform level of aid to producers for a period of five years .
+this latest decision al@@ ters the obligations of the Commission arising from the previous regime , that is of having to grant annual aid , and Members States no longer need to grant aid for setting up production teams .
+this development means that certain articles of the old regulation need to be re@@ vo@@ ked which , rightly so , is carried out in the new regulation for which we will be voting , together with Mr May@@ er 's amend@@ ments , noting that the proposed regulation will not in any way affect the budget .
+Mr President , I should like to congratulate the rapp@@ or@@ te@@ ur on the report . I can inform you that the Group of the European Liberal , Demo@@ crat and Reform Party will support the report when it is put to the vote .
+Mr President , ladies and gentlemen , I first of all want to thank the rapp@@ or@@ te@@ ur , Mr Xa@@ ver May@@ er , for a valuable report - and perhaps especially for his enthusiastic presentation of the hop paradise of Ba@@ vari@@ a - together with the Committee on Agriculture and R@@ ural Development for its constructive attitude .
+I am very pleased that our proposal for changing the way in which the market for hop@@ s is organised has met with a positive reception .
+the Commission 's proposal is , of course , aimed at removing those sti@@ pu@@ lations which are no longer valid , either because deadlines have run out or because of previous changes to the common regime under which hop@@ s are organised .
+these changes must be implemented before the basic regulation is consoli@@ dated .
+owing to the fact that the Council has resolved that the level of support is to remain constant for a period of five years , the Commission does not consider that it is necessary to submit a report every year on the situation concerning the production and marketing of hop@@ s .
+the Commission therefore considers that Article 11 can be removed .
+according to Article 18 of the proposal , we shall , however , be presenting , by 1 September 2000 , a thorough assessment of the situation regarding the production and marketing of hop@@ s .
+I am therefore afraid that the European Parliament 's two amend@@ ments complicate the text unnecessarily and that the requirement to receive information each year is already covered by the new proposal .
+this information will also be made available on the Internet .
+that is why the Commission can not adopt these amend@@ ments in this situation .
+Mr President , firstly , I would be pleased to invite the Com@@ mission@@ er to K@@ los@@ ter An@@ de@@ chs in Ba@@ vari@@ a , a place where seven different types of beer are brewed ...
+secondly , I would like to make it known that next ...
+Mr Pos@@ sel@@ t , this is not a proce@@ du@@ ral motion .
+the debate is closed .
+we shall now proceed to the vote .
+Mr President , before leaving for Stra@@ s@@ bourg , the pen@@ sion@@ ers who took me to the airport asked me " Is there going to be a debate about beer on Friday morning ? "
+I replied " Yes , certainly . "
+" well , you have to give an explanation of vote and say that we pen@@ sion@@ ers are in favour of the production and development of beer . "
+we are in favour not just because ten years ago , the Pen@@ sion@@ ers ' Party put forward as candidate for Rome 's mayor the model Sol@@ vei@@ g tu@@ bing , who was born in Berlin and was a great con@@ no@@ isse@@ ur and lover of beer , but also because my own personal studies on beer show that drinking it makes you younger .
+I know that welfare institutions and governments are against developing beer , because this means that they have to pay out pensions for longer , but as representative of the Pen@@ sion@@ ers ' Party , I am in favour .
+extension of exceptional financial assistance to Ta@@ ji@@ kistan
+Mr President , Com@@ mission@@ er , Ta@@ ji@@ kistan is not only the poorest of all the countries formed from the Soviet Union , it has also been the one to suffer the most on account of the turmoil caused by tribal fe@@ uding , which ultimately escal@@ ated into civil war .
+the country failed to grasp how to employ the financial aid provided so far in a targeted manner .
+the situation has only calmed down to some extent over the last few months , once the warring parties had ceased hos@@ ti@@ lities and resolved that their next step would be to form a coalition government .
+general free elections are set for March 2000 .
+the international donor community , which includes Swiss organisations for the most part , is now prepared to carry on where it left off delivering financial aid , but with certain provi@@ sos .
+now that the situation has ab@@ ated and there are more favourable prospects for future progress overall , the Sa@@ vary report now attempts to provide renewed support for the macro@@ economic financial aid for this country in the form of loans .
+we hope that this will make it sufficiently clear to Ta@@ ji@@ kistan that it needs to improve its state machinery by emb@@ racing democratic development and undertaking the necessary reforms .
+however , the financial aid in the form of loans should only be granted if there is a real possibility of the European Union being able to properly monitor the situation , if the process of national reconci@@ liation continues and the elections , in particular the parliament@@ ary elections set for March , are free and democratic .
+as Mr Sa@@ vary rightly said , this is also what we aim to achieve with proposed amend@@ ments no@@ s 8 and 9 , to which we give our un@@ equi@@ vocal support .
+if Ta@@ ji@@ kistan 's credi@@ t@@ wor@@ thiness is to be restored , then the proposal in Budget 2000 is also to be welcomed .
+the rapp@@ or@@ te@@ ur , Mr Bour@@ lang@@ es , has just confirmed to me that as far as this is concerned , a commentary is to provide for a particular form of financial aid to be made available again under the T@@ AC@@ IS programme .
+on a final note , I would like to say that the P@@ PE group supports this report not@@ withstanding all the associated risks .
+it represents a renewed , hopefully successful attempt to resume and promote economic and technical cooperation with Ta@@ ji@@ kistan .
+Mr President , the loan which Ta@@ ji@@ kistan will receive equals this small and poor country 's share in an outstanding debt to the former Soviet Union .
+as such , this will not solve any problems within Ta@@ ji@@ kistan .
+the loan only prevents the outstanding debts from continuing to exist .
+central Asia , the majority of whose population is Tur@@ ki@@ sh-@@ speaking and a small part of which is Ir@@ ani@@ an-@@ speaking , was conquered in the previous century by the Russian ts@@ ari@@ st empire .
+this empire did not look for colonies far from home or overseas , like most Western European States , but close by .
+although they were decol@@ on@@ ised in 19@@ 22 , they have remained linked to Russia in the form of Federal States of the Soviet Union .
+the boundaries drawn by Stal@@ in between the various linguistic and cultural regions in the '@@ 20s and ' 30s are now state borders .
+this prolonged European influence means that we in the European Union should feel especially responsible for the vic@@ is@@ sit@@ udes of the five States which appeared after the collapse of the Soviet Union .
+the economy and environment are in a sad state of affairs in all fifteen States .
+authorit@@ arian regi@@ mes have come to power and leave little or no scope for political opponents .
+by means of refer@@ en@@ du@@ ms and intimid@@ ation , some presidents have their periods of office extended by ten years , without there being rival candidates .
+in this respect , Ta@@ ji@@ kistan is no exception .
+should European money be spent on a country like this ?
+in general , my group is not in favour of funding un@@ democratic regi@@ mes .
+all too often , we have noticed that they receive funding in the expectation that they will regard this money as a reward for taking small steps towards greater democracy and human rights and as an encouragement to take further such steps .
+in practice , however , this method does not work , as we have since found out in Turkey and Russia .
+the funding is received , but the situation does not improve .
+with the collapse of the Soviet Union , Ta@@ ji@@ kistan has rever@@ ted to the situation in the nineteenth and at the beginning of the twentieth century .
+there are several , regi@@ onally powerful families and groups which are fighting each other in a situation where war@@ lords seize upon political and religious differences as an excuse to justify armed action .
+the fate of Ta@@ ji@@ kistan largely depends upon what is happening in its immediate surroundings , such as the hopeless , violent conflict in Afghanistan .
+a large proportion of the Ta@@ ji@@ ki@@ stani population lives in north-east Afghanistan , the area which is not in the hands of the Taliban .
+the North of Ta@@ ji@@ kistan stretches out as far as the den@@ sely populated Fer@@ g@@ ana Valley which is partly located in Uzbekistan and is completely integrated into the economy and road network of this neighbouring country .
+as a front@@ line area fl@@ an@@ ked , on the one hand , by the Russian sphere of influence and , on the other hand , by Islamic fundament@@ alism in Afghanistan , the present Ta@@ ji@@ ki@@ stani State has little chance of survival .
+the only reason to inject European funding into Ta@@ ji@@ kistan despite all this is that funding increases the chance of survival of the Ta@@ ji@@ ki@@ stani population and offers more chance of peace than there would be without such aid .
+this is the reason why my group can nevertheless agree with the proposals made in the Sa@@ vary report .
+Mr President , for our part , we will not be voting for the Sa@@ vary report . this is both for reasons concerning the choice of this country and out of more general considerations involving financial aid .
+although , of course , we have nothing against the sovereign State of Ta@@ ji@@ kistan , we nevertheless do not think that European States should drop their priorities , or to be more precise , the priority that they set a long time ago on the subject of cooperation .
+this priority has now been in force for more than a quarter of a century through the L@@ om@@ é A@@ gree@@ ments .
+now , at the same time , so-called exceptional financial aid to the most diverse countries in the world is multi@@ plying , without any overall plan emerging , which means that our cooperation policy is nothing but a vague , huge scratching of the surface or , to sum it up , it is no longer a policy at all .
+to this particular consideration we can add a second .
+Ta@@ ji@@ kistan may have been spared the economic problems described in the report , moreover like so many other countries in the world , but it is nevertheless the victim of an ill-@@ considered opening up of its borders and of the huge game w@@ aged by emp@@ ires .
+Mr President , the political tide in Ta@@ ji@@ kistan seems to be turning .
+only last week , President R@@ ag@@ man@@ ov called for parliament@@ ary elections to be held next spring .
+after months of tu@@ g-@@ of-@@ war between the government and the opposition , agreement has finally been reached regarding the new elector@@ al law .
+I should point out , however , that these developments mark only the beginning of the democra@@ ti@@ sation process .
+Ta@@ ji@@ kistan still shows features which are incompatible with a democratic constitutional state .
+indeed , the downside of the present positive developments is that during the next elections , a number of parties will remain on the sidel@@ ines .
+they are excluded from participating . this is hardly surprising as permission to participate in elections is still in the hands of former communi@@ sts .
+this remark regarding Ta@@ ji@@ kistan 's democratic status does not de@@ tract from the fact that quite a few changes have already taken place .
+as such , international organisations and bilateral don@@ ors no longer see good reason for su@@ spending aid to Ta@@ ji@@ kistan .
+even the European Commission , with the proposal it is making , seems to think it should put its o@@ ar in . however , the Commission is losing sight of one important factor .
+earlier this year , the three institutions of the European Union concluded the inter@@ institu@@ tional agreement for a period of seven years , stipul@@ ating the financial ceil@@ ings for the various policy areas .
+I would like to remind the Commission of this .
+in the proposal to grant aid to Ta@@ ji@@ kistan , this agreement is not given much consideration .
+neither the urgent appeal by the IMF and World Bank to the European Union to increase aid to Ta@@ ji@@ kistan nor the argument of moral duty in the light of Ta@@ ji@@ kistan 's debts to the Union are in themselves good enough reasons to grant aid .
+we are first of all faced with the European Union 's financial limitations .
+the above agreement does not allow for making gifts to Ta@@ ji@@ kistan .
+moreover , we have recent experiences of entering into financial commitments which we can not honour , as illustrated in the reconstruction of Kos@@ ovo .
+the Commission has pledged a sum of EUR 500 million while the Member States do not want to make the necessary increase in the European budget at this stage .
+a vague declaration of intent has since been drafted by the Council to prevent similar problems from occurring in future , but it remains to be seen what will come of this .
+Kos@@ ovo is no better off at the moment .
+aid has been reduced to EUR 360 million and also spread over several years .
+this incident has given me grave concerns regarding the Member States ' willingness to make conce@@ ssions once again within the context of aid to Ta@@ ji@@ kistan , even if only relatively small amounts are involved .
+member States find it hard to sell the idea within their own countries if the outcome of the negotiations at the Berlin Summit are under@@ mined by reality .
+apart from a limited budget , the European Union has little political interest in Ta@@ ji@@ kistan .
+the geographical remot@@ eness makes it impossible to have any real influence on the democra@@ ti@@ sation process .
+although the European Union has an interest in being surrounded by large , stable regions , the tools it has available in order to achieve this are still very limited .