Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-09-04 | The dot before an acronym should be optional.alvations-patch-regexes | alvations | |
2019-08-22 | Merge pull request #211 from achimr/master | Hieu Hoang | |
Support for Urdu in sentence splitter | |||
2019-07-10 | Support for Urdu in sentence splitter | Achim Ruopp | |
2019-06-08 | tweak readme | Hieu Hoang | |
2019-04-27 | Merge pull request #210 from mjpost/patch-1 | Hieu Hoang | |
escape angle brackets | |||
2019-04-26 | escape angle brackets | Matt Post | |
The script doesn't escape angle brackets which can result in bad SGML / XML output. This fixes that, although ideally, this should be implemented with a proper parser and dumper. | |||
2019-03-02 | Merge pull request #209 from joelb-git/multi-bleu-detok-non-ascii-fix | Hieu Hoang | |
Fix non-ASCII lowercasing | |||
2019-02-27 | Fix non-ASCII lowercasing | Joel Barry | |
2019-01-17 | check state object are not null before using it. For alternate weights ↵ | Hieu Hoang | |
setting where some feature functions are not used for a particular sentence | |||
2019-01-04 | Revert "use ucfirst instead of defined uppercase function" | Hieu Hoang | |
This reverts commit dfbb17e549d4cb4ece452c7224ae47a590b7a4da. | |||
2019-01-03 | Merge pull request #207 from alvations/patch-truecaser | Hieu Hoang | |
Reverting split_xml() | |||
2019-01-03 | Reverting split_xml() | alvations | |
2018-12-30 | consistent output | Hieu Hoang | |
2018-12-29 | Merge pull request #206 from alvations/patch-truecaser | Hieu Hoang | |
Patching truecaser | |||
2018-12-28 | rename file so it appears on github website. Clarify mailing list | Hieu Hoang | |
2018-12-20 | use ucfirst instead of defined uppercase function | alvations | |
2018-12-20 | split_xml should be consistent for training and using | alvations | |
2018-12-10 | increase cores to 16. For bitextor azure pipeline | Hieu Hoang | |
2018-12-10 | put fix into UnorderedComparer again. Maybe weird template bug | Hieu Hoang | |
2018-12-10 | fix weird unordered set error on ubuntu 18.04, gcc 7.3.0, boost 1.65. May be ↵ | Hieu Hoang | |
over-optimizing or bug in gcc or boost | |||
2018-12-10 | debug | Hieu Hoang | |
2018-12-08 | ems config for moses2 | Hieu Hoang | |
2018-12-04 | sacre bleu | Hieu Hoang | |
2018-12-04 | sacre bleu | Hieu Hoang | |
2018-12-04 | use --discount_fallback | Hieu Hoang | |
2018-11-12 | Merge branch 'master' of github.com:moses-smt/mosesdecoder | Hieu Hoang | |
2018-11-12 | removing python port. Sacremoses is newer | Hieu Hoang | |
2018-11-11 | Merge pull request #205 from coylz/master | Hieu Hoang | |
Add option "-b" (unbuffer output) to tokenizer scripts | |||
2018-11-10 | Add option "-b" (unbuffer output) to tokenizer scripts | Loïc Vial | |
2018-11-09 | rename directory to work with python import | Hieu Hoang | |
2018-11-09 | python wrapper works | Hieu Hoang | |
2018-11-07 | start borging Luis Gomes code | Hieu Hoang | |
2018-11-07 | Merge pull request #204 from ozancaglayan/nb-fix | Hieu Hoang | |
tokenizer.perl: split final dots unconditionally | |||
2018-11-07 | tokenizer.perl: split final dots unconditionally | Ozan Caglayan | |
Allow tokenization of non-breaking prefixes at end of sentences. This should be a fair compromise in many cases to construct a cleaner vocabulary. EN-old: So am I. EN-new: So am I . DE-old: ... schwer wie ein iPhone 5. DE-new: ... schwer wie ein iPhone 5 . FR-old: Des gens admirent une œuvre d' art. FR-new: Des gens admirent une œuvre d' art . CS-old: Dvě děti, které běží bez bot. CS-new: Dvě děti, které běží bez bot . | |||
2018-10-30 | basic support for Gujarati and Hindi, backported from one of the many upstreams | Barry Haddow | |
2018-10-26 | Merge branch 'master' of github.com:moses-smt/mosesdecoder | Hieu Hoang | |
2018-10-26 | bump again | Hieu Hoang | |
2018-10-26 | Merge pull request #203 from maxthomas/contrib-modular-boost | Hieu Hoang | |
contrib: make boost variable modular; update version to 1.68.0 | |||
2018-10-26 | bump | Hieu Hoang | |
2018-10-25 | contrib: make boost variable modular; update version to 1.68.0 | max thomas | |
2018-09-27 | Merge pull request #202 from thuvh/python3_compatible | Hieu Hoang | |
fix print to compatible with python2 and python3 | |||
2018-09-26 | fix print to compatible with python2 and python3 | Hoai-Thu Vuong | |
2018-09-26 | multi-bleu-detok should take raw reference | Rico Sennrich | |
2018-09-16 | grammar | Hieu Hoang | |
2018-09-10 | Merge branch 'master' of github.com:moses-smt/mosesdecoder | Hieu Hoang | |
2018-09-10 | unused script | Hieu Hoang | |
2018-09-06 | Handle glottal stops in Somalian | Barry Haddow | |
2018-07-05 | Merge pull request #201 from louismartin/bleu-fix-newline | Hieu Hoang | |
[BLEU] Fix multi-bleu.perl bug (no newline at end of file) | |||
2018-07-03 | Fix multi-bleu.perl bug when file does not end with newline | Louis MARTIN | |
When reading hypothesis and reference files, multi-bleu.perl uses the chop function to remove the trailing newline character. If one of these files happens to not end with a newline, then chop will remove the last character of the last line (instead of the newline). This causes the BLEU score to be slightly off from its theoretical value. Using the safest chomp function solves this problem, i.e. it only removes newlines when present. | |||
2018-06-25 | Merge branch 'RELEASE-4.0' of github.com:jowagner/mosesdecoder | Hieu Hoang | |