github.com/moses-smt/mosesdecoder.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2022-05-08	nonbreaking_prefix.tdt: add "Nu" for "Numeru"	Raphaël Merx
	E.g. "Dekretu-Lei Nu. 18/2022" -> "Decree Law No. 18/2022"
2022-01-21	Modify a comment on usage in the script	swk0627

2021-03-13	Add tokenisation support for the Tetun language	Raphael Merx

2020-08-03	Allow Arabic letters to begin a fa sentence	Kenneth Heafield

2020-07-31	adding rules for Catalan	Cristina España i Bonet
	special characters within words and contractions closer to French than to English
2020-06-30	escape ampersands	Barry Haddow

2020-06-02	Merge pull request #221 from HjalmarrSv/master	Hieu Hoang
	Added some for sv
2020-05-23	Update nonbreaking_prefix.sv	HjalmarrSv
	Added Å Ä Ö, which are not unusual initials in names, e.g. Åke, Ärling, Östen. Added some new, but mostly variations on the existing ones. Both a dot after each letter (or pair) and a dot only after last letter are accepted forms. A couple of decades ago, there had to be a space after the dot, which explains the third form. The file for sv is much more useful with these few additions. Although, It is still far from complete. Removed: G (occured twice). In this list there is one item that is also a word, even when case is kept: tom. If all words are in small case, then tex, mao, tom (again), may be confused with names, and iaf, etc with named entities.
2020-03-19	sentence splitter -k option to keep line boundaries	Kenneth Heafield

2020-03-19	Add Pashto ؟ as a sentence splitting character	Kenneth Heafield

2020-02-26	flag to turn off sentence splitter from emitting <P>	William Waites

2020-02-20	Revert "line buffering for tokeniser and truecaser"	Kenneth Heafield
	This reverts commit 691717c42569fc94b9454d5ac862041684465654.
2020-02-17	line buffering for tokeniser and truecaser	William Waites

2020-01-06	Proper spacingalvations-patch-2	alvations

2019-12-17	Modernized	HjalmarrSv
	I wanted to properly parse links on https://dumps.wikimedia.org/mirrors.html when page copied as text My proposed changes does the job. Basically I had to change by replacing the + at end of line 5 with *(\/)? The pipe symbol could lead to crashes why I broke up line 5 to three lines. I suggest not using the pipe (\|) after reading various posts.
2019-12-16	attempt to handle Korean better; only consider horizontal space in final split	Barry Haddow

2019-12-09	split word on any type of space	Barry Haddow

2019-11-25	Single quotes should be escaped as single quotes.alvations-patch-normalization	alvations

2019-11-08	2 letter codes	Barry Haddow

2019-11-08	support for several Indic languages	Barry Haddow

2019-11-05	initial hi non-breaking prefixes	Barry Haddow

2019-11-05	list items	Barry Haddow

2019-11-05	rupees	Barry Haddow

2019-11-05	fix abbrev rule	Barry Haddow

2019-11-01	devanagari fix	Barry Haddow

2019-10-31	reorganise indic support	Barry Haddow

2019-10-31	use block notation for indic scripts	Barry Haddow

2019-10-31	fix nbp	Barry Haddow

2019-10-28	full cjk test	Barry Haddow

2019-10-28	Merge branch 'master' of github.com:moses-smt/mosesdecoder	Barry Haddow

2019-10-14	Update replace-unicode-punctuation.perl	Kevin Canwen Xu

2019-10-01	Undoing 05788925812f0d3265e355565cbb1701a0ad7510	alvations
	Causes abbreviations to not split when ending with a fullstop. E.g. > The restructuring of IBM was essential to enable it organisationally to take up the responsibilities entrusted in the role with the recent changes in the policy and legislations, revised charter of function of IBM and the new activities and initiatives undertaken by IBM. IBM is also engaged in handholding the States for auction of mineral blocks for greater transparency in allocation of mineral concessions.
2019-09-30	debug	Barry Haddow

2019-09-30	revert 05788925	Barry Haddow

2019-09-30	enable custom non breaking prefixes	Barry Haddow

2019-09-30	Merge branch 'master' of github.com:moses-smt/mosesdecoder	Barry Haddow

2019-09-30	do not add spaces in cjk	Barry Haddow

2019-09-23	Enable use strict pragma	titsuki

2019-09-04	The dot before an acronym should be optional.alvations-patch-regexes	alvations

2019-07-10	Support for Urdu in sentence splitter	Achim Ruopp

2019-04-26	escape angle brackets	Matt Post
	The script doesn't escape angle brackets which can result in bad SGML / XML output. This fixes that, although ideally, this should be implemented with a proper parser and dumper.
2019-02-27	Fix non-ASCII lowercasing	Joel Barry

2019-01-04	Revert "use ucfirst instead of defined uppercase function"	Hieu Hoang
	This reverts commit dfbb17e549d4cb4ece452c7224ae47a590b7a4da.
2019-01-03	Merge pull request #207 from alvations/patch-truecaser	Hieu Hoang
	Reverting split_xml()
2019-01-03	Reverting split_xml()	alvations

2018-12-30	consistent output	Hieu Hoang

2018-12-20	use ucfirst instead of defined uppercase function	alvations

2018-12-20	split_xml should be consistent for training and using	alvations

2018-12-10	increase cores to 16. For bitextor azure pipeline	Hieu Hoang

2018-12-08	ems config for moses2	Hieu Hoang