Welcome to mirror list, hosted at ThFree Co, Russian Federation.

normalise-romanian.py « scripts « training « examples - github.com/marian-nmt/marian.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
blob: 7d5e86ca739826464a6952c90257381838e7284e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Author: Barry Haddow
# Distributed under MIT license

#
# Normalise Romanian s-comma and t-comma

import io
import sys
istream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
ostream = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

for line in istream:
  line = line.replace("\u015e", "\u0218").replace("\u015f", "\u0219")
  line = line.replace("\u0162", "\u021a").replace("\u0163", "\u021b")
  ostream.write(line)