blob: 5a6c3756361094804de01ec65550ada85d446c63 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
# en-de data
## Training
The training data is a subset of data from the [WMT21] news task.
| Dataset | Sentences |
|---------------------|--------------:|
| Europarl v10 | 1,828,521 |
| News Commentary v16 | 398,981 |
| Common Crawl corpus | 2,399,123 |
| **Total** | **4,626,625** |
## Validation
The validation set uses the [WMT19] news task test set via [sacrebleu].
## Testing
Evaluation of the model uses the [WMT20] news task test set via [sacrebleu].
[wmt19]: https://www.statmt.org/wmt19/translation-task.html
[wmt20]: https://www.statmt.org/wmt20/translation-task.html
[wmt21]: https://www.statmt.org/wmt21/translation-task.html
[sacrebleu]: https://github.com/mjpost/sacrebleu
|