{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "
\n\n--------------------------------------------------------------------------------\n\nFairseq(-py) is a sequence modeling toolkit that allows researchers and\ndevelopers to train custom models for translation, summarization, language\nmodeling and other text generation tasks.\nWe provide reference implementations of various sequence modeling papers:\n\n\n\n- **Convolutional Neural Networks (CNN)**\n - [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)\n - [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)\n - [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)\n - [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)\n - [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)\n- **LightConv and DynamicConv models**\n - [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)\n- **Long Short-Term Memory (LSTM) networks**\n - Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)\n- **Transformer (self-attention) networks**\n - Attention Is All You Need (Vaswani et al., 2017)\n - [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)\n - [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)\n - [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples/language_model/transformer_lm/README.md)\n - [Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)](examples/constrained_decoding/README.md)\n - [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)\n - [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)\n - [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)\n - [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )\n - [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples/mbart/README.md)\n - [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples/byte_level_bpe/README.md)\n - [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md)\n - [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples/wav2vec/README.md)\n - [Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)](examples/pointer_generator/README.md)\n - [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md)\n - [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)\n - [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md)\n- **Non-autoregressive Transformers**\n - Non-Autoregressive Neural Machine Translation (Gu et al., 2017)\n - Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)\n - Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)\n - Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)\n - [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)\n- **Finetuning**\n - [Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)](examples/rxf/README.md)\n\n
\n\n- March 2020: [Byte-level BPE code released](examples/byte_level_bpe/README.md)\n- February 2020: [mBART model and code released](examples/mbart/README.md)\n- February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/master/examples/backtranslation#training-your-own-model-wmt18-english-german)\n- December 2019: [fairseq 0.9.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.9.0)\n- November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example)\n- November 2019: [CamemBERT model and code released](examples/camembert/README.md)\n- November 2019: [BART model and code released](examples/bart/README.md)\n- November 2019: [XLM-R models and code released](examples/xlmr/README.md)\n- September 2019: [Nonautoregressive translation code released](examples/nonautoregressive_translation/README.md)\n- August 2019: [WMT'19 models released](examples/wmt19/README.md)\n- July 2019: fairseq relicensed under MIT license\n- July 2019: [RoBERTa models and code released](examples/roberta/README.md)\n- June 2019: [wav2vec models and code released](examples/wav2vec/README.md)\n\n