References

Here is a list of selected publications on which the training pipeline is based. You can find more relevant publications on Bergamot project web-site.

  1. V. M. Sánchez-Cartagena, M. Bañón, S. Ortiz-Rojas and G. Ramírez-Sánchez, “Prompsit’s submission to WMT 2018 Parallel Corpus Filtering shared task”, in Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task Papers. Brussels, Belgium: Association for Computational Linguistics, October 2018

  2. Gema Ramírez-Sánchez, Jaume Zaragoza-Bernabeu, Marta Bañón and Sergio Ortiz Rojas “Bifixer and Bicleaner: two open-source tools to clean your parallel data.”, in Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. Lisboa, Portugal: European Association for Machine Translation, November 2020

  3. Mölder F, Jablonski KP, Letcher B, et al. Sustainable data analysis with Snakemake. F1000Res. 2021;10:33. Published 2021 Jan 18. doi:10.12688/f1000research.29032.2

  4. Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task (Bogoychev et al., NGT 2020)

  5. From Research to Production and Back: Ludicrously Fast Neural Machine Translation (Kim et al., EMNLP 2019)

  6. The University of Edinburgh’s Submissions to the WMT19 News Translation Task (Bawden et al., 2019)

  7. Jörg Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’2012)
  8. The University of Edinburgh’s Neural MT Systems for WMT17, Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, and Philip Williams. In Proceedings of the EMNLP 2017 Second Conference on Machine Translation (WMT17), 2017.
  9. Marian: Fast Neural Machine Translation in C++, Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, Andre ́ F. T. Martins, and Alexandra Birch.
  10. Improving Neural Machine Translation Models with Monolingual Data, Rico Sennrich,Barry Haddow,Alexandra Birch, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016.
  11. A Call for Clarity in Reporting BLEU Scores (Post, 2018)
  12. The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation, Facebook
  13. Many-to-English Machine Translation Tools, Data, and Pretrained Models (Gowda et al., ACL 2021)
  14. Chris Dyer, Victor Chahuneau, and Noah A. Smith. (2013). A Simple, Fast, and Effective Reparameterization of IBM Model 2. In Proc. of NAACL.
  15. Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., ACL 2016)
  16. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates (Taku Kudo, 2018)
  17. Bicleaner AI: Bicleaner Goes Neural (Zaragoza-Bernabeu et al., LREC 2022)
  18. Sequence-Level Knowledge Distillation (Yoon Kim, Alexander M. Rush, EMNLP 2016)