Release text translation model of VinAI Translate

12/10/2023 / AI Update

VinAI is pleased to publicly release the pre-trained text translation models “vinai/vinai-translate-vi2en” and “vinai/vinai-translate-en2vi” that are currently used in the translation component of our VinAI Translate system. The pre-trained models are state-of-the-art text translation models for Vietnamese-to-English and English-to-Vietnamese, which can be used with the popular library “transformers”.

Please find details about the pre-trained models at: https://github.com/VinAIResearch/VinAI_Translate.

Experimental results of the pre-trained models can be found in our VinAI Translate system paper “A Vietnamese-English Neural Machine Translation System”, which will be presented at the Interspeech 2022 Show & Tell session.

Other NLP resources from VinAI:

  • BARTpho (INTERSPEECH 2022): Pre-trained sequence-to-sequence models for Vietnamese.
  • QA-CarManual (IUI 2022): Demo video of a Vietnamese speech-based question answering over car manuals.
  • PhoMT (EMNLP 2021): A high-quality and large-scale benchmark dataset for Vietnamese-English machine translation.
  • PhoATIS (INTERSPEECH 2021): An intent detection and slot filling dataset for Vietnamese.
  • PhoNLP (NAACL 2021): A BERT-based multi-task learning toolkit for Vietnamese POS tagging, named entity recognition and dependency parsing.
  • PhoNER_COVID19 (NAACL 2021): A dataset for Vietnamese named entity recognition.
  • ViText2SQL (EMNLP 2020 Findings): A dataset for Vietnamese Text2SQL semantic parsing.
  • PhoBERT (EMNLP 2020 Findings): Pre-trained language models for Vietnamese.
  • BERTweet (EMNLP 2020): A pre-trained language model for English Tweets.
  • COVID19Tweet (WNUT 2020): A dataset released for the WNUT 2020 Shared Task on “Identification of informative COVID-19 English Tweets”.
Back to News