VinAI is pleased to publicly release the pre-trained text translation models “vinai/vinai-translate-vi2en” and “vinai/vinai-translate-en2vi” that are currently used in the translation component of our VinAI Translate system. The pre-trained models are state-of-the-art text translation models for Vietnamese-to-English and English-to-Vietnamese, which can be used with the popular library “transformers”.
Please find details about the pre-trained models at: https://github.com/VinAIResearch/VinAI_Translate.
Experimental results of the pre-trained models can be found in our VinAI Translate system paper “A Vietnamese-English Neural Machine Translation System”, which will be presented at the Interspeech 2022 Show & Tell session.
Other NLP resources from VinAI:
- BARTpho (INTERSPEECH 2022): Pre-trained sequence-to-sequence models for Vietnamese.
- QA-CarManual (IUI 2022): Demo video of a Vietnamese speech-based question answering over car manuals.
- PhoMT (EMNLP 2021): A high-quality and large-scale benchmark dataset for Vietnamese-English machine translation.
- PhoATIS (INTERSPEECH 2021): An intent detection and slot filling dataset for Vietnamese.
- PhoNLP (NAACL 2021): A BERT-based multi-task learning toolkit for Vietnamese POS tagging, named entity recognition and dependency parsing.
- PhoNER_COVID19 (NAACL 2021): A dataset for Vietnamese named entity recognition.
- ViText2SQL (EMNLP 2020 Findings): A dataset for Vietnamese Text2SQL semantic parsing.
- PhoBERT (EMNLP 2020 Findings): Pre-trained language models for Vietnamese.
- BERTweet (EMNLP 2020): A pre-trained language model for English Tweets.
- COVID19Tweet (WNUT 2020): A dataset released for the WNUT 2020 Shared Task on “Identification of informative COVID-19 English Tweets”.