Morphology

How Transliteration Enhances Machine Translation: The HeArBERT Approach | HackerNoon

Authors: (1) Aviad Rom, The Data Science Institute, Reichman University, Herzliya, Israel; (2) Kfir Bar, The Data Science Institute, Reichman University, Herzliya, Israel. Table of Links Abstract and Introduction Related Work Methodology Experimental Settings Results Conclusion and Limitations Bibliographical References 3. Methodology We begin by pre-training a new language model using texts written in both Arabic and Hebrew. This model, named HeArBERT, is subsequently finetuned to enhance performance in machine translation between Arabic and Hebrew.

Read More »
Software

HeArBERT: A Bilingual Model for Arabic-Hebrew Translation Using Transliteration | HackerNoon

Authors: (1) Aviad Rom, The Data Science Institute, Reichman University, Herzliya, Israel; (2) Kfir Bar, The Data Science Institute, Reichman University, Herzliya, Israel. Table of Links Abstract and Introduction Related Work Methodology Experimental Settings Results Conclusion and Limitations Bibliographical References K et al. (2020) have suggested that structural similarity of languages is essential for language model’s multilingual generalization capabilities. Their suggestion was further discussed by Dufter and Schütze (2020), who highlighted the essential components for

Read More »
Software

Training a Bilingual Language Model by Mapping Tokens onto a Shared Character Space | HackerNoon

Authors: (1) Aviad Rom, The Data Science Institute, Reichman University, Herzliya, Israel; (2) Kfir Bar, The Data Science Institute, Reichman University, Herzliya, Israel. Table of Links Abstract and Introduction Related Work Methodology Experimental Settings Results Conclusion and Limitations Bibliographical References Abstract We train a bilingual Arabic-Hebrew language model using a transliterated version of Arabic texts in Hebrew, to ensure both languages are represented in the same script. Given the morphological, structural similarities, and the extensive

Read More »