Cross-Lingual Word Embeddings with Universal Concepts and Their Applications

Cross-Lingual Word Embeddings with Universal Concepts and Their Applications
Author :
Publisher :
Total Pages :
Release :
ISBN-10 : OCLC:1311288770
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Cross-Lingual Word Embeddings with Universal Concepts and Their Applications by : Pezhman Sheinidashtegol

Download or read book Cross-Lingual Word Embeddings with Universal Concepts and Their Applications written by Pezhman Sheinidashtegol and published by . This book was released on 2020 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Enormous amounts of data are generated in many languages every day due to our increasing global connectivity. This increases the demand for the ability to read and classify data regardless of language. Word embedding is a popular Natural Language Processing (NLP) strategy that uses language modeling and feature learning to map words to vectors of real numbers. However, these models need a significant amount of data annotated for the training. While gradually, the availability of labeled data is increasing, most of these data are only available in high resource languages, such as English. Researchers with different sets of proficient languages seek to address new problems with multilingual NLP applications. In this dissertation, I present multiple approaches to generate cross-lingual word embedding (CWE) using universal concepts (UC) amongst languages to address the limitations of existing methods. My work consists of three approaches to build multilingual/bilingual word embeddings. The first approach includes two steps: pre-processing and processing. In the pre-processing step, we build a bilingual corpus containing both languages' knowledge in the form of sentences for the most frequent words in English and their translated pair in the target language. In this step, knowledge of the source language is shared with the target language and vice versa by swapping one word per sentence with its corresponding translation. In the second step, we use a monolingual embeddings estimator to generate the CWE. The second approach generates multilingual word embeddings using UCs. This approach consists of three parts. For part I, we introduce and build UCs using bilingual dictionaries and graph theory by defining words as nodes and translation pairs as edges. In part II, we explain the configuration used for word2vec to generate encoded-word embeddings. Finally, part III includes decoding the generated embeddings using UCs. The final approach utilizes the supervised method of the MUSE project, but, the model trained on our UCs. Finally, we applied our last two proposed methods to some practical NLP applications; document classification, cross-lingual sentiment analysis, and code-switching sentiment analysis. Our proposed methods outperform the state of the art MUSE method on the majority of applications.

Cross-Lingual Word Embeddings with Universal Concepts and Their Applications Related Books

Cross-Lingual Word Embeddings with Universal Concepts and Their Applications
Language: en
Pages:
Authors: Pezhman Sheinidashtegol
Categories: Electronic dissertations
Type: BOOK - Published: 2020 - Publisher:

GET EBOOK

Enormous amounts of data are generated in many languages every day due to our increasing global connectivity. This increases the demand for the ability to read
Cross-Lingual Word Embeddings
Language: en
Pages: 134
Authors: Anders Søgaard
Categories: Computers
Type: BOOK - Published: 2019-06-04 - Publisher: Morgan & Claypool Publishers

GET EBOOK

The majority of natural language processing (NLP) is English language processing, and while there is good language technology support for (standard varieties of
Embeddings in Natural Language Processing
Language: en
Pages: 177
Authors: Mohammad Taher Pilehvar
Categories: Computers
Type: BOOK - Published: 2020-11-13 - Publisher: Morgan & Claypool Publishers

GET EBOOK

Embeddings have undoubtedly been one of the most influential research areas in Natural Language Processing (NLP). Encoding information into a low-dimensional ve
Cross-lingual Word Embeddings for Low-resource and Morphologically-rich Languages
Language: en
Pages: 0
Authors: Ali Hakimi Parizi
Categories:
Type: BOOK - Published: 2021 - Publisher:

GET EBOOK

Despite recent advances in natural language processing, there is still a gap in state-of-the-art methods to address problems related to low-resource and morphol
ECAI 2020
Language: en
Pages: 3122
Authors: G. De Giacomo
Categories: Computers
Type: BOOK - Published: 2020-09-11 - Publisher: IOS Press

GET EBOOK

This book presents the proceedings of the 24th European Conference on Artificial Intelligence (ECAI 2020), held in Santiago de Compostela, Spain, from 29 August