Abstract
Chinese name translation is a special case of the problem of named entity translation.
It is a very challenging problem because there exist many kinds of Romanization systems and some people like to add some words to their English names.
Because of translating a scholar’s name into its corresponding English name correctly could help find information about his academic achievements, Chinese name translation is in great demand.
In this thesis, we first propose a classification of Chinese names, and then propose a novel methodology to mining Chinese name translations from Web corpora. Our methodology uses two kinds of features,
which are the phonetic and the distant features, to extract name translation candidates by using a query expansion technique and Support Vector Machine (SVM).
Using query expansion technique can effectively and more precisely retrieve the Web pages which contained the input Chinese name and the name’s translation.
And using SVM to learn verification rule by training samples for name translation candidates can avoid the side effect caused by using heuristic rule.
We classify Chinese names into eight name types according to the corresponding name translation. The experiment result showed our methodology can effectively mine out the correct name translations of three common name types.
|