Assessing relevance using automatically translated documents for cross-language information retrieval

Orengo, Viviane Moreira (2004) Assessing relevance using automatically translated documents for cross-language information retrieval. PhD thesis, Middlesex University.

[img]
Preview
PDF - Final accepted version (with author's formatting)
Download (12MB) | Preview

Abstract

This thesis focuses on the Relevance Feedback (RF) process, and the scenario considered is that of a Portuguese-English Cross-Language Information Retrieval (CUR) system. CUR deals with the retrieval of documents in one natural language in response to a query expressed in another language. RF is an automatic process for query reformulation. The idea behind it is that users are unlikely to produce perfect
queries, especially if given just one attempt.The process aims at improving the queryspecification, which will lead to more relevant documents being retrieved. The method consists of asking the user to analyse an initial sample of documents retrieved in response to a query and judge them for relevance.

In that context, two main questions were posed. The first one relates to the user's ability in assessing the relevance of texts in a foreign language, texts hand translated into their language and texts automatically translated into their language. The second question concerns the relationship between the accuracy of the participant's judgements and the improvement achieved through the RF process.

In order to answer those questions, this work performed an experiment in which Portuguese speakers were asked to judge the relevance of English documents, documents hand-translated to Portuguese, and documents automatically translated to Portuguese. The results show that machine translation is as effective as hand translation in aiding users to assess relevance. In addition, the impact of misjudged
documents on the performance of RF is overall just moderate, and varies greatly for different query topics.

This work advances the existing research on RF by considering a CUR scenario and carrying out user experiments, which analyse aspects of RF and CUR that remained unexplored until now. The contributions of this work also include: the investigation of CUR using a new language pair; the design and implementation of a stemming algorithm for Portuguese; and the carrying out of several experiments using Latent Semantic Indexing which contribute data points to the CUR theory.

Item Type: Thesis (PhD)
Research Areas: B. > Theses
Item ID: 13606
Depositing User: Adam Miller
Date Deposited: 06 Feb 2015 14:42
Last Modified: 30 May 2019 23:36
URI: https://eprints.mdx.ac.uk/id/eprint/13606

Actions (login required)

Edit Item Edit Item

Full text downloads (NB count will be zero if no full text documents are attached to the record)

Downloads per month over the past year