Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Mixed-Language Arabic- English Information Retrieval

Includes abstract.

Saved in:

Bibliographic Details
Main Author:	Mustafa, Ali Mohammed
Other Authors:	Suleman, Hussein
Format:	Thesis
Language:	English
Published:	Department of Computer Science 2014
Subjects:	Computer Science
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613546664165376
access_status_str	Open Access
author	Mustafa, Ali Mohammed
author2	Suleman, Hussein
author_browse	Mustafa, Ali Mohammed Suleman, Hussein
author_facet	Suleman, Hussein Mustafa, Ali Mohammed
author_sort	Mustafa, Ali Mohammed
collection	Thesis
description	Includes abstract.
format	Thesis
id	oai:open.uct.ac.za:11427/6421
institution	University of Cape Town (South Africa)
language	eng
last_indexed	2026-06-10T12:37:52.426Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2014
publishDateRange	2014
publishDateSort	2014
publisher	Department of Computer Science
publisherStr	Department of Computer Science
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/6421 Mixed-Language Arabic- English Information Retrieval Mustafa, Ali Mohammed Suleman, Hussein Computer Science Includes abstract. Includes bibliographical references. This thesis attempts to address the problem of mixed querying in CLIR. It proposes mixed-language (language-aware) approaches in which mixed queries are used to retrieve most relevant documents, regardless of their languages. To achieve this goal, however, it is essential firstly to suppress the impact of most problems that are caused by the mixed-language feature in both queries and documents and which result in biasing the final ranked list. Therefore, a cross-lingual re-weighting model was developed. In this cross-lingual model, term frequency, document frequency and document length components in mixed queries are estimated and adjusted, regardless of languages, while at the same time the model considers the unique mixed-language features in queries and documents, such as co-occurring terms in two different languages. Furthermore, in mixed queries, non-technical terms (mostly those in non-English language) would likely overweight and skew the impact of those technical terms (mostly those in English) due to high document frequencies (and thus low weights) of the latter terms in their corresponding collection (mostly the English collection). Such phenomenon is caused by the dominance of the English language in scientific domains. Accordingly, this thesis also proposes reasonable re-weighted Inverse Document Frequency (IDF) so as to moderate the effect of overweighted terms in mixed queries. 2014-08-13T19:31:35Z 2014-08-13T19:31:35Z 2013 Doctoral Thesis Doctoral PhD http://hdl.handle.net/11427/6421 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town
spellingShingle	Computer Science Mustafa, Ali Mohammed Mixed-Language Arabic- English Information Retrieval
thesis_degree_str	Doctoral
title	Mixed-Language Arabic- English Information Retrieval
title_full	Mixed-Language Arabic- English Information Retrieval
title_fullStr	Mixed-Language Arabic- English Information Retrieval
title_full_unstemmed	Mixed-Language Arabic- English Information Retrieval
title_short	Mixed-Language Arabic- English Information Retrieval
title_sort	mixed language arabic english information retrieval
topic	Computer Science
url	http://hdl.handle.net/11427/6421
work_keys_str_mv	AT mustafaalimohammed mixedlanguagearabicenglishinformationretrieval

Full Text Available

Mixed-Language Arabic- English Information Retrieval

Similar Items