國立台灣科技大學 電子工程系所
糊類神經網路實驗室 研究論文
Fuzzy Neuron Laboratory Paper

89級畢業碩生 林盛康 發表論文


以關聯式索引典為基礎之互動式查詢擴展應用於網頁資訊檢索

摘要

隨著全球資訊網上有越來越多的資訊垂手可得之際,如何有效率地擷取吾人有需要的部分 ,就變得十分重要且實際。現今的搜尋引擎即是為了幫助使用者過濾不相關而留下有興趣 的資訊而設計。然而,許多現象的存在,限制了這些搜尋工具的整體效能,如:詞彙表達 上的誤用、過於精簡的查詢、中文斷詞上的困難等。這些問題驅使我們運用互動式的查詢 機制來幫助使用者更容易地下達查詢語詞,進而得到滿意的答案。在所提的方法中,一個 以伴隨出現(co-occurrence)為基礎的關聯式索引典(association thesaurus)將會用來輔 助查詢工作。當使用者送出查詢字串後,我們會從這個索引典中找出與查詢字串相關的其 他語詞,並讓使用者來勾選其中真正有關聯的語詞。接著,系統會將原始的查詢語詞以及 由使用者選出來的擴展查詢語詞做合併,再進行最後的查詢動作。在實驗中,我們用了兩 組文件集來建構關聯式索引典來比較不同性質語料所造成的影響。實驗結果顯示,同質( homogeneous)的語料會建構出較健全的索引典進而對互動式查詢更有幫助。另一方面,我 們也檢測了兩種不同修正查詢的方法;經由實驗,我們發現兩種方法在不同的情況下,各 有所長。概括來說,以關聯式索引典為基礎之互動式查詢擴展無論在系統的精確率及回收 率上都會有顯著的提升。


Interactive Query Expansion Based on Association Thesaurus for Web Information Retrieval

Abstract

With the increasing availability of information on the WWW (World Wide Web), it becomes more important and feasible to retrieve information efficiently and effectively. Current search engines are created for the purpose of sifting through non-relevant information and retrieving only those pieces of user interests. However, many difficulties, such as word misusage of human beings, short queries in retrieval systems and ambiguities in Chinese word identification, would cause these search tools to reach their limitations.  Therefore, we propose an interactive searching scheme that aims to provide users an easy way to articulate their queries and to retrieve information best fit to their interests. In this research, a co-occurrence based association thesaurus is involved while users submit their initial queries. This thesaurus is well arranged by means of an organization technique, so that terms in the association thesaurus offered as suggestions could be effortless for users to decide which to add. Then, the reformulated queries accompanied with some query modification methods are submitted to perform another round of searching. Two test collections were used to construct the association thesaurus in order to see how dataset criteria affect the constructed thesaurus. Experimental results show that a homogeneous collection would get in a robust thesaurus that is useful for interactive query expansion. On the other hand, two weighting schemes for query modification were also examined and the results show that there are some compromises of using them. In summary, we concluded that interactive query expansion based on association thesaurus achieves better performance in both precision and recall rate significantly.