國立台灣科技大學 資訊工程系所
智慧型系統實驗室 研究論文
Intelligent System Laboratory Paper

95級畢業博士 楊勝源 (Sheng-Yuan Yang) 發表論文


新一代智慧型網路資訊系統FAQ-master

摘要

    新一代智慧型網路資訊系統FAQ-master 本論文闡述我們發展智慧型網路資訊系統FAQ-master的成果。FAQ-master具有從浩瀚網際中,有效發現、擷取、過濾、代取、排序與 呈現高品質資訊,來滿足使用者資訊需求的能力。所謂高品質資訊的意義就是具深度、最新且貼近於使用者問題的解答。論文中探討 下列問題:如何忠實且傳神地擷取使用者的意圖、如何有效地發現與整合鬆散無特定結構的網路資訊、如何呈現給使用者相關的查詢 結果、以及如何提供有效的代取機制縮短系統的回應時間。提出的技術包括:本體論、使用者模式、網站模式、以及資料整合與代取 機制。本論文並勾勒出FAQ-master的四個主要組件,亦即介面代理人、代取代理人、答覆代理人與搜尋代理人的系統架構,祈能從使 用者意圖、網頁文件處理與網站搜尋等三個觀點,有效地改善網際網路搜尋的成果。 介面代理人扮演使用者與系統間的溝通者,來抓取使用者真正的查詢意圖。在使用者塑模、樣板及本體論支援下,本代理人提供增強 式型樣與樣板比對之自然語言查詢、人機溝通之協助與引導、以及較佳的個人化資訊服務,亦能處理使用者針對提供解答的回饋。代 取代理人則扮演介面代理人與後端答覆代理人間之兩階段中介者,引進本體論增強之智慧式代取機制,可有效降低後端伺服器資料庫 的擷取負擔。答覆代理人負責清理、擷取與轉換來自不同網站的資訊,並存成本體論主導的資料庫。本代理人引進包裝器技術,將搜 尋代理人收集的網頁資訊在系統後端進行本體論主導的資訊匯集。最後,搜尋代理人藉由本體論支援的網站模式,執行使用者導向與 領域相關兼顧的網路資訊擷取。這種語意層次解答的做法,使得搜尋代理人能提供具使用者高滿意度之特定領域聚焦的網路資訊探索 。 本系統的成果之一為發展介面代理人的本體論支援與樣板為主之使用者塑模及查詢處理的技術。我們初步的實驗結果顯示,近八成的 使用者問句可由本系統正確地辨識出使用者的查詢意圖與焦點。此外,實驗也驗證了樣式匹配技術的完整性,對於了解使用者問句的 意圖與焦點相當有效。 本系統的成果之二為發展代取代理人的查詢預測技術。本代理人的特色為(1)利用完美雜湊與資料庫分解之改良式的循序型樣採掘技 術,自使用者查詢歷史紀錄中,挖掘出使用者查詢行為,進行使用者導向之快速採掘及預測;(2)利用PC本體論中的VRelationships ,進行本體論主導之改良式案例式推理。實驗顯示,本系統大約可舒緩後端答覆代理人約70 %的工作負擔,很明顯的改善了整體的查 詢效能。 本系統的成果之三為發展答覆代理人的組織與處理鬆散無特定結構之網路資訊技術。本代理人引進本體論支援的包裝器技術,對來自 異質環境下的FAQ資訊進行清理、擷取與轉換,並儲存在一個依知識本體結構建構的整合式資料庫;利用本體論支援去除FAQ雜訊、不 一致、或互衝的情形,採用全部關鍵詞包含式或部分關鍵詞包含式的方式,擷取出更多可供回覆的FAQ。為呈現最正確有效的解答, 本代理人引入豐富的排名指標技術,包括:出現率、滿意值、相容值與相似值,來強化查詢結果的排名次序,以提供使用者更佳的呈 現結果。經實驗顯示,本代理人的確能提升查詢解答的精確值,並呈現較佳的排名效果。 本系統的成果之四為發展搜尋代理人的使用者導向與領域聚焦兼顧之網路搜尋技術。引進本體論支援的網站模式提供搜尋引擎具語意 層次的解答,藉以產出快速、精準、穩定與高滿意度的搜尋結果。由於網站模式與領域本體論的關聯密切,在網站模式的建構與應用 上能支援包括:查詢擴展、網頁註解、網頁與網站分類、以及兼顧領域相關與使用者興趣的網路資源收集。本代理人的特色為(1)本 體論支援的網站模式建構:提出將領域語意引進網路資源蒐集與儲存的觀念,重要的成果是一個能精準且穩定分類網頁,並支援正確 語意註解之新的本體論分類器OntoClassifier,經實驗顯示,本分類器確能獲致滿意的網頁分類結果;(2)網站模式支援的網路資源 探索:兼顧了使用者興趣與領域特殊性,成果之一為引進具使用者查詢驅動之網頁擴展、自主式網站擴展、以及深度開發查詢結果等 革新策略,來有效擴展網站模式之新的Focused Crawler;(3)網站模式支援的網頁擷取:揭露出以本體論特徵值當作快速索引架構, 來定位出符合使用者需求網頁的功效。


Development of FAQ-master as a New Intelligent Web Information System

Abstract

   This thesis describes the result of our research in developing FAQ-master as an intelligent Web information system. The system is developed to perform intelligent discovery, retrieval, filtering, proxy, ranking and presentation of Web information to provide high-quality FAQ solutions to meet user information request. By a high quality answer we mean an answer that is profound, up-to-date, and relevant to the user’s question. We summarized problems into: how to faithfully capture user intention, how to effectively discover and aggregate Web information, how to present the relevant result to the user, and how to provide efficient proxy mechanism to help speed up the turn around time. We propose the following techniques to tackle the above issues: ontology, user models, website models, and data aggregation and proxy mechanisms. Based upon the techniques, FAQ-master was developed to contain four agents, namely, Interface Agent, Proxy Agent, Answerer Agent, and Search Agent, which can effectively and efficiently improve the search result from the following three aspects of the Web search activity, namely, user intention, document processing, and website search. The Interface Agent was developed to work as an assistant between the user and FAQ system for capturing true user’s intention. Based on user modeling, template-based and ontology-supported techniques, the agent can support natural language query, enhanced by the pattern-match and template-based technique; assistance and guidance for human- machine interaction; and better personalized information services. It also handles user feedback on the suitability of the proposed responses. The Proxy Agent was developed to work as a two-tier mediator between the Interface Agent and backend Answerer Agent. It employs an ontology-enhanced intelligent proxy mechanism to effectively alleviate the overloading problem usually associated with a backend server. The Answerer Agent was developed to help clean, retrieve, and transform FAQ information collected from a heterogeneous environment, such as the Web, and stores it in an ontological database. It works as a back end process to perform ontology-directed information aggregation, supported by the wrapper technique, from the webpages collected by the Search Agent. Finally, the Search Agent was developed to work as an both user-oriented and domain-related Web information retrieval with the help of ontology- supported website models. This approach provides a semantic level solution for the Search Agent so that it can provide domain-specific, focused Web information discovery toward a high degree of user satisfaction. Our first contribution is on the techniques of user modeling and query processing involved in the development of Interface Agent, which features ontology-supported, template-based user modeling technique and query processing. Our preliminary experimentation demonstrates that user intention and focus of up to eighty percent of the user queries can be correctly understood by the system. In addition, from the experiments we verify the robustness of the linguistic pattern match technique by demonstrating its effectiveness in analyzing users’ query intention and focus. The second contribution is on the techniques of query prediction in Proxy Agent. The agent features following interesting points. First, it performs fast user-oriented mining and prediction by discovering frequent queries and predicted queries from user query history. The improved sequential pattern mining algorithm is made more efficient by the techniques of perfect hashing and database decomposition. Second, it performs ontology-directed case-based reasoning. The semantics of PC ontology, in particular the VRelationships, are used in determining similar cases, performing case adaptation, and case retaining. Our experiments show that the agent can share up to 70% of the query loading from the backend process, which helps a lot on the overall query performance. The third contribution is on the techniques of organizing and processing unstructured Web information in Answerer Agent. The agent employs ontology as the key technique, supported by the wrapper techniques to help clean, retrieve, and transform unstructured FAQ information collected from a heterogeneous environment, and stores it in an ontological database, which reflects the ontological structure. When it comes to the retrieval of FAQs, the agent trims irrelevant query keywords, employs either full keywords or partial keywords to retrieve FAQs, and removes conflicting FAQs before turning the final results to the user, all of which are supported by ontology. In addition, to producing a more effective presentation of the search results, the agent employs an enhanced ranking technique, which includes Appearance Probability, Satisfaction Value, Compatibility Value, and Statistic Similarity Value as four measures with proper weights to rank the FAQs. Our experiments show the Agent does improve the precision rate and produces better ranking results. The final contribution is on the techniques of reflecting both user-oriented and domain-focused aspects in web search in Search Agent. The agent features an ontology-supported website modeling technique to provide a semantic level solution for a search engine so that it can provide fast, precise and stable search results with a high degree of user satisfaction. The website modeling technique closely connected to the domain ontology, which supports the following functions in both website model construction and application: query expansion, webpage annotation, webpage/website classification, and focused collection of domain-related and user-interested Web resources. The agent features the following interesting characteristics. 1) Ontology-supported construction of website models. By this, we attribute domain semantics into the Web resources collected and stored in the local database. One important contribution here is the new Ontology-supported OntoClassifier which can do very accurate and stable classification on webpages to support more correct annotation of domain semantics. Our experiments show that Ontoclassifier performs very well in obtaining accurate and stable webpages classification. 2) Website models-supported web resource discovery. By this, we take into account both user interests and domain specificity. The contribution here is the new Focused Crawler which employs progressive strategies to do user query-driven webpage expansion, autonomous website expansion, and query results exploitation to effectively expand the website models. 3) Website models-supported Webpage Retrieval. By this, we leverage the power of ontology features as a fast index structure to locate most-wanted webpages for the user.