Abstract
This thesis describes the result of our research in developing FAQ-master as an intelligent Web information system.
The system is developed to perform intelligent discovery, retrieval, filtering, proxy, ranking and presentation of
Web information to provide high-quality FAQ solutions to meet user information request. By a high quality answer we
mean an answer that is profound, up-to-date, and relevant to the user’s question. We summarized problems into: how
to faithfully capture user intention, how to effectively discover and aggregate Web information, how to present the
relevant result to the user, and how to provide efficient proxy mechanism to help speed up the turn around time. We
propose the following techniques to tackle the above issues: ontology, user models, website models, and data
aggregation and proxy mechanisms. Based upon the techniques, FAQ-master was developed to contain four agents,
namely, Interface Agent, Proxy Agent, Answerer Agent, and Search Agent, which can effectively and efficiently
improve the search result from the following three aspects of the Web search activity, namely, user intention,
document processing, and website search.
The Interface Agent was developed to work as an assistant between the user and FAQ system for capturing true user’s
intention. Based on user modeling, template-based and ontology-supported techniques, the agent can support natural
language query, enhanced by the pattern-match and template-based technique; assistance and guidance for human-
machine interaction; and better personalized information services. It also handles user feedback on the suitability
of the proposed responses. The Proxy Agent was developed to work as a two-tier mediator between the Interface Agent
and backend Answerer Agent. It employs an ontology-enhanced intelligent proxy mechanism to effectively alleviate the
overloading problem usually associated with a backend server. The Answerer Agent was developed to help clean,
retrieve, and transform FAQ information collected from a heterogeneous environment, such as the Web, and stores it
in an ontological database. It works as a back end process to perform ontology-directed information aggregation,
supported by the wrapper technique, from the webpages collected by the Search Agent. Finally, the Search Agent was
developed to work as an both user-oriented and domain-related Web information retrieval with the help of ontology-
supported website models. This approach provides a semantic level solution for the Search Agent so that it can
provide domain-specific, focused Web information discovery toward a high degree of user satisfaction.
Our first contribution is on the techniques of user modeling and query processing involved in the development of
Interface Agent, which features ontology-supported, template-based user modeling technique and query processing. Our
preliminary experimentation demonstrates that user intention and focus of up to eighty percent of the user queries
can be correctly understood by the system. In addition, from the experiments we verify the robustness of the
linguistic pattern match technique by demonstrating its effectiveness in analyzing users’ query intention and
focus.
The second contribution is on the techniques of query prediction in Proxy Agent. The agent features following
interesting points. First, it performs fast user-oriented mining and prediction by discovering frequent queries and
predicted queries from user query history. The improved sequential pattern mining algorithm is made more efficient
by the techniques of perfect hashing and database decomposition. Second, it performs ontology-directed case-based
reasoning. The semantics of PC ontology, in particular the VRelationships, are used in determining similar cases,
performing case adaptation, and case retaining. Our experiments show that the agent can share up to 70% of the query
loading from the backend process, which helps a lot on the overall query performance.
The third contribution is on the techniques of organizing and processing unstructured Web information in Answerer
Agent. The agent employs ontology as the key technique, supported by the wrapper techniques to help clean, retrieve,
and transform unstructured FAQ information collected from a heterogeneous environment, and stores it in an
ontological database, which reflects the ontological structure. When it comes to the retrieval of FAQs, the agent
trims irrelevant query keywords, employs either full keywords or partial keywords to retrieve FAQs, and removes
conflicting FAQs before turning the final results to the user, all of which are supported by ontology. In addition,
to producing a more effective presentation of the search results, the agent employs an enhanced ranking technique,
which includes Appearance Probability, Satisfaction Value, Compatibility Value, and Statistic Similarity Value as
four measures with proper weights to rank the FAQs. Our experiments show the Agent does improve the precision rate
and produces better ranking results.
The final contribution is on the techniques of reflecting both user-oriented and domain-focused aspects in web
search in Search Agent. The agent features an ontology-supported website modeling technique to provide a semantic
level solution for a search engine so that it can provide fast, precise and stable search results with a high degree
of user satisfaction. The website modeling technique closely connected to the domain ontology, which supports the
following functions in both website model construction and application: query expansion, webpage annotation,
webpage/website classification, and focused collection of domain-related and user-interested Web resources. The
agent features the following interesting characteristics. 1) Ontology-supported construction of website models. By
this, we attribute domain semantics into the Web resources collected and stored in the local database. One important
contribution here is the new Ontology-supported OntoClassifier which can do very accurate and stable classification
on webpages to support more correct annotation of domain semantics. Our experiments show that Ontoclassifier
performs very well in obtaining accurate and stable webpages classification. 2) Website models-supported web
resource discovery. By this, we take into account both user interests and domain specificity. The contribution here
is the new Focused Crawler which employs progressive strategies to do user query-driven webpage expansion,
autonomous website expansion, and query results exploitation to effectively expand the website models. 3) Website
models-supported Webpage Retrieval. By this, we leverage the power of ontology features as a fast index structure to
locate most-wanted webpages for the user.
|