Neural Networks with Grey Prediction Learning Ability
and Their Applications on Web Mining
Abstract
World Wide Web (WWW) grows up very rapidly in recent
years, and it contains an enormous amount of data and information that
can be extracted via computer assisted tools, intelligent agents, search
engines, and Web mining techniques. Consequently, how to explore useful
information and knowledge from WWW is gradually becoming urgent need.
However, to search or retrieve information and data from WWW manually
is a difficult and time-consuming job because WWW has become a huge
database and provided abundant information. Thus, how to effectively
search, extract and filter data and information from Internet using
intelligent agents and Web mining techniques has become important research
issues.
Past researches present that machine learning methods and the neural-based
prediction or classification methods were extensively used in Web mining
techniques. Among used machine learning methods, the gradient descent
method is widely used to train various classifiers, such as Back-propagation
neural network and linear text classifier. However, the gradient descent
method is easily trapped into a local minimum and slowly converges.
Thus, this study presents a gradient forecasting search method (GFSM)
based on prediction methods to enhance the performance of the gradient
descent method in order to develop a more efficient and precise machine
learning method for Web mining.
However, a prediction method with few sample data items and precise
forecasting ability is a key issue to the gradient forecasting search
method. Applying statistic-based prediction methods to implement GFSM
is unsuitable because they require a large number of data items to model
a prediction model. In the contrast with statistic-based prediction
methods, GM(1,1) grey prediction model does not need a large number
of data items to build a prediction model, and it has low computational
load. However, the original GM(1,1) grey prediction model uses a mathematical
hypothesis and approximation to transform a continuous differential
equation into a discrete difference equation in order to model a forecasting
model. This is not a logical approach because the modeling sequence
data are invariably discrete. Moreover, GM(1,1) model only considers
two neighbor sequence data for modeling is not sufficient to build a
precise forecasting model. To construct a more precise prediction model,
discrete difference equation prediction model (DDEPM) is presented to
support Gradient Forecasting Search Method herein.
Web mining is the use of data mining techniques to automatically discover
and extract information from Web documents and services. Some previous
studies have indicated that main challenges in Web mining are in terms
of handling high-dimensional data, achieving incremental learning (or
incremental mining), scalability, parallel and distributed mining algorithms.
However, many traditional data mining methods cannot satisfy these needs
for Web mining. In contrast with previous analyses, Albus's CMAC (Cerebellar
Model Arithmetic Computer) neural network model has a high potential
in development of effective data mining techniques owing to its fast
learning property, good generalization capability, native parallel and
distributed processing ability, and ease of implementation by hardware.
However, the conventional CMAC has an enormous memory requirement so
that it cannot be applied to solve higher dimensional problems. This
shortcoming leads to some limitations and inconveniences while using
CMAC neural network to develop Web mining techniques. Consequently,
a hierarchical CMAC (HCMAC) model based on the concept of differentiable
CMAC is presented capable of resolving both the enormous memory requirement
in the conventional CMAC and high dimensional problems. Moreover, a
self-organizing input space approach is proposed to automatically determine
the memory structure of the HCMAC neural network according to the distribution
of training data sets. In addition, a learning algorithm that can learn
incrementally from new added data without forgetting prior knowledge
is proposed to train the self-organizing HCMAC neural network. Finally,
our proposed method is applied to develop Web mining techniques for
personalized Web pages navigation.
Experiments on several benchmark functions' searching, Back-propagation
neural network training and linear text classifier for news pages category
mining indicate that the proposed GFSM can accelerate the searching
speed of gradient descent method as well as help the gradient descent
method escape from local minima. These properties can promote effectively
the accuracy rates of classification algorithms to develop more efficient
and precise Web mining techniques. In addition, experiments also confirm
that self-organizing HCMAC neural network has advantages in terms of
constructing memory structure automatically, fast learning, good generalization
ability, low memory requirement, incremental learning, scalability and
parallel and distributed processing ability. The proposed method indeed
is capable of resolving both the enormous memory requirement in the
conventional CMAC and high dimensional problems. Finally, our proposed
method is applied to incrementally learn user profiles from user feedback
for personalized Web pages navigation. Experiments on the four topics
of user profiles show that the self-organizing HCMAC neural network
performs a better predicting accuracy rate to identify user interesting
Web pages than other well-known classifiers do.
|