The instance-based learning is a machine learning technique that has proven to be successful over a wide range of classification problems. Despite its high classification accuracy, however, it has a relatively high storage requirement and because it mus
Instance-based learning methods like the nearest neighbour classifier have been proven to perform well in pattern classification on many fields. Despite their high classification accuracy, they suffer from high storage requirement, computational cost and sensitivity to noise. In this paper, we present a data reduction method for classification techniques based on entropy-based partitioning and center instances. Experimental results show that the new algorithms achieve a high data reduction rate as well as classification accuracy.
In this paper, we present a temporal association rule based on item time intervals. A temporal association rule is an association rule that holds specific time intervals. If we consider itemset in the frequently purchased period, we can discover more sign
Data mining is widely used for turning huge amounts of data into useful information and knowledge in the information industry in recent years. When analyzing data set with continuous values in order to gain knowledge utilizing data mining, we often underg
Classification is an important area in a data mining. There are various ways in classification methodologies : the decision tree and the neural network, etc. Recently, Rough set theory has been presented as a method for classification. Rough set theory is a new approach in decision making in the presence of uncertainty and vagueness. In the process of constructing the tree, appropriate attributes have to be selected as nodes of the tree. In this paper, we present a new approach to selection of attributes for the construction of decision tree using the Rough set theory. The suggested method makes more simple classification rules in the decision tree and reduces the volume of the data to be treated.
Data mining is widely used for turning huge amounts of data into useful information and knowledge in the information industry in recent years. When analyzing data set with continuous values in order to gain knowledge utilizing data mining, we often undergo a process called discretization, which divides the attribute’s value into intervals. Such intervals from new values for the attribute allow to reduce the size of the data set. In addition, discretization based on rough set theory has the advantage of being easily applied. In this paper, we suggest a discretization algorithm based on Rough set and SOM(Self-Organizing Map) as a means of extracting valuable information from large data set, which can be employed even in the case where there lacks of professional knowledge for the field.
In this paper, we address a mining association rules with weighted items and multiple minimum support. We generalize this to the case where items are given weights to reflect their importance to the user. And to find rules that involve both frequent and rare items, we specify multiple minimum supports to reflect the frequency of the items. In rule mining, different rules may need to satisfy different minimum supports depending on what items are in the rules.
In this paper, we present a temporal association rules based on item time intervals. A temporal association rules is an association rule that holds specific time intervals. If we consider itemset in the frequently purchased period, we can discovery more significant itemset satisfying minimum support. Because the previous study did not consider the time interval between purchased item, it could find itemset that did not satisfy the minimum support in case some item was frequently purchased in a specific period and rarely or not purchased in other period. Our approach use interval support which is counted by period with support and confidence in the association rule to discovery large itemset.
We study clustering algorithm for sequences of categorical values. Clustering is a data mining problem that has received significant attention by the database community. Traditional clustering algorithms deal with numerical or categorical data points. How