Abstract
In this paper incomplete data sets, or data sets with missing attribute values, have three interpretations, lost values, attribute-concept values and ‘do not care’ conditions. Additionally, the process of data mining is based on two types of probabilistic approximations, global and saturated. We present results of experiments on mining incomplete data sets using six approaches, combining three interpretations of missing attribute values with two types of probabilistic approximations. We compare our six approaches, using the error rate computed as a result of ten-fold cross validation as a criterion of quality. We show that for some data sets the error rate is significantly smaller (5% level of significance) for lost values, for some data sets the smaller error rate is associated with attribute-concept values, and sometimes with ‘do not care’ conditions. Again, for some approaches the error rate is significantly smaller for saturated probabilistic approximations than for global probabilistic approximations, while for some approaches it is the other way around. Thus, for an incomplete data set, the best approach to data mining should be chosen by trying all six approaches.