Attribute grouping is one of the effective methods in high-dimensional outlier detection,which can effectively alleviate the interference of “ the curse of dimensionality“ . However,existing attribute grouping outlier detection methods fail to reflect the differencesamong attribute groups and the deviation degree of attribute groups,which have?a significant negative influence on the efficiency and performance of high-dimensional outlier detection. We propose an attribute group weight-based outlier detection method?
for categorical databy using information entropy cumulative sum,which depicts and describes the difference among attribute groups. Firstly,the attributegroup deviation factor is defined according to the data pattern frequency and code lengths,and used as a basis of merging attribute groups,which effectively portrays the deviation among attribute groups and further improves the search efficiency in the process of attributegrouping. Secondly,the information entropy cumulative sum is used to define the attribute group weights,which effectively reflects thedifference among different attribute groups. Thirdly,the outlier score function is redefined based on the attribute group weights,and anoutlier detection algorithm for categorical data is proposed on this basis. In the end, experimental results on UCI, NTU, KEEL andsynthetic datasets validate that the outlier detection algorithm not only has high detection accuracy and efficiency,but also has good extensibility and scalability,which can be applied to the outlier detection task of high-dimensional massive categorical attribute datasets.