Towards publishing set-valued data with high utility
- 软件学院－已发表论文 
Set-valued data are common in databases which usually contain sensitive information that is associated with data owners. Publishing set-valued data may lead to identity breaches. Pioneering techniques de-identify data by k-anonymity which may produce anonymized data of low utility. K-anonymity must be carried out based on the assumption that a presetting taxonomy tree exists. In this paper, we investigate the negative influence of taxonomy tree on data utility, and propose a novel method to anonymize data in a utility-preserving manner. We artificially construct a pseudo taxonomy tree based on utility metrics. Experiments show that our construct-then-anonymize method is not only available for anonymizing set-valued data, but also provides considerable improvement on data utility.