Leveraging data deduplication to improve the performance of primary storage systems in the cloud
- 软件学院－会议论文 
Recent studies have shown that moderate to high data redundancy exists in primary storage systems, such as VM-based, enterprise and HPC storage systems, which indicates that the data deduplication technology can be used to effectively reduce the write traffic and storage space in such environments. However, our experimental studies reveal that applying data deduplication to primary storage systems will cause space contention in main memory and data fragmentation on disks. This is in part because applying data deduplication introduces significant index memory overhead to the existing system and in part because a file or block is split into multiple small data chunks that are often located in non-sequential locations on disks after deduplication. This fragmentation of data can cause a subsequent read operation to invoke many disk I/O requests, thus leading to performance degradation.