TreeSize and Duplicate Files – a Case Study


Duplicate Files – the Solution: TreeSize

A well-known construction company stores all files relevant to a a project in a project folder. Their NAS, however, ran full of data after a few years.

Files add up

Employees working on the projects have to be able to access the files pertaining to their work at any time. Keeping each project in its own folder was the easiest non-proprietory way to keep the files in order.

Unnoticed, those files added up. Soon the company began running out of NAS space. They were unsure how to approach the problem: The NAS itself offered no help.

A solution was required: The files had to be accessible without blocking disk space.

Professional Disk Space Management

The company’s administrators started looking for a solution for their problem and they found it: systematic and professional disk space management was the way to go. They started searching for the right software for the job ahead.

While looking for a solution the company’s administrators found TreeSize. The disk space management software offers different tools facilitating precise analyses of disk space usage and was the ideal application for the tasks ahead.

The administrators scanned the servers. Comparing file names and MD5 checksums they compiled a list of duplicates.

Finding and Getting Rid of File Duplicates

The TreeSize File Search offers several options how to proceed with duplicate files. Since deleting the files was not possible without breaking the law, the administrators decided on a combined approach.

Deduplicating: NTFS Hard Links

NTFS Hard Links can replace files stored on the same partition: They make files appear in another folder without creating a copy of it. Hard links placed in several different folders will show the same content – users can work with them just like they would with a normal file. In the background, however, only one instance of the file content takes up space on a hard disk. All instances share the same basic data – thus saving disk space.

NTFS hard links

Three instances of a file – one actual file content.

A file is permanently deleted once the last hard link to the file is gone.

TreeSize can replace duplicate files with NTFS hard links or symbolic links pointing to a single file.

Deduplicating: Symbolic Links

Hard links only work within the same volume, so files are stored on different volumes, symbolic links are the way to go. Each symbolic link contains the path to the original file. While they are more flexible when the storage place of their target is concerned, symbolic links will not update automatically if the target file is moved or deleted. Therefore, they should be used carefully.

Links as a Solution

The administrators decided to test whether TreeSize would manage to solve the problem and clean up the NAS.

Find duplicate files with the TreeSize File Search.

TreeSize offers several solutions for duplicate content.

First trial runs showed a remarkable amount of freed-up disk space. The company therefore decided to go ahead with the deduplication.

Free Server Space

Using TreeSize the administrators were able to reduce the load on the company NAS and recover disk space formerly blocked by duplicates.