[X-Unix] Finding duplicates

Philip J Robar philip.robar at myrealbox.com
Mon Jun 28 14:01:13 PDT 2004


On Jun 28, 2004, at 9:44 AM, peter boardman wrote:

> Is there a clever way to find files that are identical to each other?
> (I know that diff can compare two files, I’m wondering how to find
> them...)

You could do this with ksh, awk, Perl, etc. Something like this comes 
to mind:

for all files below this point
	filesChecksum = checksum file				# check sum almost always uniquely 
identifies a file
	files{fileName} = filesChecksum			# Save a file's checksum with its 
name as the index
	increment checksums{filesChecksum}		# count how many times we've seen 
this checksum

for every checksum in checksums
	if checksums{checksum} > 1				# Found duplicate files - more then one 
file had the same checksum
		for every file in files					# Look for them in the array of files
			if files{file} equals checksum		# Found one of the duplicates
				print  "duplicates:" file checksum


I found several OS X programs on Version Tracker and MacUpDate that do 
what you want.


Phil



More information about the X-Unix mailing list