De-duping iTunes

Every so often I notice that my iTunes library has a lot of duplicate items and I need to get rid of things. It’s mostly a task for finding an audio file that has the same name as another audio file but with a space and digit added right before the extension. Song.mp3 and Song 1.mp3 are examples of this problem. They’ll be in the same directory; duplicates in different directories are a different matter and I’ve done that too by comparing file digests. That’s not this problem.

Here’s what I did. It’s not pretty because I didn’t care to make it so. It’s not exciting because I wasn’t researching anything or exploring an idea. It’s just getting rid of files. I solved the problem and moved on.

I develop these little tools incrementally. Can I get all the files? Can I find the files that have the name without the space and digit. I program a step and check the result. Then I add the next step. The process drives the procedural structure.

And during this process I forked then completely rewrote a File::Find module. You don’t need my version for this though. The one zef install File::Find gives you should be fine.

A few quick notes:

  • I really like the #`( ... ) comments
  • An array interpolated into a pattern is an alternation of the elements
  • .IO objects know how to make or tear apart paths.

use File::Find:auth<bdfoy>; #

#`( iTunes loves screwing up files. I get a file imported multiple
times instead of realizing it's already in there.

	some song.mp3
	some song 1.mp3

Let's find these pairs where the names differ by the addition of
a space and a single digit (although I've had problems like this with
as high as 4.
my $dir = '/Users/brian/Dropbox/iTunes/Music';
my $target = '/Users/brian/BackupMusic';

I actually used this module to count extensions and these are the
one I want to focus on. An array in a regex is an alternation.
my @extensions = <mp3 m4a="" m4p="">;
my $sequence := find(
	dir  => $dir,
	name => / \h '1' '.' @extensions $ /,

my $count = 0;
my $dry-run = True; # try it before we move any files
for $sequence -> $file {

	other = the file of the same name without a numbered copy
	if that file doesn't exist we don't have a problem
	my $other = $file.subst:
		/ \h '1' '.' (@extensions) $ /,
	my $exists = $other.IO.e;
	next unless $exists;

	put '-' x 50;
	put "file: $file";
	put "other: $other ($exists)";

	We need the part after the starting directory because we'll add
	that to the new target directory. We might have to make a subdir
	my $rel = $file.IO.relative: $dir;

	my $new = $rel.IO.absolute: $target;
	my $new-dir = $new.IO.parent.IO;
	$new-dir.mkdir unless $new-dir.e;
	$other.IO.rename( $new ) unless $dry-run;

say "Found $count files";

Here’s the program I used to survey the file extensions I used to populate @extensions:

use File::Find:auth<BDFOY>; #
use PrettyDump; 

my $list := find( dir => $*HOME.child( 'Dropbox/iTunes/Music' ) );

my %extensions;
for $list -> $item {
	next if $item.d;
	%extensions{ $item.extension }++

pd %extensions;

And here’s what I had done the last time I had this problem. It’s Perl 5 interpreting the results of an external find. I think last time I ended up throwing an unlink in there at some point:

use v5.14;
open my $fh, 'find . -name "* [1234].m[p4][ap3]" |';

while( <$fh> ) {
	my $other = s/\s+(\d+)(?=\.[^.]+\z)//r;
	next unless -e $other;
	print "$_\n\t$other\n";

close $fh or die "Error in Find!"

I’ll probably lose my Perl 6 program, forget I wrote this, and recreate this in nine months. I might even post about it again.


    1. You’re right. $*SPEC is deprecated in 6.d and is scheduled for removal in 6.e. I already have bad habits in Perl 6!

Leave a Reply

Your email address will not be published. Required fields are marked *