Yesterday, I made one of those classic command-line blunders that anyone with root access to a *nix shell makes from time-to-time. Basically, instead of doing:
sudo cp -r ./usr/* /usr/local/
I did:
sudo cp -r /usr/* /usr/local/
Notice the missing ‘.’ in the second command line!
The really embarrassing part is that I only realised my error after about 10 seconds despite the fact that I was only copying a handful of files. Anyhow, mistakes like this are an opportunity to learn, so how do I fix this?
My first attempt looked like this:
#!/bin/bash
count=0
for file in `cat usr_files` # Where usr_files contains the result from find /usr/ -type f
do
filename=`basename $file`
path=`echo $file | cut -d '/' -f3-
found=`find /usr/local -name $filename`
if [ -n "$found" ]; then
csfile=`cksum $file | awk '{print $1}'`
csfound=`cksum $found | awk '{print $1}'`
if [ "$csfile" == "$csfound" ]; then
echo "Match: deleting duplicate file: $found"
rm -f $found
let count=$count+1
else
echo "$filename: No match"
fi
fi
done
echo "$count files deleted"
This kind of works, but there are a couple of problems with this line:
found=`find /usr/local -name $filename`
- It makes the script horribly slow because it searches through some 100k+ files on each iteration of the for loop
- It contains a nasty bug in that if we have a file in the /usr tree:
/usr/foo, the script will find/usr/local/fooand/usr/local/bar/fooand/usr/local/baz/bar/foowhen all we really want to delete is/usr/foo
In order to rectify these problems, I needed to modify the script, so that it took a path of the form /usr/baz/bar and created a comparison path /usr/local/baz/bar. I could have used awk, but the *nix cut command gets things done in a quicker and simpler fashion:
path=`echo /usr/baz/bar | cut -d '/' -f3-`
target="/usr/local/$path"
Gives us $target as /usr/local/baz/bar. cut -d '/' tells cut to use / as a field delimiter, and -f3- means from field 3 (including delimiters!) onwards so we get baz/bar.
The final script looks like this:
#!/bin/bash
deleted=0
count=0
for file in `cat usr_files`
do
echo "Processing file $count of 500000:"
let count=$count+1
filename=`basename $file`
path=`echo $file | cut -d '/' -f3-`
target="/usr/local/$path"
if [ -f "$target" ]; then
csfile=`cksum $file | awk '{print $1}'`
cstarget=`cksum $target | awk '{print $1}'`
if [ "$csfile" == "$cstarget" ]; then
echo " Match: deleting duplicate file: $target"
rm -f $target
let deleted=$deleted+1
else
echo " No match"
fi
fi
done
echo "Finished! $deleted files deleted"
This still took a couple of hours to run on my 2.0GHz Core Duo laptop, but nothing seems to have broken …. yet!
Anybody got any suggestions for further improvements?