Fixing *nix blunders (Pt 1)

Yesterday, I made one of those classic command-line blunders that anyone with root access to a *nix shell makes from time-to-time. Basically, instead of doing:

sudo cp -r ./usr/* /usr/local/

I did:

sudo cp -r /usr/* /usr/local/

Notice the missing ‘.’ in the second command line!

The really embarrassing part is that I only realised my error after about 10 seconds despite the fact that I was only copying a handful of files. Anyhow, mistakes like this are an opportunity to learn, so how do I fix this?

My first attempt looked like this:

#!/bin/bash
count=0
for file in `cat usr_files` # Where usr_files contains the result from find /usr/ -type f
do
    filename=`basename $file`
    path=`echo $file | cut -d '/' -f3-
    found=`find /usr/local -name $filename`

    if [ -n "$found" ]; then
        csfile=`cksum $file | awk '{print $1}'`
        csfound=`cksum $found | awk '{print $1}'`
        if [ "$csfile" == "$csfound" ]; then
            echo "Match: deleting duplicate file: $found"
            rm -f $found
            let count=$count+1
        else
            echo "$filename: No match"
        fi
    fi
done

echo "$count files deleted"

This kind of works, but there are a couple of problems with this line:

found=`find /usr/local -name $filename`
  1. It makes the script horribly slow because it searches through some 100k+ files on each iteration of the for loop
  2. It contains a nasty bug in that if we have a file in the /usr tree: /usr/foo, the script will find /usr/local/foo and /usr/local/bar/foo and /usr/local/baz/bar/foo when all we really want to delete is /usr/foo

In order to rectify these problems, I needed to modify the script, so that it took a path of the form /usr/baz/bar and created a comparison path /usr/local/baz/bar. I could have used awk, but the *nix cut command gets things done in a quicker and simpler fashion:

path=`echo /usr/baz/bar | cut -d '/' -f3-`
target="/usr/local/$path"

Gives us $target as /usr/local/baz/bar. cut -d '/' tells cut to use / as a field delimiter, and -f3- means from field 3 (including delimiters!) onwards so we get baz/bar.

The final script looks like this:

#!/bin/bash
deleted=0
count=0
for file in `cat usr_files`
do
    echo "Processing file $count of 500000:"
    let count=$count+1
    filename=`basename $file`
    path=`echo $file | cut -d '/' -f3-`
    target="/usr/local/$path"

    if [ -f "$target" ]; then
        csfile=`cksum $file | awk '{print $1}'`
        cstarget=`cksum $target | awk '{print $1}'`
        if [ "$csfile" == "$cstarget" ]; then
            echo "  Match: deleting duplicate file: $target"
            rm -f $target
            let deleted=$deleted+1
        else
            echo "  No match"
        fi
    fi
done

echo "Finished! $deleted files deleted"

This still took a couple of hours to run on my 2.0GHz Core Duo laptop, but nothing seems to have broken …. yet!

Anybody got any suggestions for further improvements?

Related Entries

About

I work at Birmingham Conservatoire as senior researcher and software development manager for the Integra Project. I live with my wife and three beautiful children in Birmingham, UK.» More...

Tag Cloud

Projects

-->
Close