Results tagged “bash”

Yesterday, I made one of those classic command-line blunders that anyone with root access to a *nix shell makes from time-to-time. Basically, instead of doing:

sudo cp -r ./usr/* /usr/local/

I did:

sudo cp -r /usr/* /usr/local/

Notice the missing ‘.’ in the second command line!

The really embarrassing part is that I only realised my error after about 10 seconds despite the fact that I was only copying a handful of files. Anyhow, mistakes like this are an opportunity to learn, so how do I fix this?

My first attempt looked like this:

#!/bin/bash
count=0
for file in `cat usr_files` # Where usr_files contains the result from find /usr/ -type f
do
    filename=`basename $file`
    path=`echo $file | cut -d '/' -f3-
    found=`find /usr/local -name $filename`

    if [ -n "$found" ]; then
        csfile=`cksum $file | awk '{print $1}'`
        csfound=`cksum $found | awk '{print $1}'`
        if [ "$csfile" == "$csfound" ]; then
            echo "Match: deleting duplicate file: $found"
            rm -f $found
            let count=$count+1
        else
            echo "$filename: No match"
        fi
    fi
done

echo "$count files deleted"

This kind of works, but there are a couple of problems with this line:

found=`find /usr/local -name $filename`
  1. It makes the script horribly slow because it searches through some 100k+ files on each iteration of the for loop
  2. It contains a nasty bug in that if we have a file in the /usr tree: /usr/foo, the script will find /usr/local/foo and /usr/local/bar/foo and /usr/local/baz/bar/foo when all we really want to delete is /usr/foo

In order to rectify these problems, I needed to modify the script, so that it took a path of the form /usr/baz/bar and created a comparison path /usr/local/baz/bar. I could have used awk, but the *nix cut command gets things done in a quicker and simpler fashion:

path=`echo /usr/baz/bar | cut -d '/' -f3-`
target="/usr/local/$path"

Gives us $target as /usr/local/baz/bar. cut -d '/' tells cut to use / as a field delimiter, and -f3- means from field 3 (including delimiters!) onwards so we get baz/bar.

The final script looks like this:

#!/bin/bash
deleted=0
count=0
for file in `cat usr_files`
do
    echo "Processing file $count of 500000:"
    let count=$count+1
    filename=`basename $file`
    path=`echo $file | cut -d '/' -f3-`
    target="/usr/local/$path"

    if [ -f "$target" ]; then
        csfile=`cksum $file | awk '{print $1}'`
        cstarget=`cksum $target | awk '{print $1}'`
        if [ "$csfile" == "$cstarget" ]; then
            echo "  Match: deleting duplicate file: $target"
            rm -f $target
            let deleted=$deleted+1
        else
            echo "  No match"
        fi
    fi
done

echo "Finished! $deleted files deleted"

This still took a couple of hours to run on my 2.0GHz Core Duo laptop, but nothing seems to have broken …. yet!

Anybody got any suggestions for further improvements?

I’m sure there are a number of ways to change a directory of files to lowercase including the mmv utility for ‘wildcard’ renaming and copying. However, this simple one-liner is likely to work on a wider range of systems without installing additional programs since it only requires bash and tr:

for file in `ls`; do mv -i $file `echo $file | tr '[A-Z]' '[a-z]'`; done;

This will match any lowercase character in a filename in the current directory and convert it to lowercase. We can convert the other way by changing the arguments to tr. Variants can be created by filtering the results from ls, or changing the regular expression strings for tr. For example, to change the first letter of every directory in the current directory to upper case do:

for file in `ls -l | grep ^d | awk '{print $8}'`;
do mv -i $file `echo ${file:0:1} | tr '[a-z]' '[A-Z]'`${file:1}; done;

Merry Christmas!

UnRTF!

If you are a GNU/Linux user, and you have been working with a Mac user or Mac software, the chances are that you have received files in RTF format. This is because it’s the default format of the standard Mac ‘text’ editor: TextEdit. I have nothing against RTF as a format, it just annoys me when it is used for plain text documents that don’t have (or need) any formatting.

There are a number of ways to deal with RTF on your GNU/Linux system, you could open the document in a word processor such as OpenOffice writer, or AbiWord, or use the RTF editor Ted. However, if the document has no special formatting information in it or you don’t care about the formatting, you probably want to remove the RTF markup.

UnRTF

UnRTF an aptly-named GNU command-line application that will convert RTF files to a variety of useful output formats. These include HTML, plain text, text with VT100 codes, LaTeX, and PostScript. All but HTML are currently flagged as alpha, however for simple documents I have found them to be fine. To run UnRTF, you need to issue the following command from a shell prompt:

unrtf --text mydoc.rtf > mydoc.txt

This will convert the RTF file mydoc.rtf to plain text format. For complex documents one might wish to use HTML as a transitional format, and then make use of other tools. For example [htmldoc][5] could be used to convert to pdf:

unrtf mydoc.rtf |  htmldoc --webpage -f mydoc.pdf -

And maybe you could even use the wonderful [ImageMagick][6] convert utility to rasterize the data:

convert -density 196 mydoc.pdf mydoc.png

I’m not sure what the purpose of that would be, but I thought I’d throw it in for fun!

Despite the availability of check-as-you-type spellcheckers (even for vim, which I use for all text editing), I still prefer to write a document and spell check it afterwards. There are several reasons for this. The first is that when I am writing a document, I don’t want the flow of my writing, and my concentration broken every time I make a typing error. The second is that I use my editor for a variety of tasks including writing ‘normal’ documents, writing code in a variety of languages, and writing documents that contain code and a lot of technical acronyms. I don’t want the visual noise of seeing a lot of ‘mis-spellings’ that I want to ignore, or to have to keep training the spell checker to ignore them.

I therefore tend to write a document, and then spell check it afterwards. Aspell is perfect for this, it supports a variety of (spoken) languages and document encodings and run-together words. Most importantly, it supports a range of filters, which filter reserved words in specific documents so that they don’t get checked. Modes I often use are --mode=tex (filter latex keywords) --mode=sgml and --mode=ccpp.

Checking multiple documents

To check a selection of documents, the -exec flag of the find command is aspell’s friend. For example, to check every file in every directory in the current directory:

find ./ -exec aspell -d, --master=british -c '{}' \;

Here {} corresponds to the output of the find command, and we delimit it with single quotes to allow for filenames that contain spaces. ‘\;’ signifies the end of the command to be executed by find.

This can be further supplemented by find’s regex support to filter the search results, and of course, the aspell filters:

 find ./ -iregex '.*\.[hs][tg]ml' -exec aspell -d, --master=british --mode=sgml -c '{}' \;

This will recursively find all files in the current directory and the directories beneath it that have the extension ‘.html’ or ‘.sgml’ (case insensitive), and run aspell on them ignoring all HTML and SGML tags.

grep vs 'grep -E'

Well this might seem blindingly obvious to some, but I was stumped for half an hour this morning trying to figure out why ‘grep he+’ wasn’t finding the word ‘hello’ in a test file when it was obviously there. The answer was that I should have read the manpage very carefully. The all-important paragraph reads as follows:

grep understands three different versions of regular expression syntax: “basic,” “extended,” and “perl.” In GNU grep, there is no difference in available functionality using either of the first two syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards.

Confused? So was I until I read the Wikipedia article on regex, which states that:

The basic Unix regular expression syntax is now defined as obsolete by POSIX, but is still widely used for backwards compatibility. Many regular-expression-aware Unix utilities including grep and sed use it by default while providing support for extended regular expressions with command line arguments.

So, the moral of the story is: if you want to use the regular expression syntax you found on that cheat sheet you downloaded, use ‘grep -E’!

1
Close