Thursday, April 15, 2010

Using the find command

Today I was working on a problem with a coworker, and I saw him start to type the following:

find . | grep

I said, "no, I'm not going to let you do that. You need to do it the right way." (Yes, I do that sort of thing. Don't act like you didn't expect it.)

Don't get me wrong, there's nothing wrong with grep. It's a fine tool, and I encourage people to constantly try to become better with it. But it has its place, and this was not it. Using the find command properly in this case would result in less typing, and would spawn one less process. It may not be important for a one-line command, but in a larger script it might be more significant. Better to get in the right habit now, so that when you do find yourself working on that big script, you do it right the first time.

The find command is extremely powerful. Unlike locate, which uses a pre-built database of files and paths on your system, find searches your filesystem in real time, paying more attention to the individual files, and their properties. It may be slower than locate, but it's more accurate and far more flexible.

I see people use find mostly to search by filename, but it has plenty of other options. Let's start with filename and build from there. There are two relevant options:

-name
-iname

They are identical, except that -iname is case insensitive. Since files on a *nix system are traditionally all lowercase, you might want to save yourself a few processor cycles and just go with -name. If there's a chance that case may be an issue, use -iname instead.

find -name 'myfile.txt'

Using quotes is not strictly necessary with most filenames, but it's a good habit to get into. Keep in mind that by default, find searches by exact filenames. If you're not sure what the extension is, or you want to look anywhere in the filename, you can use globs:

find -iname '*myfile*'

The above commands will process files recursively, inside the current working directory. If you want to search a different directory, you need to specify it before any other options:

find /etc -name passwd

There are a few subtleties of find that you will encounter. They're not usually a big deal, but they can be annoying sometimes. For instance, find does not sort its results. If I expect a lot of results, I generally pipe it through the sort command. It also isn't very good at searching its own results, which is where grep can come in handy:

find / -name 'Net' | sort | grep -i perl

Now that you have the basics of find, let's explore some of the other options. Two that I use extensively are:

-ok
-exec

Again, these options are identical in purpose, but there is an important difference in how they behave. Both of them will execute a command on each file found, but -ok will ask for permission first (for each file) while -exec will just do it.

find /etc -name '*conf' -exec mv {} {}.orig \;

First off, everything between -exec (or -ok) and \; is the command that you want to run. Make sure you escape that semi-colon at the end with a backslash, or you'll be sorry. The {} is a placeholder for the filename that was found by find. In this case, we're actually going to be performing a series of commands that looks like this:

mv /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.orig
mv /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.orig
mv /etc/httpd/conf.d/perl.conf /etc/httpd/conf.d/perl.conf.orig
...SNIP...

You're not limited to just searching filenames by glob. The find command does actually have support for regular expressions, using the following:

-regex
-iregex

I don't think I need to tell you that -iregex is the case-insensitive version of -regex. If you already know how to use grep, this isn't much of a stretch:

find -regex ".*deskto."

The find command also supports boolean logic:

-and (or -a)
-or (or -o)
-not

Let's combine these with couple more options from the man page:

touch /tmp/jayceweb.tar.bz2
find / -user jayce -and -group apache -exec tar --remove-files -jrf /tmp/jayceweb.tar.bz2 {} \;

This is the sort of command a person might run if they found a user on their system that they didn't trust, and wanted to quaratine all of their web files. First we make an empty tar file, then we add the suspicious files to it, removing them once they've been archived. It assumes that the user that owns the files is jayce, and the group that owns the files is apache. You could also make use of:

-uid
-gid
-nouser
-nogroup

That's probably enough of a primer to get you started. Now would be a good time to check the man page for some of the other myriad options that you can use to check by date stamp(s), file size, file type and even permissions. A little practice with this powerful command will save you time and energy, and increase your productivity like you won't believe.