Winter 2017 - January to April 2017 - Updated 2019-01-06 04:26 EST
find
IndexThis is optional material for CST8207
The Problem:
The
find
command is showing me pathnames. I could use the mouse to copy-and-paste these pathnames into manycp
commands, but surely there must be a way to automate this? Can thecp
command select file names the same way thatfind
can?
The idea of Unix/Linux is that every command does one thing well, so they don’t put features of find
into cp
. You use find
to generate the names and you use cp
to copy the names. The trick is getting the names generated by find
to be used by cp
.
For an introductory assignment, I don’t expect more knowledge than copy and paste using your mouse, but that’s not how a real sysadmin would do it. Here are some optional hints on how a real sysadmin would get the pathnames copied without using a mouse or copy-and-paste.
find -exec
IndexThe designers of the find
command built in a mechanism to run a command using the pathnames that find
finds. It’s the -exec
option. Go read man find
and look at how -exec
works. The man page for find
has one example in the EXAMPLES
section of the man page (along with lots of other uses of find
) and you can actually use this example to run file
on a whole bunch of files:
find . -type f -exec file '{}' \;
You can append the above -exec
and following arguments to any already-working find
command you have, replacing the .
starting point and -type f
expression in the example with your own starting point and expression to find the pathnames you want. The find
command line with the above added -exec
expression will then run file
on each of the pathnames found by find
, one at a time.
The find
command will run the -exec
command once per pathname. The pathname generated by find
is inserted into the -exec
command line where that quoted set of braces is. You might be able to see it better if you insert an echo
in front of the command line being run by find
, to echo on your screen the command that is being built and executed:
find . -type f -exec echo file '{}' \;
(Make sure you get this simple -exec echo file
example working on your own set of pathnames before you try to modify it to do something more complicated such as a file copy.)
But of course you don’t want to simply run file
on each pathname; you want to copy each pathname into a single destination directory. I’ll leave most of this as an “exercise for the student”, with the following hint:
find
will put the source pathname argument to cp
; what is missing in the above line that uses file
is the destination directory needed by cp
. You will have to add the destination directory name in the right place and also change the command name file
to be the command name cp
in the above line. Leave the echo
ahead of the command line you are building until you see find
generate on your screen the cp
command lines that you know will work, then take out the echo
and let find
run the multiple cp
commands for you.The above is just one way to automate the copy by having find
do the work for you. It has the disadvantage that it runs a separate cp
command for every pathname find
finds, which is no problem if there are only three pathnames but is a huge problem if there are a million pathnames because find
will have to run cp
a million times (and that takes time).
Modern versions of find
have a modified -exec
statement ending in +
instead of ;
that can pack multiple file names into the same command execution, reducing the number of times the command has to be executed by increasing the number of pathnames passed to each execution:
find . -type f -exec file '{}' +
This works similarly to xargs
, which is described next:
xargs
IndexIf you have a million files to copy, using find
with the traditional version of -exec
is not the way to do it, since you will have to call and run the cp
command program once per pathname, and that means running cp
a million times. Even if cp
did nothing, it would take a long time to re-execute cp
a million times. We can do this more efficiently.
The cp
command is designed to allow multiple source pathnames if they are all being copied into the same destination directory. We could reduce the number of cp
commands run if we could put multiple source pathnames into each cp
command line. If we could fit a million source pathnames on one cp
command line, we would only need one single cp
command to do the work. This is a huge savings compared to running cp
a million times.
Alas, most Unix systems have a limit on the total length of a command line. You can’t fit a million pathnames on one single cp
command line. This is why the xargs
program was written.
The xargs
program reads a (usually large) list of pathnames from standard input. It will read those pathnames and pack a command line with as many of those pathnames as can possibly fit, then call the command, then repeat with another large number of pathnames, and repeat again until all the pathnames are processed. By packing each command line as full of pathnames as it possibly can, it uses the minimum number of commands needed to get the job done.
See the man xargs
and look at the EXAMPLES section for examples using find
to generate pathnames that get sent into xargs
. Sysadmin always use the -print0
option to find
and the -0
option to xargs
so that blanks in pathnames don’t cause problems. (See the man pages.)
Since xargs
can only add lists of pathnames to the end of a command line (where most commands expect them), this poses a problem for a file copy that expects all the source filenames to precede the destination directory name. The maintainers of cp
invented the -t
option to cp
so that you could specify the destination directory first on the command line, allowing all the source pathnames to be stacked at the end just the way xargs
generates them:
$ cp -t /tmp file1 file2 file3 # file4 file5 etc...
You need to use the -t
option when you use cp
inside xargs
so that the list of source pathnames can appear at the end of the command line.
Again, insert echo
at the start of your xargs
command lines (and start with only a few pathnames on standard input, not hundreds) until you see echoing on your screen the command lines you know will work. Then take out the echo
and feed the full list of pathnames.
As described in the previous section, modern versions of
find
have a modified-exec
statement ending in+
instead of;
that can pack multiple file names into the same command execution, reducing the number of times the command has to be executed by increasing the number of pathnames passed to each execution.
$(command)
IndexThe shells have a command substitution feature that lets you take the standard output of any command and insert it into a command line. (See the heading Command Substitution in man bash
, and also previous class notes such as CST8207 Command Substitution or CST8129 Command Substitution.)
You might think of using this handy feature to take the standard output of find
(a list of pathnames) and insert it into a cp
command line. This command substitution might work, but it has serious limitations:
In other words, command substitution only works sometimes, where the other two solutions presented earlier work every time (provided you use -print0
in your find
command!).
Since sysadmin want solutions that always work and won’t mysteriously start failing in the future, avoid using command substitution to naïvely generate pathnames needed by other commands if those pathnames might ever contain blanks or other shell meta-characters, or if the list of pathnames might be very large. The embedded blanks and shell meta-characters in the pathnames, or the sheer number of pathnames, will some day cause errors if you rely on command substitution.
(With correct use of shell options to turn off file GLOBbing and suppress the splitting of words on blanks, you can almost safely write a shell script that does use command substitution and pathnames, but it isn’t pretty, doesn’t work for file names with newlines in them, and the options used are unsuitable for interactive shell use. It can still stop working if the list of pathnames is longer than is allowed on a command line. Don’t do it!)