Winter 2019 - January to April 2019 - Updated 2019-01-06 04:26 EST
find
IndexThis is optional material for CST8207
The Problem:
The
find
command is showing me pathnames. I could use the mouse to copy-and-paste these pathnames into manycp
commands, but surely there must be a way to automate this? Can thecp
command select file names the same way thatfind
can?
The idea of Unix/Linux is that every command does one thing well, so
they don’t put features of find
into cp
. You use find
to generate
the names and you use cp
to copy the names. The trick is getting the
names generated by find
to be used by cp
.
For an introductory assignment, I don’t expect more knowledge than copy and paste using your mouse, but that’s not how a real sysadmin would do it. Here are some optional hints on how a real sysadmin would get the pathnames copied without using a mouse or copy-and-paste.
find -exec
IndexThe designers of the find
command built in a mechanism to run a command
using the pathnames that find
finds. It’s the -exec
option. Go read
man find
and look at how -exec
works. The man page for find
has
one example in the EXAMPLES
section of the man page (along with lots
of other uses of find
) and you can actually use this example to run
file
on a whole bunch of files:
find . -type f -exec file '{}' \;
You can append the above -exec
and following arguments to any
already-working find
command you have, replacing the .
starting point
and -type f
expression in the example with your own starting point and
expression to find the pathnames you want. The find
command line with
the above added -exec
expression will then run file
on each of the
pathnames found by find
, one at a time.
The find
command will run the -exec
command once per pathname.
The pathname generated by find
is inserted into the -exec
command line
where that quoted set of braces is. You might be able to see it better
if you insert an echo
in front of the command line being run by find
,
to echo on your screen the command that is being built and executed:
find . -type f -exec echo file '{}' \;
(Make sure you get this simple -exec echo file
example working on your
own set of pathnames before you try to modify it to do something more
complicated such as a file copy.)
But of course you don’t want to simply run file
on each pathname;
you want to copy each pathname into a single destination directory.
I’ll leave most of this as an “exercise for the student”, with the
following hint:
find
will put the source
pathname argument to cp
; what is missing in the above line that
uses file
is the destination directory needed by cp
. You will
have to add the destination directory name in the right place and
also change the command name file
to be the command name cp
in the above line. Leave the echo
ahead of the command line you
are building until you see find
generate on your screen the cp
command lines that you know will work, then take out the echo
and let find
run the multiple cp
commands for you.The above is just one way to automate the copy by having find
do the
work for you. It has the disadvantage that it runs a separate cp
command for every pathname find
finds, which is no problem if there
are only three pathnames but is a huge problem if there are a million
pathnames because find
will have to run cp
a million times (and that
takes time).
Modern versions of find
have a modified -exec
statement ending in +
instead of ;
that can pack multiple file names into the same command
execution, reducing the number of times the command has to be executed
by increasing the number of pathnames passed to each execution:
find . -type f -exec file '{}' +
This works similarly to xargs
, which is described next:
xargs
IndexIf you have a million files to copy, using find
with the traditional
version of -exec
is not the way to do it, since you will have to call
and run the cp
command program once per pathname, and that means running
cp
a million times. Even if cp
did nothing, it would take a long
time to re-execute cp
a million times. We can do this more efficiently.
The cp
command is designed to allow multiple source pathnames if they
are all being copied into the same destination directory. We could
reduce the number of cp
commands run if we could put multiple source
pathnames into each cp
command line. If we could fit a million source
pathnames on one cp
command line, we would only need one single cp
command to do the work. This is a huge savings compared to running cp
a million times.
Alas, most Unix systems have a limit on the total length of a command
line. You can’t fit a million pathnames on one single cp
command line.
This is why the xargs
program was written.
The xargs
program reads a (usually large) list of pathnames from
standard input. It will read those pathnames and pack a command line with
as many of those pathnames as can possibly fit, then call the command,
then repeat with another large number of pathnames, and repeat again
until all the pathnames are processed. By packing each command line
as full of pathnames as it possibly can, it uses the minimum number of
commands needed to get the job done.
See the man xargs
and look at the EXAMPLES section for examples using
find
to generate pathnames that get sent into xargs
. Sysadmin always
use the -print0
option to find
and the -0
option to xargs
so
that blanks in pathnames don’t cause problems. (See the man pages.)
Since xargs
can only add lists of pathnames to the end of a command
line (where most commands expect them), this poses a problem for a file
copy that expects all the source filenames to precede the destination
directory name. The maintainers of cp
invented the -t
option to
cp
so that you could specify the destination directory first on the
command line, allowing all the source pathnames to be stacked at the end
just the way xargs
generates them:
$ cp -t /tmp file1 file2 file3 # file4 file5 etc...
You need to use the -t
option when you use cp
inside xargs
so that
the list of source pathnames can appear at the end of the command line.
Again, insert echo
at the start of your xargs
command lines (and
start with only a few pathnames on standard input, not hundreds) until
you see echoing on your screen the command lines you know will work.
Then take out the echo
and feed the full list of pathnames.
As described in the previous section, modern versions of
find
have a modified-exec
statement ending in+
instead of;
that can pack multiple file names into the same command execution, reducing the number of times the command has to be executed by increasing the number of pathnames passed to each execution.
$(command)
IndexThe shells have a command substitution feature that lets you
take the standard output of any command and insert it into a
command line. (See the heading Command Substitution in
man bash
, and also previous class notes such as
CST8207 Command Substitution
or
CST8129 Command Substitution.)
You might think of using this handy feature to take the standard output
of find
(a list of pathnames) and insert it into a cp
command line.
This command substitution might work, but it has serious limitations:
In other words, command substitution only works sometimes, where the
other two solutions presented earlier work every time (provided you use
-print0
in your find
command!).
Since sysadmin want solutions that always work and won’t mysteriously start failing in the future, avoid using command substitution to naïvely generate pathnames needed by other commands if those pathnames might ever contain blanks or other shell meta-characters, or if the list of pathnames might be very large. The embedded blanks and shell meta-characters in the pathnames, or the sheer number of pathnames, will some day cause errors if you rely on command substitution.
(With correct use of shell options to turn off file GLOBbing and suppress the splitting of words on blanks, you can almost safely write a shell script that does use command substitution and pathnames, but it isn’t pretty, doesn’t work for file names with newlines in them, and the options used are unsuitable for interactive shell use. It can still stop working if the list of pathnames is longer than is allowed on a command line. Don’t do it!)