% Searching for and finding files by name, size, userid, modify time, etc. % Ian! D. Allen -- -- [www.idallen.com] % Winter 2016 - January to April 2016 - Updated 2018-10-10 00:34 EDT - [Course Home Page] - [Course Outline] - [All Weeks] - [Plain Text] Searching for and finding files =============================== > | How can we look for a file name? > | What if we don't know which directory that file is located in? > | Can we start a `fgrep` at the root and ask the `fgrep` command to look in > all subdirectories? People confuse `fgrep`, which looks for text inside files (and doesn't look at the file names), with `find`, which finds files by name (and doesn't look inside the files). The `fgrep` (and `grep` and `egrep`) commands look for patterns or text *inside* files whose names you already know. They aren't directly useful for finding and generating the *names* of those pathnames. (To use `fgrep` to find the full name of a pathname, you would first have to use some other command to generate a list of *all* pathnames and feed that to `fgrep` in a pipeline as standard input -- see the example below.) You can tell `fgrep` to search the contents of an entire directory tree of files by turning on the `fgrep` "recursive" option; but, that won't help you find the *name* of a pathname in that directory. The `fgrep` command searches content *inside* files, it doesn't find the *names* of files. To find pathnames by name you can use the `find` command. If the pathname has existed for some time and is saved in the right database, you might be able to use the faster `locate` or `slocate` commands to search by name. To find pathnames by anything other than name, e.g. size, or owner, or modify date (etc.), use the `find` command with the right expression. See below. Five common ways to use the `find` command ========================================== The `Usage` line for `find` given below is abbreviated from `man find`: Usage: find [options...] [startdir...] [expression] The `[startdir...]` is an optional list of *starting directories* in which `find` will do the search, instead of using the current directory. You rarely need to use any options, so the first thing following the `find` command name is usually the one directory or list of directories in which to look. The current directory is the default. The `find` command has its own huge set of expressions for finding pathnames precisely and efficiently. The expressions follow the *starting directories* on the command line. Below are five important uses of `find`, each explained in detail. You can find: 1. all pathnames under a list of *starting directories* 2. only pathnames containing a particular `basename` pattern 3. only pathnames owned by a particular `userid` 4. only pathnames modified within some number of days 5. only pathnames with a size greater than some number 1. find [startdir...] -print 2. find [startdir...] -name 'basename' -print 3. find [startdir...] -user 'userid' -print 4. find [startdir...] -mtime -30 -print 5. find [startdir...] -size +100M -print The optional *startdir...* list in which to search comes first, followed by the optional *expression* that says what to find. If you don't specify any *startdir*, `find` uses `.` (the current directory). The optional *expression* must *follow* the *starting directories*. The *expression* limits the pathnames that are found or changes the output format. It consists of keywords, each preceded by dashes and usually followed by some argument, e.g. `-name 'basename'`, `-size +100M`, `-print`, or `-ls`. 1. Without any *expression*, `find` finds *all* pathnames. With modern versions of `find`, you can omit the `-print` keyword; it's the default behaviour. 2. The `-name` expression allows you to give a matching pattern that is the *basename*, found in any directory, starting from each of the *starting_directories*. The *basename* patterns can include shell-GLOB-style path metacharacters such as `*` and `?`, and the patterns must be quoted to protect them from GLOB expansion by the shell, e.g. `find -name '*.txt'` 3. The `-user` expression allows you to give a userid that must be the owner of the pathnames found, e.g. `find -user 'root'` 4. The `-mtime` expression allows you to give a modify time expression. Using `+10` means "older than 10 days" and `-5` means "younger than 5 days, etc., e.g. `find -mtime +365` 5. The `-size` expression allows you give a size number, to match pathnames based on their rounded-up size. The size can have various size multipliers such as `M` for "MegaByte", and the actual size of the file is rounded up to that multiplier before comparing. A leading minus on the number means "less than" and a leading plus on the number means "greater than" the given rounded size. The expression `-size -100k` means "rounded-up size less than 100 kilobytes". Using `-size 0` is a useful expression, to find pathnames that are empty (zero size), and `-size +0` finds pathnames that are *not* empty (size greater than zero). Note that the size rounding means that `-size -1M` only matches zero-size files (because even a one-byte file rounds up to 1M and doesn't match)! If you really want to see all files smaller than 1M, you have to avoid the rounding and use 1024x1024 characters: `-1048576c` See the man page for more help, and search the net for examples. For example, the `-type f` and `-type d` expressions are useful for finding only file names or only directory names: $ find . -type f $ find . -type d The `find` command has expressions that can find pathnames based on any combination of any of the attributes you see in the output of `ls -dils`. Using multiple expressions ========================== You can use multiple expressions, and the pathnames found must meet *all* the conditions of the expressions used, e.g. $ find /bin /etc/ -name '*word' -user 'root' -size +1k With more syntax, you can also have `find` show pathnames that match one expression *or* another expression, or any Boolean combination of expressions. See the man page. Showing detailed output using `-ls` =================================== The `find` command can output detailed attribute information about the pathnames it displays using the `-ls` expression instead of using the default `-print` expression: $ find . -ls The detailed output is similar to the output you would get if you typed `ls -dils` for the displayed names: $ find /etc/passwd -ls 2101779 4 -rw-r--r-- 1 root root 2879 Oct 4 10:59 /etc/passwd $ ls -dils /etc/passwd 2101779 4 -rw-r--r-- 1 root root 2879 Oct 4 10:59 /etc/passwd This is the option to use if you want to display attribute information about the pathnames as well as the names. Examples of uses of `find`, including World-Writable ==================================================== You can try these examples. Some will produce error messages as well as pathnames, since you don't have permission to search all the system directories. Ignore the errors (or redirect standard error to `/dev/null`); look at the results: $ find /bin -name '*sh' /bin/bash /bin/dash /bin/static-sh /bin/sh /bin/rbas $ find /bin -type f -size +500k /bin/bash /bin/busybox $ find /tmp -maxdepth 1 -user root -type d /tmp /tmp/.X11-unix /tmp/.ICE-unix /tmp/ssh-mZgPJ11302 In all the examples below, in modern versions of `find`, you can leave off the default `-print` action: - `find . -print`      or      `find .`      or simply      `find` - list pathnames under the current directory (the default starting directory for `find`) - `find /etc -name 'passwd' -print` - pathnames under directory `/etc` ending with basename `passwd` - `find /etc /lib -name '*.conf' -print` - pathnames under directories `/etc` or `/lib` ending in `.conf` - `find /bin -name '?ash' -print` - pathnames under directory `/bin` with four-character basenames ending in `ash` - `find /var/mail -user root -print` - pathnames under directory `/var/mail` owned by the `root` user - `find "$HOME" -mtime -30 -print` - pathnames in your HOME directory modified within last 30 days ("*less than* 30") - `find . -type f -print` - show only files under the current directory, not directories or other things - `find . -type d -print` - show only directories under the current directory, not files or other things - `find . -size -100M -print` - pathames with size *less than* 100 MB - `find . -size +1k -print` - pathames with size *greater than* 1 KB - `find . ! -type l -perm /o+w -ls` - world-writable (other-writable) pathnames (the `!` means not symlinks) - put this in your notebook; you will need it later in the course - `find /bin /etc/ -name '*word' -user 'root' -size +1k -ls` - you can combine multiple `find` expressions and only pathnames that meet *all* the criteria will be listed > If you use a pattern in the `-name` expression, remember to quote the > pattern to protect any GLOB pattern characters from expansion by the shell! > The characters need to be quoted to be GLOB-expanded by `find`, not by the > shell. Using `fgrep` on the output of `find` ===================================== While `find` has powerful pattern matching expressions, some people prefer to pipe the pathname output of `find` into one of the `grep` family of text searching programs because they better know how to use `grep` regular expression pattern matching. You can generate a list of all pathnames under a given directory using the `find` command and then use `grep` on that piped output, e.g. $ find ... all pathnames under the current directory list here ... $ find "$HOME" ... all pathnames under your HOME directory list here ... $ find /bin | wc -l # count pathnames under /bin 107 $ find /bin | fgrep 'sh' # only pathnames containing 'sh' /bin/bash /bin/dash /bin/static-sh /bin/sh /bin/rbash /bin/sh.distrib $ find /bin | grep 'sh$' # only pathnames ending in 'sh' /bin/bash /bin/dash /bin/static-sh /bin/sh /bin/rbash > The `grep` program uses a pattern matching language similar to GLOB > patterns but more powerful called **Regular Expressions**. Don't use `grep` > to look for text until you become familiar with this pattern matching > language; use the safer `fgrep` command instead. NOT running `find` on the ROOT directory ---------------------------------------- Yes, you can do `find /` (find, starting at the top-most ROOT directory) and it will generate a list of all the pathnames on the *whole machine* that you have permissions to see -- tens of thousands of them. This will take a long time. (You will also see many error messages about permissions, since your userid does not have permissions to look in every directory on the whole system.) Don't run `find /` on a shared computer unless you really have to. (But feel free to try it on your own machine!) Finding files using the `locate` or `slocate` commands ====================================================== Many Unix systems run a weekly or nightly `find /` late at night and save the results in a small database. The `locate` or `slocate` commands can quickly search that saved database for you much faster than `find`, even using a file GLOB pattern, e.g. $ locate passwd | less ... see all the names containing the string "passwd" here ... $ locate '/etc/*passwd*' /etc/passwd /etc/passwd- /etc/cron.daily/passwd /etc/dovecot/conf.d/auth-passwdfile.conf.ext /etc/init/passwd.conf /etc/init.d/passwd /etc/pam.d/chpasswd /etc/pam.d/passwd /etc/security/opasswd Note the use of quotes to stop the shell from interpreting the GLOB pattern. (We want the `locate` command to process the GLOB pattern against the pathnames in the database; we do not want the shell to process the GLOB pattern against current pathnames in the file system before it calls the `locate` command.) If you are looking for a pathname that has been around for a while and is entered into the database, the `locate` database lookup is much, much faster than a huge ROOT `find /`. If you are looking for a new pathname that isn't in the locate database yet, only `find` will find it for you. -- | Ian! D. Allen, BA, MMath - idallen@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Free/Libre GNU+Linux) at: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ [Plain Text] - plain text version of this page in [Pandoc Markdown] format [www.idallen.com]: http://www.idallen.com/ [Course Home Page]: .. [Course Outline]: course_outline.pdf [All Weeks]: indexcgi.cgi [Plain Text]: 180_finding_files.txt [Pandoc Markdown]: http://johnmacfarlane.net/pandoc/