Winter 2017 - January to April 2017 - Updated 2019-03-10 16:44 EDT
Your shell has a pathname-matching (wildcard) feature that makes operating on large numbers of pathnames easy:
$ cp a* .. # copy all names starting with 'a' to the parent directory
The Unix name for wildcard pattern matching is GLOBbing, from the idea that the pattern matches a “global” list of names. Other operating systems may call these wildcard characters.
GLOB patterns do not only match file names. The names being matched by a GLOB pattern can be anything: files, directories, symbolic links, etc. Sometimes we say that GLOB patterns match “file names”, but what we really mean is that they match any kind of name.
The shell will try to expand any tokens on the command line that contain unquoted GLOB characters into existing pathnames in the file system. The shell will try to match the GLOB patterns in those tokens against existing pathnames in the file system to produce a list of existing pathnames.
GLOB patterns cannot generate any names that do not exist. The GLOB patterns must always match existing names.
a
or the digit 1
have no special meaning to the shell and they are not shell metacharacters. The symbol asterisk (or star) *
is a metacharacter, and so are semicolons, blanks/spaces and many other symbols. You can usually turn a metacharacter into an ordinary non-special character by quoting it. (See below for “quoting”.)
one two
contains two (blank-separated) tokens. The unquoted string one;two
contains three tokens, because the shell treats the metacharacter semicolon ;
specially and splits the string into three pieces: the word one
, the meta-character ;
, and the word two
. The four-character string cd..
contains only one token. (The shell does not split the cd
away from the ..
part – it’s all treated as one word because neither letters nor periods are meta-characters to the shell.)
Turning off the special meaning of metacharacters by surrounding them with single or double quotes, or putting a backslash in front of them:
$ echo 'This is a semicolon; the quotes hide it from the shell.'
$ echo "This is a semicolon; the quotes hide it from the shell."
$ echo This is a semicolon\; the backslash hides it from the shell.
.
) are not shown by default by some commands (e.g. ls
) and leading periods are never matched by GLOB metacharacters, even GLOB lists that contain a period such as [.]
. Names staring with periods are called “hidden” names, e.g. hidden files and hidden directories.
These are the GLOB metacharacters recognized and processed by the shells:
* - (asterisk, star) matches zero or more of any characters
? - (question mark) matches any one (1) character
[list] - (square brackets)
- Matches any one (1) character in the list of characters, e.g.
[abc] matches one a or one b or one c (only one of the three).
- WARNING: The list can contain a range of characters, but don't
use ranges until you read more on Internationalization!
- The list can be inverted/complemented by using ! at the start,
e.g. [!abc] means "any one character not a or b or c".
The shell always processes GLOB characters that it finds on the command line, even for commands that do not take pathnames. (The shell doesn’t know which commands do or do not take pathnames.) For example:
$ mail *
will call the mail program and give it all the non-hidden names in the current directory, which makes no sense because mail programs want email addresses, not pathnames. The shell can’t know which programs want pathnames and which ones don’t, so it always expands GLOB patterns!
GLOB metacharacters never match the leading period on any hidden names, so you don’t have to worry about matching the names .
or ..
accidentally.
*
to match any number of any charactersIndexAs a GLOB metacharacter, the asterisk *
matches zero or more of any character in a name, including spaces or other strange characters. The *
never matches the leading period on a hidden name, so echo *
never shows any names starting with a period.
The GLOB pattern *foo
matches non-hidden names ending with foo
, including foo
, xxxfoo
and 123foo
.
The GLOB pattern foo*
matches names beginning with foo
, including foo
, fooxxx
and foo123
.
The GLOB pattern *foo*
matches non-hidden names containing foo
anywhere, including foo
, fooxxx
and 123foo
and ZZZfoo@@@
.
?
to match only one single character, any characterIndexAs a GLOB metacharacter, the question mark ?
matches exactly one of any character in a name, including a space or other strange character. The ?
never matches the leading period on a hidden name, so echo ?
never shows the current directory name .
that is a single period.
The GLOB pattern ???
matches non-hidden names that are exactly three characters long.
The GLOB pattern ???*
matches non-hidden names that are three or more characters long.
[
list
]
to match single characters from a listIndexAs a GLOB metacharacter pair, the square brackets [
list
]
match exactly one character in a name from a list of characters. The list of characters can never match the leading period on a hidden name, so echo [.]
never shows the current directory name .
that is a single period.
The GLOB pattern [abc]
does not match the three-character name abc
; it matches only the one-character names a
, or b
, or c
:
$ touch a b c abc
$ echo [abc]
a b c
No matter how many characters are in the list, a [
list
]
pattern will only match exactly one of the characters in the list, not more than one, not less.
The GLOB patterns [aA]
and [a][A]
are very different:
[aA]
is one list, so it matches only one-character names. It matches any one character from the list [aA]
, so it matches any one-character name that is either a
or A
.[a][A]
is made up of two lists, so it matches only two-character names. It matches only the two-character name aA
that is made up of a
(taken from the first list) followed by A
(taken from the second list). It only matches aA
.Having a GLOB square bracket list with only one character in it, e.g. [a]
, is not usually useful, since it matches exactly one character a
, so rather than write *[a]bc
use the equivalent and much simpler *abc
that matches exactly the same names.
Aside: There are shell options that affect how GLOB patterns are evaluated, and some of these shell options (e.g.
nullglob
) may make single-character lists useful in special cases. Don’t worry about that until you are more advanced.
No matter how many characters are in the list, a []
pattern will only match exactly one of the characters, not more than one, not less.
[...]
listsIndexYou can use a dash -
to indicate a range of digits inside a []
list, e.g. [1-5]
, but don’t use ranges of letters, e.g. [a-c]
unless you fully understand the effects of your machine’s internationalization locale setting.
For many (most?) modern Linux machines with a modern locale setting (e.g. en_US.utf8
), the trivial character range [a-c]
actually matches the five characters: a A b B c
! Don’t use character ranges!
For a fuller explanation see http://teaching.idallen.com/cst8177/15w/notes/000_character_sets.html
[]
to match case insensitiveIndexNormally, GLOB patterns are case-sensitive so that abc*
does not match any of the names ABC
, aBc
, Abc
, etc. If you want to match both upper- and lower-case letters in names, make each letter into its own little two-character []
list so that the list matches either upper- or lower-case:
$ touch abc aBc aBC ABc ABC Abc
$ echo abc*
abc
$ echo [aA]bc
Abc abc
$ echo [aA][bB]c
ABc Abc aBc abc
$ echo [aA][bB][cC]
ABC ABc Abc aBC aBc abc
If you are matching any letters, not specific ones, you can use POSIX character classes such as [:upper:]
, [:lower:]
or [:alpha:]
inside the []
list:
$ touch a b c A B C abc ABC 1 2 3 123 1a 2b 3c x1 y2 z3
$ echo [[:lower:]]
a b c
$ echo [[:upper:]]
A B C
$ echo [[:alpha:]]
A B C a b c
$ echo [[:alpha:]]*
ABC B C a abc b c x1 y2 z3
$ echo *[[:alpha:]]
ABC B C a abc b c 1a 2b 3c
No matter how many characters are in the list, a []
pattern will only match exactly one of the characters, not more than one, not less.
Until you are sure you know how the shell uses GLOB patterns to match names, use the echo
or ls
command to see what names are being matched (if any). Before you do any of these commands using a GLOB pattern:
$ rm [abc]* # DON'T DO THIS until you verify the GLOB pattern!
$ touch [abc]* # DON'T DO THIS until you verify the GLOB pattern!
$ cp [abc]* ~/savedir/ # DON'T DO THIS until you verify the GLOB pattern!
try one of these harmless display commands to verify that the GLOB pattern matches the correct names:
$ echo [abc]* # this verifies that the GLOB pattern works
$ ls -d [abc]* # this verifies that the GLOB pattern works
(The -d
option makes ls
show only the names of any directories, not their contents as it usually does.)
If either of these harmless commands displays on your screen the correct list of matching names, then you can replace it with the rm
command (or whatever command you really wanted to use).
If there are too many pathnames to view comfortably on your screen, you can use other commands to limit the display or count the names:
$ ls -d [aA]* | less # view all the pathnames in a pager program
$ ls -d [aA]* | head # look at first ten pathnames
$ ls -d [aA]* | tail # look at last ten pathnames
$ ls -d [aA]* | wc # count the pathnames; is the count right?
$ cp [aA]* ~/savedir/ # if GLOB is correct, use the actual command
Make sure that the GLOB pattern is correct before you use it.
Unlike DOS/Windows wildcarding, the GLOB features are done by the shell, not by the individual programs called by the shell.
The shell will do this GLOB expansion irrespective of the name of the command being used, even if the command being used does not accept or process pathnames. The shell does not know which commands expect pathnames; the shell always expands tokens that have GLOB patterns in them, no matter what the command is. Typing:
$ wc -l *
counts the lines in all the non-hidden names in the current directory, and using pathnames with wc
is useful. Typing the following commands (that will get the same list of GLOB-expanded pathnames as wc
, but that do not operate on pathnames) is probably not going to work well:
$ echo * # first verify what the GLOB pattern * matches
$ wc * # yes, wc expects pathnames and this works - good
$ kill * # probably wrong - kill expects process numbers!
$ mail * # probably wrong - mail expects mail addresses!
In all the command lines above, the shell GLOB expansion delivers the same pathnames to the command being typed. Two of the commands will likely not work, because these commands don’t take pathnames as arguments.
If you do not want GLOB processing to happen, hide the GLOB characters from the shell by using quoting – surround the token with single or double quotes or precede each GLOB metacharacter with a backslash:
$ echo *
a b c
$ echo "*"
*
$ echo '*'
*
$ echo \*
*
The first argument to the grep
command is a regular expression pattern and is almost never intended to be expanded by the shell as a GLOB pattern, so you must quote the first argument to stop GLOB expansion:
$ grep '?' /etc/passwd
$ grep '.*sh' /etc/passwd
If you don’t want the shell to use GLOB on your command arguments, you must quote the arguments to commands so that the shell doesn’t see the metacharacters. Here are several examples using quoting to hide GLOB characters:
$ echo "*** Warning: assuming the worst ***"
$ mail -s "Coming to dinner?" idallen@idallen.ca
$ tr '[:lower:]' '[:upper:]' <lowercase >uppercase
$ find /usr/bin -name '*go*' -ls
The shell will always try to expand unquoted GLOB characters.
By default, GLOB patterns will match any existing pathname component – the pathname might be a file, a directory, or some other Unix pathname type (e.g. symbolic link, socket, fifo):
$ mkdir a ; touch b
$ echo *
a b
Be careful that the GLOB pattern expands to be pathnames compatible with the command you are using!
$ mkdir a ; touch b
$ rm *
rm: cannot remove 'a': Is a directory
$ mkdir a ; touch b
$ rmdir *
rmdir: failed to remove 'b': Not a directory
GLOB patterns match any names.
Each separate token found by the shell in the command line is examined for GLOB characters, and has those characters expanded separately. A single token containing GLOB patterns never produces duplicate names; every name appears exactly once. The only way for the same pathname to appear twice is for the shell to find two separate tokens containing GLOB characters and expand them separately.
Some of the command lines below produce duplicate or triplicate output; since, each blank-separated token is separately GLOB-expanded by the shell:
$ touch a b c
$ echo *
a b c
$ echo * *
a b c a b c
$ echo * * *
a b c a b c a b c
The command lines below produce identical output; since, only one token is found and expanded by the shell, and *
means the same thing as **
or ***
when there are no spaces between the metacharacters:
$ touch a b c
$ echo *
a b c
$ echo **
a b c
$ echo ***
a b c
If a GLOB pattern expansion matches and produces a name with a space or other shell metacharacter in it, that space or shell metacharacter is not seen or processed a second time by the shell. The space or other meta-character in the name is not re-interpreted by the shell and it does not produce extra command line arguments. Whatever the GLOB pattern expands to is kept together as a single pathname:
$ touch 'a b *' # create a five-character name with space and asterisk
$ echo *
a b * # the GLOB * matches the name 'a b *'
$ wc *
0 0 0 a b * # the expanded pathname is still one name
In Bourne-style shells (e.g. BASH on Linux), if the shell cannot match a GLOB pattern on the command line against any existing pathname, the GLOB pattern is silently left unchanged by the shell and passed unchanged to the command being used.
No error message is generated by the shell by the failed GLOB; the command runs with the unchanged argument still containing the unmatched GLOB metacharacters. This is almost never what you want. For example:
$ touch someverylongfilename.txt
$ ls
someverylongfilename.txt
$ cp /etc/passwd sme* # typimg error; should be some* !
$ ls
sme* someverylongfilename.txt # silently created a new file name
If GLOB matching fails, because no names match the pattern, the GLOB pattern is passed unchanged to the command. Usually the unexpanded GLOB pattern isn’t what you want. The command may produce an error message for the nonexistent pathname passed to it by the shell:
$ ls
happybirthday.txt
$ rm hp* # typimg error; should be ha* !
rm: cannot remove 'hp*': No such file or directory
$ rm ha* # this GLOB pattern matches
Another example of a GLOB pattern failing and being passed unchanged to the command (not useful):
$ ls
abc
$ wc abb* # typing error; should be ab*
wc: abb*: No such file or directory
$ wc ab* # this GLOB pattern matches
0 0 0 abc
C-Shells do produce error messages when GLOB patterns fail, and refuse to run the command. You can optionally make BASH behave this way, too, by setting the BASH
failglob
option usingshopt
built-in command (highly recommended!). That option will make it an error to use a GLOB pattern that doesn’t match anything.
GLOB patterns are matched and expanded by the shell before the command is run.
This command below with two GLOB characters creates two files in an empty directory because neither GLOB character matches any files at the time the shell goes looking for them:
$ mkdir empty ; cd empty
$ touch ? * # GLOB doesn't match; creates two new files
$ ls
* ?
The apparently similar set of two touch
commands below creates only one file because the second touch
command GLOB pattern matches the file created by the first touch
command:
$ mkdir empty ; cd empty
$ touch ? # GLOB doesn't match anything; creates new file
$ touch * # GLOB matches the ?; no new file created
$ ls
?
GLOB patterns are expanded on the command line by the shell before the command is run.
GLOB patterns apply in the directory indicated by the place in the pathname where they appear. For example (try these yourself):
$ echo * # match names in the current directory
$ echo ./* # match names in the current directory
$ echo ../* # match names in the parent directory
$ echo /* # match names in the ROOT directory
$ echo /usr/* # match names in the /usr/ directory
$ echo /usr/bin/* # match names in the /usr/bin directory
A GLOB pattern that appears to the left of any slash must match one or more directory names (or symbolic links to directory names); because, only directory names can appear to the left of slashes in valid pathnames, and GLOB patterns only match existing, valid pathnames. In the examples below, the GLOB pattern must match …
$ echo */ls # directories in the current directory that contain "ls"
$ echo ../*/ls # directories in the parent directory that contain "ls"
$ echo /*/ls # directories in the ROOT directory that contain "ls"
$ echo /usr/*/ls # directories in the /usr directory that contain "ls"
GLOB characters can match any character in a pathname component, including spaces, newlines, and unprintable characters, but they do not match or cross the slashes that separate pathname components. Ordinary shell GLOB patterns do not match slashes in a pathname. (The find
command can cross slashes, as can GLOB patterns using the bash
shell option globstar
.)
If a token containing GLOB patterns has two non-adjacent slashes, all the matched existing pathnames must also have exactly two slashes:
$ echo /*/ls
/bin/ls
$ echo /*/*x
/bin/ex /dev/psaux /dev/ptmx /etc/mgetty+sendfax /etc/postfix
/mnt/knoppix /sbin/fsck.minix /sbin/mkfs.minix /sbin/partx
$ echo /[bs]*/*x
/bin/ex /sbin/fsck.minix /sbin/mkfs.minix /sbin/partx
A GLOB pattern does not cross slashes in a pathname.
If the token containing the GLOB pattern has N non-adjacent slashes, all the matched existing pathnames will also have exactly the same number of slashes:
$ echo /* # matches names directly under the ROOT but no deeper
$ echo /bin/* # matches names directly under /bin/ but no deeper
$ echo /usr/bin/* # matches names directly under /usr/bin/ but no deeper
To match multiple levels of pathnames in a directory hierarchy, you must use a separate GLOB pattern between each slash:
$ ls /* # GLOB applies in / only (no deeper - one slash only)
$ ls /*/* # GLOB applies in / and then in /*/
$ ls /*/*/* # GLOB applies in / and then in /*/ and then in /*/*/
$ ls /usr/bin/* # GLOB applies only in /usr/bin/
$ ls /usr/*/* # GLOB applies in /usr/ and then in /usr/*/
The list of pathnames produced by GLOB patterns /*
and /*/*
have no pathnames in common. The first list contains only pathnames with one slash; the second list contains only pathnames with two slashes. The *
GLOB pattern does not match the slash that separates pathname components.
$ echo /* | wc -w
26
$ echo /*/* | wc -w
2034
Basic GLOB patterns matching non-hidden names in the current directory:
$ touch a ab abc abcde abcdef
$ ls
a ab abc abcde abcdef
$ echo *
a ab abc abcde abcdef
$ echo **
a ab abc abcde abcdef
$ echo ****
a ab abc abcde abcdef
$ echo * *
a ab abc abcde abcdef a ab abc abcde abcdef
$ echo ?
a
$ echo ??
ab
$ echo ? ?
a a
$ echo ?? ??
ab ab
$ echo ???
abc
$ echo ???*
abc abcde abcdef
$ echo *???
abc abcde abcdef
$ echo ?*?
ab abc abcde abcdef
$ echo *[cf]
abc abcdef
$ echo *c*
abc abcde abcdef
These shell patterns match all non-hidden names in the /tmp/idallen
directory:
$ pwd
/tmp/idallen
$ echo *
$ echo ./*
$ echo ../idallen/*
$ echo ../../tmp/idallen/*
$ echo /tmp/idallen/*
These patterns match all non-hidden names in the /tmp
directory (parent):
$ pwd
/tmp/idallen
$ echo ../*
$ echo ./../*
$ echo .././*
$ echo /tmp/*
$ echo /tmp/./*
$ echo /././././tmp/./././*
These patterns match all non-hidden names in the ROOT directory (the parent of the parent directory of /tmp/idallen
):
$ pwd
/tmp/idallen
$ echo ../../*
$ echo ../../../../../../*
$ echo /*
$ echo /tmp/../*
$ echo /tmp/idallen/../../*
Given an existing directory dir1
, this pathname argument is valid:
$ ls dir1/.
...files list here...
Given file name file1
, this similar pathname argument is not valid:
$ ls file1/.
ls: cannot access file1/: Not a directory
A file name file1
cannot be used as if it were a directory. Only directory names can appear to the left of slashes in valid Unix pathnames.
If we replace the name dir1
in the token dir1/.
with a GLOB pattern, as in */.
, the result can only expand to be a valid pathname if the GLOB pattern matches a directory.
Question: What does echo */.
output, and how does it differ from the echo *
pattern? We know that echo *
echoes every non-hidden name in the current directory, both files and directory names. How does using */.
change this?
Hint: Wildcards (GLOB patterns) only expand to match valid names, so the wildcard */.
only matches names where */.
is a valid pathname. The pattern */.
can only be valid if the GLOB pattern to the left of a slash matches a particular kind of name. What is it?