Regular Expression Skill Assessment
Here are some descriptions of text manipulation problems of varying levels of
difficulty. Ths skills to do these problems come from Appendix A in your Unix text.
These are all example of text and data manipulation. Some problems may be
solved using Unix utilities that don't use regular expressions. Many of the problems
require more than one Unix utility, or the same utility used repeatedly.
To succeed in becoming a Web Programmer, you must be able to do all the Elementary
manipulations given here. You must be able to do most of the Basic manipulations.
- Change the letters "dog" to "HORSE" everywhere it occurs on all
- Change all occurrences of the letters "Man" at the beginning of a line to
- Change all occurrences of "stick" followed by any punctuation at the end of a
line to "Stick.". (The punctuation is replaced by a period.)
- Change all occurrences of "Dog" or "dog" to "COW".
- Change all Canadian or American spellings of colour (color) to "Color".
- Double all vowels in every word on every line.
- Triple the amount of space between every word.
- Find and print lines that contain "dog" followed by any number of digits then
- Find and print lines that contain the letters "dog" followed anywhere by the
- Change all occurrences of one or more digits to the single word "NUMBER".
- Replace all occurrences of one or more blanks with a single blank.
- Replace all occurrences of one or more tabs or blanks with a single blank.
- Remove the first 8 characters from every line.
- Remove all leading blanks or tabs from all lines.
- Remove all trailing blanks or tabs from all lines.
- Replace all tab characters with eight spaces.
- Change all punctuation so that the sentence period lies outside of the closing double
quote, e.g. "Hello there." becomes "Hello there".
- Remove everything leading up to and including the last blank on each line.
- Remove everything including and after the first blank on each line.
- Put double quotes around every occurrence of the phrase "user-friendly".
- Add an extra blank after every period at the end of a sentence.
- Make sure that every period at the end of a sentence is followed by exactly two blanks.
- Truncate every line to ten characters.
- Exchange the first 10 characters with the next 15 characters on every line.
- Exchange the first number with the second number on every line.
- Remove all leading zeroes from the first number on each line. Don't mishandle
single digit zeroes.
- Find and print lines that contain all the vowels in alphabetical order, a
before e before i before o before u.
Test using /usr/dict/words.
- Find and print lines that contain all the vowels in any order. Test using
- Change all occurrences of one or more digits surrounded by spaces to the word
"NUMBER" also surrounded by spaces.
- Change only the second occurrence of a single blank to a colon in each line.
- Change the only the second-to-last single blank to a colon in each line.
- Change only the second occurrence of a string of one or more blanks to a colon in each
- Change only the second-to-last occurrence of a string of one or more blanks to a colon
in each line.
- Remove all occurrences of HTML tags whose open and closing angle brackets are on the
same line (e.g. <BR>, <TABLE>, <A HREF="...">, etc.).
Remove all of them, not just the first ones.
- Remove everything on every line that appears between double quotes, leaving only the
quotes. (Example: a "bcd" efg "h i" j --> a "" efg
"" j ) Handle empty strings (adjacent quotes) correctly.
- Find lines that contain only one single quote character (an unmatched quote).
- Put double quotes around every occurrence of the phrase "user-friendly",
unless the phrase already has double quotes around it.
- Find all numbers prefixed by a dollar sign, remove the dollar sign, and suffix the
number with "CDN", e.g. $123.45 becomes 123.45CDN. Now do the reverse.
- Find all numbers with periods separating decimals and change the periods to commas, e.g.
123.45 becomes 123,45. Now do the reverse.
- Find all numbers with commas separating sets of three digits and change all the commas
to spaces, e.g. 1,234,567.23 becomes 1 234 567.23. (You may assume the only use of a
comma immediately followed by three digits is as a separator.)
- Locate common misspellings and mistypings of "@algonquincollege.com" and fix them
all. (e.g. fix algonqinc.ont.can, etc.)
- Find all occurrences of your name with or without initials and embedded spaces.
(e.g. "Ian D. Allen", "Ian Allen", "I. D. Allen",
"ID Allen", "IDAllen", "iallen", etc.) Try to minimize
false hits in the middle of words. (e.g. fallen, challenge, Wallenstein, etc.)
- Remove either single or double quotes from around all strings of one or more digits,
e.g. "10" or '10' become just 10. Now do the reverse (add quotes to all
- Locate hexadecimal numbers having the form "0xA0FF2375C3" and prefix them with
the string "(HEX:)", e.g. 0xDEAD would appear as (HEX:)0xDEAD and 0xBEAD00BEAD00
would appear as (HEX:)0xBEAD00BEAD00. Now do the reverse (remove the prefixes).
- Use a single regular expression to change every occurrence of the word
"dog" to be "dog-eat-dog" and "cat" to be
"cat-eat-cat". Now do the reverse.
- Produce a plain list of mail addresses and home pages for everyone with an account on
- Write a script that will perform a simple substitution on the contents of each of the
files given on the command line, e.g. $ ./script 's/dog/cat/g'
- Have every new sentence in a document start at the beginning of a line. (Insert
newline characters at the end of every sentence.)
- Find and print lines where all vowels are in strict alphabetical order, i.e. no e
precedes an a, no i precedes an e, no o
precedes an i, etc. All vowels that appear are in alphabetical
order in the input, from left to right. Test your expression on /usr/dict/words.
- Change the second and all subsequent occurrences of one or more blanks to single blanks.
(The first occurrence of a string of blanks is untouched.)
- A file has a large number of columns of numbers separated by blanks. Change every
second string of blanks to a colon. (A line of output might appear thus: 12
34:56 78:90 12) You don't know how many columns are in the input files.
- Exchange the first number with the last number on every line.
- Remove all leading zeroes from all numbers on each line. Don't mishandle
a single digit zero.
- Produce an HTML table of active links (the links are clickable) to mail addresses and
home pages for everyone with an account on this system. Include the full names of the
people with the accounts.
- Turn any text file into an approximation of Pig Latin. (For examples of Pig Latin,
see: Club Girl Tech's
Pig Latin Translator and Pig Latin Page or Pig Latin Converter)
[I don't know if regular expressions can do this with 100% accuracy; but, even 90%
will be amusing to read.] See also: C Language Pig Latin program
- Write a script that will rename files according to a sed substitute pattern
given as the first argument to the script, for example:
$ ./rename 's/txt$/dat/' *.txt
would take all the file names in the current directory that end in ".txt" and
rename them to end in ".dat". What pattern would you use to rename files
with names such as "file1.day.mon" to "file1.mon.day", e.g.
"foo.31.01" would become "foo.01.31" and "bar.30.12" would
become "bar.12.30", etc.?
- Write a script that will convert alphabetic dated file names to numeric names, e.g. a
file named "Mar.12.99" would be renamed "1999-03-12" and
"Jul.31.54" would become "1954-07-31". Make sure your script
doesn't overwrite any existing files. Does your script handle all the possible forms
of each month name, e.g. "Mar", "MAR", "mar",
"March", "MARCH", "march"? (Hint: You can use multiple
"-e" options to sed.) (p.s. Does your script handle the year
- Advanced script/regexp problem: Write a script that will generate and execute a sed
expression that will do a substitution on the Nth occurrence of a string. The N is given
as the first argument. The string and replacement are given as the next two arguments. The
script will process lines from standard input. For example:
$ echo aaaaaaaaaa | ./script 8 'a' 'b'
You can make some simplifying assumptions to make it easier:
Super advanced problem: Solve the same problem; but, remove the simplifying
restrictions. Removing all the restrictions is hard. You may have to extensively
pre-process the string and the replacement to protect embedded special characters.
Removing some of the restrictions (e.g. blanks) is not too hard; but, you may
need to know more advanced shell. Hint: See the "eval" built-in shell command.
("man sh", "man bash")
| the string and replacement won't contain slashes|
| the string and replacement won't contain any blanks or other shell metacharacters or
|the string and replacement won't contain any regular expression characters, e.g. . [ ] *|
sh$ x="'nested quoted string'"
sh$ ./argprint $x
sh$ eval ./argprint $x
[nested quoted string]
sh$ y="This is y"
sh$ echo $x
sh$ eval echo $x
This is y
See the entries for Regular Expressions in the FastTrack Resources