Winter 2016 - January to April 2016 - Updated 2019-03-08 04:21 EST
gzip
and gunzip
IndexYou can compress a file using the gzip
command, and the result is a new binary compressed file with a .gz
suffix added on the end:
$ cp -p /etc/passwd foo
$ gzip foo
$ ls -ls /etc/passwd foo.gz
96 -rw-r--r-- 1 root root 97450 Feb 10 13:08 /etc/passwd
28 -rw-r--r-- 1 idallen idallen 26884 Feb 10 13:08 foo.gz
$ file foo.gz
foo.gz: gzip compressed data, was "foo", from Unix, last modified: Wed Feb 10 13:08:27 2016
The original file is removed after being compressed. The modify time of the original file is preserved.
You can decompress/uncompress the file with gunzip
, which restores the original file contents and removes the suffix from the name:
$ gunzip foo.gz # "gunzip foo" works too
$ ls -ls foo
96 -rw-r--r-- 1 idallen idallen 97450 Feb 10 13:08 foo
The compressed file is removed after being uncompressed. The modify time of the file is preserved.
The gunzip
command will not uncompress a file by name unless the file name ends in the .gz
suffix:
$ gzip </etc/passwd >foo
$ file foo
foo: gzip compressed data, last modified: Wed Mar 6 21:13:03 2019, from Unix
$ gunzip foo
gzip: foo: unknown suffix -- ignored
$ mv foo foo.gz
$ gunzip foo.gz
$ ls -l /etc/passwd foo
-rw-r--r-- 1 root root 168835 Mar 6 16:13 /etc/passwd
-rw-rw-r-- 1 idallen idallen 168835 Mar 8 04:03 foo
You can use either command as a filter (reading standard input and writing standard output) if you don’t give it a file name:
$ fgrep 'refused connect' /var/log/auth.log | gzip >bad.txt.gz
$ gunzip <bad.txt.gz | wc
$ gunzip <bad.txt.gz | less
When used as a filter (no file name), the commands cannot actually compress or decompress the original file and remove it because there is no file name. Filter commands simply compress or decompress the data in the input stream; the file is not changed.
zless zfgrep zcat zdiff zgrep
IndexSome helpful z
-commands have been created to directly access compressed files and save typing gunzip
in a pipe all the time:
$ gunzip <bad.txt.gz | less # hard way to paginate contents
$ zless bad.txt.gz # easy way
$ gunzip <bad.txt.gz | fgrep '.cn' # hard way to fgrep contents
$ zfgrep '.cn' bad.txt.gz # easy way
Since all of the z
-commands are filters (they are small shell scripts), none of the z
-commands affect the given file. The file is not decompressed and then removed. Only the file contents are decompressed and sent to standard output.
See also: zcat zdiff zgrep
bzip2
and bunzip2
IndexThe commands bzip2
and bunzip2
are similar to gzip
and gunzip
but they use a different, often better, compression algorithm. The default file extension is .bz2
instead of .gz
:
$ cp /etc/passwd foo
$ bzip2 foo
$ ls -ls /etc/passwd foo.bz2 foo.gz
96 -rw-r--r-- 1 root root 97450 Feb 10 13:08 /etc/passwd
24 -rw-r--r-- 1 idallen idallen 22235 Feb 10 13:08 foo.bz2
28 -rw-r--r-- 1 idallen idallen 26884 Feb 10 13:08 foo.gz
$ file foo.bz2
foo.bz2: bzip2 compressed data, block size = 900k
As with gzip
, the original file is removed after being compressed, unless the command is used as a filter (without a file name). The modify time of the original file is preserved.
If you give bunzip2
a file name that does not end in .bz2
, it decompresses the file into the same file name with .out
appended:
$ bzip2 </etc/passwd >foo
$ file foo
foo: bzip2 compressed data, block size = 900k
$ bunzip2 foo
bunzip2: Can't guess original name for foo -- using foo.out
$ ls -l /etc/passwd foo.out
-rw-r--r-- 1 root root 168835 Mar 6 16:13 /etc/passwd
-rw-rw-r-- 1 idallen idallen 168835 Mar 8 04:05 foo.out
bzless bzfgrep bzcat bzdiff bzgrep
IndexSome helpful bz
-commands have been created to directly access compressed files and save typing bunzip2
in a pipe all the time: bzcat bzdiff bzfgrep bzgrep bzless
:
$ bunzip2 <bad.txt.bz2 | less # hard way to paginate contents
$ bzless bad.txt.bz2 # easy way
These helpers have similar names and work the same way as the gzip
helper z
-commands. See the man pages for the other helpers.
tar
file (tarball)IndexRead the mouse-over text in the above
tar
-related comic from the XKCD webcomic.
Long before software package managers such as YUM, RPM, and APT, there were tar
archives. Originally written as a magnetic Tape ARchiver, the command is common to every Unix/Linux system. A tar
archive file is the Unix version of a zip
file. It is one file that contains many other files inside it. You can download and extract a tar
format archive file on most any Unix/Linux system back to 1969.
A tar
archive, also called a “tarball”, is a single file that contains multiple uncompressed files and directories. Unix/Linux software source is often distributed as a “tarball”.
The syntax of the tar
command is irregular – you don’t have to put dashes in front of the operation letters (but you can if you like):
Syntax: tar <operation> [options] -f <archive_file> [<pathnames>]
$ tar cf /tmp/my.tar . # create archive of current directory
$ tar -cf stuff.tar *.c # archive all the .c files
$ tar -xvf my.tar # extract everything into current dir
$ tar xvf my.tar mydir # only extract mydir from the archive
The name of the tar
archive can be anything; the suffixes are there simply for human readers to better know what the files contain.
The archive name must always directly follow the -f
option with no other option letters in between:
$ tar -tvf my.tar # correct use of -f
$ tar -vft my.tar # WRONG use of -f
$ tar -fvt my.tar # WRONG use of -f
You must always use one of three major operation letters:
-t: list the pathnames in the archive (a table of contents)
-x: extract (all or some) pathnames from the archive
-c: create a new tar archive (erases existing contents!)
You may optionally use some other relevant options:
-f: select the archive pathname (almost always used; must be last option)
-p: preserve permissions when extracting
-v: verbose (more messages about what is happening, or more detail)
-z: the entire archive is gzip compressed (or uncompressed if extracting)
-j: the entire archive is bzip2 compressed (or uncompressed if extracting)
The -f
archive pathname option is almost always used, unless you happen to own a tape drive! Always use -f
and an archive file name. The archive file name must immediately follow the -f
option with no other option letters in between, i.e. tar -tvf my.tar
The -v
“verbose” option above lists all the file names as they are put into an archive file, or as they are extracted. This is useful for debugging, but isn’t usually used for a production system where you know exactly what is going into the archive; leave it out for normal use.
If an uncompressed tarball file is damaged, the damage may affect only some of the files in the tarball and the other files, even files stored after the damage point, may still be recoverable.
tarball.tar.gz
and tarball.tar.bz2
IndexA compressed tarball is simply a single tarball file that has been compressed with either gzip
or bzip2
. The compression compresses the entire tarball, not the individual files inside the tarball.
A tarball file may be first created and then compressed as a whole using either the gzip
or bzip2
file compression commands:
$ tar -cf tarball.tar *.c # create archive named tarball.tar
$ gzip tarball.tar # compress into tarball.tar.gz
$ tar -cf tarball.tar *.c # create archive named tarball.tar
$ bzip2 tarball.tar # compress into tarball.tar.bz2
Modern versions of tar
have an option letter that does this compression for you (less typing). A compressed tar
archive can be created and compressed in one step by an option to the tar
command itself:
$ tar -czf tarball.tar.gz *.c # create and gzip compress into tarball.tar.gz
$ tar -cjf tarball.tar.bz2 *.c # create and bzip2 compress into tarball.tar.bz2
You generate a table of contents, or extract all the files, using the appropriate de-compression option depending on if and how the tarball file was compressed:
$ tar -tf tarball.tar # table of contents if uncompressed
$ tar -tzf tarball.tar.gz # table of contents if gzip compressed
$ tar -tjf tarball.tar.bz2 # table of contents if bzip2 compressed
$ tar -xf tarball.tar # extract contents (uncompressed)
$ tar -xzf tarball.tar.gz # extract contents (gzip compressed)
$ tar -xjf tarball.tar.bz2 # extract contents (bzip2 compressed)
The tar
command doesn’t care what you name your archive file. The gzip
compressed tarballs usually have names ending with *.tar.gz
or *.tgz
and bzip2
compressed tarballs usually have names ending with *.tar.bz2
or *.tb2
.
Modern versions of the
tar
command automatically recognize existing compressed archives and thus don’t require the extraz
orj
option letters to read compressed archives. You still need the appropriate letter to create a new compressed archive file.
If a compressed tarball file is damaged, all the files following the damage point cannot be decompressed and are usually unrecoverable.
tar
to archive or restore a directoryIndexThe tar
command will automatically recursively archive entire directories into a tarball if you give it directories. Software is often distributed as a tarball file.
$ cd # go to my home directory
$ tar czf /tmp/homedir.tar.gz . # archive current directory into a file
Do not place the output tarball file in any of the directories being used as input to tar
!
When you have a tarball, you can then extract it into the current directory:
$ mkdir /some/backupdir
$ cd /some/backupdir
$ tar xzpf /tmp/homedir.tar.gz # extract the whole archive into current directory
The p
option preserves the modes (permissions) of the files as they are extracted.
tar
to copy a directoryIndexThis legacy use of tar
to copy an entire directory has been replaced by cp -a
or the rsync
command.
You can do a directory copy with tar
using a pipe instead of an output file by using the special file name -
that stands for either standard output (when creating) or standard input (when extracting):
$ cd
$ tar cf - . | ( cd /some/backupdir && tar xpf - ) # local copy
$ tar cf - . | ( ssh otherhost 'cd /some/dir && tar xpf -' ) # remote host copy
The above uses of tar
to copy a directory have been largely supplanted by the -a
(archive) option to cp
or by the rsync
command.
zip
and unzip
IndexA ZIP file is a single file containing individually compressed files. (This is not the same format as a compressed tarball, which is a single compressed file containing individual uncompressed files.)
Unix/Linux can also manipulate ZIP format file archives (often used on Microsoft systems) using zip
and unzip
:
$ touch file1 file2 file3
$ zip foo file1 file2 file3 # create foo.zip with three files
adding: file1 (stored 0%)
adding: file2 (stored 0%)
adding: file3 (stored 0%)
$ ls -l foo.zip
-rw-rw-r-- 1 idallen idallen 436 Mar 9 03:44 foo.zip
$ unzip -l foo.zip # list the contents (do not extract)
Archive: foo.zip
Length Date Time Name
--------- ---------- ----- ----
0 2016-03-09 03:44 file1
0 2016-03-09 03:44 file2
0 2016-03-09 03:44 file3
--------- -------
0 3 files
$ rm file?
$ unzip foo.zip # extract all the files
Archive: foo.zip
extracting: file1
extracting: file2
extracting: file3
Other options can preserve directory hierarchy and do other things. See the man page.
If a ZIP file is damaged, the damage usually affects only some of the files in the ZIP file and the other files, even files stored after the damage point, may still recoverable.
zip
file), or does tar
archive together all the files first (uncompressed) and then compress the whole archive?zip
file or a compressed tar
file, and why? (Hint: Consider archiving 1000 copies of the same file.)zip
file or a compressed tar
file, and why?diff
IndexThe diff
command compares two files: diff file1 file1
vimdiff
and gvimdiff
diff3
meld
Student Tammy Rediger (17F) tells me that “the program 7zip does work with .gz
, .bzip2
and .tar
files” under Microsoft Windows.