Everything you need to know about TAR.GZ
TAR (Tape Archive, .tar) is the Unix archiving format from 1979 - originally designed for sequential tape backup, hence the name. Unlike ZIP, TAR doesn't compress - it just bundles files into one stream while preserving permissions, timestamps, and Unix-specific metadata. Compression is added externally: .tar.gz (gzip), .tar.bz2 (bzip2), .tar.xz (LZMA).
How it works under the hood
- Block-based. TAR is a stream of fixed-size 512-byte blocks. Each file is preceded by a header block containing filename, mode, owner, size, and a checksum.
- No compression. Pure TAR is just concatenation with metadata. Combine with gzip/bzip2/xz externally for compression.
- Preserves Unix permissions. Owner UID/GID, mode bits, symbolic links, hard links - all faithfully preserved. ZIP can't do most of this.
- Streamable. Output to stdout, pipe to compressor, send over SSH - all without intermediate files. `tar cz . | ssh server 'tar xz'` is a classic.
Where you'll actually use it
- Linux/Unix package distribution (.tar.gz is the universal source code archive)
- System backups preserving permissions and ownership
- Docker image layers (each layer is a TAR file)
- Pipelining file transfer over networks
How it compares to alternatives
TAR vs ZIP: TAR preserves Unix permissions perfectly; ZIP is more cross-platform. .tar.gz vs .zip: gzip-compressed TAR is typically 5-15% smaller because it compresses across files (better dictionary). TAR vs 7z: 7z has better compression but isn't pre-installed on Linux.
Things that will trip you up
- Pure TAR (uncompressed) can be larger than the source files - always use a compression layer
- Tarbombs: archives that extract to the current directory instead of a subdirectory - always inspect with `tar tvf` first
- Filename encoding can break - use `LANG=C tar` for ASCII-only environments