sort, uniq & wc — Counting & Ordering#

sort#

Common flags#

Flag	Meaning
`-n`	Numeric sort
`-r`	Reverse order
`-k N`	Sort on field N
`-k N,M`	Sort on fields N through M
`-t SEP`	Field delimiter (default: whitespace)
`-u`	Unique — remove duplicate lines
`-f`	Case-insensitive (fold)
`-h`	Human-readable sizes (2K, 3M, 1G)
`-V`	Version sort (1.2 < 1.10)
`-R`	Random shuffle
`-s`	Stable sort (preserve order of equal lines)
`-c`	Check if already sorted; exit 1 if not
`-m`	Merge pre-sorted files (no sort step)
`-o FILE`	Write output to FILE (can be same as input)
`-z`	NUL-terminated lines

Basic sort#

sort file.txt              # lexicographic ascending
sort -r file.txt           # reverse
sort -u file.txt           # unique lines only
sort -f file.txt           # case-insensitive
sort -n numbers.txt        # numeric
sort -rn numbers.txt       # numeric descending
sort -h sizes.txt          # human sizes: 1K < 2M < 3G
sort -V versions.txt       # version: 1.9 < 1.10 < 2.0
sort -R file.txt           # shuffle

Multi-key sort#

# Sort by field 2 numerically, then field 1 lexicographically
sort -t, -k2,2n -k1,1 data.csv

# Sort by field 3 descending, field 1 ascending
sort -k3,3rn -k1,1 data.txt

# Sort CSV by 4th column (numeric) descending
sort -t, -k4,4rn report.csv

# Sort by month name (ignore leading whitespace in field)
sort -t: -k1,1 /etc/passwd      # by username

# Sort IP addresses correctly (4-field numeric)
sort -t. -k1,1n -k2,2n -k3,3n -k4,4n ips.txt

Sort by partial field#

# -k START.CHAR,END.CHAR
sort -k1.3,1.5 file     # characters 3–5 of field 1

In-place sort#

sort -o file.txt file.txt    # overwrite in-place
sort file.txt | sponge file.txt  # with moreutils

uniq#

uniq collapses consecutive duplicate lines. Input must be sorted first.

Common flags#

Flag	Meaning
`-c`	Prefix each line with occurrence count
`-d`	Print only duplicate lines (once each)
`-D`	Print all copies of duplicate lines
`-u`	Print only unique (non-repeated) lines
`-i`	Case-insensitive comparison
`-f N`	Skip first N fields
`-s N`	Skip first N characters
`-w N`	Compare only first N characters

sort file.txt | uniq           # deduplicate
sort file.txt | uniq -c        # count occurrences
sort file.txt | uniq -cd       # count + only duplicates
sort file.txt | uniq -u        # lines appearing exactly once
sort -f file.txt | uniq -i     # case-insensitive dedup

Frequency table pattern#

# Most common words in a file
tr -s '[:space:]' '\n' < file.txt | sort | uniq -c | sort -rn | head -20

# Most frequent IPs in access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10

# Most common HTTP status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn

wc — Word Count#

Flag	Counts
`-l`	Lines
`-w`	Words
`-c`	Bytes
`-m`	Characters (multibyte-aware)
`-L`	Length of longest line

wc -l file.txt            # line count
wc -w file.txt            # word count
wc -c file.txt            # byte count
wc file.txt               # lines + words + bytes
wc -l *.log               # count per file + total

# Count matching lines
grep -c "ERROR" app.log

# Count files in a directory
ls | wc -l

# Length of longest line (useful for column-width decisions)
wc -L report.txt

nl — Number Lines#

nl file.txt               # number non-empty lines (default)
nl -b a file.txt          # number all lines including empty
nl -b p'^[A-Z]' file.txt  # number lines matching pattern
nl -v 0 file.txt          # start numbering at 0
nl -s '. ' file.txt       # custom separator after number
nl -n rz file.txt         # right-justified, zero-padded (000001)
nl -n ln file.txt         # left-justified
nl -w 3 file.txt          # width of line number field

Practical pipelines#

# Top 10 largest files in a directory tree
du -sh * 2>/dev/null | sort -rh | head -10

# Count unique visitors in access log (by IP)
awk '{print $1}' access.log | sort -u | wc -l

# Distribution of response sizes
awk '{print $10}' access.log | grep -v '-' | sort -n | uniq -c

# Find the 5 most recently modified files
ls -lt | grep '^-' | head -5

# Sort a CSV by 3rd column (numeric), keep header
{ head -1 data.csv; tail -n +2 data.csv | sort -t, -k3,3n; }

# Check if a file is already sorted
sort -c file.txt && echo "sorted" || echo "not sorted"

# Merge two pre-sorted files
sort -m sorted1.txt sorted2.txt

# Deduplicate IPs while preserving first-seen order
awk '!seen[$0]++' ips.txt

# Rank word frequency across multiple files
cat *.txt | tr '[:upper:]' '[:lower:]' | tr -sc '[:alpha:]' '\n' \
  | sort | uniq -c | sort -rn | head -30

# Show only lines that appear in both files
sort file1.txt > /tmp/s1; sort file2.txt > /tmp/s2
comm -12 /tmp/s1 /tmp/s2

# Lines only in file1 (not in file2)
comm -23 <(sort file1.txt) <(sort file2.txt)

comm — Compare Sorted Files#

comm compares two sorted files line by line, outputting three columns.

comm file1.txt file2.txt     # col1: only in f1, col2: only in f2, col3: both
comm -12 f1 f2               # only lines in BOTH (suppress cols 1 and 2)
comm -23 f1 f2               # only in f1 (suppress cols 2 and 3)
comm -13 f1 f2               # only in f2

[!TIP] The idiom sort file | uniq -c | sort -rn (sort → count → sort by count descending) is one of the most useful pipelines for log analysis and data exploration. sort -rn | head -20 gives the top 20 most frequent items.

g h	home
g l	Linux section
g w	Windows section
g z	z/OS section
g o	macOS section
g a	AI section
g p	Python section
g g	graph view
g t	tags index

⌘K / /	open search palette
t	cycle theme (dark → light → system)
?	toggle this panel

[ / ]	previous / next sheet in section
j / k	scroll down / up