skip to content

sort, uniq & wc β€” Counting & Ordering

Sort lines (numerically, by field, human-readable sizes), deduplicate with uniq, count lines/words/bytes with wc, and number lines with nl. With real-world pipeline recipes.

5 min read 10 snippets 2d ago quick read

sort, uniq & wc β€” Counting & Ordering#

sort#

Common flags#

FlagMeaning
-nNumeric sort
-rReverse order
-k NSort on field N
-k N,MSort on fields N through M
-t SEPField delimiter (default: whitespace)
-uUnique β€” remove duplicate lines
-fCase-insensitive (fold)
-hHuman-readable sizes (2K, 3M, 1G)
-VVersion sort (1.2 < 1.10)
-RRandom shuffle
-sStable sort (preserve order of equal lines)
-cCheck if already sorted; exit 1 if not
-mMerge pre-sorted files (no sort step)
-o FILEWrite output to FILE (can be same as input)
-zNUL-terminated lines

Basic sort#

sort file.txt              # lexicographic ascending
sort -r file.txt           # reverse
sort -u file.txt           # unique lines only
sort -f file.txt           # case-insensitive
sort -n numbers.txt        # numeric
sort -rn numbers.txt       # numeric descending
sort -h sizes.txt          # human sizes: 1K < 2M < 3G
sort -V versions.txt       # version: 1.9 < 1.10 < 2.0
sort -R file.txt           # shuffle

Multi-key sort#

# Sort by field 2 numerically, then field 1 lexicographically
sort -t, -k2,2n -k1,1 data.csv

# Sort by field 3 descending, field 1 ascending
sort -k3,3rn -k1,1 data.txt

# Sort CSV by 4th column (numeric) descending
sort -t, -k4,4rn report.csv

# Sort by month name (ignore leading whitespace in field)
sort -t: -k1,1 /etc/passwd      # by username

# Sort IP addresses correctly (4-field numeric)
sort -t. -k1,1n -k2,2n -k3,3n -k4,4n ips.txt

Sort by partial field#

# -k START.CHAR,END.CHAR
sort -k1.3,1.5 file     # characters 3–5 of field 1

In-place sort#

sort -o file.txt file.txt    # overwrite in-place
sort file.txt | sponge file.txt  # with moreutils

uniq#

uniq collapses consecutive duplicate lines. Input must be sorted first.

Common flags#

FlagMeaning
-cPrefix each line with occurrence count
-dPrint only duplicate lines (once each)
-DPrint all copies of duplicate lines
-uPrint only unique (non-repeated) lines
-iCase-insensitive comparison
-f NSkip first N fields
-s NSkip first N characters
-w NCompare only first N characters
sort file.txt | uniq           # deduplicate
sort file.txt | uniq -c        # count occurrences
sort file.txt | uniq -cd       # count + only duplicates
sort file.txt | uniq -u        # lines appearing exactly once
sort -f file.txt | uniq -i     # case-insensitive dedup

Frequency table pattern#

# Most common words in a file
tr -s '[:space:]' '\n' < file.txt | sort | uniq -c | sort -rn | head -20

# Most frequent IPs in access log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10

# Most common HTTP status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn

wc β€” Word Count#

FlagCounts
-lLines
-wWords
-cBytes
-mCharacters (multibyte-aware)
-LLength of longest line
wc -l file.txt            # line count
wc -w file.txt            # word count
wc -c file.txt            # byte count
wc file.txt               # lines + words + bytes
wc -l *.log               # count per file + total

# Count matching lines
grep -c "ERROR" app.log

# Count files in a directory
ls | wc -l

# Length of longest line (useful for column-width decisions)
wc -L report.txt

nl β€” Number Lines#

nl file.txt               # number non-empty lines (default)
nl -b a file.txt          # number all lines including empty
nl -b p'^[A-Z]' file.txt  # number lines matching pattern
nl -v 0 file.txt          # start numbering at 0
nl -s '. ' file.txt       # custom separator after number
nl -n rz file.txt         # right-justified, zero-padded (000001)
nl -n ln file.txt         # left-justified
nl -w 3 file.txt          # width of line number field

Practical pipelines#

# Top 10 largest files in a directory tree
du -sh * 2>/dev/null | sort -rh | head -10

# Count unique visitors in access log (by IP)
awk '{print $1}' access.log | sort -u | wc -l

# Distribution of response sizes
awk '{print $10}' access.log | grep -v '-' | sort -n | uniq -c

# Find the 5 most recently modified files
ls -lt | grep '^-' | head -5

# Sort a CSV by 3rd column (numeric), keep header
{ head -1 data.csv; tail -n +2 data.csv | sort -t, -k3,3n; }

# Check if a file is already sorted
sort -c file.txt && echo "sorted" || echo "not sorted"

# Merge two pre-sorted files
sort -m sorted1.txt sorted2.txt

# Deduplicate IPs while preserving first-seen order
awk '!seen[$0]++' ips.txt

# Rank word frequency across multiple files
cat *.txt | tr '[:upper:]' '[:lower:]' | tr -sc '[:alpha:]' '\n' \
  | sort | uniq -c | sort -rn | head -30

# Show only lines that appear in both files
sort file1.txt > /tmp/s1; sort file2.txt > /tmp/s2
comm -12 /tmp/s1 /tmp/s2

# Lines only in file1 (not in file2)
comm -23 <(sort file1.txt) <(sort file2.txt)

comm β€” Compare Sorted Files#

comm compares two sorted files line by line, outputting three columns.

comm file1.txt file2.txt     # col1: only in f1, col2: only in f2, col3: both
comm -12 f1 f2               # only lines in BOTH (suppress cols 1 and 2)
comm -23 f1 f2               # only in f1 (suppress cols 2 and 3)
comm -13 f1 f2               # only in f2

[!TIP] The idiom sort file | uniq -c | sort -rn (sort β†’ count β†’ sort by count descending) is one of the most useful pipelines for log analysis and data exploration. sort -rn | head -20 gives the top 20 most frequent items.