skip to content

qsv — CSV Toolkit

Comprehensive reference for qsv: count, headers, stats, moarstats, select, search, sort, dedup, frequency, join, sqlp, luau, apply, schema, validate, sample, split, MCP server, and more — with examples and outputs.

24 min read 189 snippets deep dive

qsv — CSV Toolkit#

What it is#

qsv is a blazing-fast, Rust-based CSV toolkit with 80+ subcommands for querying, transforming, analyzing, and validating tabular data — a maintained, feature-rich fork of the original xsv project. It adds Polars-backed acceleration, an embedded Luau scripting engine, and support for CSV, TSV, Excel, JSON, Parquet, and Apache Arrow formats. Reach for qsv when you need to slice, filter, join, or summarize structured tabular data from the command line without loading it into a full database or spreadsheet.

Install#

# macOS
brew install qsv

# Windows
scoop install qsv

# Cargo
cargo install qsv --locked

# Or download binary from releases
curl -LO https://github.com/dathere/qsv/releases/latest/download/qsv-x86_64-unknown-linux-gnu.zip

Output: (none — exits 0 on success)

Variants: qsv (full), qsvlite (no Luau/Python), qsvmcp (Model Context Protocol), qsvpy (Python integration).

Sample data#

All examples below use these two files:

cat > people.csv << 'EOF'
name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
EOF

cat > dept.csv << 'EOF'
name,department
Alice,Engineering
Bob,Marketing
Carol,Engineering
Dave,Sales
Eve,Engineering
EOF

Output: (none — exits 0 on success)


Discovery#

Commands for understanding an unfamiliar CSV before you commit to processing it.

sniff — Detect schema without reading the whole file#

Samples the first few thousand bytes of a file to detect delimiter, quoting, field count, types, and record count without reading the whole file. Use it as a fast first look before running heavier commands.

qsv sniff people.csv

Output:

Sniff Results for people.csv
  Last Modified  : 2026-04-26 10:00:00 UTC
  File Size      : 123 bytes
  Delimiter      : ,
  Has Header Row : true
  Quote Char     : "
  Num Records    : 5
  Num Fields     : 4
  Fields         :
                    0: name     (String)
                    1: age      (Integer)
                    2: city     (String)
                    3: salary   (Integer)
# Sniff a remote file
qsv sniff https://example.com/data.csv

Output: (none — exits 0 on success)

count — Count rows#

Returns the number of data rows (excluding the header). Faster than wc -l because it handles quoted newlines correctly, and near-instant on indexed files.

qsv count people.csv

Output:

5
# Human-readable (useful for millions of rows)
qsv count --human-readable largefile.csv

Output:

1,482,309
# Include record width statistics
qsv count --width people.csv

Output:

5
32-27-28-22-5

headers — List column names#

Prints each column name with its 1-based index. Use --just-names when you need a plain list for scripting, or --intersect to find the common columns across two files before a join.

qsv headers people.csv

Output:

1   name
2   age
3   city
4   salary
# Just names (for scripting)
qsv headers --just-names people.csv

Output:

name
age
city
salary
# Find common columns across two files
qsv headers --intersect people.csv dept.csv

Output:

name
# Count only
qsv headers --just-count people.csv

Output:

4

Summary statistics#

Commands for computing numeric and categorical summaries across columns without writing a full query.

stats — Per-column statistics#

Computes sum, min, max, mean, stddev, null count, and type for every column in a single pass. Results are cached alongside the file, so repeated runs are instant; use --everything to add median, quartiles, and mode.

qsv stats people.csv

Output:

field,type,sum,min,max,range,sortorder,min_length,max_length,mean,stddev,variance,cv,nullcount,max_precision,sparsity
name,String,,Alice,Eve,,Unsorted,3,5,,,,,,0,0
age,Integer,150,25,35,10,Ascending,2,2,30,3.742,14,0.1247,0,0,0
city,String,,Boston,New York,,Unsorted,6,8,,,,,,0,0
salary,Integer,391000,62000,95000,33000,Ascending,5,5,78200,11972.47,143337500,0.1531,0,0,0
# Infer types only (fast — no numeric computation)
qsv stats --typesonly people.csv

Output:

field,type
name,String
age,Integer
city,String
salary,Integer
# Full statistics including mode, median, quartiles
qsv stats --everything people.csv

Output:

field,type,...,mode,median,mad,q1,q2_median,q3,...
name,String,...,Alice|Bob|Carol|Dave|Eve,,,,,...
age,Integer,...,25|28|30|32|35,30,2,27,30,33,...
salary,Integer,...,62000|71000|75000|88000|95000,75000,10000,66500,75000,91500,...
# Stats for specific columns only
qsv stats -s salary,age people.csv

Output: (none — exits 0 on success)

[!TIP] qsv stats caches results in a .stats.csv.bin.sz file alongside the input. Subsequent calls are instant. Use --force to recompute.

moarstats — Extended statistics (qsv 12+)#

Augments a stats output file with up to 55 additional advanced measures — extended outlier, robust, and bivariate statistics (covariance, correlation, kurtosis, MAD, IQR, Pearson/Spearman, etc.). Run stats first, then moarstats on the resulting .stats.csv to enrich it without re-scanning the original data.

# Produce stats.csv, then enrich it with advanced measures
qsv stats people.csv -o people.stats.csv
qsv moarstats people.stats.csv

Output:

field,type,...,kurtosis,iqr,skewness,covariance,pearson_r,spearman_r,...
age,Integer,...,-1.30,6,0.21,...
salary,Integer,...,-1.20,25000,0.34,...
# Restrict to a subset of advanced measures
qsv moarstats --select kurtosis,iqr,skewness people.stats.csv

Output: (none — exits 0 on success)

[!NOTE] moarstats was introduced in qsv 12.0.0 and refined in 13.0.0. It also powers the per-column “FAIR metadata” inference used by the MCP server and TOON output.


Selecting columns#

Commands for narrowing or reordering the columns in a file before downstream processing.

select — Pick, reorder, or drop columns#

Outputs a subset (or reordering) of columns by name, index, range, or regex. Prefix a selector with ! to exclude it; the order of selectors controls the output order.

# Pick two columns
qsv select name,salary people.csv

Output:

name,salary
Alice,75000
Bob,62000
Carol,88000
Dave,71000
Eve,95000
# Drop a column (! prefix = all except)
qsv select '!age' people.csv

Output:

name,city,salary
Alice,New York,75000
Bob,Chicago,62000
Carol,New York,88000
Dave,Chicago,71000
Eve,Boston,95000
# Select by column range
qsv select 1-3 people.csv

Output:

name,age,city
Alice,30,New York
Bob,25,Chicago
Carol,35,New York
Dave,28,Chicago
Eve,32,Boston
# Select by regex (columns starting with 'a' or 'c')
qsv select '/^[ac]/' people.csv

Output:

age,city
30,New York
25,Chicago
35,New York
28,Chicago
32,Boston

Filtering rows#

Commands for keeping or discarding rows based on patterns or positional ranges.

search — Filter rows by regex#

Filters rows using a regular expression, optionally scoped to one or more columns with -s. Use -v to invert (exclude matches), or --flag to add a match-indicator column instead of dropping rows.

# Keep rows matching a pattern
qsv search "New York" people.csv

Output:

name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
# Search in a specific column only
qsv search -s city "Chicago" people.csv

Output:

name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
# Invert match (exclude Chicago)
qsv search -s city -v "Chicago" people.csv

Output:

name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
# Add a match flag column instead of filtering
qsv search -s city --flag matched "New York" people.csv

Output:

name,age,city,salary,matched
Alice,30,New York,75000,1
Bob,25,Chicago,62000,0
Carol,35,New York,88000,1
Dave,28,Chicago,71000,0
Eve,32,Boston,95000,0
# Count matches only (written to stderr)
qsv search -s city -c "New York" people.csv 2>&1 >/dev/null

Output:

2

slice — Extract row ranges#

Extracts a contiguous range of rows by start index, end index, length, or a single row. Differs from search in that it operates by position, not content; on indexed files it is O(1) regardless of file size.

# First 3 rows
qsv slice -l 3 people.csv

Output:

name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
# Rows 2–4 (0-based start, exclusive end)
qsv slice -s 1 -e 4 people.csv

Output:

name,age,city,salary
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
# Single row by index
qsv slice -i 4 people.csv

Output:

name,age,city,salary
Eve,32,Boston,95000
# Last 2 rows (negative index)
qsv slice -s -2 people.csv

Output:

name,age,city,salary
Dave,28,Chicago,71000
Eve,32,Boston,95000
# JSON output for a single row
qsv slice -i 0 --json people.csv

Output:

[{"name":"Alice","age":"30","city":"New York","salary":"75000"}]

Sorting and deduplication#

Commands for ordering rows and removing duplicates, often a prerequisite for joins or frequency counts.

sort — Sort rows#

Sorts rows by one or more columns; add -N for numeric comparison and -R for descending order. Also supports --random for reproducible shuffles with a --seed.

# Sort by salary numerically (ascending)
qsv sort -s salary -N people.csv

Output:

name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
# Sort by salary descending
qsv sort -s salary -N -R people.csv

Output:

name,age,city,salary
Eve,32,Boston,95000
Carol,35,New York,88000
Alice,30,New York,75000
Dave,28,Chicago,71000
Bob,25,Chicago,62000
# Multi-key sort (city then salary)
qsv sort -s city,salary -N people.csv

Output:

name,age,city,salary
Eve,32,Boston,95000
Bob,25,Chicago,62000
Dave,28,Chicago,71000
Alice,30,New York,75000
Carol,35,New York,88000
# Reproducible random shuffle
qsv sort --random --seed 42 people.csv

Output:

name,age,city,salary
Carol,35,New York,88000
Eve,32,Boston,95000
Bob,25,Chicago,62000
Alice,30,New York,75000
Dave,28,Chicago,71000

dedup — Remove duplicate rows#

Removes rows that are identical across one or more key columns, keeping the first occurrence. Use -D to write the dropped duplicates to a separate file for auditing.

# Dedup by city (keep first occurrence per city)
qsv dedup -s city people.csv

Output:

name,age,city,salary
Eve,32,Boston,95000
Bob,25,Chicago,62000
Alice,30,New York,75000
2  (duplicates removed, written to stderr)
# Write duplicates to a separate file
qsv dedup -s city -D dupes.csv people.csv

Output: (none — exits 0 on success)

dupes.csv:

name,age,city,salary
Dave,28,Chicago,71000
Carol,35,New York,88000

Frequency analysis#

Commands for counting distinct values and understanding the distribution of categorical columns.

frequency — Value counts per column#

Produces a ranked value-count table for each column (or a subset with -s), including the percentage each value represents. Use --no-other to suppress the catch-all “Other” bucket when there are many distinct values.

qsv frequency -s city people.csv

Output:

field,value,count,percentage
city,Chicago,2,40.0000
city,New York,2,40.0000
city,Boston,1,20.0000
# All columns, no truncation
qsv frequency --no-other people.csv

Output:

field,value,count,percentage
name,Alice,1,20.0000
name,Bob,1,20.0000
name,Carol,1,20.0000
name,Dave,1,20.0000
name,Eve,1,20.0000
age,25,1,20.0000
age,28,1,20.0000
age,30,1,20.0000
age,32,1,20.0000
age,35,1,20.0000
city,Chicago,2,40.0000
city,New York,2,40.0000
city,Boston,1,20.0000
salary,62000,1,20.0000
salary,71000,1,20.0000
salary,75000,1,20.0000
salary,88000,1,20.0000
salary,95000,1,20.0000
# JSON output
qsv frequency -s city --json people.csv

Output:

[{"field":"city","data":[{"value":"Chicago","count":2,"percentage":40.0},{"value":"New York","count":2,"percentage":40.0},{"value":"Boston","count":1,"percentage":20.0}]}]

Transforming columns#

Commands for reshaping, renaming, filling, and computing new columns without leaving the command line.

rename — Rename column headers#

Renames columns by supplying a comma-separated list of new names in positional order. Use --pairwise to rename only specific columns by specifying old,new pairs, leaving the rest untouched.

# Rename all columns by position
qsv rename full_name,years_old,location,annual_pay people.csv

Output:

full_name,years_old,location,annual_pay
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
# Pairwise rename (only rename specific columns)
qsv rename --pairwise age,years,salary,income people.csv

Output:

name,years,city,income
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000

fill — Forward-fill empty values#

Propagates the last non-empty value in a column downward to fill blanks — useful for sparse exports where a value is only written on the first row of a group. Use --default to fill with a fixed string instead.

# Create a CSV with gaps
printf 'name,city,salary\nAlice,New York,75000\nBob,,62000\nCarol,,88000\nDave,Chicago,\nEve,Boston,95000\n' > gaps.csv

# Forward-fill city
qsv fill city gaps.csv

Output:

name,city,salary
Alice,New York,75000
Bob,New York,62000
Carol,New York,88000
Dave,Chicago,
Eve,Boston,95000
# Fill with a fixed default
qsv fill --default "N/A" salary gaps.csv

Output:

name,city,salary
Alice,New York,75000
Bob,,62000
Carol,,88000
Dave,Chicago,N/A
Eve,Boston,95000

reverse — Reverse row order#

Outputs all rows in reverse order without sorting. Use this when the last record is the most recent and you want newest-first output without a sort key.

qsv reverse people.csv

Output:

name,age,city,salary
Eve,32,Boston,95000
Dave,28,Chicago,71000
Carol,35,New York,88000
Bob,25,Chicago,62000
Alice,30,New York,75000

transpose — Swap rows and columns#

Rotates the CSV so rows become columns and columns become rows. Useful for turning a wide stat table into a narrow key-value layout, or for feeding column-oriented data into row-oriented tools.

qsv transpose people.csv

Output:

name,Alice,Bob,Carol,Dave,Eve
age,30,25,35,28,32
city,New York,Chicago,New York,Chicago,Boston
salary,75000,62000,88000,71000,95000

enum — Add a row number column#

Appends a _enum column containing the 0-based row index (or a custom name and start value). Use it to add a stable surrogate key or to restore original ordering after a shuffle.

qsv enum people.csv

Output:

name,age,city,salary,_enum
Alice,30,New York,75000,0
Bob,25,Chicago,62000,1
Carol,35,New York,88000,2
Dave,28,Chicago,71000,3
Eve,32,Boston,95000,4
# Custom column name, 1-based
qsv enum --new-column row_id --start-index 1 people.csv

Output:

name,age,city,salary,row_id
Alice,30,New York,75000,1
Bob,25,Chicago,62000,2
Carol,35,New York,88000,3
Dave,28,Chicago,71000,4
Eve,32,Boston,95000,5

pseudo — Pseudonymize a column#

Replaces the values in a column with consistent, opaque identifiers so the same input always maps to the same output within a file. Use it to anonymize PII before sharing data while preserving join-ability.

# Replace names with consistent opaque IDs
qsv pseudo name people.csv

Output:

name,age,city,salary
b3a4f2...,30,New York,75000
9c7d1e...,25,Chicago,62000
2f8a03...,35,New York,88000
7e1c94...,28,Chicago,71000
4b5d82...,32,Boston,95000

safenames — Sanitize column names for SQL/Python#

Rewrites column headers so they are valid identifiers for SQL, pandas, or R by replacing spaces and special characters with underscores. Use --mode check first to count unsafe headers without modifying the file.

# Create a CSV with messy headers
printf 'Full Name,Age (Years),City/Region,Annual Salary $\nAlice,30,NYC,75000\n' > messy.csv
qsv safenames messy.csv

Output:

Full_Name,Age__Years_,City_Region,Annual_Salary__
Alice,30,NYC,75000
# Verify names are safe (check mode)
qsv safenames --mode check messy.csv

Output:

4 unsafe header(s) found.

Format conversion#

Commands for converting between CSV, TSV, JSONL, Excel, and other tabular formats.

fmt — Change delimiter or quoting#

Reformats a CSV in place — change the delimiter, quote character, or quoting style without altering the data. Use it to convert CSV to TSV before piping into tools that expect tab-delimited input.

# CSV to TSV
qsv fmt -t T people.csv

Output:

name	age	city	salary
Alice	30	New York	75000
Bob	25	Chicago	62000
Carol	35	New York	88000
Dave	28	Chicago	71000
Eve	32	Boston	95000
# Pipe-delimited
qsv fmt -t '|' people.csv

Output:

name|age|city|salary
Alice|30|New York|75000
Bob|25|Chicago|62000
Carol|35|New York|88000
Dave|28|Chicago|71000
Eve|32|Boston|95000
# Quote every field
qsv fmt --quote-always people.csv

Output:

"name","age","city","salary"
"Alice","30","New York","75000"
"Bob","25","Chicago","62000"
"Carol","35","New York","88000"
"Dave","28","Chicago","71000"
"Eve","32","Boston","95000"

tojsonl — Convert CSV to JSONL#

Converts each CSV row to a JSON object on its own line (JSON Lines format), with automatic type inference so numeric and boolean columns are emitted without quotes. Use it to feed CSV data into JSON-native tools or APIs.

qsv tojsonl people.csv

Output:

{"name":"Alice","age":30,"city":"New York","salary":75000}
{"name":"Bob","age":25,"city":"Chicago","salary":62000}
{"name":"Carol","age":35,"city":"New York","salary":88000}
{"name":"Dave","age":28,"city":"Chicago","salary":71000}
{"name":"Eve","age":32,"city":"Boston","salary":95000}

[!NOTE] Type inference is automatic: age and salary are emitted as integers (not quoted), boolean columns become true/false, and nulls become JSON null.

excel — Extract Excel sheet to CSV#

Reads .xlsx or .xls files and converts a sheet to CSV, handling merged cells, date formatting, and formula results. Use --metadata j to list all sheets before deciding which to extract.

# First sheet
qsv excel data.xlsx -o output.csv

# Specific sheet by name
qsv excel data.xlsx --sheet "Sales" -o sales.csv

# List all sheets as JSON
qsv excel data.xlsx --metadata j

Output:

{"filename":"data.xlsx","format":"Xlsx","num_sheets":3,"sheets":[{"index":0,"name":"Sheet1","typ":"WorkSheet","visible":"Visible","headers":["name","age","city","salary"],"num_columns":4,"num_rows":6},...]}
# Extract a specific cell range
qsv excel data.xlsx --range "A1:C4" -o range.csv

Output: (none — exits 0 on success)


Combining files#

Commands for stacking, joining, splitting, and partitioning CSV files.

cat — Concatenate CSVs#

Stacks multiple CSV files vertically (rows) or side-by-side (columns). Use rowskey when the files have different or overlapping schemas — it aligns by column name and fills missing fields with empty strings.

# Stack vertically (same schema required)
qsv cat rows people.csv people2.csv

Output:

name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Carol,35,New York,88000
Dave,28,Chicago,71000
Eve,32,Boston,95000
Frank,40,Seattle,105000
Grace,29,Austin,67000
# Stack with differing schemas (fills missing fields with empty)
qsv cat rowskey --group fname people.csv dept.csv

Output:

file,name,age,city,salary,department
people.csv,Alice,30,New York,75000,
people.csv,Bob,25,Chicago,62000,
dept.csv,Alice,,,Engineering
dept.csv,Bob,,,Marketing
...
# Concatenate side by side (columns)
qsv cat columns people.csv dept.csv

Output:

name,age,city,salary,name,department
Alice,30,New York,75000,Alice,Engineering
Bob,25,Chicago,62000,Bob,Marketing
Carol,35,New York,88000,Carol,Engineering
Dave,28,Chicago,71000,Dave,Sales
Eve,32,Boston,95000,Eve,Engineering

join — Join two CSVs#

Performs an inner, outer, semi, anti, or cross join between two CSV files on one or more key columns. Differs from sqlp joins in that it does not require SQL syntax and is optimized for streaming large files.

# Inner join on name
qsv join name people.csv name dept.csv

Output:

name,age,city,salary,name,department
Alice,30,New York,75000,Alice,Engineering
Bob,25,Chicago,62000,Bob,Marketing
Carol,35,New York,88000,Carol,Engineering
Dave,28,Chicago,71000,Dave,Sales
Eve,32,Boston,95000,Eve,Engineering
# Left anti-join (people NOT in dept.csv)
qsv join --left-anti name people.csv name dept.csv

Output (empty if all names matched):

name,age,city,salary
# Cross join (cartesian product)
qsv join --cross name people.csv name dept.csv | qsv count

Output:

25
Join typeFlag
Inner (default)(none)
Left outer--left
Right outer--right
Full outer--full
Left anti--left-anti
Left semi--left-semi
Right anti--right-anti
Cross (cartesian)--cross

split — Split into multiple files#

Writes sequential chunks of a CSV to separate files in an output directory, either by fixed row count (-s) or by total number of chunks (-c). Use --pad and --filename to control zero-padding and naming.

mkdir /tmp/split_out
# 2 rows per chunk
qsv split -s 2 /tmp/split_out people.csv
ls /tmp/split_out

Output:

0.csv  1.csv  2.csv
# 3 chunks with padded, custom filenames
qsv split -c 3 --pad 3 --filename "chunk_{}.csv" /tmp/split_out people.csv

Output: (none — exits 0 on success)

partition — Partition by column value#

Creates one output file per distinct value in a key column, named after that value. Differs from split in that grouping is by content rather than row count — ideal for producing per-department or per-region files.

mkdir /tmp/by_city
qsv partition city /tmp/by_city people.csv
ls /tmp/by_city

Output:

Boston.csv  Chicago.csv  New York.csv

Chicago.csv:

name,age,city,salary
Bob,25,Chicago,62000
Dave,28,Chicago,71000
# Drop the partition column from output files
qsv partition --drop city /tmp/by_city people.csv

Output: (none — exits 0 on success)


Scripting and queries#

Commands for running SQL, embedded Lua scripts, and built-in string operations directly against CSV files.

sqlp — SQL queries via Polars#

The filename (without extension) becomes the table name.

# WHERE filter and ORDER BY
qsv sqlp people.csv "SELECT name, salary FROM people WHERE salary > 70000 ORDER BY salary DESC"

Output:

name,salary
Eve,95000
Carol,88000
Alice,75000
Dave,71000
# GROUP BY aggregation
qsv sqlp people.csv "SELECT city, COUNT(*) as n, AVG(salary) as avg_salary FROM people GROUP BY city ORDER BY avg_salary DESC"

Output:

city,n,avg_salary
Boston,1,95000.0
New York,2,81500.0
Chicago,2,66500.0
# Join two files in SQL
qsv sqlp people.csv dept.csv \
  "SELECT p.name, p.salary, d.department
   FROM people p JOIN dept d ON p.name = d.name
   WHERE d.department = 'Engineering'
   ORDER BY p.salary DESC"

Output:

name,salary,department
Eve,95000,Engineering
Carol,88000,Engineering
Alice,75000,Engineering
# Window function: salary rank
qsv sqlp people.csv \
  "SELECT name, salary, RANK() OVER (ORDER BY salary DESC) as rank FROM people"

Output:

name,salary,rank
Eve,95000,1
Carol,88000,2
Alice,75000,3
Dave,71000,4
Bob,62000,5
# Output as JSON
qsv sqlp --format json people.csv "SELECT * FROM people WHERE city = 'Chicago'"

Output:

[{"name":"Bob","age":25,"city":"Chicago","salary":62000},{"name":"Dave","age":28,"city":"Chicago","salary":71000}]

[!TIP] Use --streaming for files larger than RAM. Add --try-parsedates to auto-parse date columns.

luau — Scripted transforms with embedded Lua#

Runs a Luau (sandboxed Lua 5.1) expression per row to map a new column or filter rows, with optional --begin/--end blocks for initialization and aggregation. Reach for this when apply operations are too limited but a full sqlp query is overkill.

# Add computed column (salary in thousands)
qsv luau map salary_k \
  "string.format('%.1f', col.salary / 1000)" \
  people.csv

Output:

name,age,city,salary,salary_k
Alice,30,New York,75000,75.0
Bob,25,Chicago,62000,62.0
Carol,35,New York,88000,88.0
Dave,28,Chicago,71000,71.0
Eve,32,Boston,95000,95.0
# Filter rows with a script
qsv luau filter "tonumber(col.salary) > 75000" people.csv

Output:

name,age,city,salary
Carol,35,New York,88000
Eve,32,Boston,95000
# Add a seniority label (conditional logic)
qsv luau map seniority \
  "if tonumber(col.age) >= 32 then return 'Senior' else return 'Junior' end" \
  people.csv

Output:

name,age,city,salary,seniority
Alice,30,New York,75000,Junior
Bob,25,Chicago,62000,Junior
Carol,35,New York,88000,Senior
Dave,28,Chicago,71000,Junior
Eve,32,Boston,95000,Senior
# Aggregation using BEGIN/END blocks
qsv luau map dummy \
  --begin "total = 0" \
  "total = total + tonumber(col.salary); return ''" \
  --end "print('Total salary: ' .. total)" \
  people.csv > /dev/null

Output:

Total salary: 391000

[!NOTE] Reference columns with col.column_name or col["col name"]. Use _IDX for the current row number. Scripts run with Luau 0.716 — a safe, sandboxed Lua 5.1 subset.

apply — Built-in string and numeric operations#

Applies one or more named operations (case conversion, trimming, encoding, similarity, NLP sentiment, etc.) to a column without writing a script. Use dynfmt to produce a new column from a format string that interpolates other columns.

# Uppercase a column
qsv apply operations upper name people.csv

Output:

name,age,city,salary
ALICE,30,New York,75000
BOB,25,Chicago,62000
CAROL,35,New York,88000
DAVE,28,Chicago,71000
EVE,32,Boston,95000
# Compute string length into a new column
qsv apply operations len name -c name_len people.csv

Output:

name,age,city,salary,name_len
Alice,30,New York,75000,5
Bob,25,Chicago,62000,3
Carol,35,New York,88000,5
Dave,28,Chicago,71000,4
Eve,32,Boston,95000,3
# Dynamic format string → computed description column
qsv apply dynfmt \
  --formatstr "{name} earns \${salary} in {city}" \
  description people.csv

Output:

name,age,city,salary,description
Alice,30,New York,75000,Alice earns $75000 in New York
Bob,25,Chicago,62000,Bob earns $62000 in Chicago
Carol,35,New York,88000,Carol earns $88000 in New York
Dave,28,Chicago,71000,Dave earns $71000 in Chicago
Eve,32,Boston,95000,Eve earns $95000 in Boston

Available apply operations:

CategoryOperations
Caselower, upper, titlecase
Whitespacetrim, ltrim, rtrim, squeeze
Stringlen, strip_prefix, strip_suffix, escape, replace, regex_replace
Encodingencode64, decode64, encode62, decode62, crc32
Mathround, thousands
Financialcurrencytonum, numtocurrency
Similaritysimdl, simjw, simsd, simhm
NLPsentiment, whatlang, gender_guess, eudex

Schema and validation#

Commands for inferring structure from a CSV and checking that data conforms to expected types and constraints.

schema — Infer JSON Schema from CSV#

Scans a CSV and generates a JSON Schema (Draft 2020-12) file capturing field types, enum values, and numeric ranges. The output schema can be fed directly to validate to enforce those constraints on new data.

qsv schema people.csv

Output: (none — exits 0 on success)

Generates people.csv.schema.json:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "people.csv",
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "enum": ["Alice", "Bob", "Carol", "Dave", "Eve"]
    },
    "age": { "type": "integer", "minimum": 25, "maximum": 35 },
    "city": {
      "type": "string",
      "enum": ["Boston", "Chicago", "New York"]
    },
    "salary": { "type": "integer", "minimum": 62000, "maximum": 95000 }
  },
  "required": ["name", "age", "city", "salary"]
}
# Polars schema (for use with sqlp/Parquet pipelines)
qsv schema --polars people.csv

Output:

{"name":"Utf8","age":"Int64","city":"Utf8","salary":"Int64"}

validate — Validate CSV against JSON Schema#

Without a schema argument, checks that the CSV is well-formed per RFC 4180. With a schema, validates each row against it and writes passing rows to .valid, failing rows to .invalid, and a validation-errors.tsv describing each violation.

# RFC 4180 well-formedness check
qsv validate people.csv

Output:

people.csv is valid.
# Schema validation (generates .valid, .invalid, and validation-errors.tsv)
qsv schema people.csv
printf 'name,age,city,salary\nBadRow,notanumber,Unknown,0\n' > bad.csv
qsv validate bad.csv people.csv.schema.json

Output: (none — exits 0 on success)

validation-errors.tsv:

row_number	field	error
2	age	notanumber is not of type "integer"
2	city	Unknown is not one of ["Boston","Chicago","New York"]
2	salary	0 is less than the minimum value of 62000

Sampling#

Commands for drawing representative subsets from large files without loading everything into memory.

sample — Random sampling#

Draws rows using reservoir sampling by default, guaranteeing a uniform random sample in a single pass without knowing the file size upfront. Supports stratified, Bernoulli, systematic, cluster, weighted, and time-series sampling modes; use --seed for reproducibility.

# Reservoir sample (3 random rows)
qsv sample 3 people.csv

Output:

name,age,city,salary
Bob,25,Chicago,62000
Alice,30,New York,75000
Eve,32,Boston,95000
# Reproducible sample with seed
qsv sample --seed 42 3 people.csv

Output:

name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
# 50% Bernoulli sample (each row independently included with probability 0.5)
qsv sample --bernoulli --seed 42 0.5 people.csv

Output:

name,age,city,salary
Alice,30,New York,75000
Carol,35,New York,88000
Eve,32,Boston,95000
# Stratified: 1 row per unique city
qsv sample --stratified city --seed 42 1 people.csv

Output:

name,age,city,salary
Alice,30,New York,75000
Bob,25,Chicago,62000
Eve,32,Boston,95000
MethodFlagUse case
Reservoir (default)General random sample
Indexed— (with .idx)Random I/O, large files
Bernoulli--bernoulliIndependent row probability
Systematic--systematic <col>Every nth record
Stratified--stratified <col>Representative subgroup samples
Weighted--weighted <col>Probability proportional to weight
Cluster--cluster <col>Sample entire clusters
Timeseries--timeseries <col>One record per time interval

Flattening and display#

Commands for rendering CSV records in a human-readable layout rather than a dense columnar format.

flatten — View records one at a time#

Prints each record as a vertical key-value block separated by #, making wide or deeply nested CSVs readable in a terminal. Use -c to truncate long values to a fixed character limit for a quick overview.

qsv flatten people.csv

Output:

name    Alice
age     30
city    New York
salary  75000
#
name    Bob
age     25
city    Chicago
salary  62000
#
...
# Condense long values for a quick overview
qsv flatten -c 8 people.csv

Output:

name    Alice
age     30
city    New York
salary  75000
#
...

Indexing#

An index file dramatically speeds up commands that support random access (slice, split, sample, count, dedup).

qsv index people.csv
# Creates people.csv.idx alongside the source file

Output: (none — exits 0 on success)

After indexing, qsv count and qsv slice are O(1) regardless of file size.

# Force rebuild
qsv index --force people.csv

Output: (none — exits 0 on success)


Configuration#

qsv reads runtime defaults from QSV_* environment variables and from a dotenv file. Use this to set delimiters, buffer sizes, parallelism, and remote-fetch behaviour project-wide without repeating flags on every invocation.

Environment variables#

Every option exposed as a CLI flag has a matching QSV_* variable; the variable becomes the default and is overridden by an explicit flag. Run qsv --envlist to dump the active set.

# Show every QSV_* variable currently in effect
qsv --envlist

Output:

QSV_DEFAULT_DELIMITER: ,
QSV_NO_HEADERS: false
QSV_COMMENT_CHAR:
QSV_MAX_JOBS: 8
QSV_CACHE_DIR: /home/alice/.qsv-cache
...
# Project-wide TSV default + parallel job cap
export QSV_DEFAULT_DELIMITER=$'\t'
export QSV_MAX_JOBS=4
qsv stats data.tsv

Output: (none — exits 0 on success)

VariablePurpose
QSV_DEFAULT_DELIMITEROne ASCII char; overrides --delimiter.
QSV_SNIFF_DELIMITERIf set, auto-detect delimiter per file.
QSV_NO_HEADERSTreat first row as data, not a header.
QSV_MAX_JOBSCap parallel workers (default = logical CPUs).
QSV_CACHE_DIRWhere stats/fetch cache files are written.
QSV_DOTENV_PATHExplicit dotenv file path; "" disables loading.
QSV_LOG_LEVELerror/warn/info/debug/trace.
QSV_LOG_DIRDirectory for structured log output.
QSV_PROGRESSBAR1 to show a TTY progress bar on long runs.

Dotenv file#

On startup, qsv loads a .env file from the current directory (or the path in QSV_DOTENV_PATH) and applies any QSV_*=value lines as if they were exported. Useful for pinning per-project defaults next to a dataset.

cat > .env << 'EOF'
QSV_DEFAULT_DELIMITER=|
QSV_MAX_JOBS=4
QSV_LOG_LEVEL=info
EOF

qsv count people.csv   # picks up the .env automatically

Output: (none — exits 0 on success)

# Point at a shared dotenv outside the cwd
QSV_DOTENV_PATH=/home/alice/projects/etl/.env qsv stats people.csv

# Disable dotenv loading for one invocation
QSV_DOTENV_PATH= qsv stats people.csv

Output: (none — exits 0 on success)


MCP server (qsv 13+)#

qsv 13 added a built-in Model Context Protocol server that lets AI agents (Claude Desktop, Claude Code, and other MCP clients) query and transform local CSV/Parquet/Excel files without uploading raw data — only statistical metadata and result rows cross the wire. Reach for it when you want a chatbot to drive qsv against your own files.

# Start the MCP server on stdio (default transport)
qsvmcp serve

# Or use the full binary
qsv mcp serve

Output: (none — exits 0 on success)

# List the MCP-exposed tools and exit (handy for debugging)
qsvmcp list-skills

# Regenerate the bundled skill definitions
qsvmcp --update-mcp-skills

Output: (none — exits 0 on success)

Register with Claude Desktop by adding the server to claude_desktop_config.json:

{
  "mcpServers": {
    "qsv": {
      "command": "qsvmcp",
      "args": ["serve"],
      "env": { "QSV_CACHE_DIR": "/home/alice/.qsv-cache" }
    }
  }
}

[!NOTE] The qsvmcp binary ships ~63 of qsv’s commands — enough for the MCP skill set with a smaller footprint. Use the full qsv binary if you need commands outside the MCP surface (e.g. geocode, python).

tojsonl --toon — Token-efficient output for LLMs#

qsv 12 introduced TOON, a token-optimized tabular format designed for LLM contexts — denser than JSON, still parseable. Useful when piping CSV summaries into a prompt.

qsv tojsonl --toon people.csv

Output:

[name|age|city|salary]
Alice|30|New York|75000
Bob|25|Chicago|62000
...

Piping commands together#

qsv is designed to be composed — pipe subcommands to build multi-step pipelines:

# Filter to Engineering dept, sort by salary desc, pick 3 columns
qsv join name people.csv name dept.csv \
  | qsv search -s department "Engineering" \
  | qsv select name,salary,department \
  | qsv sort -s salary -N -R

Output:

name,salary,department
Eve,95000,Engineering
Carol,88000,Engineering
Alice,75000,Engineering
# Top city by total salary
qsv sqlp people.csv \
  "SELECT city, SUM(salary) as total FROM people GROUP BY city ORDER BY total DESC LIMIT 1"

Output:

city,total
New York,163000
# Count rows matching a pattern across a directory of CSVs
cat *.csv | qsv search "New York" | qsv count

Output: (none — exits 0 on success)

[!TIP] Use qsv input to normalize messy CSVs (trim whitespace, fix quoting, skip comment lines) before piping to other commands. Use qsv fixlengths to pad rows with missing fields so downstream commands don’t choke on ragged files.


Sources#