skip to content

pathlib — Object-Oriented File Paths

Work with filesystem paths using Python's built-in pathlib module. Covers Path creation, navigation, reading/writing files, glob patterns, and stat.

15 min read 51 snippets deep dive

pathlib — Object-Oriented File Paths#

What it is#

pathlib is part of the Python standard library (no install needed). It represents filesystem paths as Path objects instead of plain strings, giving you methods for reading, writing, navigating, and querying files — all in a cross-platform way that handles Windows backslashes automatically.

It replaces os.path, os.getcwd(), open() boilerplate, and glob.glob() for most filesystem tasks.

Quick example#

from pathlib import Path

p = Path("documents/readme.txt")
print(p.name)       # filename with extension
print(p.stem)       # filename without extension
print(p.suffix)     # extension with dot
print(p.parent)     # containing directory
print(p.parts)      # tuple of path components

Output:

readme.txt
readme
.txt
documents
('documents', 'readme.txt')

When / why to use it#

  • Any time you manipulate file paths — pathlib is cleaner and more readable than os.path.join().
  • Reading and writing files without managing file handles.
  • Recursive directory searches with rglob("*.py").
  • Cross-platform code: Path uses the right separator on every OS.

[!TIP] Prefer Path over str for all path values in new code. If a library requires a string, wrap: str(p) or use p.as_posix() for forward-slash strings.

Common pitfalls#

[!WARNING] / operator builds paths, not dividesPath("/home") / "user" / "file.txt" is path concatenation. If you pass an absolute path as the right operand it replaces everything to the left: Path("/home") / "/etc/passwd"Path("/etc/passwd").

[!WARNING] Path.open() vs open() — both work, but path.read_text() / path.write_text() are shorter for simple file reads/writes and handle encoding arguments directly.

Special paths#

Path.home() returns the current user’s home directory; Path.cwd() returns the process working directory. In scripts, Path(__file__).parent gives the directory of the script itself — useful for building paths relative to the source file rather than wherever the script is invoked from.

from pathlib import Path

print(Path.cwd())        # current working directory
print(Path.home())       # home directory (~)
print(Path("/").root)    # "/"

Output:

/home/user/myproject
/home/user
/

Reading and writing#

read_text() and write_text() handle the open/read/close cycle in one call and accept an encoding parameter. read_bytes() and write_bytes() do the same for binary data. For appending or more control, fall back to p.open("a") as a context manager.

from pathlib import Path

p = Path("notes.txt")

# Write text (creates or overwrites)
p.write_text("Hello\nWorld\n", encoding="utf-8")

# Read text
content = p.read_text(encoding="utf-8")
print(repr(content))

# Append (open explicitly)
with p.open("a") as f:
    f.write("More text\n")

# Binary
p.write_bytes(b"\x89PNG\r\n")
data = p.read_bytes()
print(len(data), "bytes")

Output:

'Hello\nWorld\n'
6 bytes

Richer example — directory operations and glob#

from pathlib import Path

base = Path("/tmp/demo_project")

# Create a directory tree
(base / "src").mkdir(parents=True, exist_ok=True)
(base / "tests").mkdir(parents=True, exist_ok=True)

# Write some files
(base / "src" / "app.py").write_text("print('app')\n")
(base / "src" / "utils.py").write_text("# utilities\n")
(base / "tests" / "test_app.py").write_text("# tests\n")
(base / "README.md").write_text("# Demo\n")

# Find all Python files recursively
py_files = sorted(base.rglob("*.py"))
print("Python files:")
for f in py_files:
    print(f"  {f.relative_to(base)}  ({f.stat().st_size} bytes)")

# Find only in src/ (non-recursive)
src_files = sorted((base / "src").glob("*.py"))
print("\nSrc files:")
for f in src_files:
    print(f"  {f.name}")

Output:

Python files:
  src/app.py  (13 bytes)
  src/utils.py  (14 bytes)
  tests/test_app.py  (9 bytes)

Src files:
  app.py
  utils.py

Checking existence and type#

exists() returns True for any path that exists on the filesystem, including symlinks and directories. Use the more specific is_file() or is_dir() when you need to distinguish between them — both return False if the path doesn’t exist at all.

from pathlib import Path

p = Path("/tmp/demo_project/src/app.py")

print(p.exists())     # True if path exists (any type)
print(p.is_file())    # True if regular file
print(p.is_dir())     # True if directory
print(p.is_symlink()) # True if symlink

Output:

True
True
False
False

Quick reference#

TaskCode
Build pathPath("dir") / "sub" / "file.txt"
Absolute pathp.resolve()
Home dirPath.home()
Current dirPath.cwd()
Read textp.read_text(encoding="utf-8")
Write textp.write_text("content")
Read bytesp.read_bytes()
Create dirp.mkdir(parents=True, exist_ok=True)
List dirlist(p.iterdir())
Glob (non-recursive)list(p.glob("*.py"))
Glob (recursive)list(p.rglob("*.py"))
File sizep.stat().st_size
Rename / movep.rename(new_path)
Copy (no method — use)shutil.copy2(src, dst)
Delete filep.unlink()
Delete empty dirp.rmdir()
Delete treeshutil.rmtree(p)
Change extensionp.with_suffix(".txt")
Change namep.with_name("other.txt")
Change stemp.with_stem("other")
Relative pathp.relative_to(base)
Parent dirsp.parents[0], p.parents[1], …

Path vs PurePath#

PurePath is the string-manipulation base class — it understands path syntax (separators, suffixes, parts) but never touches the filesystem. Path is the concrete subclass that adds I/O methods (read_text, iterdir, exists, stat). Reach for PurePath when you are constructing paths for another system (e.g., building a Linux path on Windows for a remote machine) or when writing pure logic that should not hit the disk.

from pathlib import Path, PurePath, PurePosixPath, PureWindowsPath

# Pure: string-only, no filesystem access
p = PurePosixPath("/srv/data/logs.txt")
print(p.suffix, p.parts)

w = PureWindowsPath(r"C:\Users\Alice\Desktop\notes.txt")
print(w.drive, w.parts)

# Concrete: hits the filesystem
c = Path.home() / ".bashrc"
print(c.exists())

Output:

.txt ('/', 'srv', 'data', 'logs.txt')
C: ('C:\\', 'Users', 'Alice', 'Desktop', 'notes.txt')
True

PurePath("a/b") resolves to either PurePosixPath or PureWindowsPath depending on the host OS; instantiate the specific class to force semantics.

OS-specific subclasses — PosixPath and WindowsPath#

On a POSIX system, Path() returns a PosixPath; on Windows it returns a WindowsPath. Both inherit the same API, but each enforces its own separator and forbidden-character rules. You rarely instantiate the subclass directly — use the Path factory.

from pathlib import Path, PosixPath, WindowsPath
import os

p = Path("/tmp/example")
print(type(p).__name__)

# On Windows this would print:
#   WindowsPath
# On macOS/Linux:
#   PosixPath

# Force POSIX-style serialisation regardless of host
print(p.as_posix())
print(p.as_uri())   # → file:///tmp/example

Output (Linux/macOS):

PosixPath
/tmp/example
file:///tmp/example

[!TIP] When writing to JSON/YAML/TOML for cross-platform tools, store p.as_posix() rather than str(p). The Posix form round-trips on every OS.

The / operator and joinpath#

The division operator on Path performs path joining, not arithmetic. It accepts any number of str or Path operands on the right side and is the idiomatic replacement for os.path.join(...). joinpath(*parts) is the method form, useful when you have a list of components.

from pathlib import Path

base = Path.home()

# Operator form (preferred)
log = base / "logs" / "app.log"
print(log)

# Method form — same result
log2 = base.joinpath("logs", "app.log")
print(log2)

# Joining with a list (use *unpacking)
parts = ["src", "lib", "core.py"]
p = base.joinpath(*parts)
print(p)

Output:

/home/alice/logs/app.log
/home/alice/logs/app.log
/home/alice/src/lib/core.py

[!WARNING] Absolute paths on the right reset everything. Path("/home/alice") / "/etc/passwd" returns Path("/etc/passwd") — anything to the left is discarded. This mirrors os.path.join behaviour but surprises newcomers. Validate untrusted user input with is_absolute() before joining.

Resolution, normalisation, and absolute paths#

absolute() prepends the current working directory but does not resolve symlinks or .. segments. resolve(strict=False) does the full thing: it makes the path absolute and resolves symlinks, dots, and .. to a canonical form. Use resolve(strict=True) to raise FileNotFoundError if any path component is missing — a useful sanity check.

from pathlib import Path

p = Path("../docs/./readme.md")
print(p.absolute())   # cwd-prepended; not normalised
print(p.resolve())    # canonical; symlinks resolved
print(p.expanduser()) # ~ → /home/alice

Output:

/home/alice/project/../docs/./readme.md
/home/alice/docs/readme.md
../docs/./readme.md

expanduser() expands a leading ~ or ~user. Combine with resolve() for full canonicalisation: Path("~/notes.md").expanduser().resolve().

Glob and rglob — pattern matching#

glob(pattern) matches files in a directory using shell-style wildcards (*, ?, [abc], **); rglob(pattern) is the recursive form, equivalent to glob("**/" + pattern). Both return generators — wrap in list(...) or sorted(...) when you need a concrete sequence.

from pathlib import Path

root = Path("/tmp/demo_project")

# All Python files anywhere under root (recursive)
py = sorted(root.rglob("*.py"))

# Direct children only matching a pattern
direct = sorted(root.glob("src/*.py"))

# Multiple patterns with chain.from_iterable
from itertools import chain
images = sorted(chain.from_iterable(root.rglob(p)
                                     for p in ("*.png", "*.jpg", "*.webp")))

# Use ** explicitly for arbitrary-depth match
deep = sorted(root.glob("**/test_*.py"))

# Glob with character class
test_files = sorted(root.rglob("test_[a-z]*.py"))

Output: (paths printed depend on filesystem contents)

[PosixPath('/tmp/demo_project/src/app.py'), ...]

[!NOTE] rglob traverses hidden directories (starting with .) as well. If you want to skip .git or node_modules, filter the iterator or use os.walk with explicit pruning. Pathlib added a case_sensitive= argument in Python 3.12.

[!TIP] For complex filesystem queries (e.g. “Python files under 1 MB modified this week”), chain rglob with stat():

import time
cutoff = time.time() - 7*86400
recent = [p for p in root.rglob("*.py")
          if p.stat().st_size < 1_000_000
          and p.stat().st_mtime > cutoff]

iterdir and walking trees#

iterdir() yields the immediate children (files and directories) of a directory; it does not recurse. For full tree walks, use rglob("*") or Path.walk() (Python 3.12+), which mirrors os.walk but yields Path objects.

from pathlib import Path

root = Path("/tmp/demo_project")

# Direct children
for child in sorted(root.iterdir()):
    kind = "DIR " if child.is_dir() else "FILE"
    print(kind, child.name)

Output:

FILE README.md
DIR  src
DIR  tests

Path.walk() (Python 3.12+) is the modern recursive walker; on earlier versions, use os.walk(root) and wrap results in Path.

# Python 3.12+
from pathlib import Path

for dirpath, dirnames, filenames in Path("/tmp/demo_project").walk():
    # Prune unwanted dirs in-place (just like os.walk)
    dirnames[:] = [d for d in dirnames if d not in {".git", "__pycache__"}]
    for name in filenames:
        print(dirpath / name)
# Pre-3.12 fallback
import os
from pathlib import Path

for dirpath, dirnames, filenames in os.walk("/tmp/demo_project"):
    dirnames[:] = [d for d in dirnames if d not in {".git"}]
    for name in filenames:
        print(Path(dirpath) / name)

Creating directories — mkdir#

mkdir() creates a single directory and raises FileExistsError if it exists or FileNotFoundError if a parent is missing. parents=True creates missing intermediates (mkdir -p); exist_ok=True makes the call idempotent. The canonical safe-create idiom is p.mkdir(parents=True, exist_ok=True).

from pathlib import Path

target = Path("/tmp/myapp/logs/2026")

target.mkdir(parents=True, exist_ok=True)
print(target.is_dir())

# Mode argument controls permissions on Unix (subject to umask)
restricted = Path("/tmp/myapp/secrets")
restricted.mkdir(mode=0o700, exist_ok=True)
print(oct(restricted.stat().st_mode & 0o777))

Output:

True
0o700

Renaming, moving, and replacing#

rename(target) renames or moves a path; on POSIX it fails silently if target exists on the same filesystem, but cross-device moves raise. replace(target) always overwrites the target, atomically when possible — use it for safe writes (write to temp, then replace). Cross-device moves are not atomic; fall back to shutil.move for that case.

from pathlib import Path
import shutil

src = Path("/tmp/notes.txt")
src.write_text("v1")

# Rename within the same dir
new = src.rename(src.with_name("notes-2026.txt"))
print(new)

# Atomic replace (overwrites)
tmp = Path("/tmp/notes-2026.txt.tmp")
tmp.write_text("v2")
tmp.replace(new)
print(new.read_text())

# Cross-device move (or unknown FS): use shutil
shutil.move(str(new), "/tmp/archive/notes-2026.txt")

Output:

/tmp/notes-2026.txt
v2

[!TIP] The atomic-write pattern: write to path.with_suffix(path.suffix + ".tmp"), fsync, then replace(path). Readers always see either the old or the complete new file — never a partial write.

Stat and metadata#

stat() returns an os.stat_result with size, timestamps, mode, and inode. Use it to filter, sort, or audit large directories. lstat() does the same but does not follow symlinks (returns metadata of the symlink itself).

from pathlib import Path
import time

p = Path("/tmp/demo_project/src/app.py")
s = p.stat()

print(f"size:   {s.st_size} bytes")
print(f"mode:   {oct(s.st_mode & 0o777)}")
print(f"mtime:  {time.ctime(s.st_mtime)}")
print(f"is_dir: {p.is_dir()}")
print(f"owner:  {p.owner()}  group: {p.group()}")  # Unix only

Output:

size:   13 bytes
mode:   0o644
mtime:  Sun Apr 25 14:30:00 2026
is_dir: False
owner:  alice  group: alice

[!NOTE] Every p.stat() call is a syscall. When filtering thousands of files, cache the result: for p in root.rglob('*'): info = p.stat(); .... Repeating p.stat().st_size and p.stat().st_mtime doubles the I/O.

is_symlink() checks symlink status without following; resolve() follows the symlink to its real target; readlink() returns the immediate target (a single hop, not transitively resolved). Create symlinks with symlink_to() and hardlinks with hardlink_to() (Python 3.10+).

from pathlib import Path

target = Path("/tmp/real_file.txt")
target.write_text("data")

link = Path("/tmp/alias.txt")
link.unlink(missing_ok=True)
link.symlink_to(target)

print(link.is_symlink())
print(link.readlink())          # one hop
print(link.resolve())           # full canonical target
print(link.read_text())         # symlink follows transparently

Output:

True
/tmp/real_file.txt
/tmp/real_file.txt
data

unlink() removes a single file or symlink; rmdir() removes an empty directory; shutil.rmtree(path) recursively removes a whole tree. None of these are reversible — there is no recycle bin. Use missing_ok=True (Python 3.8+) to suppress the FileNotFoundError for idempotent cleanup.

from pathlib import Path
import shutil

# Single file (idempotent)
Path("/tmp/old.log").unlink(missing_ok=True)

# Empty directory
Path("/tmp/emptydir").rmdir()

# Recursive
shutil.rmtree("/tmp/demo_project", ignore_errors=True)

[!WARNING] shutil.rmtree is permanent and recursive. Pass ignore_errors=True only when you’ve already verified the path. A bug in path construction can wipe production data — always log the path first and use a confirmation in interactive scripts (click.confirm if you’re using click).

Integration with os and shutil#

Pathlib covers the high-frequency 95% of filesystem work. For the rest, os and shutil are still the right answer — every Path accepts a string via str(p) and works with os.path.* for legacy APIs that expect strings.

TaskModuleCall
Copy file (with metadata)shutilshutil.copy2(src, dst)
Copy file treeshutilshutil.copytree(src, dst)
Move (cross-device-safe)shutilshutil.move(src, dst)
Recursive deleteshutilshutil.rmtree(path)
Disk usageshutilshutil.disk_usage(path)
Free disk for treeshutilshutil.disk_usage(path).free
Temp directorytempfiletempfile.TemporaryDirectory()
Temp filetempfiletempfile.NamedTemporaryFile(delete=False)
Get/change cwdosos.getcwd() / os.chdir(path)
File modeosos.chmod(path, 0o644)
Path expansionos.pathos.path.expandvars("$HOME/...")
Walk a treeosos.walk(path) (or Path.walk() on 3.12+)
import shutil
from pathlib import Path

src  = Path("photo.jpg")
dst  = Path("/tmp/backup") / src.name
dst.parent.mkdir(parents=True, exist_ok=True)
shutil.copy2(src, dst)

# Walk for tree size
size = sum(f.stat().st_size for f in dst.parent.rglob("*") if f.is_file())
print(f"Tree size: {size} bytes")

Output:

Tree size: 245801 bytes

with_name, with_stem, with_suffix#

These three methods return new Path objects with one component swapped — they do not mutate the original (pathlib paths are immutable). Use them when transforming filenames in bulk renames or output-path derivation.

from pathlib import Path

p = Path("/var/log/app.log.gz")

print(p.with_suffix(".zip"))    # last suffix only
print(p.with_stem("alice"))     # everything before the last suffix
print(p.with_name("new.txt"))   # name = stem + suffix

# Strip every suffix with a loop
q = Path("backup.tar.gz")
while q.suffix:
    q = q.with_suffix("")
print(q)

Output:

/var/log/app.log.zip
/var/log/alice.gz
/var/log/new.txt
backup

[!TIP] p.suffixes returns all suffixes (['.tar', '.gz']) — useful for double-extension files. p.suffix is just the last one (.gz).

Comparing and sorting paths#

Path objects implement __eq__ and __hash__ based on the normalised string form; they sort lexicographically. Comparing across pure/concrete classes works as expected, but cross-OS PurePosixPath vs PureWindowsPath comparisons raise TypeError.

from pathlib import Path

ps = [Path("b/2.txt"), Path("a/10.txt"), Path("a/2.txt")]
for p in sorted(ps):
    print(p)

Output:

a/10.txt
a/2.txt
b/2.txt

[!NOTE] Lexicographic order treats 10.txt as less than 2.txt. For natural ordering, sort by (parent, int(stem)) if stem.isdigit() else (parent, stem) or use the natsort library.

Relative paths and is_relative_to#

p.relative_to(base) returns the path of p relative to base, raising ValueError if p is not under base. is_relative_to(base) (Python 3.9+) is the boolean form — no exception, just True/False.

from pathlib import Path

root = Path("/srv/myapp")
log  = Path("/srv/myapp/logs/2026/app.log")

print(log.relative_to(root))
print(log.is_relative_to(root))
print(log.is_relative_to("/etc"))

Output:

logs/2026/app.log
True
False

This is the standard guard against directory-traversal attacks: validate that user-supplied paths resolve to inside an allowed base before opening them.

from pathlib import Path

UPLOAD_ROOT = Path("/srv/uploads").resolve()

def safe_read(rel: str) -> str:
    p = (UPLOAD_ROOT / rel).resolve()
    if not p.is_relative_to(UPLOAD_ROOT):
        raise PermissionError(f"refusing to access {p}")
    return p.read_text()

Common pitfalls#

[!WARNING] Path is immutable — methods like with_suffix, with_name, and joinpath return new objects. p.with_suffix(".bak") alone does nothing; assign the result: p = p.with_suffix(".bak").

[!WARNING] exists() follows symlinks — a broken symlink returns False. Use p.is_symlink() (does not follow) to detect the symlink itself, even if its target is missing.

[!WARNING] Path("") is not a real path — it represents the current directory . after resolve() but is not equal to Path(".") for string purposes. Always use Path.cwd() or Path(".") explicitly.

[!WARNING] rename can clobber silently on POSIX — if the destination exists, POSIX rename(2) overwrites it. Use replace() for explicit overwrite or check existence first.

[!WARNING] p.stat() raises FileNotFoundError for missing paths — there is no “soft” stat. Either check exists() first or wrap in try/except FileNotFoundError.

[!WARNING] rglob follows symlinks, which can cause infinite loops on symlink cycles. Pass follow_symlinks=False (Python 3.13+) or filter out symlinks manually.

Real-world recipes#

Recursive size of a directory#

Summing stat().st_size across rglob("*") is the pathlib idiom for “how big is this directory?”. Filter is_file() to skip the directory entries themselves.

from pathlib import Path

def tree_size(root: Path) -> int:
    return sum(p.stat().st_size for p in root.rglob("*") if p.is_file())

print(f"{tree_size(Path('/tmp/demo_project'))} bytes")

Output:

36 bytes

Batch rename — date-prefix every file#

Walk a directory, build the new name with with_name, and call rename. Always print the planned rename first in dry-run mode before applying.

from pathlib import Path
from datetime import datetime

def date_prefix(root: Path, dry_run=True):
    today = datetime.now().strftime("%Y-%m-%d")
    for p in sorted(root.iterdir()):
        if p.is_file() and not p.name.startswith(today):
            target = p.with_name(f"{today}_{p.name}")
            print(f"{'DRY' if dry_run else 'MV '} {p} -> {target}")
            if not dry_run:
                p.rename(target)

date_prefix(Path("/tmp/inbox"), dry_run=True)

Output:

DRY /tmp/inbox/photo.jpg -> /tmp/inbox/2026-05-25_photo.jpg
DRY /tmp/inbox/notes.md  -> /tmp/inbox/2026-05-25_notes.md

Find the largest N files under a tree#

Build a list of (size, path) tuples from rglob, then use heapq.nlargest for an O(n log k) top-K.

import heapq
from pathlib import Path

def largest(root: Path, n: int = 10):
    files = ((p.stat().st_size, p) for p in root.rglob("*") if p.is_file())
    for size, p in heapq.nlargest(n, files, key=lambda pair: pair[0]):
        print(f"{size:>10}  {p}")

largest(Path.home(), n=5)

Output:

  10240000  /home/alice/Videos/clip.mp4
   2456789  /home/alice/Downloads/dataset.csv
    893456  /home/alice/Pictures/photo.jpg
    102400  /home/alice/notes.tar.gz
     45120  /home/alice/.bash_history

Atomic config write#

Write to a temp sibling and replace the target. This guarantees readers never see a partially written file. Pair with os.fsync if you need durability across power loss.

from pathlib import Path
import json, os

def write_atomic(path: Path, data: dict) -> None:
    tmp = path.with_suffix(path.suffix + ".tmp")
    with tmp.open("w", encoding="utf-8") as f:
        json.dump(data, f, indent=2)
        f.flush()
        os.fsync(f.fileno())
    tmp.replace(path)

write_atomic(Path("/tmp/config.json"), {"host": "myhost", "port": 8080})

Find git repos under a tree#

Recursively locate every directory containing a .git subdirectory. The trick is to prune the walk once .git is found — don’t descend into it.

from pathlib import Path

def find_repos(root: Path):
    for p in root.rglob(".git"):
        if p.is_dir():
            yield p.parent

for repo in find_repos(Path.home() / "Code"):
    print(repo)

Output:

/home/alice/Code/jockey
/home/alice/Code/dotfiles
/home/alice/Code/notes