skip to content

dataclasses — Boilerplate-Free Classes

Define typed data containers with @dataclass — frozen, slots, kw_only, default_factory, __post_init__, asdict, replace, and how it compares to attrs, pydantic, NamedTuple, TypedDict.

13 min read 41 snippets deep dive

dataclasses — Boilerplate-Free Classes#

What it is#

dataclasses is the standard library module added in Python 3.7 (PEP 557) that auto-generates __init__, __repr__, __eq__, and optionally __hash__ / __lt__ from class-level type annotations. Reach for it when you want a typed record (a config, a row, a DTO, a message) without writing constructor boilerplate, and you don’t need runtime validation. For runtime validation against types use pydantic; for finer control over the generated methods use attrs. dataclasses lives in between — zero dependencies, pure Python, no coercion.

Install#

dataclasses is part of the Python standard library (3.7+) and requires no installation. Verify it loads:

python -c "from dataclasses import dataclass; print(dataclass)"

Output:

<function dataclass at 0x7f9c1a2b3e60>

Quick example#

The @dataclass decorator inspects the class body’s annotated attributes and synthesizes the dunder methods you’d otherwise write by hand. The annotations are not enforced at runtime — they’re hints for tools (mypy, IDEs) and metadata for dataclasses.fields().

from dataclasses import dataclass

@dataclass
class User:
    name: str
    email: str
    age: int = 0
    active: bool = True

u = User(name="Alice Dev", email="alice@example.com", age=30)
print(u)
print(u == User("Alice Dev", "alice@example.com", 30))

Output:

User(name='Alice Dev', email='alice@example.com', age=30, active=True)
True

What the decorator generates#

By default @dataclass synthesizes __init__, __repr__, and __eq__. Each can be turned off via decorator arguments; additional methods are opt-in. The full set of toggles is:

ArgumentDefaultEffect
init=TrueonGenerate __init__
repr=TrueonGenerate __repr__
eq=TrueonGenerate __eq__
order=FalseoffGenerate __lt__, __le__, __gt__, __ge__
unsafe_hash=FalseoffForce __hash__ even when not safe
frozen=FalseoffMake instances immutable (no attribute assignment)
match_args=TrueonGenerate __match_args__ for structural pattern matching (3.10+)
kw_only=FalseoffAll fields are keyword-only (3.10+)
slots=FalseoffGenerate __slots__ (3.10+)
weakref_slot=FalseoffAdd a __weakref__ slot when slots=True (3.11+)
from dataclasses import dataclass

@dataclass(order=True, frozen=True, slots=True)
class Point:
    x: int
    y: int

a, b = Point(1, 2), Point(3, 4)
print(sorted([b, a]))

Output:

[Point(x=1, y=2), Point(x=3, y=4)]

field() — per-field configuration#

When the class default isn’t enough — mutable defaults, init exclusion, hash exclusion, metadata — wrap the default in field(...). It’s the per-attribute equivalent of the decorator arguments.

field() argEffect
defaultDefault value
default_factoryZero-arg callable for the default (lists, dicts, sets, dataclasses)
initIf False, exclude from __init__
reprIf False, exclude from __repr__
compareIf False, exclude from __eq__ and __lt__
hashIf False, exclude from __hash__
metadataArbitrary mapping kept on the Field object (use for docs, validators, serializers)
kw_onlyPer-field keyword-only flag (3.10+)
from dataclasses import dataclass, field

@dataclass
class Cart:
    user_id: int
    items: list[str] = field(default_factory=list)
    discount: float = 0.0
    _cache: dict = field(default_factory=dict, repr=False, compare=False)

c = Cart(user_id=42)
c.items.append("book")
print(c)

Output:

Cart(user_id=42, items=['book'], discount=0.0)

[!WARNING] Mutable defaults raise. Writing items: list = [] raises ValueError: mutable default <class 'list'> for field items is not allowed: use default_factory. The error catches the classic shared-state bug at class-creation time.

__post_init__ — the validation hook#

__post_init__ runs automatically at the end of the generated __init__, after all fields are assigned. Use it for validation, normalization, or computing derived fields. Combine with field(init=False) to populate an attribute the user never passes.

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Order:
    id: int
    items: list[str]
    total: float
    created_at: datetime = field(init=False)

    def __post_init__(self) -> None:
        if self.total < 0:
            raise ValueError("total cannot be negative")
        self.created_at = datetime.now()

o = Order(id=1, items=["a", "b"], total=19.99)
print(o.created_at.year, o.total)

Output:

2026 19.99

InitVar — pseudo-fields visible only to __post_init__#

InitVar[T] is an annotation that creates a constructor parameter without storing it as an attribute. The value is forwarded to __post_init__ and then discarded. Useful for derivation inputs (e.g. raw password -> hashed) that should never persist on the instance.

from dataclasses import dataclass, field, InitVar
import hashlib

@dataclass
class User:
    username: str
    password_hash: str = field(init=False)
    password: InitVar[str] = ""

    def __post_init__(self, password: str) -> None:
        self.password_hash = hashlib.sha256(password.encode()).hexdigest()

u = User(username="alicedev", password="hunter2")
print(u)                                # no `password` attr
print(hasattr(u, "password"))           # False

Output:

User(username='alicedev', password_hash='f52fbd32b2b3b86ff88ef6c490628285f482af15ddcb29541f94bcf526a3f6c7')
False

frozen=True — immutable instances#

frozen=True makes attribute assignment raise FrozenInstanceError. Frozen dataclasses are hashable by default (so they can live in sets and dict keys) — exactly what you want for config objects, message envelopes, and value types.

from dataclasses import dataclass, FrozenInstanceError

@dataclass(frozen=True)
class AppConfig:
    debug: bool
    host: str = "127.0.0.1"
    port: int = 8000

cfg = AppConfig(debug=False)
try:
    cfg.port = 9000
except FrozenInstanceError as e:
    print("immutable:", e)

# Hashable -> usable as dict key
configs = {cfg: "default"}
print(configs)

Output:

immutable: cannot assign to field 'port'
{AppConfig(debug=False, host='127.0.0.1', port=8000): 'default'}

[!NOTE] frozen=True is shallow: the references are frozen, the objects they point to are not. cfg.items.append(...) still works if items is a list. Combine with tuple / frozenset for full immutability.

slots=True — memory + speed win (3.10+)#

slots=True generates __slots__, removing the per-instance __dict__. The benefits: ~50% less memory per instance, ~20% faster attribute access, and a guard against typo-attribute-creation (u.naem = "..." raises AttributeError).

from dataclasses import dataclass
import sys

@dataclass
class WithoutSlots:
    x: int
    y: int

@dataclass(slots=True)
class WithSlots:
    x: int
    y: int

a, b = WithoutSlots(1, 2), WithSlots(1, 2)
print("__dict__ ->", sys.getsizeof(a.__dict__))
try:
    print(b.__dict__)
except AttributeError as e:
    print("slotted has no __dict__:", e)

try:
    b.z = 3
except AttributeError as e:
    print("typo guard:", e)

Output:

__dict__ -> 296
slotted has no __dict__: 'WithSlots' object has no attribute '__dict__'
typo guard: 'WithSlots' object has no attribute 'z'

[!WARNING] slots=True creates a brand-new class, not the one decorated. Decorators applied below @dataclass(slots=True) see the slotted class; references to the original (e.g. weakref, base classes referenced before decoration) won’t.

kw_only=True — force keyword-only arguments (3.10+)#

kw_only=True (decorator-level) makes every field keyword-only in __init__. At the field level, kw_only=True marks individual fields as keyword-only — handy for adding a default-bearing field after non-default fields in a subclass.

from dataclasses import dataclass, field

@dataclass(kw_only=True)
class Request:
    method: str
    path: str
    body: bytes = b""
    timeout: float = 30.0

# Positional args now raise
try:
    Request("GET", "/")
except TypeError as e:
    print(e)

r = Request(method="GET", path="/")
print(r)

Output:

Request.__init__() takes 1 positional argument but 3 were given
Request(method='GET', path='/', body=b'', timeout=30.0)

The per-field form solves a common inheritance headache:

from dataclasses import dataclass, field

@dataclass
class Base:
    name: str

@dataclass
class Child(Base):
    note: str = field(kw_only=True)   # allowed even though Base has no default
    enabled: bool = True

Output: (none — type defines correctly)

order=True — comparison operators#

order=True synthesizes <, <=, >, >= based on tuple comparison of the fields in declaration order. Combined with frozen=True, dataclasses become hashable, orderable value types — drop-in replacements for tuple with named fields.

from dataclasses import dataclass

@dataclass(order=True, frozen=True)
class Version:
    major: int
    minor: int
    patch: int

v1 = Version(1, 9, 0)
v2 = Version(1, 10, 0)
print(v1 < v2)
print(sorted([v2, v1, Version(2, 0, 0)]))

Output:

True
[Version(major=1, minor=9, patch=0), Version(major=1, minor=10, patch=0), Version(major=2, minor=0, patch=0)]

asdict / astuple — convert to plain containers#

dataclasses.asdict(obj) deeply converts a dataclass (and any nested dataclasses, lists, dicts, tuples it contains) into plain dicts. astuple does the same to tuples. Use asdict for JSON serialization (after a default=str cleanup pass for non-JSON-native types).

from dataclasses import dataclass, field, asdict, astuple
import json

@dataclass
class Address:
    street: str
    city: str

@dataclass
class User:
    name: str
    address: Address
    tags: list[str] = field(default_factory=list)

u = User("Alice Dev", Address("1 Main St", "Springfield"), ["admin"])
print(asdict(u))
print(astuple(u))
print(json.dumps(asdict(u)))

Output:

{'name': 'Alice Dev', 'address': {'street': '1 Main St', 'city': 'Springfield'}, 'tags': ['admin']}
('Alice Dev', ('1 Main St', 'Springfield'), ['admin'])
{"name": "Alice Dev", "address": {"street": "1 Main St", "city": "Springfield"}, "tags": ["admin"]}

replace — copy-with-overrides#

dataclasses.replace(obj, **changes) creates a new instance with the listed fields overridden — the equivalent of namedtuple._replace. Indispensable for frozen dataclasses where you can’t mutate in place.

from dataclasses import dataclass, replace

@dataclass(frozen=True)
class Window:
    title: str
    width: int = 800
    height: int = 600

w = Window("editor")
w2 = replace(w, title="editor (modified)", width=1024)
print(w)
print(w2)

Output:

Window(title='editor', width=800, height=600)
Window(title='editor (modified)', width=1024, height=600)

fields() — introspection#

dataclasses.fields(cls) returns a tuple of Field objects describing every declared field — name, type, default, metadata. Useful for writing serializers, form builders, or CLI generators that walk a dataclass at runtime.

from dataclasses import dataclass, field, fields

@dataclass
class Setting:
    key: str
    value: str = field(metadata={"help": "the value to set"})
    sensitive: bool = field(default=False, metadata={"help": "redact in logs"})

for f in fields(Setting):
    print(f"{f.name:<10} {f.type!s:<10} default={f.default!r:<10} {f.metadata}")

Output:

key        <class 'str'> default=<dataclasses._MISSING_TYPE object at 0x7f9c...> {}
value      <class 'str'> default=<dataclasses._MISSING_TYPE object at 0x7f9c...> {'help': 'the value to set'}
sensitive  <class 'bool'> default=False    {'help': 'redact in logs'}

Inheritance#

A subclass @dataclass can add fields and override defaults, but all fields with defaults must come after fields without defaults across the merged MRO — otherwise you get TypeError: non-default argument follows default argument. Use kw_only=True to escape this constraint cleanly.

from dataclasses import dataclass, field

@dataclass
class Animal:
    name: str
    species: str

@dataclass(kw_only=True)
class Pet(Animal):
    owner: str
    nickname: str = ""

p = Pet(name="Whiskers", species="cat", owner="alicedev", nickname="Mr. W")
print(p)

Output:

Pet(name='Whiskers', species='cat', owner='alicedev', nickname='Mr. W')

Structural pattern matching (3.10+)#

@dataclass generates __match_args__ from the positional __init__ parameters, enabling match/case patterns by class and field. This is the cleanest way to dispatch on a sum-type style hierarchy.

from dataclasses import dataclass

@dataclass
class Move:
    dx: int
    dy: int

@dataclass
class Quit:
    reason: str

def step(event):
    match event:
        case Move(dx=0, dy=dy):
            return f"vertical {dy}"
        case Move(dx, dy):
            return f"step {dx},{dy}"
        case Quit(reason):
            return f"quit: {reason}"

print(step(Move(0, 5)))
print(step(Move(3, -2)))
print(step(Quit("user-asked")))

Output:

vertical 5
step 3,-2
quit: user-asked

dataclasses vs alternatives#

The Python ecosystem now offers half a dozen ways to define a record type. The right choice depends on whether you want runtime validation, immutability, performance, or zero dependencies.

ToolValidates at runtimeFrozen by defaultMutable defaultsSubclass-ableNotes
dataclasses (stdlib)NoNo (opt-in)default_factoryYesZero deps, minimal magic
pydantic v2YesNo (opt-in)ValidatorsYesCoerces; great for I/O
attrsOptional via validator=Opt-inBuilt-in slotsYesPredecessor; richer config
msgspec.StructYes (fast)Opt-inYesYesFastest serializer
typing.NamedTupleNoYes (tuple)NoLimitedIndexable; immutable tuple
typing.TypedDictNo (mypy only)No (it’s a dict)n/aYes (multiple inheritance)Annotation for dict shape
collections.namedtupleNoYesNoLimitedPre-PEP-526; avoid for new code
from dataclasses import dataclass
from typing import NamedTuple, TypedDict

@dataclass
class UserDC:
    name: str
    age: int = 0

class UserNT(NamedTuple):
    name: str
    age: int = 0

class UserTD(TypedDict):
    name: str
    age: int

dc = UserDC("Alice", 30)
nt = UserNT("Alice", 30)
td: UserTD = {"name": "Alice", "age": 30}

print(dc.name, nt[0], td["name"])
print(type(dc).__name__, type(nt).__name__, type(td).__name__)

Output:

Alice Alice Alice
UserDC UserNT dict

When to pick which#

  • dataclasses — internal records, configs, CLI option objects, message envelopes. No external data crossing the boundary, no need to validate.
  • pydantic — anything parsed from JSON/YAML/env vars/HTTP. Use it when bad input is a runtime concern.
  • msgspec.Struct — high-volume serialization (event streams, MQ payloads). Beats both above on throughput.
  • NamedTuple — when you also want tuple unpacking and indexing semantics.
  • TypedDict — when the wire format is already a dict and you want type hints without converting.

Common pitfalls#

  1. Bare mutable defaults are an erroritems: list = [] raises at class creation. Use field(default_factory=list).
  2. Field ordering with defaults + inheritance — a default-less field in a subclass after a default-bearing field in the parent raises TypeError. Use kw_only=True.
  3. frozen=True is shallow — references to mutable objects can still be mutated through. Combine with immutable types (tuple, frozenset, frozen dataclass) for transitively-immutable values.
  4. __eq__ with frozen=True still requires same class — two instances of two different frozen dataclasses with identical fields are not equal. Compare asdict() or use a single class.
  5. __init__ is auto-generated and silently overrides yours — defining __init__ in a @dataclass body shadows the synth and breaks default_factory. Use __post_init__ instead.
  6. asdict is recursive and deep-copies — for large nested structures this is expensive. Use dataclasses.fields() + manual conversion for hot paths.
  7. Type annotations are not enforcedUser(name=42, age="thirty") succeeds. Use pydantic or write a __post_init__ validator if you need runtime checks.
  8. ClassVar and InitVar are specialClassVar[T] skips field generation entirely; InitVar[T] creates an __init__ arg but no attribute. Forgetting either makes things appear or disappear unexpectedly.
  9. slots=True returns a new classOrigCls is not OrigCls after decoration. Code that relies on identity (caches keyed by class object) breaks.

Real-world recipes#

Config object loaded from env vars (frozen + slotted)#

A common deployment pattern: load every setting from environment variables once at startup, freeze the result so no code can accidentally mutate it, and slot it for compactness.

from dataclasses import dataclass, field
import os

def _bool_env(key: str, default: bool = False) -> bool:
    return os.environ.get(key, str(default)).lower() in {"1", "true", "yes", "on"}

@dataclass(frozen=True, slots=True, kw_only=True)
class AppConfig:
    database_url: str
    redis_url: str = "redis://localhost:6379/0"
    debug: bool = False
    log_level: str = "INFO"
    workers: int = 4
    allowed_hosts: tuple[str, ...] = ()

    @classmethod
    def from_env(cls) -> "AppConfig":
        hosts = os.environ.get("ALLOWED_HOSTS", "")
        return cls(
            database_url=os.environ["DATABASE_URL"],
            redis_url=os.environ.get("REDIS_URL", "redis://localhost:6379/0"),
            debug=_bool_env("DEBUG"),
            log_level=os.environ.get("LOG_LEVEL", "INFO"),
            workers=int(os.environ.get("WORKERS", "4")),
            allowed_hosts=tuple(h.strip() for h in hosts.split(",") if h.strip()),
        )

os.environ["DATABASE_URL"] = "postgres://alicedev@myhost.local/app"
cfg = AppConfig.from_env()
print(cfg)

Output:

AppConfig(database_url='postgres://alicedev@myhost.local/app', redis_url='redis://localhost:6379/0', debug=False, log_level='INFO', workers=4, allowed_hosts=())

Round-trip through JSON#

A dataclass round-trips cleanly through JSON if every field is a JSON-native type or has a known converter. Build from_dict and to_json helpers and you have a zero-dependency serializer.

from dataclasses import dataclass, field, asdict
from datetime import datetime
import json

@dataclass
class Event:
    id: int
    name: str
    created_at: datetime = field(default_factory=datetime.now)
    tags: list[str] = field(default_factory=list)

    def to_json(self) -> str:
        return json.dumps(asdict(self), default=str)

    @classmethod
    def from_dict(cls, d: dict) -> "Event":
        return cls(
            id=d["id"],
            name=d["name"],
            created_at=datetime.fromisoformat(d["created_at"]),
            tags=list(d.get("tags", [])),
        )

e = Event(id=1, name="signup", tags=["alpha"])
blob = e.to_json()
print(blob)
print(Event.from_dict(json.loads(blob)))

Output:

{"id": 1, "name": "signup", "created_at": "2026-05-25 15:01:11.842371", "tags": ["alpha"]}
Event(id=1, name='signup', created_at=datetime.datetime(2026, 5, 25, 15, 1, 11, 842371), tags=['alpha'])

Pattern-matched message dispatch#

A frozen dataclass per message variant + match is the idiomatic way to write a state machine, parser, or reducer. The code reads top-down like a spec.

from dataclasses import dataclass

@dataclass(frozen=True)
class Connect:
    host: str
    port: int

@dataclass(frozen=True)
class Send:
    payload: bytes

@dataclass(frozen=True)
class Disconnect:
    reason: str

def handle(msg):
    match msg:
        case Connect(host=h, port=p):
            return f"opening connection to {h}:{p}"
        case Send(payload=b) if len(b) > 1024:
            return f"chunking {len(b)} bytes"
        case Send(payload=b):
            return f"sending {len(b)} bytes"
        case Disconnect(reason=r):
            return f"closed: {r}"

print(handle(Connect("myhost.local", 5432)))
print(handle(Send(b"x" * 2048)))
print(handle(Disconnect("timeout")))

Output:

opening connection to myhost.local:5432
chunking 2048 bytes
closed: timeout

Diff two configs#

Walking fields() of a dataclass produces a compact, reusable diff for any pair of same-typed instances. Useful for “what changed?” log lines on reload.

from dataclasses import dataclass, fields

@dataclass(frozen=True)
class Config:
    debug: bool = False
    workers: int = 4
    host: str = "127.0.0.1"

def diff(a, b) -> dict:
    return {
        f.name: (getattr(a, f.name), getattr(b, f.name))
        for f in fields(a)
        if getattr(a, f.name) != getattr(b, f.name)
    }

old = Config()
new = Config(debug=True, workers=8)
print(diff(old, new))

Output:

{'debug': (False, True), 'workers': (4, 8)}