A pattern-matching mini-language for searching, validating, and rewriting text — implemented (with subtly different dialects) by every modern language and CLI tool.
Parse, search, and mutate HTML/XML with BeautifulSoup 4. Covers parser choice (html.parser/lxml/html5lib), find/find_all/select, tree navigation, attribute access, and pairing with requests/httpx/playwright for end-to-end scraping.
Work with dates, times, and timezones in Python using the stdlib datetime module and zoneinfo. Covers aware vs naive datetimes, ISO-8601 parsing, strftime/strptime, timedelta arithmetic, and DST handling.
Encode and decode JSON in Python with the stdlib json module. Covers dumps/loads, indent/sort_keys/separators, custom default= and JSONEncoder, object_hook decoding, JSONL streaming, and orjson/ujson/msgspec comparison.
Extract structured text from PDFs, Word docs, HTML, images, and more with the unstructured library. Covers partitioning, chunking, cleaning, metadata, and pipeline integrations.
navigation
actions
cheat sheet pages