pandas — DataFrames#

What it is#

pandas provides two main structures: DataFrame (2-D labeled table) and Series (1-D labeled array). It is the standard library for reading, cleaning, transforming, and analyzing tabular data in Python.

Install#

pip install pandas

Quick example#

import pandas as pd

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "dept": ["Eng", "Eng", "HR"],
    "score": [92, 78, 85],
})
print(df[df["score"] > 80])

Output:

      name dept  score
0    Alice  Eng     92
2  Charlie   HR     85

When / why to use it#

Reading CSV, Excel, JSON, or SQL tables: pd.read_csv(), pd.read_excel(), pd.read_sql().
Cleaning and reshaping data before analysis or ML.
Exploratory data analysis with df.describe(), groupby, pivot_table.
Joining multiple datasets: pd.merge().

Common pitfalls#

[!WARNING] SettingWithCopyWarning — modifying a slice of a DataFrame may not modify the original. Use .loc for assignment:
df.loc[df["score"] > 80, "grade"] = "A"  # ✅ correct
df[df["score"] > 80]["grade"] = "A"       # ❌ may silently do nothing

[!WARNING] inplace=True is a trap — many operations accept inplace=True but it doesn’t save memory and makes code harder to chain. Prefer reassignment: df = df.dropna().

[!TIP] Use df.dtypes and df.info() early to understand what you loaded. Numeric columns imported as object (string) will break aggregations silently.

Reading and writing files#

import pandas as pd

# CSV
df = pd.read_csv("data.csv")
df.to_csv("output.csv", index=False)

# Excel
df = pd.read_excel("report.xlsx", sheet_name="Sheet1")

# JSON
df = pd.read_json("records.json", orient="records")

# from a SQL query
import sqlite3
conn = sqlite3.connect("app.db")
df = pd.read_sql("SELECT * FROM users WHERE active = 1", conn)

Richer example — groupby and merge#

import pandas as pd

sales = pd.DataFrame({
    "product": ["A", "B", "A", "B", "A"],
    "region": ["East", "East", "West", "West", "East"],
    "amount": [100, 200, 150, 250, 120],
})

# Group by region and product, aggregate
summary = (
    sales.groupby(["region", "product"])["amount"]
    .agg(total="sum", orders="count")
    .reset_index()
)
print(summary)
print()

# Merge with a product lookup table
products = pd.DataFrame({"product": ["A", "B"], "name": ["Widget", "Gadget"]})
merged = summary.merge(products, on="product")
print(merged[["region", "name", "total", "orders"]])

Output:

  region product  total  orders
0   East       A    220       2
1   East       B    200       1
2   West       A    150       1
3   West       B    250       1

  region    name  total  orders
0   East  Widget    220       2
1   East  Gadget    200       1
2   West  Widget    150       1
3   West  Gadget    250       1

Useful operations quick reference#

Task	Code
Filter rows	`df[df["col"] > 5]`
Select columns	`df[["a", "b"]]`
Add column	`df["new"] = df["a"] * 2`
Drop rows with NaN	`df.dropna(subset=["col"])`
Fill NaN	`df["col"].fillna(0)`
Sort	`df.sort_values("col", ascending=False)`
Rename columns	`df.rename(columns={"old": "new"})`
Apply function	`df["col"].apply(lambda x: x.strip())`
Pivot	`df.pivot_table(values="sales", index="region", columns="product", aggfunc="sum")`
Value counts	`df["status"].value_counts()`
Describe	`df.describe()`
Shape	`df.shape`

g h	home
g l	Linux section
g w	Windows section
g z	z/OS section
g o	macOS section
g a	AI section
g p	Python section
g g	graph view
g t	tags index

⌘K / /	open search palette
t	cycle theme (dark → light → system)
?	toggle this panel

[ / ]	previous / next sheet in section
j / k	scroll down / up