numpy — Numerical Arrays#
What it is#
NumPy provides the ndarray — a fixed-type, C-backed multi-dimensional array. It is the foundation of the entire Python scientific stack (pandas, scipy, matplotlib all build on it). Vectorized operations on ndarray are typically 100–1000× faster than equivalent pure-Python loops.
Install#
pip install numpy
Output: (none — exits 0 on success)
Quick example#
import numpy as np
a = np.array([1, 2, 3, 4, 5])
print(a * 2)
print(f"mean={a.mean():.1f}, std={a.std():.4f}")
print(np.dot(a, a))
Output:
[ 2 4 6 8 10]
mean=3.0, std=1.4142
55
When / why to use it#
- You need fast numeric computation without pandas overhead.
- Working with images, audio, or other multi-dimensional data (images are just 3-D arrays).
- Linear algebra (matrix multiply, decompositions).
- Feeding data into machine learning libraries (all expect numpy arrays or array-like objects).
Common pitfalls#
[!WARNING]
dtypedefaults can surprise you —np.array([1, 2, 3])createsint64, butnp.zeros((3, 3))createsfloat64. Mixing dtypes in operations silently upcasts. Be explicit:np.array([1, 2], dtype=np.float32).
[!WARNING] Views vs copies — slicing a numpy array returns a view, not a copy. Modifying the slice modifies the original. Use
.copy()when you need independence:b = a[1:3].copy().
[!TIP] Check shapes before operations:
a.shape,a.dtype,a.ndim. Mismatched shapes are the #1 cause ofValueError: operands could not be broadcast.
Array creation#
NumPy provides a family of constructors for the common cases: np.array() converts Python lists; np.zeros/np.ones/np.full fill with a constant; np.arange mirrors Python’s range but returns an array; np.linspace gives evenly spaced floats inclusive of the endpoint. Always specify dtype explicitly when the default (float64 or int64) isn’t what you need.
import numpy as np
np.array([1, 2, 3]) # from list
np.zeros((3, 4)) # 3×4 zeros (float64)
np.ones((2, 2), dtype=int) # 2×2 ones (int)
np.eye(3) # 3×3 identity
np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1.0]
np.random.default_rng(42).random((3, 3)) # random 3×3 (seeded)
Indexing and slicing#
Basic indexing (a[1, 2], a[:, 1]) returns a view — modifying it modifies the original. Boolean masks (a[a > 5]) and fancy indexing (integer arrays) return copies. For multi-dimensional arrays, each comma-separated expression selects along one axis.
a = np.arange(12).reshape(3, 4)
print(a)
print(a[1, 2]) # row 1, col 2
print(a[:, 1]) # all rows, col 1
print(a[a > 5]) # boolean mask → 1-D array
Output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
6
[1 5 9]
[ 6 7 8 9 10 11]
Richer example — matrix operations and broadcasting#
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print("Matrix product:")
print(A @ B)
print("\nElement-wise product:")
print(A * B)
# Broadcasting: add a row vector to every row of a matrix
offsets = np.array([10, 20])
print("\nBroadcast add:")
print(A + offsets)
# Trig over evenly spaced angles
x = np.linspace(0, 2 * np.pi, 5)
print("\nsin values:")
print(np.round(np.sin(x), 4))
Output:
Matrix product:
[[19 22]
[43 50]]
Element-wise product:
[[ 5 12]
[21 32]]
Broadcast add:
[[11 22]
[13 24]]
sin values:
[ 0. 1. 0. -1. -0.]
Useful functions quick reference#
| Function | Purpose |
|---|---|
np.concatenate([a, b], axis=0) | Join arrays along an axis |
np.stack([a, b], axis=0) | Stack along a new axis |
np.split(a, 3) | Split into N equal parts |
np.sort(a) | Sorted copy |
np.argsort(a) | Indices that would sort the array |
np.unique(a) | Unique values |
np.where(a > 0, a, 0) | Conditional element selection |
np.clip(a, 0, 1) | Clamp values to [0, 1] |
np.linalg.norm(a) | Euclidean norm |
np.linalg.eig(A) | Eigenvalues and eigenvectors |
np.fft.fft(signal) | Fast Fourier transform |
dtypes in depth#
A NumPy dtype records the binary layout of each element: kind (int, float, complex, bool, bytes, datetime), width in bytes, and byte order. Mismatched dtypes silently upcast (an int8 + float64 returns float64) and can blow up memory by 8x on big arrays, so set the dtype explicitly whenever the default doesn’t match the data.
import numpy as np
print(np.array([1, 2, 3]).dtype) # platform default int (usually int64)
print(np.array([1.0, 2.0]).dtype) # float64
print(np.array([True, False]).dtype) # bool, 1 byte each
print(np.array(["abc", "de"]).dtype) # <U3 (Unicode, width 3)
print(np.zeros(3, dtype=np.float32).itemsize) # 4 bytes per element
print(np.zeros(3, dtype=np.float64).itemsize) # 8 bytes per element
Output:
int64
float64
bool
<U3
4
8
Cast with .astype() — it always returns a copy:
a = np.array([1.7, 2.3, 3.9])
print(a.astype(np.int32)) # truncates toward zero
print(np.round(a).astype(int)) # round-half-to-even, then cast
Output:
[1 2 3]
[2 2 4]
[!TIP] For 100M+ element arrays consider
float32overfloat64andint8/int16overint64— half/quarter the RAM and faster on SIMD hardware. The downside is reduced range and precision.
Broadcasting rules#
Broadcasting is how NumPy applies an operation to arrays of different shapes without materialising an explicit copy. The rules: starting from the right, two dimensions are compatible if they are equal or one of them is 1. A length-1 axis is virtually “stretched” along the other operand’s matching axis. If any non-1 pair disagrees, you get ValueError: operands could not be broadcast together.
import numpy as np
a = np.arange(12).reshape(3, 4) # (3, 4)
row = np.array([10, 20, 30, 40]) # (4,) — stretched along axis 0
col = np.array([[100], [200], [300]]) # (3, 1) — stretched along axis 1
print(a + row)
print()
print(a + col)
Output:
[[10 21 32 43]
[14 25 36 47]
[18 29 40 51]]
[[100 101 102 103]
[204 205 206 207]
[308 309 310 311]]
Use np.newaxis (or None) to introduce a length-1 axis when you need to broadcast against a different axis:
a = np.array([1, 2, 3])
b = np.array([10, 20])
# outer-product style (3,) * (2,) -> (3, 2)
print(a[:, np.newaxis] * b)
Output:
[[10 20]
[20 40]
[30 60]]
Fancy indexing and boolean masks#
Fancy indexing uses integer arrays to pick arbitrary elements; boolean indexing uses a same-shape bool array as a mask. Both return copies — unlike basic slicing, you cannot use them to assign through a view in chained form. Combined indexing (a[idx, :]) can be cryptic, so name your index arrays.
import numpy as np
a = np.arange(10) * 10
# Fancy indexing — pick specific positions
idx = np.array([0, 3, 7, 9])
print(a[idx])
# Boolean mask
mask = (a > 20) & (a < 80)
print(a[mask])
# 2-D fancy indexing — pair rows with cols
m = np.arange(16).reshape(4, 4)
print(m[[0, 2, 3], [1, 1, 3]]) # picks (0,1), (2,1), (3,3)
Output:
[ 0 30 70 90]
[30 40 50 60 70]
[ 1 9 15]
[!WARNING]
a[mask] = 0modifiesain place even thougha[mask]returns a copy, because Python translates this intoa.__setitem__(mask, 0). The same does NOT work fora[mask][:] = 0— the intermediate copy is what gets mutated.
Vectorisation vs Python loops#
A Python-level for loop on an ndarray is 50–500x slower than the equivalent vectorised expression because each iteration crosses the C/Python boundary. Almost any element-wise operation has a ufunc form (+, np.sin, np.where); use it. When you absolutely cannot vectorise, drop into Numba, Cython, or a list comprehension over scalars — never for i in range(len(a)): a[i] = ....
import numpy as np
import time
x = np.arange(1_000_000, dtype=np.float64)
# Slow: Python loop
t0 = time.perf_counter()
out = np.empty_like(x)
for i in range(len(x)):
out[i] = x[i] ** 2 + 1
slow = time.perf_counter() - t0
# Fast: vectorised
t0 = time.perf_counter()
out = x ** 2 + 1
fast = time.perf_counter() - t0
print(f"loop: {slow*1000:.1f} ms")
print(f"vec: {fast*1000:.1f} ms")
print(f"speedup: {slow/fast:.0f}x")
Output (typical):
loop: 240.0 ms
vec: 3.2 ms
speedup: 75x
Linear algebra (np.linalg)#
np.linalg covers the common matrix routines: solve (Ax=b), inv, det, eig, svd, qr, cholesky, and matrix_rank. Always prefer solve(A, b) over inv(A) @ b — it is faster and numerically more stable. For larger problems or specialised decompositions, switch to scipy.linalg, which links to system LAPACK/BLAS.
import numpy as np
A = np.array([[3.0, 1.0], [1.0, 2.0]])
b = np.array([9.0, 8.0])
x = np.linalg.solve(A, b)
print("x =", x)
print("det =", np.linalg.det(A))
print("rank =", np.linalg.matrix_rank(A))
# Singular Value Decomposition
U, S, Vt = np.linalg.svd(A)
print("singular values =", S.round(4))
Output:
x = [2. 3.]
det = 5.0
rank = 2
singular values = [3.6180 1.382 ]
Random numbers — modern Generator API#
The legacy np.random.rand / np.random.seed functions use a global, non-thread-safe state and are deprecated for new code. Use np.random.default_rng(seed) instead — it returns a Generator with explicit state, better statistical quality (PCG64), and faster sampling. Always seed in tests and pipelines so results are reproducible.
import numpy as np
rng = np.random.default_rng(42)
print(rng.random(3)) # uniform [0, 1)
print(rng.integers(0, 10, size=5)) # uniform ints [0, 10)
print(rng.normal(loc=0.0, scale=1.0, size=4)) # N(0, 1)
print(rng.choice(["A", "B", "C"], size=5, p=[0.5, 0.3, 0.2]))
print(rng.permutation(np.arange(5))) # shuffled copy
Output:
[0.77395605 0.43887844 0.85859792]
[1 7 6 4 4]
[ 0.42175859 -1.06560158 0.54470015 0.0717954 ]
['A' 'A' 'B' 'A' 'C']
[2 4 0 3 1]
Structured arrays and record arrays#
A structured dtype lets one ndarray hold heterogeneous fields per element — like a fixed-schema, in-memory database row. Useful for parsing binary file formats and interop with C structs, but for general tabular data, pandas or polars is almost always a better choice.
import numpy as np
dt = np.dtype([("name", "U10"), ("age", "i4"), ("weight", "f4")])
people = np.array(
[("Alice", 30, 65.5), ("Bob", 25, 78.2), ("Carol", 35, 60.1)],
dtype=dt,
)
print(people)
print(people["name"])
print(people["age"].mean())
Output:
[('Alice', 30, 65.5) ('Bob', 25, 78.2) ('Carol', 35, 60.1)]
['Alice' 'Bob' 'Carol']
30.0
Memory layout — C vs Fortran order#
Multi-dimensional arrays store elements as a flat memory buffer; C order (the default) is row-major (last axis varies fastest), F order is column-major (first axis varies fastest, like MATLAB/Fortran). Choosing the right order can speed up cache-friendly access patterns 2–10x; mismatches force a copy when handing arrays to libraries that demand one or the other (e.g. BLAS).
import numpy as np
a = np.arange(12).reshape(3, 4) # C-order (default)
print(a.flags["C_CONTIGUOUS"], a.flags["F_CONTIGUOUS"])
print(a.strides) # (32, 8): step 32 bytes per row, 8 per col
b = np.asfortranarray(a)
print(b.flags["C_CONTIGUOUS"], b.flags["F_CONTIGUOUS"])
print(b.strides) # (8, 24): step 8 bytes per row, 24 per col
# Transpose is free — it just re-labels strides, no copy
print(a.T.flags["F_CONTIGUOUS"])
Output:
True False
(32, 8)
False True
(8, 24)
True
Saving and loading#
NumPy’s native binary formats round-trip dtypes and shapes exactly, unlike CSV. Use .npy for a single array, .npz for multiple, and memmap when an array is too big for RAM but you only need to touch a slice at a time. For interop with other languages, prefer HDF5 (via h5py) or Parquet.
import numpy as np
a = np.arange(12).reshape(3, 4)
b = np.linspace(0, 1, 5)
# Single array
np.save("a.npy", a)
loaded = np.load("a.npy")
# Multiple arrays in one file
np.savez("bundle.npz", first=a, second=b)
data = np.load("bundle.npz")
print(data["first"].shape, data["second"].shape)
# Compressed
np.savez_compressed("bundle.npz", first=a, second=b)
# Memory-mapped — read/write without loading into RAM
mm = np.memmap("big.dat", dtype="float32", mode="w+", shape=(10_000, 10_000))
mm[0, 0] = 3.14
mm.flush() # write to disk
del mm # close
mm2 = np.memmap("big.dat", dtype="float32", mode="r", shape=(10_000, 10_000))
print(mm2[0, 0])
Output:
(3, 4) (5,)
3.14
Interop with pandas, polars, and Arrow#
A pandas Series or DataFrame exposes its underlying buffer through .to_numpy(); numeric DataFrames give a 2-D array with no copy when dtypes are uniform. Polars uses Arrow internally so df.to_numpy() is zero-copy for fixed-width numeric columns and a quick memory-buffer copy otherwise. Going the other way is just pd.DataFrame(arr) or pl.from_numpy(arr).
import numpy as np
import pandas as pd
a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float64)
df = pd.DataFrame(a, columns=["x", "y", "z"])
back = df.to_numpy()
print(back.dtype, back.shape)
# polars
import polars as pl
pl_df = pl.from_numpy(a, schema=["x", "y", "z"])
print(pl_df.to_numpy().shape)
Output:
float64 (2, 3)
(2, 3)
GPU and JIT alternatives#
When CPU-bound NumPy code is the bottleneck, three drop-in replacements expose nearly identical APIs:
- CuPy — NumPy on NVIDIA GPUs.
import cupy as cpand usecp.array(...); most functions matchnp.*. Best for large arrays and operations that fit in GPU memory. - JAX —
jax.numpy as jnpprovides NumPy semantics plus automatic differentiation, JIT compilation via XLA, and seamless GPU/TPU execution. Arrays are immutable; in-place mutation becomesx.at[i].set(v). - Numba —
@numba.njitdecorates a Python function and compiles it to native code at first call. Best for tight numeric loops that don’t vectorise cleanly.
# Numba example
from numba import njit
import numpy as np
@njit
def sum_squares(x):
total = 0.0
for i in range(len(x)):
total += x[i] * x[i]
return total
x = np.arange(1_000_000, dtype=np.float64)
print(sum_squares(x))
Output:
3.3333328333335e+17
[!TIP] Before reaching for GPU/JIT, confirm you have actually vectorised the NumPy version. A clean NumPy expression often outperforms a hand-rolled Numba loop on small or medium arrays because of better cache behaviour.
Real-world recipes#
Image processing: invert and normalise#
Images are just H × W × 3 uint8 arrays. Convert to float, normalise to [0, 1], invert, save back.
import numpy as np
from PIL import Image
img = np.asarray(Image.open("photo.jpg")) # uint8, shape (H, W, 3)
arr = img.astype(np.float32) / 255.0 # normalise
inverted = 1.0 - arr
out = (inverted * 255).astype(np.uint8)
Image.fromarray(out).save("inverted.jpg")
Audio: detect zero-crossings#
A zero-crossing is where the signal sign flips; counting them gives a rough frequency estimate.
import numpy as np
signal = np.sin(np.linspace(0, 10 * np.pi, 5000))
sign = np.sign(signal)
crossings = np.where(np.diff(sign) != 0)[0]
print(f"zero-crossings: {len(crossings)}")
Output:
zero-crossings: 10
Sliding-window mean (cumulative-sum trick)#
For a flat numeric series, a sliding window mean via cumsum is O(n) and beats np.convolve for large windows.
import numpy as np
x = np.arange(1, 11, dtype=float)
w = 3
cs = np.cumsum(np.insert(x, 0, 0))
windowed = (cs[w:] - cs[:-w]) / w
print(windowed)
Output:
[2. 3. 4. 5. 6. 7. 8. 9.]
One-hot encode integer labels#
Useful before feeding labels into a neural net or sklearn.linear_model.
import numpy as np
labels = np.array([0, 2, 1, 2, 0])
n_classes = labels.max() + 1
one_hot = np.eye(n_classes, dtype=np.int8)[labels]
print(one_hot)
Output:
[[1 0 0]
[0 0 1]
[0 1 0]
[0 0 1]
[1 0 0]]
Pairwise Euclidean distance matrix#
Broadcasting computes a full pairwise distance matrix without an explicit loop.
import numpy as np
pts = np.array([[0, 0], [3, 4], [6, 8]], dtype=float)
diff = pts[:, np.newaxis, :] - pts[np.newaxis, :, :] # (N, N, 2)
dist = np.linalg.norm(diff, axis=-1)
print(dist)
Output:
[[ 0. 5. 10.]
[ 5. 0. 5.]
[10. 5. 0.]]
For larger point clouds use scipy.spatial.distance_matrix or scipy.spatial.KDTree — they avoid the full N×N intermediate.
See also#
- pandas and polars — DataFrame libraries that wrap NumPy/Arrow buffers.
- scipy — domain-specific algorithms (stats, optimize, signal, sparse) on top of NumPy.
- matplotlib — pairs naturally with NumPy for plotting arrays.
- scikit-learn — ML models consume NumPy arrays directly.