skip to content

numpy — Numerical Arrays

Create and manipulate N-dimensional arrays with NumPy. Covers array creation, broadcasting, vectorized math, indexing, and matrix operations.

12 min read 43 snippets deep dive

numpy — Numerical Arrays#

What it is#

NumPy provides the ndarray — a fixed-type, C-backed multi-dimensional array. It is the foundation of the entire Python scientific stack (pandas, scipy, matplotlib all build on it). Vectorized operations on ndarray are typically 100–1000× faster than equivalent pure-Python loops.

Install#

pip install numpy

Output: (none — exits 0 on success)

Quick example#

import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a * 2)
print(f"mean={a.mean():.1f}, std={a.std():.4f}")
print(np.dot(a, a))

Output:

[ 2  4  6  8 10]
mean=3.0, std=1.4142
55

When / why to use it#

  • You need fast numeric computation without pandas overhead.
  • Working with images, audio, or other multi-dimensional data (images are just 3-D arrays).
  • Linear algebra (matrix multiply, decompositions).
  • Feeding data into machine learning libraries (all expect numpy arrays or array-like objects).

Common pitfalls#

[!WARNING] dtype defaults can surprise younp.array([1, 2, 3]) creates int64, but np.zeros((3, 3)) creates float64. Mixing dtypes in operations silently upcasts. Be explicit: np.array([1, 2], dtype=np.float32).

[!WARNING] Views vs copies — slicing a numpy array returns a view, not a copy. Modifying the slice modifies the original. Use .copy() when you need independence: b = a[1:3].copy().

[!TIP] Check shapes before operations: a.shape, a.dtype, a.ndim. Mismatched shapes are the #1 cause of ValueError: operands could not be broadcast.

Array creation#

NumPy provides a family of constructors for the common cases: np.array() converts Python lists; np.zeros/np.ones/np.full fill with a constant; np.arange mirrors Python’s range but returns an array; np.linspace gives evenly spaced floats inclusive of the endpoint. Always specify dtype explicitly when the default (float64 or int64) isn’t what you need.

import numpy as np

np.array([1, 2, 3])               # from list
np.zeros((3, 4))                  # 3×4 zeros (float64)
np.ones((2, 2), dtype=int)        # 2×2 ones (int)
np.eye(3)                         # 3×3 identity
np.arange(0, 10, 2)               # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5)              # [0, 0.25, 0.5, 0.75, 1.0]
np.random.default_rng(42).random((3, 3))  # random 3×3 (seeded)

Indexing and slicing#

Basic indexing (a[1, 2], a[:, 1]) returns a view — modifying it modifies the original. Boolean masks (a[a > 5]) and fancy indexing (integer arrays) return copies. For multi-dimensional arrays, each comma-separated expression selects along one axis.

a = np.arange(12).reshape(3, 4)
print(a)
print(a[1, 2])       # row 1, col 2
print(a[:, 1])       # all rows, col 1
print(a[a > 5])      # boolean mask → 1-D array

Output:

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
6
[1 5 9]
[ 6  7  8  9 10 11]

Richer example — matrix operations and broadcasting#

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix product:")
print(A @ B)

print("\nElement-wise product:")
print(A * B)

# Broadcasting: add a row vector to every row of a matrix
offsets = np.array([10, 20])
print("\nBroadcast add:")
print(A + offsets)

# Trig over evenly spaced angles
x = np.linspace(0, 2 * np.pi, 5)
print("\nsin values:")
print(np.round(np.sin(x), 4))

Output:

Matrix product:
[[19 22]
 [43 50]]

Element-wise product:
[[ 5 12]
 [21 32]]

Broadcast add:
[[11 22]
 [13 24]]

sin values:
[ 0.  1.  0. -1. -0.]

Useful functions quick reference#

FunctionPurpose
np.concatenate([a, b], axis=0)Join arrays along an axis
np.stack([a, b], axis=0)Stack along a new axis
np.split(a, 3)Split into N equal parts
np.sort(a)Sorted copy
np.argsort(a)Indices that would sort the array
np.unique(a)Unique values
np.where(a > 0, a, 0)Conditional element selection
np.clip(a, 0, 1)Clamp values to [0, 1]
np.linalg.norm(a)Euclidean norm
np.linalg.eig(A)Eigenvalues and eigenvectors
np.fft.fft(signal)Fast Fourier transform

dtypes in depth#

A NumPy dtype records the binary layout of each element: kind (int, float, complex, bool, bytes, datetime), width in bytes, and byte order. Mismatched dtypes silently upcast (an int8 + float64 returns float64) and can blow up memory by 8x on big arrays, so set the dtype explicitly whenever the default doesn’t match the data.

import numpy as np

print(np.array([1, 2, 3]).dtype)               # platform default int (usually int64)
print(np.array([1.0, 2.0]).dtype)              # float64
print(np.array([True, False]).dtype)           # bool, 1 byte each
print(np.array(["abc", "de"]).dtype)           # <U3 (Unicode, width 3)
print(np.zeros(3, dtype=np.float32).itemsize)  # 4 bytes per element
print(np.zeros(3, dtype=np.float64).itemsize)  # 8 bytes per element

Output:

int64
float64
bool
<U3
4
8

Cast with .astype() — it always returns a copy:

a = np.array([1.7, 2.3, 3.9])
print(a.astype(np.int32))     # truncates toward zero
print(np.round(a).astype(int))  # round-half-to-even, then cast

Output:

[1 2 3]
[2 2 4]

[!TIP] For 100M+ element arrays consider float32 over float64 and int8/int16 over int64 — half/quarter the RAM and faster on SIMD hardware. The downside is reduced range and precision.

Broadcasting rules#

Broadcasting is how NumPy applies an operation to arrays of different shapes without materialising an explicit copy. The rules: starting from the right, two dimensions are compatible if they are equal or one of them is 1. A length-1 axis is virtually “stretched” along the other operand’s matching axis. If any non-1 pair disagrees, you get ValueError: operands could not be broadcast together.

import numpy as np

a = np.arange(12).reshape(3, 4)   # (3, 4)
row = np.array([10, 20, 30, 40])  # (4,)  — stretched along axis 0
col = np.array([[100], [200], [300]])  # (3, 1) — stretched along axis 1

print(a + row)
print()
print(a + col)

Output:

[[10 21 32 43]
 [14 25 36 47]
 [18 29 40 51]]

[[100 101 102 103]
 [204 205 206 207]
 [308 309 310 311]]

Use np.newaxis (or None) to introduce a length-1 axis when you need to broadcast against a different axis:

a = np.array([1, 2, 3])
b = np.array([10, 20])
# outer-product style (3,) * (2,) -> (3, 2)
print(a[:, np.newaxis] * b)

Output:

[[10 20]
 [20 40]
 [30 60]]

Fancy indexing and boolean masks#

Fancy indexing uses integer arrays to pick arbitrary elements; boolean indexing uses a same-shape bool array as a mask. Both return copies — unlike basic slicing, you cannot use them to assign through a view in chained form. Combined indexing (a[idx, :]) can be cryptic, so name your index arrays.

import numpy as np

a = np.arange(10) * 10

# Fancy indexing — pick specific positions
idx = np.array([0, 3, 7, 9])
print(a[idx])

# Boolean mask
mask = (a > 20) & (a < 80)
print(a[mask])

# 2-D fancy indexing — pair rows with cols
m = np.arange(16).reshape(4, 4)
print(m[[0, 2, 3], [1, 1, 3]])  # picks (0,1), (2,1), (3,3)

Output:

[ 0 30 70 90]
[30 40 50 60 70]
[ 1  9 15]

[!WARNING] a[mask] = 0 modifies a in place even though a[mask] returns a copy, because Python translates this into a.__setitem__(mask, 0). The same does NOT work for a[mask][:] = 0 — the intermediate copy is what gets mutated.

Vectorisation vs Python loops#

A Python-level for loop on an ndarray is 50–500x slower than the equivalent vectorised expression because each iteration crosses the C/Python boundary. Almost any element-wise operation has a ufunc form (+, np.sin, np.where); use it. When you absolutely cannot vectorise, drop into Numba, Cython, or a list comprehension over scalars — never for i in range(len(a)): a[i] = ....

import numpy as np
import time

x = np.arange(1_000_000, dtype=np.float64)

# Slow: Python loop
t0 = time.perf_counter()
out = np.empty_like(x)
for i in range(len(x)):
    out[i] = x[i] ** 2 + 1
slow = time.perf_counter() - t0

# Fast: vectorised
t0 = time.perf_counter()
out = x ** 2 + 1
fast = time.perf_counter() - t0

print(f"loop: {slow*1000:.1f} ms")
print(f"vec:  {fast*1000:.1f} ms")
print(f"speedup: {slow/fast:.0f}x")

Output (typical):

loop: 240.0 ms
vec:  3.2 ms
speedup: 75x

Linear algebra (np.linalg)#

np.linalg covers the common matrix routines: solve (Ax=b), inv, det, eig, svd, qr, cholesky, and matrix_rank. Always prefer solve(A, b) over inv(A) @ b — it is faster and numerically more stable. For larger problems or specialised decompositions, switch to scipy.linalg, which links to system LAPACK/BLAS.

import numpy as np

A = np.array([[3.0, 1.0], [1.0, 2.0]])
b = np.array([9.0, 8.0])

x = np.linalg.solve(A, b)
print("x =", x)
print("det =", np.linalg.det(A))
print("rank =", np.linalg.matrix_rank(A))

# Singular Value Decomposition
U, S, Vt = np.linalg.svd(A)
print("singular values =", S.round(4))

Output:

x = [2. 3.]
det = 5.0
rank = 2
singular values = [3.6180 1.382 ]

Random numbers — modern Generator API#

The legacy np.random.rand / np.random.seed functions use a global, non-thread-safe state and are deprecated for new code. Use np.random.default_rng(seed) instead — it returns a Generator with explicit state, better statistical quality (PCG64), and faster sampling. Always seed in tests and pipelines so results are reproducible.

import numpy as np

rng = np.random.default_rng(42)

print(rng.random(3))                            # uniform [0, 1)
print(rng.integers(0, 10, size=5))              # uniform ints [0, 10)
print(rng.normal(loc=0.0, scale=1.0, size=4))   # N(0, 1)
print(rng.choice(["A", "B", "C"], size=5, p=[0.5, 0.3, 0.2]))
print(rng.permutation(np.arange(5)))            # shuffled copy

Output:

[0.77395605 0.43887844 0.85859792]
[1 7 6 4 4]
[ 0.42175859 -1.06560158  0.54470015  0.0717954 ]
['A' 'A' 'B' 'A' 'C']
[2 4 0 3 1]

Structured arrays and record arrays#

A structured dtype lets one ndarray hold heterogeneous fields per element — like a fixed-schema, in-memory database row. Useful for parsing binary file formats and interop with C structs, but for general tabular data, pandas or polars is almost always a better choice.

import numpy as np

dt = np.dtype([("name", "U10"), ("age", "i4"), ("weight", "f4")])
people = np.array(
    [("Alice", 30, 65.5), ("Bob", 25, 78.2), ("Carol", 35, 60.1)],
    dtype=dt,
)
print(people)
print(people["name"])
print(people["age"].mean())

Output:

[('Alice', 30, 65.5) ('Bob', 25, 78.2) ('Carol', 35, 60.1)]
['Alice' 'Bob' 'Carol']
30.0

Memory layout — C vs Fortran order#

Multi-dimensional arrays store elements as a flat memory buffer; C order (the default) is row-major (last axis varies fastest), F order is column-major (first axis varies fastest, like MATLAB/Fortran). Choosing the right order can speed up cache-friendly access patterns 2–10x; mismatches force a copy when handing arrays to libraries that demand one or the other (e.g. BLAS).

import numpy as np

a = np.arange(12).reshape(3, 4)             # C-order (default)
print(a.flags["C_CONTIGUOUS"], a.flags["F_CONTIGUOUS"])
print(a.strides)                            # (32, 8): step 32 bytes per row, 8 per col

b = np.asfortranarray(a)
print(b.flags["C_CONTIGUOUS"], b.flags["F_CONTIGUOUS"])
print(b.strides)                            # (8, 24): step 8 bytes per row, 24 per col

# Transpose is free — it just re-labels strides, no copy
print(a.T.flags["F_CONTIGUOUS"])

Output:

True False
(32, 8)
False True
(8, 24)
True

Saving and loading#

NumPy’s native binary formats round-trip dtypes and shapes exactly, unlike CSV. Use .npy for a single array, .npz for multiple, and memmap when an array is too big for RAM but you only need to touch a slice at a time. For interop with other languages, prefer HDF5 (via h5py) or Parquet.

import numpy as np

a = np.arange(12).reshape(3, 4)
b = np.linspace(0, 1, 5)

# Single array
np.save("a.npy", a)
loaded = np.load("a.npy")

# Multiple arrays in one file
np.savez("bundle.npz", first=a, second=b)
data = np.load("bundle.npz")
print(data["first"].shape, data["second"].shape)

# Compressed
np.savez_compressed("bundle.npz", first=a, second=b)

# Memory-mapped — read/write without loading into RAM
mm = np.memmap("big.dat", dtype="float32", mode="w+", shape=(10_000, 10_000))
mm[0, 0] = 3.14
mm.flush()                       # write to disk
del mm                           # close
mm2 = np.memmap("big.dat", dtype="float32", mode="r", shape=(10_000, 10_000))
print(mm2[0, 0])

Output:

(3, 4) (5,)
3.14

Interop with pandas, polars, and Arrow#

A pandas Series or DataFrame exposes its underlying buffer through .to_numpy(); numeric DataFrames give a 2-D array with no copy when dtypes are uniform. Polars uses Arrow internally so df.to_numpy() is zero-copy for fixed-width numeric columns and a quick memory-buffer copy otherwise. Going the other way is just pd.DataFrame(arr) or pl.from_numpy(arr).

import numpy as np
import pandas as pd

a = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float64)

df = pd.DataFrame(a, columns=["x", "y", "z"])
back = df.to_numpy()
print(back.dtype, back.shape)

# polars
import polars as pl
pl_df = pl.from_numpy(a, schema=["x", "y", "z"])
print(pl_df.to_numpy().shape)

Output:

float64 (2, 3)
(2, 3)

GPU and JIT alternatives#

When CPU-bound NumPy code is the bottleneck, three drop-in replacements expose nearly identical APIs:

  • CuPy — NumPy on NVIDIA GPUs. import cupy as cp and use cp.array(...); most functions match np.*. Best for large arrays and operations that fit in GPU memory.
  • JAXjax.numpy as jnp provides NumPy semantics plus automatic differentiation, JIT compilation via XLA, and seamless GPU/TPU execution. Arrays are immutable; in-place mutation becomes x.at[i].set(v).
  • Numba@numba.njit decorates a Python function and compiles it to native code at first call. Best for tight numeric loops that don’t vectorise cleanly.
# Numba example
from numba import njit
import numpy as np

@njit
def sum_squares(x):
    total = 0.0
    for i in range(len(x)):
        total += x[i] * x[i]
    return total

x = np.arange(1_000_000, dtype=np.float64)
print(sum_squares(x))

Output:

3.3333328333335e+17

[!TIP] Before reaching for GPU/JIT, confirm you have actually vectorised the NumPy version. A clean NumPy expression often outperforms a hand-rolled Numba loop on small or medium arrays because of better cache behaviour.

Real-world recipes#

Image processing: invert and normalise#

Images are just H × W × 3 uint8 arrays. Convert to float, normalise to [0, 1], invert, save back.

import numpy as np
from PIL import Image

img = np.asarray(Image.open("photo.jpg"))           # uint8, shape (H, W, 3)
arr = img.astype(np.float32) / 255.0                 # normalise
inverted = 1.0 - arr
out = (inverted * 255).astype(np.uint8)
Image.fromarray(out).save("inverted.jpg")

Audio: detect zero-crossings#

A zero-crossing is where the signal sign flips; counting them gives a rough frequency estimate.

import numpy as np

signal = np.sin(np.linspace(0, 10 * np.pi, 5000))
sign = np.sign(signal)
crossings = np.where(np.diff(sign) != 0)[0]
print(f"zero-crossings: {len(crossings)}")

Output:

zero-crossings: 10

Sliding-window mean (cumulative-sum trick)#

For a flat numeric series, a sliding window mean via cumsum is O(n) and beats np.convolve for large windows.

import numpy as np

x = np.arange(1, 11, dtype=float)
w = 3
cs = np.cumsum(np.insert(x, 0, 0))
windowed = (cs[w:] - cs[:-w]) / w
print(windowed)

Output:

[2. 3. 4. 5. 6. 7. 8. 9.]

One-hot encode integer labels#

Useful before feeding labels into a neural net or sklearn.linear_model.

import numpy as np

labels = np.array([0, 2, 1, 2, 0])
n_classes = labels.max() + 1
one_hot = np.eye(n_classes, dtype=np.int8)[labels]
print(one_hot)

Output:

[[1 0 0]
 [0 0 1]
 [0 1 0]
 [0 0 1]
 [1 0 0]]

Pairwise Euclidean distance matrix#

Broadcasting computes a full pairwise distance matrix without an explicit loop.

import numpy as np

pts = np.array([[0, 0], [3, 4], [6, 8]], dtype=float)
diff = pts[:, np.newaxis, :] - pts[np.newaxis, :, :]   # (N, N, 2)
dist = np.linalg.norm(diff, axis=-1)
print(dist)

Output:

[[ 0.  5. 10.]
 [ 5.  0.  5.]
 [10.  5.  0.]]

For larger point clouds use scipy.spatial.distance_matrix or scipy.spatial.KDTree — they avoid the full N×N intermediate.

See also#

  • pandas and polars — DataFrame libraries that wrap NumPy/Arrow buffers.
  • scipy — domain-specific algorithms (stats, optimize, signal, sparse) on top of NumPy.
  • matplotlib — pairs naturally with NumPy for plotting arrays.
  • scikit-learn — ML models consume NumPy arrays directly.