Architecture
Overview
Section titled “Overview”pg_ext_memcheck is a standard PostgreSQL extension (.so loaded via shared_preload_libraries) composed of eight C modules. It instruments the backend process from the inside, using official PostgreSQL hook APIs — no patching of PostgreSQL source is required.
┌─────────────────────────────────────────────────────┐│ Backend Process ││ ││ SQL Layer ──► memcheck_hooks.c (executor hooks) ││ │ ││ ┌────────────┼────────────┐ ││ ▼ ▼ ▼ ││ context_walker shmem_probe dsm_tracker ││ (Phase 1) (Phase 1) (Phase 1) ││ │ ││ ▼ ││ violation_log.c (shared ring buffer) ││ │ ││ ▼ ││ SQL: flush_violations() ──► violations table ││ ││ worker_harness.c (BGWorker crash harness) ││ gucs.c (GUC parameters) │└─────────────────────────────────────────────────────┘Modules
Section titled “Modules”memcheck_hooks.c
Section titled “memcheck_hooks.c”The entry point. Registers ExecutorStart_hook, ExecutorEnd_hook, and planner_hook. These bracket every query execution with:
- Planner hook (ALL mode only) → calls
context_walkerfor a pre-snapshot before plan generation, so planning-phase allocations are captured. - ExecutorStart (EXECUTOR mode) → calls
context_walkerfor a pre-snapshot if the planner hook did not already take one. - ExecutorEnd → calls
context_walkerfor a post-snapshot, runs the context-diff, passescontext_leakviolations toviolation_log, and runscheck_wrong_context_alloc()to detectwrong_ctx_allocviolations.
In EXECUTOR mode only the executor hooks are active. In ALL mode the planner hook fires first, giving broader coverage of planning-phase allocations.
Nested query support: Pre-snapshots are stored in a fixed-depth stack (snapshot_stack[16]) rather than a single pointer. Each ExecutorStart/planner_hook call pushes a new frame; each ExecutorEnd pops and analyzes it. This allows nested SQL (e.g. PL/pgSQL functions calling SQL) to be correctly tracked up to 16 levels deep without the outer query's snapshot being clobbered by the inner ExecutorEnd.
Targeting state is stored as module-level variables set by memcheck_begin() and cleared by memcheck_end():
| Variable | Purpose |
|---|---|
ext_context_pattern | SQL LIKE pattern; only contexts whose names match are analysed. Empty string = all. |
ext_allowed_contexts[] | Allowlist of context names that may grow without triggering a wrong_ctx_alloc violation. |
ext_n_allowed_contexts | Number of valid entries in the allowlist. |
ext_track_shmem | Whether shmem sentinel checks are active this session. When false, run_scenario('shmem_sentinel_probe', …) returns immediately without registering or checking any sentinels. |
ext_track_dsm | Whether DSM leak checks are active this session. When false, end() skips the dsm_tracker_check_leaks() call so no dsm_leak violations are emitted. |
analyze_and_log_diff() calls ctx_matches_target() and skips any diff entry whose context name does not match the pattern. analyze_bloat() applies the same guard so that ctx_bloat violations are also scoped to the target pattern. check_wrong_context_alloc() calls is_allowed_context_target() in both passes and skips allowlisted contexts.
context_walker.c
Section titled “context_walker.c”Recursively walks the MemoryContext tree rooted at TopMemoryContext. Produces a CtxTree node which hold the array of CtxSnapshot entries for all contexts in the tree, along with their parent-child relationships. Snapshots are taken both before and after the test window, and then compared to find leaks and bloats:
// One entry per context in the treetypedef struct CtxSnapshot { char name[NAMEDATALEN]; Size totalAllocated; Size totalFree; int depth; /* depth in tree */ uint32 parentHash; /* hash of parent name+depth for diff */} CtxSnapshot;
// Context Tree Structuretypedef struct CtxTree { CtxSnapshot *entries; int count; int capacity;} CtxTree;Diff algorithm: After a query, the post-snapshot is compared against the pre-snapshot. Contexts in post but not pre → context_leak. Contexts in both with monotonically increasing totalAllocated across iterations → context_leak.
violation_log.c
Section titled “violation_log.c”Manages a fixed-size ring buffer (2048) in shared memory, protected by a single LWLock. Consumers call flush_violations() to drain the buffer into the ext_memcheck.violation_log table; end() drains only the current session's entries.
typedef struct ViolationEntry { TimestampTz ts; int backend_pid; char check_type[32]; // e.g., "context_leak", "wrong_ctx_alloc", etc. char severity[16]; // "ERROR", "WARNING", "INFO" char detail[256]; // Detailed message about the violation char source_lib[64]; // basename of the .so that triggered the violation} ViolationEntry;Buffer capacity is hard set to 2048 at startup. When full, oldest entries are overwritten (oldest-first eviction). This is intentional: recent violations are more actionable.
gucs.c
Section titled “gucs.c”Registers all pg_ext_memcheck.* GUC parameters with DefineCustomBoolVariable / DefineCustomIntVariable.
shmem_probe.c
Section titled “shmem_probe.c”Maintains a ProbeRegistry in shared memory (up to 32 entries). probe_register(seg_name, alloc_size, data_end) looks up an existing ShmemInitStruct allocation by its exact size and writes a sentinel byte (0xDE) at base_ptr[data_end]. After a workload runs, probe_check_all() re-reads each sentinel and logs shmem_overrun violations where the byte has changed.
Two sentinel placement strategies are used:
- Own segments (allocated with
+1byte in_PG_init):data_end = sizeof(struct)— the sentinel occupies the reserved guard byte. - External segments (no extra allocation):
data_end = alloc_size— the sentinel uses the alignment slack guaranteed byCACHELINEALIGN(alloc_size).
Users can register their own extension's segments via the SQL function ext_memcheck.register_shmem_probe(seg_name, allocated_size).
dsm_tracker.c
Section titled “dsm_tracker.c”Maintains a shared-memory DsmTrackerState table of observed DSM handles (manual registration via ext_memcheck.track_dsm_handle(); live/detached status is probed at query time). At test window close, any handle still reachable is logged as a dsm_leak violation.
worker_harness.c
Section titled “worker_harness.c”A BackgroundWorker that runs crash-inducing scenarios (use_after_reset, oom_simulation) in an isolated process so that SIGSEGV or OOM cannot kill the calling session.
Communication uses a shared-memory WorkerSlot:
typedef struct WorkerSlot { LWLock lock; WorkerStatus status; /* IDLE | RUNNING | DONE | CRASHED */ char scenario[64]; /* e.g. "use_after_reset" */ char database[NAMEDATALEN]; int requestor_pid; int exit_code; /* non-zero → crash detected */} WorkerSlot;launch_crash_isolation_worker() fills the slot, registers a BackgroundWorker, and blocks on WaitForBackgroundWorkerShutdown(). After the worker exits, a non-zero exit_code signals a confirmed crash. The worker uses elog(FATAL) rather than triggering a true SIGSEGV so the postmaster does not initiate cluster-wide crash recovery.
Data flow for a typical check
Section titled “Data flow for a typical check”Client backend │ ├─ SET pg_ext_memcheck.memcheck_mode = 'executor'; │ ├─ SELECT ext_memcheck.begin('MyExtCtx%', │ '{"allowed_contexts": ["TopMemoryContext"]}'); │ └─ writes ext_context_pattern + allowlist into memcheck_hooks.c state │ ├─ SELECT ext_memcheck.run_scenario('growth_benchmark', 200); │ ├─ SELECT ext_memcheck.end(); │ ├─ context_walker: post-snapshot │ ├─ diff: pre vs. post │ │ └─ ctx_matches_target() filters to 'MyExtCtx%' only │ ├─ check_wrong_context_alloc() │ │ └─ is_allowed_context_target() skips 'TopMemoryContext' │ ├─ violation_log: qualifying entries written to ring buffer (LWLock) │ └─ targeting state cleared (pattern, allowlist reset to defaults) OR └─ SELECT * FROM ext_memcheck.flush_violations(); ├─ violation_log: drain ring buffer → INSERT into violation_log table └─ RETURN violations to clientPostgreSQL version compatibility
Section titled “PostgreSQL version compatibility”PostgreSQL 15+ is supported.