Architecture

Overview

pg_ext_memcheck is a standard PostgreSQL extension (.so loaded via shared_preload_libraries) composed of eight C modules. It instruments the backend process from the inside, using official PostgreSQL hook APIs — no patching of PostgreSQL source is required.

┌─────────────────────────────────────────────────────┐
│                   Backend Process                    │
│                                                      │
│  SQL Layer  ──►  memcheck_hooks.c  (executor hooks) │
│                       │                              │
│          ┌────────────┼────────────┐                 │
│          ▼            ▼            ▼                 │
│  context_walker   shmem_probe   dsm_tracker          │
│      (Phase 1)    (Phase 1)     (Phase 1)            │
│          │                                           │
│          ▼                                           │
│   violation_log.c  (shared ring buffer)              │
│          │                                           │
│          ▼                                           │
│   SQL: flush_violations() ──► violations table       │
│                                                      │
│  worker_harness.c  (BGWorker crash harness)           │
│  gucs.c            (GUC parameters)                  │
└─────────────────────────────────────────────────────┘

Modules

memcheck_hooks.c

The entry point. Registers ExecutorStart_hook, ExecutorEnd_hook, and planner_hook. These bracket every query execution with:

Planner hook (ALL mode only) → calls context_walker for a pre-snapshot before plan generation, so planning-phase allocations are captured.
ExecutorStart (EXECUTOR mode) → calls context_walker for a pre-snapshot if the planner hook did not already take one.
ExecutorEnd → calls context_walker for a post-snapshot, runs the context-diff, passes context_leak violations to violation_log, and runs check_wrong_context_alloc() to detect wrong_ctx_alloc violations.

In EXECUTOR mode only the executor hooks are active. In ALL mode the planner hook fires first, giving broader coverage of planning-phase allocations.

Nested query support: Pre-snapshots are stored in a fixed-depth stack (snapshot_stack[16]) rather than a single pointer. Each ExecutorStart/planner_hook call pushes a new frame; each ExecutorEnd pops and analyzes it. This allows nested SQL (e.g. PL/pgSQL functions calling SQL) to be correctly tracked up to 16 levels deep without the outer query's snapshot being clobbered by the inner ExecutorEnd.

Targeting state is stored as module-level variables set by memcheck_begin() and cleared by memcheck_end():

Variable	Purpose
`ext_context_pattern`	SQL `LIKE` pattern; only contexts whose names match are analysed. Empty string = all.
`ext_allowed_contexts[]`	Allowlist of context names that may grow without triggering a `wrong_ctx_alloc` violation.
`ext_n_allowed_contexts`	Number of valid entries in the allowlist.
`ext_track_shmem`	Whether shmem sentinel checks are active this session. When `false`, `run_scenario('shmem_sentinel_probe', …)` returns immediately without registering or checking any sentinels.
`ext_track_dsm`	Whether DSM leak checks are active this session. When `false`, `end()` skips the `dsm_tracker_check_leaks()` call so no `dsm_leak` violations are emitted.

analyze_and_log_diff() calls ctx_matches_target() and skips any diff entry whose context name does not match the pattern. analyze_bloat() applies the same guard so that ctx_bloat violations are also scoped to the target pattern. check_wrong_context_alloc() calls is_allowed_context_target() in both passes and skips allowlisted contexts.

context_walker.c

Recursively walks the MemoryContext tree rooted at TopMemoryContext. Produces a CtxTree node which hold the array of CtxSnapshot entries for all contexts in the tree, along with their parent-child relationships. Snapshots are taken both before and after the test window, and then compared to find leaks and bloats:

// One entry per context in the tree
typedef struct CtxSnapshot {
    char   name[NAMEDATALEN];
    Size   totalAllocated;
    Size   totalFree;
    int    depth;           /* depth in tree */
    uint32 parentHash;      /* hash of parent name+depth for diff */
} CtxSnapshot;

// Context Tree Structure
typedef struct CtxTree {
    CtxSnapshot *entries;
    int          count;
    int          capacity;
} CtxTree;

Diff algorithm: After a query, the post-snapshot is compared against the pre-snapshot. Contexts in post but not pre → context_leak. Contexts in both with monotonically increasing totalAllocated across iterations → context_leak.

violation_log.c

Manages a fixed-size ring buffer (2048) in shared memory, protected by a single LWLock. Consumers call flush_violations() to drain the buffer into the ext_memcheck.violation_log table; end() drains only the current session's entries.

typedef struct ViolationEntry {
    TimestampTz   ts;
    int           backend_pid;
    char          check_type[32];      // e.g., "context_leak", "wrong_ctx_alloc", etc.
    char          severity[16];        // "ERROR", "WARNING", "INFO"
    char          detail[256];         // Detailed message about the violation
    char          source_lib[64];      // basename of the .so that triggered the violation
} ViolationEntry;

Buffer capacity is hard set to 2048 at startup. When full, oldest entries are overwritten (oldest-first eviction). This is intentional: recent violations are more actionable.

gucs.c

Registers all pg_ext_memcheck.* GUC parameters with DefineCustomBoolVariable / DefineCustomIntVariable.

shmem_probe.c

Maintains a ProbeRegistry in shared memory (up to 32 entries). probe_register(seg_name, alloc_size, data_end) looks up an existing ShmemInitStruct allocation by its exact size and writes a sentinel byte (0xDE) at base_ptr[data_end]. After a workload runs, probe_check_all() re-reads each sentinel and logs shmem_overrun violations where the byte has changed.

Two sentinel placement strategies are used:

Own segments (allocated with +1 byte in _PG_init): data_end = sizeof(struct) — the sentinel occupies the reserved guard byte.
External segments (no extra allocation): data_end = alloc_size — the sentinel uses the alignment slack guaranteed by CACHELINEALIGN(alloc_size).

Users can register their own extension's segments via the SQL function ext_memcheck.register_shmem_probe(seg_name, allocated_size).

dsm_tracker.c

Maintains a shared-memory DsmTrackerState table of observed DSM handles (manual registration via ext_memcheck.track_dsm_handle(); live/detached status is probed at query time). At test window close, any handle still reachable is logged as a dsm_leak violation.

worker_harness.c

A BackgroundWorker that runs crash-inducing scenarios (use_after_reset, oom_simulation) in an isolated process so that SIGSEGV or OOM cannot kill the calling session.

Communication uses a shared-memory WorkerSlot:

typedef struct WorkerSlot {
    LWLock       lock;
    WorkerStatus status;          /* IDLE | RUNNING | DONE | CRASHED */
    char         scenario[64];    /* e.g. "use_after_reset" */
    char         database[NAMEDATALEN];
    int          requestor_pid;
    int          exit_code;       /* non-zero → crash detected */
} WorkerSlot;

launch_crash_isolation_worker() fills the slot, registers a BackgroundWorker, and blocks on WaitForBackgroundWorkerShutdown(). After the worker exits, a non-zero exit_code signals a confirmed crash. The worker uses elog(FATAL) rather than triggering a true SIGSEGV so the postmaster does not initiate cluster-wide crash recovery.

Data flow for a typical check

Client backend
  │
  ├─ SET pg_ext_memcheck.memcheck_mode = 'executor';
  │
  ├─ SELECT ext_memcheck.begin('MyExtCtx%',
  │         '{"allowed_contexts": ["TopMemoryContext"]}');
  │       └─ writes ext_context_pattern + allowlist into memcheck_hooks.c state
  │
  ├─ SELECT ext_memcheck.run_scenario('growth_benchmark', 200);
  │
  ├─ SELECT ext_memcheck.end();
  │       ├─ context_walker: post-snapshot
  │       ├─ diff: pre vs. post
  │       │     └─ ctx_matches_target() filters to 'MyExtCtx%' only
  │       ├─ check_wrong_context_alloc()
  │       │     └─ is_allowed_context_target() skips 'TopMemoryContext'
  │       ├─ violation_log: qualifying entries written to ring buffer (LWLock)
  │       └─ targeting state cleared (pattern, allowlist reset to defaults)
  OR
  └─ SELECT * FROM ext_memcheck.flush_violations();
          ├─ violation_log: drain ring buffer → INSERT into violation_log table
          └─ RETURN violations to client

PostgreSQL version compatibility

PostgreSQL 15+ is supported.