Pinax - Shipping a Production-Grade Obsidian Plugin with a Coding Agent :: SREKubeCraft

I spend my days building platforms with strong contracts: schemas, admission policies, CI gates, provenance. Last week I applied the same discipline to something completely different, a plugin for my note-taking app, and let a coding agent do most of the typing. Two days later Pinax passed Obsidian’s automated community review with a clean scorecard and went live in the plugin directory.

This post is not “look, AI wrote my code”. It is about the engineering system around the agent: the machine-checkable success criteria, the verification harness, the trust model, and the three rounds of automated review rejections that only strict guardrails made cheap to fix. The lessons transfer directly to how we ship operators, controllers, and internal tooling at work.

Who Should Read This?#

This post is for:

SREs and Platform Engineers experimenting with coding agents and wondering how to keep quality high without reviewing every line by hand
Tool builders who want a concrete example of “LLM-buildable” as a design constraint, not a marketing word
Anyone shipping to a store or marketplace with an automated review gate (Obsidian, Grafana plugins, Backstage, k8s operator hubs) and losing rounds to it

What is Pinax?#

Pinax (πίναξ, an ancient board or panel, literally a dashboard) is a domain-agnostic dashboard framework for Obsidian. You describe the dashboard you want; a single profile.json renders it on top of the notes you already have:

{
  "schemaVersion": 1,
  "name": "Bookshelf",
  "layout": "grid",
  "panes": [
    { "type": "table", "title": "LIBRARY", "source": { "folder": "reading/books" } },
    { "type": "stat", "title": "FINISHED LAST 12 MONTHS", "agg": "count",
      "source": { "folder": "reading/books",
        "where": [{ "field": "finished", "after": "{{today-365d}}" }] } },
    { "type": "board", "title": "PIPELINE", "groupBy": "status",
      "source": { "folder": "reading/books" } }
  ]
}

Eleven built-in widgets (tables, kanban boards, heatmaps with streaks, stat tiles with sparklines, quick-capture forms, embeds), live re-rendering when the underlying notes change, and undoable frontmatter write-back when you drag a kanban card. The same engine ships three bundled profiles with zero code difference: an SRE command center, a personal multi-tab dashboard, and a reading tracker. That is the point: it is a framework, not an app.

What Pinax is NOT#

Not a query language - Use Dataview if you want inline queries inside notes; Pinax renders whole dashboards from config
Not a code-execution surface - The released plugin never executes code from your vault; custom widgets come from tiny companion plugins through a public API
Not opinionated about your vault - The core is grep-verifiably domain-neutral; a CI check fails if a vault-specific word leaks into src/core/

“LLM-Buildable” as a Design Constraint#

The interesting design decision was making profiles a target for language models, not just humans. That sounds like a feature; it is actually a constraint that shaped the whole architecture:

Constraint	Consequence in the design
A profile might be wrong	JSON Schema validation at load, error panel instead of partial render, unknown widget ids render placeholders, never crashes
A profile might be hostile	Per-profile trust gates (web embeds, command buttons, note writing), all OFF by default, never inherited on import
An LLM writes the profile	One schema file plus one authoring guide are the entire contract; if the docs are ambiguous, generation fails, so ambiguity is a bug
The agent writes the plugin	Every feature lands with a machine-checkable acceptance test before the implementation is considered done

The last row is the one that matters for agentic development. The repo has a headless verification harness: a mock of the Obsidian API just complete enough to boot the real bundled main.js, seed a fake vault, click buttons, and assert on rendered DOM. It started at 59 checks and ended at 129:

npm test               # 63 unit tests
npm run verify:criteria  # 129 end-to-end checks against the real bundle
npm run check:generic    # banned-word grep: core stays domain-neutral
npm run lint:obsidian    # mirror of the store's review bot (more on this below)

When the acceptance criteria are executable, the agent can loop on its own: implement, run, read the failure, fix. My job shrinks to deciding what to build and reviewing diffs, which is exactly where a human should sit.

flowchart TB
    subgraph Contract["The contract"]
        S[profile.schema.json] --> G[AUTHORING.md]
    end
    subgraph Loop["Agent loop"]
        I[Implement] --> V[verify-criteria: 129 checks]
        V -->|fail| I
        V -->|pass| R[Human reviews the diff]
    end
    subgraph Gate["External gates"]
        CI[CI: tests, lint, schema validation] --> REV[Obsidian automated review]
    end
    Contract --> Loop --> Gate

The Automated Review Battles#

Obsidian’s community directory runs an automated review (an ESLint ruleset, eslint-plugin-obsidianmd) against your repository, and your plugin is not installable until it passes. We lost three rounds to it. Each loss taught something worth writing down.

Round 1: APIs newer than your declared minAppVersion#

The bot flagged Plugin.settings usage at nine call sites. We never called any such API. The actual cause: Obsidian 1.13 introduced its own Plugin.settings property, and our plugin’s own settings field now collided with it, so the linter resolved our property to their new API. The fix was renaming our field, plus honest version work: Workspace.revealLeaf returns a promise since 1.7.2 and App#saveLocalStorage exists since 1.8.7, so minAppVersion moved to 1.8.7 and the calls got awaited.

The transferable lesson: shadowing platform namespaces is a time bomb. The same applies to a CRD field named like a future Kubernetes core field or a Helm value colliding with a builtin.

Round 2: the `new Function` dead end#

Pinax originally supported profile-local widgets.js: drop a JavaScript file next to your profile, flip an explicit “Custom widget code” trust toggle, get custom widgets. Executing it required new Function, and the review bot flags that as a hard error. We tried the justification route first; the plugin’s own config even forbids inline eslint-disable for its rules, and the documentation is explicit that the plugin is not installable until the automated review passes. There is no documented appeal path.

So the feature came out. The loader lives on a feature/widgets-js branch, and extensibility moved to a sanctioned mechanism that already existed: any tiny companion plugin can call window.pinax.registerWidget(...). The repo ships a copy-paste template (a manifest.json and a 20-line main.js) written so an LLM can generate a working widget plugin from it.

The transferable lesson: when a gate has no appeal path, redesign instead of arguing. We kept the capability, changed the mechanism, and shipped the same week.

Round 3: warnings are a scorecard, not noise#

With zero errors the plugin listed, but the directory page showed a “Risks” review bar, driven by roughly a hundred no-unsafe-* warnings. They only reproduced in the bot’s environment: it lints without @types/node, so every nodeRequire<typeof import("fs")> collapsed to any. The fix was self-contained minimal typings for the handful of node APIs we touch:

export interface NodeFs {
  existsSync(p: string): boolean;
  readdirSync(p: string, opts: { withFileTypes: true }): NodeDirent[];
  statSync(p: string): NodeStats;
  readFileSync(p: string, enc: "utf8"): string;
}

const fs = nodeRequire<NodeFs>("fs");

The transferable lesson: reproduce the reviewer’s environment locally. We added npm run lint:obsidian, a config that runs the exact plugin the bot runs, with the same severity mapping. After that, every fix was verified before pushing instead of discovered one review round later. It is the same move as running kubeconform or conftest locally with the same policy bundle your admission controller enforces.

Guardrails That Did the Heavy Lifting#

Beyond the review gate, four practices made agent-driven development safe enough to move fast:

Guardrail	Implementation	What it caught
Executable acceptance criteria	129-check headless harness booting the real bundle	A flaky “latest file” test that only passed when two writes landed in the same millisecond; CI exposed it, deterministic mock mtimes fixed it
Invariant checks	`check:generic` banned-word grep over `src/core/`	Domain terms leaking into the framework core during feature rounds
Supply chain hygiene	SHA-pinned actions, build provenance via `actions/attest-build-provenance`, Dependabot on the pins	Verifiable releases: `gh attestation verify main.js --repo sphragis-oss/pinax`
Repo rulesets	`protect-main` (no force push, no deletion, no bypass) plus `require-pr` (1 approval, admin bypass)	Force-push protection stays absolute while a solo maintainer keeps a direct-push workflow

The ruleset split deserves a note because most people configure this wrong. If you attach a bypass actor to a single ruleset containing all your rules, your admins can bypass force-push protection too. Two rulesets, one without bypass actors for the invariants and one with admin bypass for the PR requirement, keep the guarantees you actually want:

{ "name": "protect-main",
  "rules": [{ "type": "deletion" }, { "type": "non_fast_forward" }],
  "bypass_actors": [] }

Pros and Cons of Building This Way#

Pros#

Advantage	Description
Iteration speed	Ten feature rounds, three review-fix releases, and full docs in roughly two days of part-time direction
Quality floor is explicit	The harness, not reviewer stamina, defines “done”; regressions fail loudly
Review gates become cheap	With a local mirror of the bot, each rejection round cost minutes, not days
Docs stay honest	The same agent that changed behavior updated SECURITY.md, the changelog, and the templates in the same commit

Cons#

Limitation	Description
Guardrails are upfront cost	The mock harness and schema work took real effort before features felt fast
Agents inherit your blind spots	The flaky mtime test was generated with the same tie-dependent assumption a rushed human would make; only CI variance exposed it
Gates without appeal paths force redesigns	Budget for the possibility that a store policy deletes a feature, as it did here
Session drift is real	Long agent sessions accumulate stale context; externalized state (memory files, changelogs, verification scripts) matters more than prompt cleverness

Quick Start#

# In Obsidian: Settings -> Community plugins -> search "Pinax"
# or from the directory page:
open https://community.obsidian.md/plugins/pinax

# Verify what you installed (build provenance):
gh attestation verify main.js --repo sphragis-oss/pinax

# Build a profile with an LLM: paste profile.schema.json + AUTHORING.md
# into your model of choice and describe the dashboard you want.

Community profile bundles live in pinax-profiles and can be imported from a raw URL directly in the plugin settings (behind the per-profile web gate, like everything else that touches the network).

Conclusion#

Pinax shipped because the boring parts were rigorous: a schema as the contract, a harness as the definition of done, a local mirror of the external review gate, and supply chain hygiene from the first commit. The agent provided speed; the guardrails provided trust. Neither works alone.

If you take one thing into your platform work, take the pattern: make the external gate reproducible locally, and make “done” executable. It is the difference between an agent that ships and an agent that generates plausible diffs.

Machine-checkable acceptance criteria let agents loop without supervision
Reproduce reviewer and admission environments locally before pushing
Split protection rulesets so bypass never weakens your invariants
When a gate has no appeal path, redesign the mechanism, keep the capability

If you found this useful, you might also enjoy my related posts:

Pinax - config-driven dashboards for Obsidian

Pinax - Shipping a Production-Grade Obsidian Plugin with a Coding Agent