Pinax - Shipping a Production-Grade Obsidian Plugin with a Coding Agent
// How I built and shipped Pinax, an LLM-buildable dashboard framework for Obsidian, using Claude Code: the guardrails that made agentic development work, the automated review battles, and why machine-checkable success criteria beat vibe coding.
I spend my days building platforms with strong contracts: schemas, admission policies, CI gates, provenance. Last week I applied the same discipline to something completely different, a plugin for my note-taking app, and let a coding agent do most of the typing. Two days later Pinax passed Obsidian’s automated community review with a clean scorecard and went live in the plugin directory.
This post is not “look, AI wrote my code”. It is about the engineering system around the agent: the machine-checkable success criteria, the verification harness, the trust model, and the three rounds of automated review rejections that only strict guardrails made cheap to fix. The lessons transfer directly to how we ship operators, controllers, and internal tooling at work.
Who Should Read This?#
This post is for:
- SREs and Platform Engineers experimenting with coding agents and wondering how to keep quality high without reviewing every line by hand
- Tool builders who want a concrete example of “LLM-buildable” as a design constraint, not a marketing word
- Anyone shipping to a store or marketplace with an automated review gate (Obsidian, Grafana plugins, Backstage, k8s operator hubs) and losing rounds to it
What is Pinax?#
Pinax (πίναξ, an ancient board or panel, literally a dashboard) is a domain-agnostic dashboard framework for Obsidian. You describe the dashboard you want; a single profile.json renders it on top of the notes you already have:
{
"schemaVersion": 1,
"name": "Bookshelf",
"layout": "grid",
"panes": [
{ "type": "table", "title": "LIBRARY", "source": { "folder": "reading/books" } },
{ "type": "stat", "title": "FINISHED LAST 12 MONTHS", "agg": "count",
"source": { "folder": "reading/books",
"where": [{ "field": "finished", "after": "{{today-365d}}" }] } },
{ "type": "board", "title": "PIPELINE", "groupBy": "status",
"source": { "folder": "reading/books" } }
]
}
Eleven built-in widgets (tables, kanban boards, heatmaps with streaks, stat tiles with sparklines, quick-capture forms, embeds), live re-rendering when the underlying notes change, and undoable frontmatter write-back when you drag a kanban card. The same engine ships three bundled profiles with zero code difference: an SRE command center, a personal multi-tab dashboard, and a reading tracker. That is the point: it is a framework, not an app.
What Pinax is NOT#
- Not a query language - Use Dataview if you want inline queries inside notes; Pinax renders whole dashboards from config
- Not a code-execution surface - The released plugin never executes code from your vault; custom widgets come from tiny companion plugins through a public API
- Not opinionated about your vault - The core is grep-verifiably domain-neutral; a CI check fails if a vault-specific word leaks into
src/core/
“LLM-Buildable” as a Design Constraint#
The interesting design decision was making profiles a target for language models, not just humans. That sounds like a feature; it is actually a constraint that shaped the whole architecture:
| Constraint | Consequence in the design |
|---|---|
| A profile might be wrong | JSON Schema validation at load, error panel instead of partial render, unknown widget ids render placeholders, never crashes |
| A profile might be hostile | Per-profile trust gates (web embeds, command buttons, note writing), all OFF by default, never inherited on import |
| An LLM writes the profile | One schema file plus one authoring guide are the entire contract; if the docs are ambiguous, generation fails, so ambiguity is a bug |
| The agent writes the plugin | Every feature lands with a machine-checkable acceptance test before the implementation is considered done |
The last row is the one that matters for agentic development. The repo has a headless verification harness: a mock of the Obsidian API just complete enough to boot the real bundled main.js, seed a fake vault, click buttons, and assert on rendered DOM. It started at 59 checks and ended at 129:
npm test # 63 unit tests
npm run verify:criteria # 129 end-to-end checks against the real bundle
npm run check:generic # banned-word grep: core stays domain-neutral
npm run lint:obsidian # mirror of the store's review bot (more on this below)
When the acceptance criteria are executable, the agent can loop on its own: implement, run, read the failure, fix. My job shrinks to deciding what to build and reviewing diffs, which is exactly where a human should sit.
flowchart TB
subgraph Contract["The contract"]
S[profile.schema.json] --> G[AUTHORING.md]
end
subgraph Loop["Agent loop"]
I[Implement] --> V[verify-criteria: 129 checks]
V -->|fail| I
V -->|pass| R[Human reviews the diff]
end
subgraph Gate["External gates"]
CI[CI: tests, lint, schema validation] --> REV[Obsidian automated review]
end
Contract --> Loop --> Gate
The Automated Review Battles#
Obsidian’s community directory runs an automated review (an ESLint ruleset, eslint-plugin-obsidianmd) against your repository, and your plugin is not installable until it passes. We lost three rounds to it. Each loss taught something worth writing down.
Round 1: APIs newer than your declared minAppVersion#
The bot flagged Plugin.settings usage at nine call sites. We never called any such API. The actual cause: Obsidian 1.13 introduced its own Plugin.settings property, and our plugin’s own settings field now collided with it, so the linter resolved our property to their new API. The fix was renaming our field, plus honest version work: Workspace.revealLeaf returns a promise since 1.7.2 and App#saveLocalStorage exists since 1.8.7, so minAppVersion moved to 1.8.7 and the calls got awaited.
The transferable lesson: shadowing platform namespaces is a time bomb. The same applies to a CRD field named like a future Kubernetes core field or a Helm value colliding with a builtin.
Round 2: the new Function dead end#
Pinax originally supported profile-local widgets.js: drop a JavaScript file next to your profile, flip an explicit “Custom widget code” trust toggle, get custom widgets. Executing it required new Function, and the review bot flags that as a hard error. We tried the justification route first; the plugin’s own config even forbids inline eslint-disable for its rules, and the documentation is explicit that the plugin is not installable until the automated review passes. There is no documented appeal path.
So the feature came out. The loader lives on a feature/widgets-js branch, and extensibility moved to a sanctioned mechanism that already existed: any tiny companion plugin can call window.pinax.registerWidget(...). The repo ships a copy-paste template (a manifest.json and a 20-line main.js) written so an LLM can generate a working widget plugin from it.
The transferable lesson: when a gate has no appeal path, redesign instead of arguing. We kept the capability, changed the mechanism, and shipped the same week.
Round 3: warnings are a scorecard, not noise#
With zero errors the plugin listed, but the directory page showed a “Risks” review bar, driven by roughly a hundred no-unsafe-* warnings. They only reproduced in the bot’s environment: it lints without @types/node, so every nodeRequire<typeof import("fs")> collapsed to any. The fix was self-contained minimal typings for the handful of node APIs we touch:
export interface NodeFs {
existsSync(p: string): boolean;
readdirSync(p: string, opts: { withFileTypes: true }): NodeDirent[];
statSync(p: string): NodeStats;
readFileSync(p: string, enc: "utf8"): string;
}
const fs = nodeRequire<NodeFs>("fs");
The transferable lesson: reproduce the reviewer’s environment locally. We added npm run lint:obsidian, a config that runs the exact plugin the bot runs, with the same severity mapping. After that, every fix was verified before pushing instead of discovered one review round later. It is the same move as running kubeconform or conftest locally with the same policy bundle your admission controller enforces.
Guardrails That Did the Heavy Lifting#
Beyond the review gate, four practices made agent-driven development safe enough to move fast:
| Guardrail | Implementation | What it caught |
|---|---|---|
| Executable acceptance criteria | 129-check headless harness booting the real bundle | A flaky “latest file” test that only passed when two writes landed in the same millisecond; CI exposed it, deterministic mock mtimes fixed it |
| Invariant checks | check:generic banned-word grep over src/core/ | Domain terms leaking into the framework core during feature rounds |
| Supply chain hygiene | SHA-pinned actions, build provenance via actions/attest-build-provenance, Dependabot on the pins | Verifiable releases: gh attestation verify main.js --repo sphragis-oss/pinax |
| Repo rulesets | protect-main (no force push, no deletion, no bypass) plus require-pr (1 approval, admin bypass) | Force-push protection stays absolute while a solo maintainer keeps a direct-push workflow |
The ruleset split deserves a note because most people configure this wrong. If you attach a bypass actor to a single ruleset containing all your rules, your admins can bypass force-push protection too. Two rulesets, one without bypass actors for the invariants and one with admin bypass for the PR requirement, keep the guarantees you actually want:
{ "name": "protect-main",
"rules": [{ "type": "deletion" }, { "type": "non_fast_forward" }],
"bypass_actors": [] }
Pros and Cons of Building This Way#
Pros#
| Advantage | Description |
|---|---|
| Iteration speed | Ten feature rounds, three review-fix releases, and full docs in roughly two days of part-time direction |
| Quality floor is explicit | The harness, not reviewer stamina, defines “done”; regressions fail loudly |
| Review gates become cheap | With a local mirror of the bot, each rejection round cost minutes, not days |
| Docs stay honest | The same agent that changed behavior updated SECURITY.md, the changelog, and the templates in the same commit |
Cons#
| Limitation | Description |
|---|---|
| Guardrails are upfront cost | The mock harness and schema work took real effort before features felt fast |
| Agents inherit your blind spots | The flaky mtime test was generated with the same tie-dependent assumption a rushed human would make; only CI variance exposed it |
| Gates without appeal paths force redesigns | Budget for the possibility that a store policy deletes a feature, as it did here |
| Session drift is real | Long agent sessions accumulate stale context; externalized state (memory files, changelogs, verification scripts) matters more than prompt cleverness |
Quick Start#
# In Obsidian: Settings -> Community plugins -> search "Pinax"
# or from the directory page:
open https://community.obsidian.md/plugins/pinax
# Verify what you installed (build provenance):
gh attestation verify main.js --repo sphragis-oss/pinax
# Build a profile with an LLM: paste profile.schema.json + AUTHORING.md
# into your model of choice and describe the dashboard you want.
Community profile bundles live in pinax-profiles and can be imported from a raw URL directly in the plugin settings (behind the per-profile web gate, like everything else that touches the network).
Conclusion#
Pinax shipped because the boring parts were rigorous: a schema as the contract, a harness as the definition of done, a local mirror of the external review gate, and supply chain hygiene from the first commit. The agent provided speed; the guardrails provided trust. Neither works alone.
If you take one thing into your platform work, take the pattern: make the external gate reproducible locally, and make “done” executable. It is the difference between an agent that ships and an agent that generates plausible diffs.
- Machine-checkable acceptance criteria let agents loop without supervision
- Reproduce reviewer and admission environments locally before pushing
- Split protection rulesets so bypass never weakens your invariants
- When a gate has no appeal path, redesign the mechanism, keep the capability
If you found this useful, you might also enjoy my related posts:
- Building a Read-Only Kubernetes Agent with Google ADK (Go)
- Sphragis - The EU AI Act Compliance Gateway You Actually Control
- K8sGPT - The AI Solution to Streamline Kubernetes Operations?

