Why I Build Claude Code Skills with Bun and SQLite

When I started writing Claude Code skills, I reached for the obvious pattern: fetch data from an API, return JSON, let the agent figure it out. Grep through files. Pipe to jq. Parse the output.

It worked, until it didn't. The agent would miss nested fields. Regex patterns would break on edge cases. jq queries would silently return empty arrays when the structure shifted slightly. The skill ran fine. The agent just couldn't do anything useful with the output.

The fix was simple: store data in SQLite and let the agent query it directly.

The Pattern

Instead of returning JSON for the agent to parse, store data in SQLite. Let the agent query it.

#!/usr/bin/env -S npx -y bun
// This shebang works anywhere npm exists, but adds ~500ms latency.
// For production skills, install Bun and use: #!/usr/bin/env bun

import { Database } from "bun:sqlite"

const db = new Database("papers.db")

// Schema documents the structure for the agent
db.run(`
  CREATE TABLE IF NOT EXISTS papers (
    id TEXT PRIMARY KEY,
    title TEXT,
    authors TEXT,
    abstract TEXT,
    year INTEGER,
    citations INTEGER
  )
`)

// Clear stale data from previous runs (or use :memory: for ephemeral)
db.run(`DELETE FROM papers`)

// Fetch from API and store
const response = await fetch("https://api.dimensions.ai/...")
const data = await response.json()

for (const paper of data.publications) {
  // Joining authors as comma-separated text is lazy normalization,
  // but LLMs handle LIKE '%Smith%' better than json_extract() syntax.
  db.run(
    `INSERT OR REPLACE INTO papers VALUES (?, ?, ?, ?, ?, ?)`,
    [paper.id, paper.title, paper.authors.join(", "), paper.abstract, paper.year, paper.citations]
  )
}

// Agent queries with SQL, not jq
const highCited = db.query(`
  SELECT title, citations
  FROM papers
  WHERE year >= 2024 AND citations > 100
  ORDER BY citations DESC
`).all()

console.log(JSON.stringify(highCited, null, 2))

Why does this work better?

SQL queries are verifiable. The agent can inspect a query before running it. Failed SQL throws a clear error with line numbers and syntax hints. A failed jq query returns null or an empty array by default. (Yes, jq -e exists, but the error is still just "exit status 1"—not "no such field.")

The schema documents the data structure. The agent doesn't need to parse example JSON to understand what fields exist. It can query sqlite_master or just look at the CREATE TABLE statement.

Queries are composable. Need papers by a specific author with more than 50 citations from the last two years? That's a WHERE clause, not a nested jq filter piped through grep.

Why Bun Specifically

Bun has SQLite built in. No npm install. No native bindings to compile. No node-gyp errors. Just import { Database } from "bun:sqlite" and it works.

The same goes for file I/O. Bun.file() and Bun.write() handle reading and writing without importing fs. For skills that fetch data, cache it, and output results, Bun covers the entire flow with zero dependencies.

My Dimensions skill has authentication, caching, rate limiting, six output formats, and a full test suite. The only external dependency is Zod for schema validation. Everything else is Bun built-ins.

import { Database } from "bun:sqlite"
const db = new Database(":memory:")
db.query("SELECT 1 + 1 as result").get() // { result: 2 }

This matters for skills because skills run in unpredictable environments. The agent might invoke your skill from a project directory, a monorepo root, or a temp folder. Native bindings that compile against a specific Node version break in these contexts. bun:sqlite doesn't have this problem because there's nothing to compile.

Bun's SQLite is also fast. Bun's benchmarks claim 3-6x faster than better-sqlite3 for read queries. For skills that cache API responses and query them repeatedly, this adds up.

I should be honest about the alternatives. Python's sqlite3 is also built in, and if you're already writing skills in Python, that's a fine choice. Simon Willison's llm CLI uses Python with SQLite, and it's excellent. Deno added SQLite support via node:sqlite, though it requires the Node.js compatibility layer.

Bun's advantage is strongest when you want JavaScript or TypeScript, zero configuration, and a single distributable binary. bun build --compile produces a standalone executable that includes the runtime and SQLite. No dependencies to install on the target machine.

When This Pattern Makes Sense

Not every skill needs a database. The pattern pays off when:

You're querying the same data more than once. If the skill fetches data, uses it, and exits, SQLite is overhead. If the agent might ask follow-up questions against the same dataset, SQLite turns a re-fetch into a fast local query.

The data is too large for context. A research query might return 100,000 papers. You can't inject that into the conversation. You need to give the agent a reference to something it can query. SQLite is that reference. The agent asks for "top 10 by citation count" or "papers containing 'transformer' in the abstract" and gets back 10 rows, not 100,000.

You need joins or aggregations. How many papers per author? Average citations by year? These are trivial in SQL, painful in jq.

The agent needs to explore the data. SQL's declarative nature means the agent can try different queries without you anticipating every possible question. "Show me the schema" followed by "count rows grouped by year" followed by "find outliers" is a natural exploration pattern.

When JSON and jq Still Win

SQLite is overkill for:

Quick one-off extractions from a small API response
Streaming data where you can't wait for the full payload
ETL preprocessing before you know the schema
Anything small enough to fit comfortably in context

If the data fits in a single message and you're only looking at it once, just return JSON. The agent can parse it inline.

Practical Considerations

Where does the database live? I typically put it in a skill-specific directory: ~/.local/share/my-skill/cache.db. This keeps the database out of the user's working directory and makes it easy to clear.

When does it get cleared? I add a --clear-cache flag. Some skills clear on each run; others persist indefinitely. It depends on whether the data is a cache (expendable) or a log (valuable).

What about schema changes? For simple skills, I drop and recreate on schema mismatch. For anything more complex, I version the schema in a metadata table and run migrations.

What if the agent writes bad SQL? It happens. LLM accuracy on SQL varies widely by benchmark—anywhere from 50% to 80% depending on query complexity. The upside is that bad SQL fails loudly:

# jq with a typo: silent failure
$ cat papers.json | jq '.papers[] | .authr'
null

# SQL with a typo: loud failure
$ sqlite3 papers.db "SELECT authr FROM papers"
Error: no such column: authr

The agent sees the error and can fix the typo. Silent failures are debugging nightmares.

Large result sets? Use LIMIT. Or query for IDs first, then fetch details in a follow-up. SQLite's query planner is good at this.

The Ecosystem Is Converging Here

This pattern is gaining traction.

Simon Willison's llm CLI logs every prompt and response to SQLite. You can query your entire conversation history with SQL, or explore it with Datasette.

Claude-Flow uses SQLite for agent memory at .swarm/memory.db, enabling cross-session state and coordination between agents.

MCP has SQLite server implementations that let agents inspect schemas and run queries against databases.

The Library of Congress recommends SQLite as a format for long-term data preservation. It's one of four recommended formats for datasets.

When Anthropic acquired Bun, this pattern clicked into place. The company building Claude Code now owns a runtime with SQLite built in. That's not a coincidence.

What I'm Building With This

I've been writing skills that wrap research APIs. Dimensions for academic papers. Semantic Scholar for citations.

Returning JSON works for small results. But when you're exploring 50,000 papers across a research domain, it breaks. Every question requires a new API call. The agent can't slice the data differently without starting over.

With SQLite, the skill fetches once and the agent explores indefinitely. The pattern is always the same: authenticate, fetch, store in SQLite, let the agent query.

The agent doesn't need to know the API's pagination quirks or rate limits or authentication flow. It just needs to know the schema and how to write SQL.

> Find papers about "AI alignment" with more than 200 citations published after 2022

SELECT title, authors, citations, year
FROM papers
WHERE abstract LIKE '%AI alignment%'
  AND citations > 200
  AND year > 2022
ORDER BY citations DESC

The skill handles the API complexity. SQLite handles the storage. The agent handles the query. Each layer does what it's good at.

The future of agent tooling isn't making agents better at parsing unstructured data. It's giving them data structures that don't require parsing at all.