2026Live

In-Browser Semantic Search

Search my writing by meaning — the language model runs in your tab.

ReactGatsbyTransformers.jsWebAssemblyEmbeddings

Inspiration

Can a Static Site Do This?

Everyone assumes semantic search needs a backend, an embedding API, a vector database. By 2026 transformers.js and quantized small models have made that assumption false for small corpuses. The whole thing fits in a tab.

The frame I wanted to prove out was: can a plain Gatsby site on Netlify, with no functions and no keys, still offer the kind of search that usually requires an OpenAI embedding endpoint and a Pinecone instance. For a portfolio's worth of text, the answer turns out to be yes.

The Idea

Build-Time Embeddings, Runtime Lookup

A Node script runs at build time: walks blogs.js and projects.js, chunks each document, computes embeddings with Xenova/all-MiniLM-L6-v2, writes the result to static/search-index.json. The index is served like any other static asset, CDN-cached, around 120KB for this corpus.

At runtime, the search page fetches the index immediately and runs keyword search on it with zero dependencies. Semantic search kicks in after the user searches for the first time — transformers.js is dynamically imported, the 25MB quantized model downloads once and caches in IndexedDB forever. Every subsequent search is a local cosine-similarity scan.

Architecture

Keyword First, Semantic When Ready

Progressive enhancement by design. Keyword search works instantly on first load. The embedding model loads in the background on the first search, and once ready the page flips seamlessly into semantic mode.

Build Script

scripts/build-search-index.mjs reads blogs + projects, chunks each doc by section, embeds with transformers.js in Node, writes static/search-index.json.

Static Index

One JSON file per build, ~120KB gzipped. Each chunk: id, type, slug, title, section, text, 384-dim embedding vector.

Keyword Mode

Initial search mode — O(n) scan over the chunks comparing lowercased query terms to chunk text. Runs in under 5ms for this corpus.

Model Load

On first query, dynamic import of @xenova/transformers. Progress bar reports download progress. Model caches in IndexedDB after first load.

Semantic Mode

Query embedded locally, cosine similarity scored against every chunk embedding, top 8 returned. Same JSON index, different scoring function.

Tech Deep Dive

Under the Hood

Transformers.js + ONNX Runtime

Runs ONNX models in the browser via WebAssembly. all-MiniLM-L6-v2 quantized is ~25MB, 384-dim embeddings, sub-50ms query embedding on a modern laptop.

Build-Time Node Pipeline

Same transformers.js package runs in Node during the Gatsby build. Runs per deploy on Netlify's build machines. The user never pays for index generation.

Dynamic Import

@xenova/transformers is import()-ed inside the /search page component only. Gatsby code-splits by route, so the library adds zero bytes to the homepage bundle.

Cosine Similarity, Flat Scan

For 34 chunks of 384 floats, brute-force cosine is ~13KB of work per query. No vector DB, no ANN index — the corpus is small enough that any cleverness would be a worse tradeoff.

Challenges

What Made It Hard

Keeping the homepage fast. The fix was never importing transformers.js at module level — it's behind a dynamic import inside a useEffect on the /search route, so Gatsby's route-chunking automatically keeps it off every other page.
First-search UX. The model download is ~25MB, which is noticeable. Solution: keyword search runs instantly on the pre-built index while the model downloads in the background. Users get useful results immediately, and upgrade to semantic when the model finishes.
Index generation in CI. transformers.js in Node downloads model weights to a cache dir; on a cold Netlify build this adds 20–40 seconds. Considered caching the download but decided the complexity wasn't worth it for a small personal site.