jeanrojas.com

Footer

jeanrojas.com

Boosting remote teamwork and improving systems architecture focusing on team communication patterns.



jrojastechnology@gmail.com
+1 (929) 2245443

Links

  • About
  • Experience
  • Blog
  • Contact

Social

  • Github
  • Codepen
  • Linkedin
  • Twitter
  • Behance
  • Quora
  • AdpList

Subscribe to my newsletter

The latest news, articles, and resources, sent to your inbox weekly.

© Jeanrojas.com All rights reserved.

← All articles

May 4, 2026 · 4 min read

Running ONNX models in the browser without losing your weekend

A working recipe for shipping image segmentation in a tab — Web Workers, WASM, pre-encoded embeddings, and the small things that decide whether the demo is fast or felt-fast.

On this page

Most of the AI demos I’ve seen on the web in 2026 punt: they POST an image to a server, render a spinner, then paint the mask. That works, but it costs you per call, leaks user data, and feels like 2018. The alternative — running the model directly in the user’s tab — is shockingly close to boring now. Here’s the recipe I’ve been using.

The shape of the runtime

Three rules that have served me well:

  1. Run inference in a Web Worker. Otherwise a 60ms decoder pass freezes your animation loop. Workers are cheap, and onnxruntime-web ships an ESM-friendly worker entry.
  2. Pre-encode whatever you can at build time. For interactive segmentation (think Meta’s SAM family), the heavy work is the encoder. Run it once on a server (or your laptop), serialise the resulting tensors to disk, and ship them next to the image. The browser only ever runs the lightweight decoder.
  3. Use WASM unless you really need WebGPU. WASM is universally supported, debugs cleanly, and is fast enough for decoder-only inference. WebGPU is a nice optimisation when the model is bigger.

Loading the runtime in a worker

public/workers/sam2-decoder-worker.js
import * as ort from "https://cdn.jsdelivr.net/npm/onnxruntime-web@1.21.0/dist/ort.wasm.bundle.min.mjs";
 
ort.env.wasm.wasmPaths =
  "https://cdn.jsdelivr.net/npm/onnxruntime-web@1.21.0/dist/";
ort.env.wasm.numThreads = navigator.hardwareConcurrency || 4;
 
let session = null;
 
self.onmessage = async (e) => {
  const { type, data } = e.data;
  if (type === "load") {
    const buf = await fetch(data.modelUrl).then((r) => r.arrayBuffer());
    session = await ort.InferenceSession.create(buf, {
      executionProviders: ["wasm"],
    });
    self.postMessage({ type: "ready" });
  }
};

A few things worth noting:

  • The CDN import keeps your main bundle tiny — none of onnxruntime-web reaches your app code until the worker boots.
  • numThreads will quietly fall back to 1 unless your page is cross-origin-isolated. Don’t fight it; single-threaded is fine for a decoder.

Talking to the worker from React

Keep the worker outside React state and lazy-load it via IntersectionObserver. The model file is ~3–20MB; you don’t want to fetch it before the section is on screen.

components/sam2Tile.tsx
"use client";
import { useEffect, useRef, useState } from "react";
 
export default function Sam2Tile() {
  const [active, setActive] = useState(false);
  const wrap = useRef<HTMLDivElement>(null);
 
  useEffect(() => {
    if (!wrap.current || active) return;
    const io = new IntersectionObserver(([e]) => {
      if (e.isIntersecting) {
        setActive(true);
        io.disconnect();
      }
    }, { rootMargin: "200px" });
    io.observe(wrap.current);
    return () => io.disconnect();
  }, [active]);
 
  useEffect(() => {
    if (!active) return;
    const w = new Worker("/workers/sam2-decoder-worker.js", { type: "module" });
    w.postMessage({ type: "load", data: { modelUrl: "/models/decoder.onnx" } });
    return () => w.terminate();
  }, [active]);
 
  return <div ref={wrap}>{active ? "loading…" : "scroll into view"}</div>;
}

Pre-encoding embeddings

For SAM, the trick that makes click-to-segment feel snappy is the encoder / decoder split. The encoder produces three tensors — image_embed, high_res_feats_0, high_res_feats_1 — that the decoder consumes alongside your click points. Encode once, serialise as MessagePack, ship next to the image:

import { encode } from "@msgpack/msgpack";
import fs from "node:fs/promises";
 
// after running the encoder server-side once:
const buf = encode({
  tensors: {
    image_embed: { data: imageEmbed, shape: [1, 256, 64, 64] },
    high_res_feats_0: { data: f0, shape: [1, 32, 256, 256] },
    high_res_feats_1: { data: f1, shape: [1, 64, 128, 128] },
  },
  original_size: [imageHeight, imageWidth],
});
 
await fs.writeFile("public/demo/portrait/embeddings.bin", buf);

That embeddings.bin is now a static asset. Decoding it in the worker is two lines:

const buf = await fetch(url).then((r) => r.arrayBuffer());
const decoded = decode(new Uint8Array(buf)); // from @msgpack/msgpack

The small things that decide whether the demo feels fast

  • Lazy-load via IntersectionObserver. If the section never enters the viewport, the user never paid for it.
  • Show a status string. "Loading SAM2 decoder… ", "Loading embeddings…", "Click anywhere on the image". People wait calmly when they know what they’re waiting for.
  • Render the image to canvas first. Then composite the mask on top via globalAlpha. Avoids the “mask appears, image flashes” flicker.
  • Terminate the worker on unmount. Otherwise it keeps a 50MB heap alive in the background while the user reads the next section.

Where to go next

  • Swap WASM for WebGPU when you have the model and the user’s device for it. Rendering inside <canvas> while inference runs on webgpu is the cleanest way to keep frames smooth.
  • Use transformers.js for pure encoder-side work — it’s the simplest API I’ve found for image classification, embeddings and small text models.
  • Cache ort.InferenceSession instances when you have multiple tiles. The WASM load is the expensive part.

The segmentation tile on my home page follows this exact recipe. Click the portrait — that’s ONNX, in your tab, no server.

Comments

Tags in this post

  • #onnx
  • #wasm
  • #ai
  • #web-worker

Keep reading

  • Making AI feel realtime with hybrid segmentation

    Segmentation is the substrate for nearly every AI photo workflow worth shipping in 2026 — inpainting, object swaps, controlled generation. Here is how to make it feel instant on the web by splitting SAM2 across a notebook on the user's hardware and a decoder in their browser.

    23 min · May 5, 2026

  • Building a 3D ring configurator in Expo

    A React-Native-first take on the classic R3F ring configurator: GLB loading on device, four metal materials, gesture-driven rotation, Zustand state, and ARKit / ARCore preview — all behind one Expo build.

    11 min · May 5, 2026

  • Welcome to the new blog

    A short tour of the new MDX-powered writing setup, complete with syntax-highlighted code blocks rendered by Shiki at build time.

    1 min · May 4, 2026

All tags

  • #ai
  • #ar
  • #expo
  • #huggingface
  • #image-generation
  • #mdx
  • #meta
  • #next.js
  • #onnx
  • #r3f
  • #react-native
  • #replicate
  • #sam2
  • #segmentation
  • #shaders
  • #three.js
  • #vercel
  • #wasm
  • #web-worker
  • #webgl
  • #webgpu
← Back to all articles

Tags in this post

  • #onnx
  • #wasm
  • #ai
  • #web-worker

Keep reading

  • Making AI feel realtime with hybrid segmentation

    Segmentation is the substrate for nearly every AI photo workflow worth shipping in 2026 — inpainting, object swaps, controlled generation. Here is how to make it feel instant on the web by splitting SAM2 across a notebook on the user's hardware and a decoder in their browser.

    23 min · May 5, 2026

  • Building a 3D ring configurator in Expo

    A React-Native-first take on the classic R3F ring configurator: GLB loading on device, four metal materials, gesture-driven rotation, Zustand state, and ARKit / ARCore preview — all behind one Expo build.

    11 min · May 5, 2026

  • Welcome to the new blog

    A short tour of the new MDX-powered writing setup, complete with syntax-highlighted code blocks rendered by Shiki at build time.

    1 min · May 4, 2026

All tags

  • #ai
  • #ar
  • #expo
  • #huggingface
  • #image-generation
  • #mdx
  • #meta
  • #next.js
  • #onnx
  • #r3f
  • #react-native
  • #replicate
  • #sam2
  • #segmentation
  • #shaders
  • #three.js
  • #vercel
  • #wasm
  • #web-worker
  • #webgl
  • #webgpu