BM1: Our First Local AI Model for Email Categorization

4 min read · Published February 27, 2026

This post describes work in progress. BM1 is not shipped yet.

BM1 is our first local AI model for email categorization. It is being designed as a fine-tuned model over Gemma 4, focused on one narrow job: helping Banger classify email threads locally into useful operational categories.

The goal is not to build a general assistant. The goal is to make email organization faster, more private, cheaper to run, and more predictable.

For Banger, categorization is not a decorative AI feature. It feeds the workflow. A category can affect triage, visibility, automation, and whether a thread should be archived or reviewed. That makes the model’s output part of the product’s state, not just text in a side panel.

Why local categorization matters

Email is sensitive.

Even when a cloud model is technically allowed, sending every thread to a remote inference endpoint is not always the right product shape. It can add latency, increase cost, complicate privacy expectations, and make core inbox behavior depend on external availability.

Local categorization gives us a different set of tradeoffs:

lower latency for common classification work
better privacy posture for sensitive inboxes
lower marginal inference cost
more predictable behavior for a constrained task
offline-friendly or degraded-mode potential
tighter integration with local sync state

BM1 is our attempt to make that local path good enough to matter.

Why categorization is the first target

Categorization is a good first local AI task because it is narrow and high leverage.

The model does not need to answer arbitrary questions. It needs to look at a thread and decide what kind of work it represents.

Examples:

customer support
billing
legal
partnership
hiring
product feedback
engineering issue
notification
newsletter
promotion
spam or phishing

That classification can then feed Banger’s workflow. A thread that clearly does not need human attention can stay out of triage. A customer issue can be surfaced quickly. A suspicious message can be separated from real work.

The task is bounded, but the product impact is large.

Fine-tuning over Gemma 4

BM1 is planned as a fine-tuned model over Gemma 4.

That base gives us a strong local-model foundation while keeping the project focused. We do not need to invent a foundation model. We need a model adapted to Banger’s email categorization language, label behavior, and workflow expectations.

The fine-tuning target is not just “predict a label.” The useful output needs to fit the product:

one best label when confident
a triage decision
an optional suggested action
conservative behavior when uncertain
consistent treatment of low-signal mail
safe handling of spam and phishing-like messages

The model should be opinionated enough to be useful and conservative enough not to damage trust.

The sync engine is the integration point

BM1 should not write directly into random UI state.

The right integration point is Banger’s sync engine. The local runtime already has normalized thread state, local bodies when available, pending action support, and deterministic projection rules. BM1 can consume the local thread context and produce structured categorization results.

Those results should become actions:

add a category label
remove stale category labels
mark whether the thread belongs in triage
optionally archive low-signal mail when allowed

That design keeps BM1 inside the same state model as everything else. A human label change, a server-side categorization result, and a local BM1 result can all move through the same action and projection system.

AI output becomes product state through the normal path.

Predictability beats cleverness

For email categorization, cleverness is overrated.

The model should not surprise the user with creative labels or complex reasoning. It should be boring in the best way:

classify the obvious things correctly
say nothing when uncertain
avoid inventing categories
avoid over-automation
keep low-signal mail out of the way
preserve human control

This is why a fine-tuned local categorizer is attractive. A smaller model, trained for a constrained job, can be easier to reason about than a general model prompted into acting like a categorizer.

Local does not mean isolated

BM1 being local does not mean it operates outside the product.

The model still needs:

user and workspace settings
the workspace label catalog
thread context
body availability state
workflow state
confidence thresholds
sync cursors
action enqueueing
observability

The local runtime is the natural home for that orchestration.

The UI should not be responsible for deciding when to run BM1. The backend should not need to see every thread to categorize it. The runtime can sit between them: close to the data, close to the sync engine, and close to the user’s device.

What we are not claiming yet

Because BM1 is still pending, there are things we should not claim:

no shipping status
no benchmark numbers
no accuracy promises
no hardware support matrix
no final model size
no production performance claims

Those details should come after implementation and testing.

The architecture direction is clear, but the model still has to earn its place in the product.

Why this is worth building

The inbox gets better when categorization is fast, private, and built into the workflow.

If BM1 works the way we want, it becomes a local layer of intelligence inside Banger’s runtime. It can classify threads near the data, write structured actions into the sync engine, and help teams keep their inbox organized without sending every categorization decision through a remote model.

That is the direction we want for AI in email: focused models, local execution where it matters, and outputs that become auditable product state.

BM1 is the first step in that direction.

Written by

Tiago