BM1: Our First Local AI Model for Email Categorization
This post describes work in progress. BM1 is not shipped yet.
BM1 is our first local AI model for email categorization. It is being designed as a fine-tuned model over Gemma 4, focused on one narrow job: helping Banger classify email threads locally into useful operational categories.
The goal is not to build a general assistant. The goal is to make email organization faster, more private, cheaper to run, and more predictable.
For Banger, categorization is not a decorative AI feature. It feeds the workflow. A category can affect triage, visibility, automation, and whether a thread should be archived or reviewed. That makes the model’s output part of the product’s state, not just text in a side panel.
Why local categorization matters
Email is sensitive.
Even when a cloud model is technically allowed, sending every thread to a remote inference endpoint is not always the right product shape. It can add latency, increase cost, complicate privacy expectations, and make core inbox behavior depend on external availability.
Local categorization gives us a different set of tradeoffs:
- lower latency for common classification work
- better privacy posture for sensitive inboxes
- lower marginal inference cost
- more predictable behavior for a constrained task
- offline-friendly or degraded-mode potential
- tighter integration with local sync state
BM1 is our attempt to make that local path good enough to matter.
Why categorization is the first target
Categorization is a good first local AI task because it is narrow and high leverage.
The model does not need to answer arbitrary questions. It needs to look at a thread and decide what kind of work it represents.
Examples:
- customer support
- billing
- legal
- partnership
- hiring
- product feedback
- engineering issue
- notification
- newsletter
- promotion
- spam or phishing
That classification can then feed Banger’s workflow. A thread that clearly does not need human attention can stay out of triage. A customer issue can be surfaced quickly. A suspicious message can be separated from real work.
The task is bounded, but the product impact is large.
Fine-tuning over Gemma 4
BM1 is planned as a fine-tuned model over Gemma 4.
That base gives us a strong local-model foundation while keeping the project focused. We do not need to invent a foundation model. We need a model adapted to Banger’s email categorization language, label behavior, and workflow expectations.
The fine-tuning target is not just “predict a label.” The useful output needs to fit the product:
- one best label when confident
- a triage decision
- an optional suggested action
- conservative behavior when uncertain
- consistent treatment of low-signal mail
- safe handling of spam and phishing-like messages
The model should be opinionated enough to be useful and conservative enough not to damage trust.
The sync engine is the integration point
BM1 should not write directly into random UI state.
The right integration point is Banger’s sync engine. The local runtime already has normalized thread state, local bodies when available, pending action support, and deterministic projection rules. BM1 can consume the local thread context and produce structured categorization results.
Those results should become actions:
- add a category label
- remove stale category labels
- mark whether the thread belongs in triage
- optionally archive low-signal mail when allowed
That design keeps BM1 inside the same state model as everything else. A human label change, a server-side categorization result, and a local BM1 result can all move through the same action and projection system.
AI output becomes product state through the normal path.
Predictability beats cleverness
For email categorization, cleverness is overrated.
The model should not surprise the user with creative labels or complex reasoning. It should be boring in the best way:
- classify the obvious things correctly
- say nothing when uncertain
- avoid inventing categories
- avoid over-automation
- keep low-signal mail out of the way
- preserve human control
This is why a fine-tuned local categorizer is attractive. A smaller model, trained for a constrained job, can be easier to reason about than a general model prompted into acting like a categorizer.
Local does not mean isolated
BM1 being local does not mean it operates outside the product.
The model still needs:
- user and workspace settings
- the workspace label catalog
- thread context
- body availability state
- workflow state
- confidence thresholds
- sync cursors
- action enqueueing
- observability
The local runtime is the natural home for that orchestration.
The UI should not be responsible for deciding when to run BM1. The backend should not need to see every thread to categorize it. The runtime can sit between them: close to the data, close to the sync engine, and close to the user’s device.
What we are not claiming yet
Because BM1 is still pending, there are things we should not claim:
- no shipping status
- no benchmark numbers
- no accuracy promises
- no hardware support matrix
- no final model size
- no production performance claims
Those details should come after implementation and testing.
The architecture direction is clear, but the model still has to earn its place in the product.
Why this is worth building
The inbox gets better when categorization is fast, private, and built into the workflow.
If BM1 works the way we want, it becomes a local layer of intelligence inside Banger’s runtime. It can classify threads near the data, write structured actions into the sync engine, and help teams keep their inbox organized without sending every categorization decision through a remote model.
That is the direction we want for AI in email: focused models, local execution where it matters, and outputs that become auditable product state.
BM1 is the first step in that direction.