Technology — SEEV | Local AI Architecture

System Architecture

SEEV is built on a strict local-only constraint. Every layer in the stack executes entirely on your device.

Layered System Architecture

Presentation Layer

User-facing interface components rendered in WKWebView

Chat Interface Sidebar Navigation Composer Bar Control Center Voice Input UI Theme Engine

Application Layer

Core business logic, content processing, and data management

Markdown Renderer Memory System RAG Pipeline Chat Manager Web Search Engine Context Assembler File Processor

AI Runtime Layer

On-device model inference through the bundled local runtime

Local Runtime LFM 2.5 1.2B Flash LFM2-VL-3B WhisperKit Tokenizer

Storage Layer

All persistence is local — nothing leaves your device

IndexedDB localStorage Cache Storage Local File System

Interface

Logic

AI Inference

Persistence

AI Models and Runtime

SEEV uses a bundled fast local model, an optional deeper reasoning model, and Smart Hybrid routing across them, with external providers remaining optional.

LFM 2.5 — 1.2B Flash

Fastest bundled local responses

1.2B

Parameters

Flash

Default preset

Local

Runtime

Instant

Availability

LFM2-VL-3B — Deep Reasoning

Optional local download for deeper reasoning and stronger attachment understanding

3B

Parameters

Uploads

Multimodal input

Routed

Use mode

Optional

Install mode

Smart Hybrid — Routed

Uses LFM 1.2B for speed and LFM2-VL-3B for reasoning or uploads

1.2B

Fast path

3B

Reasoning path

Auto

Routing

1.2B now

Default availability

Model Inference Pipeline

The complete journey of a prompt through SEEV's on-device AI engine, structured by the current local model lineup, saved memory, uploaded document text, and conversation history — from input to streamed output.

1

User Input

Prompt entered via text or voice

2

Context Assembly

System prompt, saved memory, and local document context merged

3

Tokenization

Text converted to token IDs by model tokenizer

4

Local Model Runtime

Forward pass through the selected local model runtime

5

Autoregressive Decode

Token-by-token generation with sampling

6

Markdown Render

Response parsed and rendered with syntax highlighting

7

Local Persist

Conversation saved to IndexedDB

Pre/Post Processing

Local AI inference

Optional network only when invoked

Runtime

Why a Bundled Runtime

SEEV packages the interface and local inference stack together so the app can run private on-device workflows without depending on a remote model service. The result is a local-first setup with faster startup for the bundled 1.2B model and optional expansion when you install the 3B model.

Local execution — core inference stays on your machine
App-bundled stack — UI and local backend ship together
No account setup — bundled local use works without cloud onboarding
Optional expansion — install the 3B reasoning model only when you want it

Technology Stack

Every layer of SEEV is built with proven, high-performance technologies.

SwiftUI Shell

Native macOS application shell providing lightweight window management, menu bar integration, and system-level controls.

WKWebView Engine

High-performance web rendering engine for the app interface, with modern CSS and JavaScript APIs inside the macOS shell.

Local Model Paths

SEEV ships with a bundled fast model, supports an optional local 3B reasoning install, and routes between them with Smart Hybrid behavior.

WhisperKit (Local)

WhisperKit runs on-device for speech-to-text transcription. Audio is processed locally and never transmitted.

Local Data Store

Workspace state, conversations, memories, and settings stay on-device in local storage and IndexedDB, matching the app's private-first design.

Background Tasks

Parsing, search, and response rendering are handled without blocking the interface, keeping the workspace responsive while the app works locally.

Marked + Highlight.js

Rich markdown rendering with syntax highlighting and copyable code blocks.

Generation Presets

Curated inference configurations for different use cases, adjustable from the Control Center.

Precision

Low temperature, high accuracy. Ideal for factual queries, code generation, and technical tasks.

temp: 0.3 · top-p: 0.85

Creative

Higher temperature for brainstorming, writing, and exploratory conversation.

temp: 0.9 · top-p: 0.95

Balanced

A middle ground for general conversations balancing accuracy and fluency.

temp: 0.6 · top-p: 0.90

Quick

Faster, shorter responses. Suitable for rapid lookups and brief Q&A.

temp: 0.4 · max: 512 tokens

Performance Specifications

Engineered for efficient local inference on modern Mac hardware.

~1.2 GB

Local cache footprint

1.2B

Fast bundled model

3B

Deep reasoning model

WASM

Local runtime

Advanced Technology.
On Your Terms.

Experience the next generation of local AI. Download SEEV and put the power of private intelligence in your hands.

The TechnologyBehind SEEV

System Architecture

Layered System Architecture

AI Models and Runtime

LFM 2.5 — 1.2B Flash

LFM2-VL-3B — Deep Reasoning

Smart Hybrid — Routed

Model Inference Pipeline

Why a Bundled Runtime

Technology Stack

SwiftUI Shell

WKWebView Engine

Local Model Paths

WhisperKit (Local)

Local Data Store

Background Tasks

Marked + Highlight.js

Generation Presets

Precision

Creative

Balanced

Quick

Performance Specifications

Advanced Technology.On Your Terms.

The Technology
Behind SEEV

Advanced Technology.
On Your Terms.