AI-powered coding assistant that runs 100% on your machine. No cloud APIs. No subscriptions. No data exfiltration. Just a single binary, local models, and your VS Code.
Every line of code you send to the cloud is a line of code someone else can read. We built an alternative.
| Feature | LocalDev Studio | GitHub Copilot | Cursor | Sourcegraph Cody |
|---|---|---|---|---|
| Data Privacy | ● 100% local | ● Cloud | ● Cloud | ● Optional local |
| Works Offline | ● Full offline | ● No | ● No | ● Partial |
| Air-Gap Compatible | ● Yes | ● No | ● No | ● No |
| Cost | $0 forever | $19/mo | $20/mo | $9/mo |
| Model Choice | ● Any Ollama model | ● GPT-4o only | ● Limited set | ● Limited set |
| Chat Sidebar | ● Yes | ● Yes | ● Yes | ● Yes |
| Inline Completions | ● Yes (FIM) | ● Yes | ● Yes | ● Yes |
| Context Engine | ● Git + Deps + Errors | ● Open files only | ● Codebase indexing | ● Code graph |
| MCP Servers | ● Built-in (FS, Terminal) | ● No | ● No | ● No |
| Setup Time | ● 1 command | ● Extension + login | ● New IDE install | ● Extension + login |
| Vendor Lock-in | ● None | ● Microsoft | ● Cursor Inc | ● Sourcegraph |
| Downtime Risk | ● Zero — runs locally | ● Cloud outages | ● Cloud outages | ● Cloud outages |
| BCP Compatible | ● Survives vendor failure | ● Vendor-dependent | ● Vendor-dependent | ● Vendor-dependent |
| Fine-Tuning | ● QLoRA on your code | ● Not available | ● Not available | ● Not available |
| Scales to Cluster | ● Laptop to Mac Studio | ● Per-seat pricing | ● Per-seat pricing | ● Per-seat pricing |
localdev setup detects your OS, GPU, and VRAM. Automatically selects the right model (1.5B→32B), installs Ollama, creates config, adds to PATH, installs the VS Code extension, and runs an inference test. 8 steps, fully automated, works on Windows, macOS, and Linux.
BM25-ranked file search, git diff injection, dependency graph traversal, and compiler error forwarding. The model sees what matters — not your entire codebase. Context window budget is optimized per-task: code generation gets more code context, explanations get more documentation.
Model Context Protocol servers for filesystem access and terminal execution — built into the daemon. The AI can read files, list directories, and run commands through a standardized JSON-RPC 2.0 interface. Custom MCP servers plug in via config.
Different tasks route to different model configurations. Code generation uses higher temperature and more context tokens. Explanations use lower temperature for accuracy. You can assign different Ollama models to different task types — use a fast 3B for completions, a thorough 7B for reviews.
Chat sidebar with markdown rendering, syntax-highlighted code blocks (Copy + Insert buttons), elapsed timer for CPU inference, and auto-reconnect. Inline ghost-text completions with fill-in-the-middle inference. Code actions: Explain, Fix, Verify (cross-model review), Generate Tests. CodeLens integration shows actions above every function. Status bar shows connection state, model info, and generation speed.
After initial model download, zero network traffic required. The daemon, Ollama, and models all run on localhost. No telemetry, no usage tracking, no phone-home. Verified: run with firewall blocking all outbound and it works identically.
The "Verify" action sends your code to a second model configuration for independent review. Catches bugs, logic errors, and edge cases that the generating model missed. Like having a second pair of eyes — but both pairs are local.
RTX 3060 and above. 7B model runs fully on GPU.
M1/M2/M3/M4. Unified memory makes 7B models fly.
NVIDIA or AMD GPUs. Works with any Ollama-supported hardware.
Any modern x86_64 with 8GB+ RAM. Slower but identical quality.
Cloud AI is a single point of failure. Provider goes down? Your team stops. Provider gets acquired? Terms change. Provider gets sanctioned? Access revoked. LocalDev eliminates all three risks. Your AI toolchain survives vendor bankruptcy, geopolitical sanctions, pricing changes, and API deprecations. It's the only AI coding tool you can write into a BCP plan.
No API rate limits. No "service degraded" banners. No "usage cap reached, try again tomorrow." When the model runs on your hardware, 3 AM on a Sunday works exactly like 10 AM on a Tuesday. Your AI availability is bounded by your power supply, not someone else's infrastructure budget.
Same binary. Same config. Scale by adding hardware, not subscriptions.
1 laptop
CPU or GPU
7B model
Shared workstation
RTX 4090 / A4000
7B–32B models
M2/M4 Ultra nodes
192GB unified memory
70B+ models
On-prem GPU servers
Multi-model routing
Team-wide inference
Generic models give generic suggestions. Fine-tune on your own codebase to get completions that know your naming conventions, your architecture patterns, your internal APIs. LocalDev supports QLoRA-based fine-tuning workflows — adapt a 7B model to your codebase with as little as a single consumer GPU and a few hours of training time. The fine-tuned model stays on your infrastructure, trained on your proprietary code, producing suggestions that feel like they came from a senior developer who's read every PR.
We don't just consume open-source models — we improve them. Our fine-tuning techniques, evaluation benchmarks, and inference optimizations are contributed back to the community. Better base models mean better LocalDev for everyone. When Qwen, DeepSeek, or CodeLlama improve, you benefit automatically — just ollama pull the latest and your tooling gets smarter overnight.
New model released? Pull it and switch in 60 seconds. No waiting for vendor integration, no feature-flag rollouts, no "coming soon." The open-source model ecosystem moves fast — Ollama gives you access to every model the day it drops. Mistral, Qwen, DeepSeek, CodeLlama, StarCoder — swap freely, keep the one that works best for your stack.
Qwen 2.5 Coder 7B scores 80%+ on HumanEval. For autocomplete, explain, and code review — local models handle 90% of daily developer needs. The gap with cloud models shrinks every quarter.
EU AI Act. SEC cybersecurity rules. NIST AI RMF. Organizations in defense, finance, healthcare, and government increasingly cannot send code to third-party AI providers. LocalDev is compliant by architecture.
An RTX 3060 (12GB, ~$300 used) runs 7B models at 20 tokens/sec. Apple M-series unified memory handles 7B natively. The hardware barrier is gone — the software just wasn't ready. Until now.
Anthropic's Model Context Protocol creates a standard for tool integration. LocalDev implements MCP natively — your local models can access filesystem, terminal, and custom servers through the same protocol that powers cloud AI tools.
Air-gapped networks, ITAR compliance, classified codebases. LocalDev operates with zero network traffic after setup. NIST SP 800-171 compatible by design.
SOC 2 Type II, SEC Rule 10b-5, FINRA data protection. Trading systems, risk models, and proprietary algorithms never touch third-party servers.
HIPAA, FDA 21 CFR Part 11. Code that processes PHI or operates medical devices stays local. Full audit trail, zero data exfiltration risk.
No $20/month subscription, no usage limits, no API key management. Download, setup, code. Works the same whether you're online or on an airplane.
Tell us about your setup. We'll send you the binary, a setup guide, and direct support during the beta.
We'll review your setup and send the beta package to your email within 24 hours.
In the meantime, check out the Getting Started Guide.