Sometimes you can't send code to OpenAI.
Maybe it's proprietary code under NDA. Maybe it's a security-conscious client. Maybe you just prefer keeping your work private.
Local AI models have reached the point where they're genuinely useful for development. Here's how to set up a private AI workflow.
Why Local?
Cloud AI services are convenient, but they have trade-offs:
Data leaves your machine. Your prompts, your code, your context—all sent to external servers.
Terms of service matter. Most providers say they don't train on your data. But terms change, breaches happen, trust is required.
Compliance requirements. Some industries and clients prohibit external data processing.
Internet dependency. No connection, no AI. Local runs anywhere.
For sensitive work, local models solve real problems.
The Current State
Local models have improved dramatically:
Good enough for coding. Models like CodeLlama and DeepSeek Coder handle most programming tasks competently.
Reasonable hardware requirements. A decent laptop can run useful models. You don't need a server farm.
Easy setup. Tools like Ollama make running models trivial.
They're not as capable as Claude or GPT-4, but they're often capable enough.
Ollama: The Easy Path
Ollama is the simplest way to run local models.
Installation: One command on Mac or Linux. Download and run on Windows.
# Mac/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Then run a model
ollama run codellama
That's it. You're now running a local AI.
Model options:
codellama— Meta's code-focused modeldeepseek-coder— Strong coding performancemistral— Good general-purpose modelllama3— Latest Llama, versatile
Try different models for your use case. They have different strengths.
LM Studio: The GUI Option
If you prefer a visual interface:
LM Studio provides a ChatGPT-like interface for local models. Download models from within the app. Chat naturally. No command line required.
Good for:
- Exploring what's available
- Quick experimentation
- Non-technical team members
IDE Integration
Local models can plug into your editor:
Continue (VS Code extension): Connect to Ollama or other local providers. Get Copilot-like completions from local models.
Ollama + API: Ollama exposes an OpenAI-compatible API. Many tools that work with OpenAI can point to your local instance instead.
# Ollama API runs on localhost:11434
curl http://localhost:11434/api/generate -d '{
"model": "codellama",
"prompt": "Write a Python function to parse JSON"
}'
Hardware Reality
What you need:
Minimum: 8GB RAM, any recent CPU. Can run smaller models (7B parameters).
Comfortable: 16GB RAM, modern CPU. Runs 13B models smoothly.
Ideal: 32GB+ RAM or a GPU with 8GB+ VRAM. Runs larger models at reasonable speed.
M1/M2/M3 Macs are particularly good—the unified memory architecture handles larger models well.
Performance Trade-offs
Be realistic:
Slower than cloud. Local inference takes time. Expect seconds, not milliseconds.
Less capable. The best local models are behind cloud frontier models. Complex reasoning and very long contexts suffer.
Resource-intensive. Your laptop will work hard. Battery life suffers. Fans spin.
For quick completions and straightforward tasks, local works great. For complex architectural discussions, I still reach for Claude.
A Hybrid Workflow
My approach:
Sensitive code: Local models only. Ollama with CodeLlama.
General development: Cloud AI. Claude, ChatGPT.
Quick completions: Copilot or local via Continue.
Match the tool to the sensitivity level. Not everything needs the same protection.
Getting Started
- Install Ollama: Takes two minutes
- Download a model:
ollama pull codellama - Try it:
ollama run codellama - Integrate with your editor: Install Continue extension
Start with CodeLlama for coding tasks. Experiment from there.
Related Reading
- My AI Tool Stack in 2025 — Where local models fit in my workflow.
- Free AI Tools Worth Your Time — Local models are completely free.
- Security for Solo Founders — Privacy as part of security posture.