Sometimes you can't send code to OpenAI.

Maybe it's proprietary code under NDA. Maybe it's a security-conscious client. Maybe you just prefer keeping your work private.

Local AI models have reached the point where they're genuinely useful for development. Here's how to set up a private AI workflow.

Why Local?

Cloud AI services are convenient, but they have trade-offs:

Data leaves your machine. Your prompts, your code, your context—all sent to external servers.

Terms of service matter. Most providers say they don't train on your data. But terms change, breaches happen, trust is required.

Compliance requirements. Some industries and clients prohibit external data processing.

Internet dependency. No connection, no AI. Local runs anywhere.

For sensitive work, local models solve real problems.

The Current State

Local models have improved dramatically:

Good enough for coding. Models like CodeLlama and DeepSeek Coder handle most programming tasks competently.

Reasonable hardware requirements. A decent laptop can run useful models. You don't need a server farm.

Easy setup. Tools like Ollama make running models trivial.

They're not as capable as Claude or GPT-4, but they're often capable enough.

Ollama: The Easy Path

Ollama is the simplest way to run local models.

Installation: One command on Mac or Linux. Download and run on Windows.

# Mac/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Then run a model
ollama run codellama

That's it. You're now running a local AI.

Model options:

  • codellama — Meta's code-focused model
  • deepseek-coder — Strong coding performance
  • mistral — Good general-purpose model
  • llama3 — Latest Llama, versatile

Try different models for your use case. They have different strengths.

LM Studio: The GUI Option

If you prefer a visual interface:

LM Studio provides a ChatGPT-like interface for local models. Download models from within the app. Chat naturally. No command line required.

Good for:

  • Exploring what's available
  • Quick experimentation
  • Non-technical team members

IDE Integration

Local models can plug into your editor:

Continue (VS Code extension): Connect to Ollama or other local providers. Get Copilot-like completions from local models.

Ollama + API: Ollama exposes an OpenAI-compatible API. Many tools that work with OpenAI can point to your local instance instead.

# Ollama API runs on localhost:11434
curl http://localhost:11434/api/generate -d '{
  "model": "codellama",
  "prompt": "Write a Python function to parse JSON"
}'

Hardware Reality

What you need:

Minimum: 8GB RAM, any recent CPU. Can run smaller models (7B parameters).

Comfortable: 16GB RAM, modern CPU. Runs 13B models smoothly.

Ideal: 32GB+ RAM or a GPU with 8GB+ VRAM. Runs larger models at reasonable speed.

M1/M2/M3 Macs are particularly good—the unified memory architecture handles larger models well.

Performance Trade-offs

Be realistic:

Slower than cloud. Local inference takes time. Expect seconds, not milliseconds.

Less capable. The best local models are behind cloud frontier models. Complex reasoning and very long contexts suffer.

Resource-intensive. Your laptop will work hard. Battery life suffers. Fans spin.

For quick completions and straightforward tasks, local works great. For complex architectural discussions, I still reach for Claude.

A Hybrid Workflow

My approach:

Sensitive code: Local models only. Ollama with CodeLlama.

General development: Cloud AI. Claude, ChatGPT.

Quick completions: Copilot or local via Continue.

Match the tool to the sensitivity level. Not everything needs the same protection.

Getting Started

  1. Install Ollama: Takes two minutes
  2. Download a model: ollama pull codellama
  3. Try it: ollama run codellama
  4. Integrate with your editor: Install Continue extension

Start with CodeLlama for coding tasks. Experiment from there.