Most developers' first experience with Claude is through chat interfaces—Claude.ai or Claude Code. But when you're building a product, you need the Claude API.
The transition isn't obvious. Chat interfaces hide complexity. APIs expose it. You're suddenly responsible for error handling, rate limits, cost control, and state management.
Here's what I've learned building production features with Claude API.
Why Use the API Instead of Chat
Chat interfaces are for you. The API is for your users.
When you integrate Claude via API, you can:
- Embed AI features directly into your application
- Control the user experience completely
- Process requests programmatically at scale
- Build custom workflows that combine multiple AI calls
- Track usage and costs per user or feature
The tradeoff: you handle all the infrastructure yourself.
Getting Started: The Basics
Authentication
You'll need an API key from console.anthropic.com. Store it securely—never commit it to version control.
// lib/claude.ts
import Anthropic from '@anthropic-ai/sdk'
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
})
export async function callClaude(prompt: string) {
const message = await anthropic.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
})
return message.content[0].text
}
This is the minimum viable integration. It works, but it's not production-ready.
Production Pattern: Structured Error Handling
APIs fail. Networks timeout. Rate limits hit. Your code needs to handle this gracefully.
import Anthropic from '@anthropic-ai/sdk'
export class ClaudeError extends Error {
constructor(
message: string,
public code: string,
public statusCode?: number
) {
super(message)
this.name = 'ClaudeError'
}
}
export async function callClaudeWithRetry(
prompt: string,
maxRetries = 3
): Promise<string> {
let lastError: Error | null = null
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const message = await anthropic.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
})
return message.content[0].text
} catch (error) {
lastError = error as Error
// Check if it's a rate limit error
if (error.status === 429) {
const waitTime = Math.pow(2, attempt) * 1000 // Exponential backoff
await new Promise(resolve => setTimeout(resolve, waitTime))
continue
}
// Don't retry on authentication errors
if (error.status === 401 || error.status === 403) {
throw new ClaudeError(
'Authentication failed',
'AUTH_ERROR',
error.status
)
}
// Retry on server errors
if (error.status >= 500) {
continue
}
// Don't retry on client errors
throw new ClaudeError(
error.message || 'Claude API request failed',
'API_ERROR',
error.status
)
}
}
throw new ClaudeError(
`Failed after ${maxRetries} attempts: ${lastError?.message}`,
'MAX_RETRIES_EXCEEDED'
)
}
This pattern handles:
- Rate limits with exponential backoff
- Transient server errors with retries
- Authentication failures without retries
- Clear error messages for debugging
Cost Optimization
Claude API pricing is per-token. Every character you send and receive costs money.
Strategy 1: Minimize System Prompts
System prompts count as input tokens on every request. Keep them concise.
Instead of:
const systemPrompt = `You are a helpful assistant that specializes in
helping users with Excel formulas. You should be friendly and patient.
Always explain your reasoning step by step. If you're not sure about
something, say so. Format your responses in markdown. Include examples
when relevant. Be concise but thorough.`
Try:
const systemPrompt = `Excel formula assistant. Explain reasoning, use markdown, provide examples.`
Same guidance, 90% fewer tokens.
Strategy 2: Cache System Prompts
Anthropic's prompt caching feature lets you reuse system prompts across requests at a 90% discount.
const message = await anthropic.messages.create({
model: 'claude-opus-4-20250514',
max_tokens: 1024,
system: [
{
type: 'text',
text: systemPrompt,
cache_control: { type: 'ephemeral' }
}
],
messages: [{ role: 'user', content: prompt }],
})
Cache hits last 5 minutes. For high-traffic features, this dramatically reduces costs.
Strategy 3: Use Haiku for Simple Tasks
Not every task needs Opus. Classification, summarization, and simple extraction work fine with Haiku at 1/30th the cost.
function selectModel(taskType: string) {
switch (taskType) {
case 'classify':
case 'extract':
case 'summarize':
return 'claude-haiku-4-20250514'
case 'analyze':
case 'generate':
case 'reason':
return 'claude-opus-4-20250514'
default:
return 'claude-sonnet-4-20250514'
}
}
Streaming Responses
For user-facing features, streaming makes your app feel faster.
export async function* streamClaude(prompt: string) {
const stream = await anthropic.messages.stream({
model: 'claude-opus-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
})
for await (const chunk of stream) {
if (
chunk.type === 'content_block_delta' &&
chunk.delta.type === 'text_delta'
) {
yield chunk.delta.text
}
}
}
Use this with Server-Sent Events (SSE) or WebSockets to stream to your frontend.
Production Checklist
Before deploying Claude API features:
Security:
- [ ] API keys stored in environment variables
- [ ] Rate limiting on your endpoints
- [ ] Input validation and sanitization
- [ ] Output sanitization for user-generated content
Reliability:
- [ ] Error handling with retries
- [ ] Timeout configuration
- [ ] Circuit breaker for cascading failures
- [ ] Fallback behavior when API is down
Cost Control:
- [ ] Token usage monitoring
- [ ] Per-user rate limits
- [ ] Maximum token caps per request
- [ ] Alert thresholds for unexpected usage spikes
Monitoring:
- [ ] Log all API calls (without sensitive data)
- [ ] Track response times
- [ ] Monitor error rates
- [ ] Track costs per feature/user
Common Pitfalls
Pitfall 1: Not Setting max_tokens
Without a limit, Claude will generate until it hits the model's maximum. Set reasonable caps based on your use case.
Pitfall 2: Sending Too Much Context
You pay for input tokens. Don't send entire documents if you only need a summary. Extract relevant sections first.
Pitfall 3: Ignoring Rate Limits
Free tier: 5 requests per minute. Paid tier: much higher but still limited. Implement queuing for batch operations.
Pitfall 4: Not Testing Failure Modes
Test what happens when:
- The API is down
- Requests timeout
- You hit rate limits
- The response is malformed
Your app should degrade gracefully, not crash.
Real-World Example: Content Generation
Here's a production-ready function for generating blog post ideas:
import { callClaudeWithRetry, ClaudeError } from '@/lib/claude'
interface BlogIdea {
title: string
excerpt: string
targetAudience: string
}
export async function generateBlogIdeas(
topic: string,
count: number = 3
): Promise<BlogIdea[]> {
const prompt = `Generate ${count} blog post ideas about "${topic}".
For each idea, provide:
- Title (compelling, specific)
- Excerpt (2-3 sentences)
- Target audience
Format as JSON array.`
try {
const response = await callClaudeWithRetry(prompt)
// Parse JSON response
const ideas = JSON.parse(response)
// Validate structure
if (!Array.isArray(ideas) || ideas.length === 0) {
throw new Error('Invalid response format')
}
return ideas.slice(0, count) // Cap at requested count
} catch (error) {
if (error instanceof ClaudeError) {
// Log for monitoring
console.error('Claude API error:', {
code: error.code,
statusCode: error.statusCode,
message: error.message,
})
}
// Return empty array instead of crashing
return []
}
}
This function handles errors gracefully and returns predictable results.
Next Steps
Start small. Pick one feature where AI adds clear value. Build it with proper error handling and monitoring. Learn from real usage before expanding.
The Claude API is powerful, but production reliability comes from disciplined engineering around it—not just clever prompts.
Related: Claude API Series
- Claude API Prompt Patterns That Actually Work - Essential patterns for consistent, parseable API responses
- How I Built This Website with Claude Code - Using Claude for full-stack development
Official Documentation
- Claude API Documentation - Complete API reference and guides
- API Console - Manage API keys and monitor usage
- Anthropic SDK - Official TypeScript SDK on GitHub
- Prompt Caching Guide - Reduce costs with prompt caching
- Rate Limits - Understanding API rate limits
Building something with Claude API? I'd love to hear what you're working on and what challenges you're facing.