How I Use PostgreSQL as a Complete AI Backend

Every AI tutorial starts the same way: here is your vector database, here is your LLM provider, here is your caching layer, here is your observability platform. By the time you have followed the tutorial, you are managing six services to answer a question.

When I built Complai, I made a deliberate choice: PostgreSQL would be the only database. Not just for user data and application state, but for embeddings, conversation history, prompt versioning, analytics, and job scheduling. One database. One connection string. One backup strategy.

Six months later, I have not regretted it once.

The AI Backend Checklist

A production AI application needs more than just a vector store. Here is what Complai requires:

Vector storage and similarity search — for RAG retrieval
Conversation history — every chat session, stored and queryable
Prompt versioning — track which system prompts are in production
Usage analytics — tokens consumed, costs, response times
Job scheduling — periodic re-embedding, cache cleanup, report generation
Rate limiting state — per-user request counts and quotas

Most teams reach for a different tool for each of these. I use PostgreSQL for all of them.

Vector Storage with pgvector

I have written about pgvector for RAG pipelines before, so I will keep this brief. The extension adds vector columns, similarity search operators, and HNSW indexing to PostgreSQL. For knowledge bases under 5 million documents, it performs well enough that a dedicated vector database adds complexity without meaningful benefit.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding VECTOR(1536),
  source TEXT,
  chunk_index INTEGER,
  embedding_model TEXT DEFAULT 'text-embedding-3-small',
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

The key insight is that your documents table lives alongside your users table, your conversations table, and everything else. Joins are free. You can query "show me all documents that user X has accessed" without crossing service boundaries.

Conversation History

Storing conversation history in PostgreSQL gives you something that most chat frameworks do not: queryable history with full SQL power.

CREATE TABLE conversations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES users(id),
  title TEXT,
  model TEXT NOT NULL,
  system_prompt_id INTEGER REFERENCES prompt_versions(id),
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  conversation_id UUID REFERENCES conversations(id) ON DELETE CASCADE,
  role TEXT NOT NULL CHECK (role IN ('user', 'assistant', 'system')),
  content TEXT NOT NULL,
  tokens_used INTEGER,
  latency_ms INTEGER,
  model TEXT,
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_messages_conversation ON messages(conversation_id, created_at);

This schema enables queries that would be impossible with a flat file or key-value store:

-- Average response time by model over the last 7 days
SELECT model, AVG(latency_ms) as avg_latency, COUNT(*) as total_messages
FROM messages
WHERE role = 'assistant' AND created_at > now() - INTERVAL '7 days'
GROUP BY model;

-- Users with the most conversations this month
SELECT u.email, COUNT(c.id) as conversation_count
FROM users u JOIN conversations c ON u.id = c.user_id
WHERE c.created_at > date_trunc('month', now())
GROUP BY u.email ORDER BY conversation_count DESC;

-- Find conversations where the AI refused to answer
SELECT c.id, m.content
FROM messages m JOIN conversations c ON m.conversation_id = c.id
WHERE m.role = 'assistant' AND m.content ILIKE '%I cannot%'
ORDER BY m.created_at DESC;

Every one of these queries would require a custom aggregation pipeline if your conversation history lived in Redis or a NoSQL store.

Prompt Versioning

System prompts change. When they change, you need to know which version produced which results. I treat prompts like database migrations: versioned, immutable, and auditable.

CREATE TABLE prompt_versions (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  version INTEGER NOT NULL,
  content TEXT NOT NULL,
  model TEXT NOT NULL,
  is_active BOOLEAN DEFAULT false,
  created_by TEXT,
  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(name, version)
);

-- Only one active version per prompt name
CREATE UNIQUE INDEX idx_active_prompt ON prompt_versions(name) WHERE is_active = true;

async function getActivePrompt(name: string): Promise<string> {
  const result = await db.query(
    `SELECT content FROM prompt_versions WHERE name = $1 AND is_active = true`,
    [name]
  );
  
  if (result.rows.length === 0) {
    throw new Error(`No active prompt found for: ${name}`);
  }
  
  return result.rows[0].content;
}

async function activatePromptVersion(name: string, version: number): Promise<void> {
  await db.query('BEGIN');
  
  // Deactivate current version
  await db.query(
    `UPDATE prompt_versions SET is_active = false WHERE name = $1 AND is_active = true`,
    [name]
  );
  
  // Activate new version
  await db.query(
    `UPDATE prompt_versions SET is_active = true WHERE name = $1 AND version = $2`,
    [name, version]
  );
  
  await db.query('COMMIT');
}

When a user reports a bad response, I can look up exactly which prompt version generated it. When I want to A/B test a new prompt, I route a percentage of traffic to the new version and compare metrics. All of this is just SQL queries — no additional tooling required.

Usage Analytics and Cost Tracking

Every LLM call gets logged with token counts and cost estimates. This is the foundation for cost monitoring, usage-based billing, and anomaly detection.

CREATE TABLE llm_usage (
  id SERIAL PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  conversation_id UUID REFERENCES conversations(id),
  model TEXT NOT NULL,
  input_tokens INTEGER NOT NULL,
  output_tokens INTEGER NOT NULL,
  cost_cents NUMERIC(10, 4),
  latency_ms INTEGER,
  cached BOOLEAN DEFAULT false,
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_usage_user_date ON llm_usage(user_id, created_at);
CREATE INDEX idx_usage_model_date ON llm_usage(model, created_at);

-- Monthly cost by user
SELECT u.email,
       SUM(l.cost_cents) / 100.0 as total_cost,
       SUM(l.input_tokens + l.output_tokens) as total_tokens
FROM llm_usage l JOIN users u ON l.user_id = u.id
WHERE l.created_at > date_trunc('month', now())
GROUP BY u.email ORDER BY total_cost DESC;

-- Cache hit rate
SELECT
  COUNT(*) FILTER (WHERE cached) * 100.0 / COUNT(*) as cache_hit_rate
FROM llm_usage
WHERE created_at > now() - INTERVAL '24 hours';

This data feeds directly into business decisions. When a user asks "how much is this costing us?", I run a query instead of building a dashboard.

Job Scheduling with pg_cron

PostgreSQL can schedule its own maintenance jobs using the pg_cron extension. This eliminates the need for an external scheduler for tasks like cache cleanup, re-embedding, and report generation.

-- Clean up expired cache entries every hour
SELECT cron.schedule('cache-cleanup', '0 * * * *',
  $$DELETE FROM llm_cache WHERE created_at < now() - INTERVAL '24 hours'$$
);

-- Generate daily usage summary at midnight
SELECT cron.schedule('daily-summary', '0 0 * * *',
  $$INSERT INTO daily_summaries (date, total_queries, total_tokens, total_cost, cache_hit_rate)
    SELECT
      CURRENT_DATE - 1,
      COUNT(*),
      SUM(input_tokens + output_tokens),
      SUM(cost_cents),
      COUNT(*) FILTER (WHERE cached) * 100.0 / NULLIF(COUNT(*), 0)
    FROM llm_usage
    WHERE created_at >= CURRENT_DATE - 1 AND created_at < CURRENT_DATE$$
);

-- Weekly re-embedding check
SELECT cron.schedule('embedding-check', '0 3 * * 0',
  $$UPDATE system_flags SET value = 'pending' WHERE key = 'reembedding_status'
    AND (SELECT AVG(top_similarity) FROM retrieval_metrics
         WHERE created_at > now() - INTERVAL '7 days') < 0.85$$
);

The best infrastructure is the infrastructure you already operate. Every additional service is another thing to monitor, backup, update, and debug at 2 AM. PostgreSQL handles all of this in one process, with one backup strategy and one monitoring endpoint.

The Limits

PostgreSQL is not the right tool for everything:

Real-time streaming state. If you need sub-millisecond pub/sub for WebSocket fan-out, Redis is better. I use PostgreSQL's LISTEN/NOTIFY for low-volume real-time updates, but it does not scale to thousands of concurrent WebSocket connections.
Massive vector datasets. Beyond 5–10 million vectors, dedicated vector databases like Pinecone or Qdrant offer better query performance and managed scaling. For most applications, you will never reach this threshold.
Global distribution. PostgreSQL is a single-region database by default. If you need multi-region reads with low latency, you need a distributed database or a read replica strategy.

For everything else — and "everything else" covers 90% of what a typical AI application needs — PostgreSQL is not just sufficient. It is optimal. One database means one connection pool, one backup schedule, one monitoring dashboard, and one set of credentials to manage.

The simplicity compounds. When you add a new AI feature, you add a table and a few queries. You do not evaluate a new service, provision infrastructure, set up networking, configure authentication, and add monitoring. You just write SQL.

That is the real advantage. Not performance. Not features. Simplicity.