The Embedding Problem Nobody Talks About: Why AI Search Degrades Over Time

Six months after launching Complai's compliance search, users started complaining that the results were getting worse. Not dramatically worse — just subtly off. Queries that used to return the exact regulation they needed were now returning tangentially related documents. The system had not changed. The code was identical. The database schema was the same. So what happened?

The answer was embedding drift, and it is a problem that almost nobody talks about when they write about RAG pipelines.

What Is Embedding Drift?

When you build a RAG system, you embed your documents using a specific embedding model at a specific point in time. Those embeddings become the mathematical representation of your knowledge base. When a user searches, you embed their query with the same model and find the closest documents by vector similarity.

The problem starts when any part of this equation changes.

Model deprecation. OpenAI deprecated text-embedding-ada-002 in favor of text-embedding-3-small in early 2024. If your documents were embedded with the old model but your queries are embedded with the new model, the vector spaces do not align. Cosine similarity between vectors from different models is meaningless — you are comparing coordinates on different maps.

Model versioning. Even within the "same" model, providers sometimes update weights or architecture without changing the model name. These silent updates can shift the embedding space enough to degrade retrieval quality.

Knowledge base evolution. Your document corpus changes over time. New documents are added, old ones are updated. If you embed new documents with a newer model version while old documents retain their original embeddings, you get a split embedding space. Newer documents are in one region, older documents are in another, and queries can preferentially match one set over the other regardless of relevance.

The Symptoms

Embedding drift does not cause your system to break. It causes it to get gradually worse. The symptoms are subtle:

Relevance scores drop. Your average similarity scores decline over time, even though users are asking the same types of questions.
Top-k results shift. Documents that used to rank #1 for common queries drop to #3 or #4.
New documents outperform old ones. Recent additions consistently appear in results even when older documents are more relevant.
User complaints increase gradually. Nobody reports a single bad result. They say the system "feels less accurate" than it used to.

I did not notice the drift at Complai for three months because I was not tracking the right metrics. The system was not broken — it was just slowly rotting.

Detection: What to Monitor

The first line of defense is monitoring. Here is what I track now:

interface RetrievalMetrics {
  queryId: string;
  query: string;
  topSimilarity: number;    // Similarity of the best match
  avgSimilarity: number;    // Average similarity of top-5 results
  resultCount: number;      // How many results above threshold
  model: string;            // Which embedding model was used
  timestamp: Date;
}

async function logRetrieval(
  query: string,
  results: SearchResult[]
): Promise<void> {
  const metrics: RetrievalMetrics = {
    queryId: crypto.randomUUID(),
    query,
    topSimilarity: results[0]?.similarity ?? 0,
    avgSimilarity: results.reduce((sum, r) => sum + r.similarity, 0) / results.length,
    resultCount: results.length,
    model: CURRENT_EMBEDDING_MODEL,
    timestamp: new Date(),
  };

  await db.query(
    `INSERT INTO retrieval_metrics (query_id, query, top_similarity, avg_similarity, result_count, model, created_at)
     VALUES ($1, $2, $3, $4, $5, $6, $7)`,
    [metrics.queryId, metrics.query, metrics.topSimilarity, metrics.avgSimilarity, metrics.resultCount, metrics.model, metrics.timestamp]
  );
}

The key metric is the trailing average of topSimilarity over time. If it trends downward over weeks, you have embedding drift. A sudden drop indicates a model change. A gradual decline indicates corpus evolution without re-embedding.

I run a weekly report that compares the current week's average similarity to the four-week trailing average. A decline of more than 5% triggers an alert.

The Re-Embedding Problem

The obvious fix for embedding drift is to re-embed everything with the current model. The less obvious part is that re-embedding is expensive, time-consuming, and disruptive.

For Complai, re-embedding our 15,000-document corpus takes about 45 minutes and costs roughly $3 in API calls. That is manageable. For enterprise applications with millions of documents, re-embedding can take days and cost hundreds of dollars.

async function reembedAll(
  model: string,
  batchSize = 100
): Promise<{ processed: number; failed: number }> {
  let processed = 0;
  let failed = 0;
  let offset = 0;

  while (true) {
    const batch = await db.query(
      `SELECT id, content FROM documents ORDER BY id LIMIT $1 OFFSET $2`,
      [batchSize, offset]
    );

    if (batch.rows.length === 0) break;

    const embeddings = await openai.embeddings.create({
      model,
      input: batch.rows.map((r) => r.content),
    });

    for (let i = 0; i < batch.rows.length; i++) {
      try {
        await db.query(
          `UPDATE documents SET embedding = $1, embedding_model = $2, embedded_at = now() WHERE id = $3`,
          [JSON.stringify(embeddings.data[i].embedding), model, batch.rows[i].id]
        );
        processed++;
      } catch {
        failed++;
      }
    }

    offset += batchSize;
  }

  return { processed, failed };
}

A few things I learned the hard way:

Batch your embedding API calls. Most embedding APIs accept arrays of inputs. Sending one document at a time is 10x slower and often more expensive due to per-request overhead.

Track the model version per document. Add an embedding_model column to your documents table. This lets you identify which documents need re-embedding when you switch models, and it provides audit traceability.

Do not re-embed in place. During re-embedding, your search quality will be inconsistent — some documents have old embeddings, some have new ones. Instead, write new embeddings to a staging column, verify quality, and then swap atomically.

The Staging Strategy

Here is the approach I use for zero-downtime re-embedding:

-- Add a staging column
ALTER TABLE documents ADD COLUMN embedding_staging VECTOR(1536);

-- After re-embedding completes, swap atomically
BEGIN;
ALTER TABLE documents RENAME COLUMN embedding TO embedding_old;
ALTER TABLE documents RENAME COLUMN embedding_staging TO embedding;
DROP INDEX IF EXISTS documents_embedding_idx;
CREATE INDEX documents_embedding_idx ON documents USING hnsw (embedding vector_cosine_ops);
COMMIT;

-- Clean up after verifying quality
ALTER TABLE documents DROP COLUMN embedding_old;

This ensures that users always get consistent results — either all old embeddings or all new embeddings — never a mix.

Versioning Your Embedding Space

For applications that need to maintain historical consistency — regulatory compliance being a prime example — I version the entire embedding space.

interface EmbeddingVersion {
  version: number;
  model: string;
  dimensions: number;
  createdAt: Date;
  documentCount: number;
  isActive: boolean;
}

// Store embeddings with version metadata
// This allows rolling back to a previous embedding space if the new one performs worse

Each major re-embedding gets a version number. The active version serves production queries. If a new re-embedding degrades quality (which I have seen happen when switching models), I can roll back to the previous version by updating a single configuration value.

When to Re-Embed

Based on my experience, re-embed in these situations:

You switch embedding models. This is non-negotiable. Vectors from different models are incompatible.
Your corpus grows by more than 30%. Large additions can shift the distribution enough to affect retrieval quality for older documents.
Your monitoring shows a sustained quality decline. If average similarity scores drop 5%+ over two weeks, re-embed.
The provider announces a model update. Even if the model name stays the same, re-embed to ensure consistency.

For Complai, I re-embed quarterly as a preventive measure, regardless of monitoring signals. The cost is trivial (under $5), and it eliminates the slow drift before users notice.

The Bigger Picture

Embedding drift is a symptom of a broader challenge: AI systems require ongoing maintenance in ways that traditional software does not. A database schema does not degrade over time. A REST API does not slowly return worse results. But an embedding space does, because it depends on external models that change independently of your application.

Treat your embeddings as a dependency that needs version management, monitoring, and periodic refresh — the same way you treat your npm packages or your database migrations. The RAG pipeline you built six months ago is not the same pipeline you are running today, even if you have not changed a line of code. The models underneath it have shifted, and your search quality has shifted with them.

Build monitoring from day one, budget for periodic re-embedding, and never assume that "it worked when we launched it" means "it still works today."