When I started building Comply Assist AI — an AI-powered regulatory compliance platform for the cannabis industry — I expected the hard part to be the AI. I was wrong. The hard parts were everything around it: ingesting messy regulatory documents, building a retrieval pipeline that actually returns relevant results, and solving payment processing in an industry that every major processor refuses to serve.
Here's what I learned building an AI SaaS from zero to MVP as a sole developer, and what I wish someone had told me before I started.
Start with the Data, Not the Model
Every AI SaaS tutorial starts with "call the OpenAI API." That's the easy part. The hard part is getting the right data to the model.
For Comply Assist AI, the data was thousands of pages of regulatory text — federal statutes, state laws, local ordinances — spread across dozens of government websites in inconsistent formats. PDFs, HTML pages, occasionally just scanned images.
The pipeline I built:
- Ingest — Scrape and parse regulatory documents into clean text
- Chunk — Split documents into semantically meaningful chunks (not arbitrary 500-token blocks)
- Embed — Generate vector embeddings for each chunk using an embedding model
- Store — Index embeddings in PostgreSQL with
pgvectorfor semantic search - Retrieve — At query time, find the most relevant chunks for the user's question
- Generate — Pass retrieved context + user question to the LLM
async function queryCompliance(
question: string,
jurisdiction: string
): Promise<ComplianceAnswer> {
// Step 1: Generate embedding for the question
const questionEmbedding = await generateEmbedding(question)
// Step 2: Find relevant regulatory chunks
const relevantDocs = await db.query(`
SELECT content, source_url, statute_id
FROM regulatory_chunks
WHERE jurisdiction = $1
ORDER BY embedding <-> $2
LIMIT 5
`, [jurisdiction, questionEmbedding])
// Step 3: Generate answer with context
const answer = await callLLM({
system: COMPLIANCE_SYSTEM_PROMPT,
user: buildPrompt(question, relevantDocs),
temperature: 0.1, // Low temperature for factual accuracy
})
return ComplianceAnswerSchema.parse(answer)
}
The key insight: chunking strategy matters more than model choice. I spent weeks experimenting with different chunk sizes and overlap strategies. Splitting on paragraph boundaries with 20% overlap gave the best retrieval accuracy for regulatory text.
Prompt Engineering Is Software Engineering
I treat prompts the same way I treat code: version-controlled, tested, and reviewed. A compliance platform can't afford hallucinations — wrong answers about regulations can cost businesses their licenses.
My prompt architecture has three layers:
const SYSTEM_PROMPT = `You are a regulatory compliance expert.
Rules:
- Only answer based on the provided regulatory context
- If the context doesn't contain the answer, say "I don't have enough information"
- Always cite the specific statute or regulation
- Never speculate or extrapolate beyond what the text says`
function buildPrompt(question: string, docs: RegDoc[]): string {
const context = docs
.map(d => `[${d.statute_id}] ${d.content}`)
.join('\n---\n')
return `Regulatory context:\n${context}\n\nQuestion: ${question}\n\nProvide your answer with specific citations.`
}
The system prompt sets behavioral constraints. The user prompt provides context and the question. A third layer — output format instructions — tells the model exactly how to structure its response (JSON with answer, citations, and confidence fields).
The difference between a demo and a product is error handling. A demo calls the API and shows the response. A product validates the response, handles failures gracefully, and never shows the user a raw error message.
Payment Processing in a Restricted Industry
Here's something nobody warns you about: if you're building software for the cannabis industry, Stripe won't work. Neither will PayPal, Square, or Braintree. The cannabis industry is classified as "high-risk" by virtually every mainstream payment processor.
After weeks of research, I integrated Authorize.net — one of the few established processors willing to serve cannabis-adjacent businesses. The integration was more complex than a standard Stripe setup:
- Custom compliance verification flows beyond standard KYC
- Manual underwriting process that took weeks
- Different webhook patterns and error codes
- No drop-in UI components — everything had to be built from scratch
This is the kind of domain-specific challenge that AI tutorials never mention. Half of building a SaaS is solving problems that have nothing to do with your core technology.
What Actually Matters for an AI MVP
After building Comply Assist AI from scratch, here's what I'd prioritize if I were starting another AI SaaS tomorrow:
- Data quality over model sophistication. A well-curated dataset with GPT-4 beats a messy dataset with a fine-tuned model every time.
- Retrieval accuracy over generation quality. If you retrieve the wrong context, no amount of prompt engineering will save you. Invest in your chunking and embedding strategy.
- Latency budget. Users expect sub-3-second responses. Comply Assist AI hits that target by pre-computing embeddings, using connection pooling for PostgreSQL, and caching frequent queries.
- Confidence scoring. Every response from Comply Assist AI includes a confidence score. Low-confidence answers trigger a disclaimer. This builds trust and reduces liability.
- Source attribution. Every answer cites specific regulatory text. Users can verify the AI's claims, which is critical in a compliance context.
The Reality of Solo Development
Building an AI SaaS alone means wearing every hat: backend engineer, frontend developer, data engineer, prompt engineer, compliance researcher, and payment integrations specialist. It's exhausting, but it gives you deep understanding of every layer of the stack.
The most important skill isn't any specific technology — it's the ability to scope aggressively. I launched with one jurisdiction instead of all fifty. I supported text queries before adding document upload. Each feature was the minimum viable version that solved a real user problem.
If you're planning to build an AI-powered product, start with the problem, not the technology. Build the simplest pipeline that delivers value, validate it with real users, and iterate. The AI is just a tool — the product is the experience around it.