ChatGPT can't answer "What's our refund policy?" or "When is the Smith account due?" because it was never trained on your data. It knows the internet. It doesn't know your business.
RAG changes that. And it's simpler than you think.
What is RAG? (The librarian analogy)
RAG stands for Retrieval Augmented Generation. Fancy name, simple idea.
Think of it as hiring a librarian for your AI:
- Someone asks a question
- The librarian (RAG) goes to YOUR bookshelves - your documents, policies, procedures
- Finds the most relevant pages
- Hands them to the AI
- The AI reads those specific pages and answers the question
The AI doesn't memorise your entire business. It looks things up - like a librarian, not a god.
What you can feed it
- Company policies and procedures
- FAQ documents
- Product catalogs and price lists
- Client onboarding guides
- Training materials
- Email templates and standard responses
- Basically any text your team references regularly
How it actually works (without the jargon)
Step 1: Your documents get chopped into small chunks - paragraphs, not whole files.
Step 2: Each chunk gets converted into a mathematical fingerprint (called an embedding).
Step 3: These fingerprints go into a database.
Step 4: When someone asks a question, the system finds the chunks with the most similar fingerprints.
Step 5: Those chunks get sent to the AI along with the question.
Step 6: The AI writes an answer based ONLY on what it found.
This is why RAG chatbots don't hallucinate as much - they're answering from your actual documents, not making things up.
What it costs
- Database: Free to £20/month (Supabase free tier handles most SME needs)
- AI model: £10-50/month depending on usage
- Workflow tool: Free to £20/month (n8n self-hosted is free)
- Total: £0-90/month in running costs
The setup takes 1-3 days with the right expertise.
What it can't do
- It can't reason about data it hasn't been fed
- It's only as good as your documents (garbage in, garbage out)
- It won't replace complex human judgment
- It needs updating when your policies change
Who's using this today
- Law firms: answering client queries from case files
- Accountancy practices: staff looking up HMRC guidance
- Recruitment agencies: candidates asking about job details
- Property management: tenants checking policies
These aren't futuristic experiments. They're working systems deployed right now in UK service businesses.
A worked example: the law firm internal Q&A bot
Let me make this concrete with the kind of project I actually build. A 12-person UK commercial law firm came to me with a specific problem: their associates were spending 30-45 minutes per day answering the same internal questions. "What's our engagement letter template for a SaaS client?" "Which insurers do we recommend for cyber risk?" "What's the partner's standard position on non-compete clauses?" The answers lived in Clio, in shared drives, in partner emails from 2023, and in the heads of two senior associates who were getting tired of being the human search engine.
The solution was a RAG chatbot trained on the firm's own internal documentation. We ingested the engagement letter templates, the internal knowledge base articles, anonymised past matter notes, the risk policies, and a collection of partner memos going back three years. The documents got chunked and embedded into a Supabase vector database. The chat interface sat in Microsoft Teams, which was where the firm already worked.
The first week of use generated about 180 queries. Roughly 70% of them were resolved completely by the bot. Another 20% were partially answered and pointed the associate to the right document. The final 10% were genuinely novel questions that needed a partner's judgement anyway. The two senior associates who had been the human knowledge base got their mornings back. The firm's junior associates stopped feeling awkward about asking basic questions. The bot never judges, never gets impatient, and never forgets the answer.
Total cost: £1,950 for the initial sprint, around £60 per month in running costs (OpenAI API for embeddings and generation, Supabase for the vector store, n8n for the workflow orchestration). Payback was inside the first billing cycle for the firm.
Cost breakdown in more detail
- Vector database: £0-20/month. Supabase's free tier (pgvector) handles up to 500MB of embeddings, which is plenty for most SMEs. Paid plans start at £20/month for 8GB. Pinecone and Weaviate are alternatives if you want dedicated vector infrastructure, but Supabase is usually sufficient.
- Embedding costs: £5-30 one-off for most SMEs. Embeddings are cheap (around £0.02 per 1,000 tokens on OpenAI's text-embedding-3-small). A typical SME knowledge base of 500 documents costs £5-30 to embed once, then pennies per month to update.
- Generation model: £10-100/month depending on volume. The Claude 3.5 Sonnet or GPT-4o tier handles most business Q&A comfortably. Heavy usage (thousands of queries per day) pushes the cost up. Light usage (a few hundred queries per day) keeps it under £30/month.
- Workflow orchestration: £0-20/month. n8n self-hosted is free. The managed cloud version is £20/month. Zapier and Make are alternatives but tend to cost more at scale.
- Interface: £0-50/month. You can surface the bot in Slack, Teams, or a custom web widget. The Slack and Teams integrations are free on the bot side, the interface cost is whatever you already pay for those platforms.
The total running cost for a small-to-medium SME sits between £25 and £120 per month. The setup cost, if you build it properly, is 1-3 days of technical work. If you buy a done-for-you build with a warranty, it's a fixed-price sprint.
The failure modes nobody warns you about
RAG chatbots fail in specific, predictable ways. Knowing the failure modes in advance lets you design around them.
Failure mode 1: Stale documents. If your bank reconciliation policy changed in January but the RAG bot is still trained on last year's document, it will confidently give the wrong answer. The fix is a scheduled re-ingestion pipeline that picks up document changes automatically. Without that, the bot slowly becomes a liability.
Failure mode 2: Conflicting sources. If two different documents contradict each other (say, two different partner memos saying different things about a clause), the bot may pick one at random without flagging the conflict. The fix is to add a confidence score and a sources-cited section so the human asking the question can see which document the answer came from and judge whether it's current.
Failure mode 3: Questions that need data, not documents. RAG is text-search in disguise. If the question is "how many clients signed up last month", the bot cannot answer that from documents - it needs database access. The fix is a hybrid system where the bot can call structured data tools alongside document search. That is a more complex build but solves the problem.
Failure mode 4: Overconfident hallucination on missing information. If you ask something the knowledge base doesn't cover, a badly-configured bot will invent an answer. The fix is explicit grounding instructions in the system prompt ("answer only from the retrieved context; if the context does not contain the answer, say so"). This is a five-minute fix that most DIY builds skip.
DIY versus done-for-you
If you have a technical co-founder or a developer on the team, you can build a basic RAG bot in a weekend. The code is straightforward. LangChain, LlamaIndex, or a raw OpenAI + pgvector setup will get you a working prototype quickly. The running costs are low. The learning is genuinely useful.
What DIY does not give you is the operational layer. The stale document pipeline, the conflict detection, the monitoring for bad answers, the escalation to a human when the bot's confidence drops, the audit trail for compliance, the integration with your existing tools. Those are the parts that take 80% of the time on a production build, and they are the parts that decide whether the bot is a reliable piece of infrastructure or an unreliable science experiment.
Done-for-you builds bundle the operational layer with the core RAG setup. The price reflects that. For a small business, the question is not whether you can build a RAG bot yourself - you probably can. The question is whether you want to spend three months debugging it into reliability instead of three weeks using a reliable version built by someone who has already hit all the failure modes.
The honest bottom line
RAG is not magic. It is a specific, useful pattern for a specific, common problem: your team needs answers that live in your documents. For that problem, RAG is the right tool, the cost is low, the technology is mature, and the failure modes are well understood.
If your problem is different (structured data lookups, predictive analytics, creative generation), RAG is not the tool you need. Be honest about what you are trying to solve before you build. The worst RAG projects I see are the ones where someone heard about RAG and retrofitted it onto a problem it doesn't fit.
For the right problem, a RAG chatbot pays for itself inside a month and keeps paying for itself every time someone on your team finds the answer they need in 10 seconds instead of 10 minutes. That is not a technology win. That is a business win.
Ready to automate?
Book a free 20-minute audit. We find the highest-ROI automation for your business in 14 days.
Get your free automation audit