RAG

RAG systems that hold up against real knowledge bases

RAG is often described as a simple pattern: embed documents, retrieve the relevant ones, pass them to the model. In practice, the pattern works well on small, clean datasets and gets considerably harder as the knowledge base grows, the documents become more varied, the access rules matter, and the answers have to be trustworthy. That's the work we focus on.

Overview

What is RAG?

RAG — Retrieval-Augmented Generation — is a technique for grounding an AI model's answers in a specific body of knowledge: your documentation, records, or internal data. Instead of relying only on training data, the system retrieves relevant context and passes it to the model.

The appeal is straightforward. RAG lets you build AI features that know your business specifically: your product, your customers, your processes, your policies. The difficulty is doing it well at real scale.

We design RAG systems around three principles:

Retrieval is the foundation

The quality of what the model says is bounded by the quality of what it's given. We treat retrieval as a first-class engineering problem — combining semantic search with keyword search, metadata filters, and re-ranking — rather than as a one-line library call.

Permissions belong inside retrieval

Access controls are enforced at the retrieval layer, not as a post-filter applied to results. A RAG system that can surface documents a user shouldn't see is a compliance problem waiting to happen.

Trust comes from verifiability

Every answer the system gives should be traceable back to the source documents it came from, with citations users can follow. Trust in a RAG system is earned by making the answers verifiable, not by claiming they're accurate.

Audience

Who we work with

  • Product teams building knowledge-powered features where answers have to be accurate and source-verifiable.
  • Teams with large or fast-changing document repositories that basic RAG implementations struggle with.
  • Organisations in regulated or data-sensitive environments where retrieval has to respect permissions and remain auditable.
  • Teams whose first RAG prototype worked well in testing and then ran into problems once it met the real corpus.
Challenges

The problems we most often see

  • Knowledge that's messier than the prototype assumed. Real corpora include PDFs with bad layouts, scanned documents, structured data, outdated versions, duplicates, and content in multiple languages. The chunking and retrieval strategy that worked on a clean pilot often doesn't survive the real data.
  • Answers that lose trust once users check them. Hallucinations, stale information, and missing or vague citations erode confidence quickly. RAG only works as a business system if users can verify where answers came from and rely on them being current.
  • Retrieval that degrades at scale. A system that retrieves well against ten thousand documents can behave very differently at five hundred thousand or a few million. The index, the embedding model, the re-ranking strategy, and the query pipeline all need to be chosen with the actual scale in mind.
  • Approach

    How we approach the work

    • 01 — Treat retrieval as the engineering problem. The model's answer is bounded by what retrieval gives it. Most of the engineering work goes into combining search techniques, tuning chunking, and applying metadata filters and re-ranking that match the specific corpus.
    • 02 — Build evaluation as infrastructure. Without a way to measure whether a retrieval change is actually an improvement, teams spend weeks tuning parameters and end up roughly where they started. We build the evaluation harness before the rest of the system depends on it.
    • 03 — Make permissions a retrieval-layer concern. Access rules are enforced where the retrieval happens, not bolted on afterwards. Retrofitting permissions into a RAG system that wasn't designed for them is painful, and the failure mode is too consequential to leave to chance.
    • 04 — Design answers users can verify. Every response should be grounded in specific source documents, with citations users can follow. Trust in a RAG system isn't asserted — it's earned, conversation by conversation, by giving users the ability to check.
    Use Cases

    Where RAG fits

    Software companies and product teams

    In-product knowledge features where AI answers questions or generates content using your own data — documentation, customer records, internal datasets.

    Common use cases:

    • In-product assistants that help users navigate complex software
    • Documentation search and Q&A grounded in your actual docs, not the model's training
    • Personalised features that draw on the user's own records and history
    • Smart search across product content, with citations users can verify

    Operations teams in mid-sized companies

    Internal knowledge systems that help teams find and use information across large or fragmented sources — internal docs, knowledge bases, contracts, reports.

    Common use cases:

    • Internal assistants for support, sales, or operations teams
    • Q&A over policy and reference documents, with verifiable answers
    • Research and analysis tools that pull from multiple internal sources
    • Knowledge tools that respect existing user permissions

    Compliance-heavy and multi-source environments

    RAG systems where every answer needs to be traceable, permissions matter, and the corpus spans both structured and unstructured sources.

    Common use cases:

    • Policy and compliance Q&A with citations to current source documents
    • Audit-support systems that retrieve and summarise historical records
    • Research platforms combining structured data with unstructured documents
    • Knowledge systems in regulated industries, with strict access and data-residency controls
    Honesty

    When RAG might not be the right answer

    RAG gets reached for more often than it should. A few patterns where we've suggested different approaches:

    • When the underlying question is better answered by structured data. A well-designed query against a database usually beats semantic search over a report about that database.
    • When the knowledge base is small and stable. Putting relevant context directly in the prompt is often simpler, faster, and more reliable than building retrieval infrastructure.
    • When the documentation problem is actually a documentation problem. RAG can't fix content that's out of date, duplicated, contradictory, or missing. It will surface those problems, not solve them.
    Process

    How a typical engagement runs

    Most RAG projects move through five phases. We work alongside your team throughout, with weekly check-ins so you can see progress, raise questions early, and shift priorities as the project evolves.

    1

    Discovery

    We start by looking at the actual corpus, the access requirements, the query patterns, and the accuracy bar the use case needs. The output is a written plan with a realistic scope, timeline, and cost — and often an honest assessment of whether RAG is the right tool for the problem at all.

    2

    Build

    We design and implement the retrieval pipeline alongside your team, including chunking, indexing, search, re-ranking, and the integration with your access controls. Evaluation infrastructure is built in from the first commit.

    3

    Validation

    We test the system against real queries, real users, and real edge cases — not just synthetic benchmarks — and validate retrieval quality against your actual accuracy bar. Where the system will operate at scale, we test against representative corpus volumes.

    4

    Deployment

    Production rollout with monitoring, alerting, and the answer-quality controls the use case requires. We stay close during the first weeks of live use, when the patterns of real user queries tend to differ from what was tested.

    5

    Handover, and what comes next

    We hand over the system with full documentation, evaluation tooling, and a clear plan for how your team will own and evolve it. Everything we build — code, infrastructure, operational knowledge — is yours.

    From there, you have two options: take the system in-house and run it yourselves, or have us continue alongside you for monitoring, evaluation, and ongoing updates as the corpus grows or the underlying tooling evolves. Both work for us; we'll talk through the choice at the start of the engagement.

    Get in touch

    Let's talk about your project

    Most engagements begin with a short discovery phase: a few days spent looking at the knowledge bases, the access requirements, the query patterns, and what success would actually look like. The output is a written plan with a realistic scope, timeline, and cost — and an honest read on whether RAG is the right approach for what you're trying to do.

    We're glad to start the conversation, whether you have a clearly scoped project, a rough idea you're still thinking through, or a specific problem you'd like a second opinion on.