How RAG Works

Retrieval-Augmented Generation (RAG) is the technology that powers Linuity's AI document intelligence platform. It combines the power of large language models with your organisation's specific knowledge base.

Instead of relying solely on what an AI was trained on, RAG retrieves relevant information from your documents and uses it to generate accurate, contextual answers. This page explains how the process works from document ingestion to answer retrieval.

OVERVIEW

The RAG Pipeline

RAG works in three main phases: Ingestion (preparing your documents), Embedding (converting text to mathematical representations), and Retrieval (finding and using relevant information).

Phase 1: Ingestion

Documents are parsed, cleaned, and split into manageable chunks

Phase 2: Embedding

Text chunks are converted into numerical vectors that capture semantic meaning

Phase 3: Retrieval

User queries find similar vectors and relevant context is provided to the AI

PHASE 1

Document Ingestion

The first step is preparing your documents for AI processing. This involves extracting text, cleaning data, and breaking it into optimal chunks.

Document Parsing

Different file formats (PDF, Word, Excel, CAD files) are processed to extract their text content. This includes:

•Extracting text from PDFs while preserving structure
•Reading tables and data from spreadsheets
•Extracting metadata (author, date, project codes)
•OCR for scanned documents and images

Data Cleaning

Raw extracted text is cleaned and normalized:

•Removing formatting artifacts and special characters
•Normalizing whitespace and line breaks
•Handling headers, footers, and page numbers
•Preserving important technical terminology

Text Chunking

Documents are split into smaller, semantically meaningful chunks. This is crucial because:

•AI models have token limits (typically 512-1024 tokens per chunk)
•Smaller chunks allow more precise retrieval
•Chunks maintain context with overlapping windows

EXAMPLE CHUNKING:

Chunk 1: "The bridge design requires steel grade 350..."

Chunk 2: "...grade 350 with specific welding procedures..."

Chunk 3: "...welding procedures as per AS/NZS standards..."

Note: Overlapping text maintains context between chunks

PHASE 2

Creating Embeddings

Embeddings are the magic that allows AI to understand semantic meaning. Text is converted into numerical vectors that capture the essence of the content.

What are Embeddings?

An embedding is a list of numbers (a vector) that represents the meaning of text. Similar concepts have similar vectors.

EXAMPLE:

"structural steel"

[0.23, -0.45, 0.67, 0.12, -0.89, ...]

"steel beams"

[0.25, -0.43, 0.65, 0.14, -0.87, ...]

"concrete mix"

[-0.67, 0.34, -0.12, 0.78, 0.45, ...]

Notice: "structural steel" and "steel beams" have similar numbers, while "concrete mix" is quite different

The Embedding Process

Each text chunk is passed through an embedding model (a specialised neural network):

•The model has been trained on billions of text examples
•It outputs a vector (typically 384, 768, or 1536 dimensions)
•Each dimension captures different aspects of meaning
•Semantically similar text produces geometrically close vectors

VECTOR SIMILARITY:

✓ High Similarity (0.95)

"bridge design" ↔ "bridge engineering"

~ Medium Similarity (0.65)

"bridge design" ↔ "structural analysis"

✗ Low Similarity (0.15)

"bridge design" ↔ "project budget"

○ No Similarity (0.02)

"bridge design" ↔ "lunch menu"

Storing in a Vector Database

Vectors are stored in a specialised database optimised for similarity search:

•Each vector is indexed for fast retrieval
•Metadata (source document, page number, date) is stored alongside
•Enables finding similar vectors in milliseconds
•Scales to millions of documents

PHASE 3

Retrieval & Generation

When a user asks a question, the system finds the most relevant information and uses it to generate an accurate answer.

1. User Query is Embedded

The user's question goes through the same embedding process:

User asks: "What steel grade is required for the bridge?"

↓ Embedding Model

Query Vector: [0.24, -0.44, 0.66, 0.13, -0.88, ...]

2. Similarity Search in Vector Database

The vector database finds the most similar document chunks using mathematical distance:

TOP MATCHING CHUNKS:

Match 95%

Bridge_Design.pdf, p.23

"The bridge design requires steel grade 350..."

Match 87%

Specifications.pdf, p.5

"Structural steel specifications for grade 350..."

Match 79%

Materials.xlsx, Sheet 1

"Grade 350 steel meets AS/NZS standards..."

3. Assembling Context for the AI

The top matching chunks are combined with the user's question and sent to the language model:

PROMPT TO AI:

System: Use the following context to answer the question.

Context:

[Bridge_Design.pdf, p.23] The bridge design requires steel grade 350...
[Specifications.pdf, p.5] Structural steel specifications for grade 350...
[Materials.xlsx] Grade 350 steel meets AS/NZS standards...

Question:

What steel grade is required for the bridge?

4. AI Generates an Answer

The language model synthesizes the context to provide an accurate, source-backed answer:

AI RESPONSE:

"Based on the project documentation, the bridge requires steel grade 350. This is specified in the Bridge Design document (page 23) and meets the AS/NZS standards as outlined in the Materials specifications."

Sources:

• Bridge_Design.pdf, page 23
• Specifications.pdf, page 5
• Materials.xlsx, Sheet 1

BENEFITS

Why RAG is Powerful

Factual Accuracy

Answers are grounded in your actual documents, not AI hallucinations

Always Up-to-Date

Add new documents anytime - no retraining the AI model required

Source Traceability

Every answer includes citations to original documents and page numbers

Domain-Specific Knowledge

Works with your firm's unique terminology, standards, and project history

See RAG in Action

Experience how Linuity uses RAG technology to transform your engineering documents into instant, accurate answers.

Book a Demo