How RAG Works
Retrieval-Augmented Generation (RAG) is the technology that powers Linuity's AI document intelligence platform. It combines the power of large language models with your organisation's specific knowledge base.
Instead of relying solely on what an AI was trained on, RAG retrieves relevant information from your documents and uses it to generate accurate, contextual answers. This page explains how the process works from document ingestion to answer retrieval.
OVERVIEW
The RAG Pipeline
RAG works in three main phases: Ingestion (preparing your documents), Embedding (converting text to mathematical representations), and Retrieval (finding and using relevant information).
Phase 1: Ingestion
Documents are parsed, cleaned, and split into manageable chunks
Phase 2: Embedding
Text chunks are converted into numerical vectors that capture semantic meaning
Phase 3: Retrieval
User queries find similar vectors and relevant context is provided to the AI
PHASE 1
Document Ingestion
The first step is preparing your documents for AI processing. This involves extracting text, cleaning data, and breaking it into optimal chunks.
Document Parsing
Different file formats (PDF, Word, Excel, CAD files) are processed to extract their text content. This includes:
- •Extracting text from PDFs while preserving structure
- •Reading tables and data from spreadsheets
- •Extracting metadata (author, date, project codes)
- •OCR for scanned documents and images
Data Cleaning
Raw extracted text is cleaned and normalized:
- •Removing formatting artifacts and special characters
- •Normalizing whitespace and line breaks
- •Handling headers, footers, and page numbers
- •Preserving important technical terminology
Text Chunking
Documents are split into smaller, semantically meaningful chunks. This is crucial because:
- •AI models have token limits (typically 512-1024 tokens per chunk)
- •Smaller chunks allow more precise retrieval
- •Chunks maintain context with overlapping windows
EXAMPLE CHUNKING:
Chunk 1: "The bridge design requires steel grade 350..."
Chunk 2: "...grade 350 with specific welding procedures..."
Chunk 3: "...welding procedures as per AS/NZS standards..."
Note: Overlapping text maintains context between chunks
PHASE 2
Creating Embeddings
Embeddings are the magic that allows AI to understand semantic meaning. Text is converted into numerical vectors that capture the essence of the content.
What are Embeddings?
An embedding is a list of numbers (a vector) that represents the meaning of text. Similar concepts have similar vectors.
EXAMPLE:
"structural steel"
[0.23, -0.45, 0.67, 0.12, -0.89, ...]
"steel beams"
[0.25, -0.43, 0.65, 0.14, -0.87, ...]
"concrete mix"
[-0.67, 0.34, -0.12, 0.78, 0.45, ...]
Notice: "structural steel" and "steel beams" have similar numbers, while "concrete mix" is quite different
The Embedding Process
Each text chunk is passed through an embedding model (a specialised neural network):
- •The model has been trained on billions of text examples
- •It outputs a vector (typically 384, 768, or 1536 dimensions)
- •Each dimension captures different aspects of meaning
- •Semantically similar text produces geometrically close vectors
VECTOR SIMILARITY:
✓ High Similarity (0.95)
"bridge design" ↔ "bridge engineering"
~ Medium Similarity (0.65)
"bridge design" ↔ "structural analysis"
✗ Low Similarity (0.15)
"bridge design" ↔ "project budget"
○ No Similarity (0.02)
"bridge design" ↔ "lunch menu"
Storing in a Vector Database
Vectors are stored in a specialised database optimised for similarity search:
- •Each vector is indexed for fast retrieval
- •Metadata (source document, page number, date) is stored alongside
- •Enables finding similar vectors in milliseconds
- •Scales to millions of documents
PHASE 3
Retrieval & Generation
When a user asks a question, the system finds the most relevant information and uses it to generate an accurate answer.
1. User Query is Embedded
The user's question goes through the same embedding process:
User asks: "What steel grade is required for the bridge?"
↓ Embedding Model
Query Vector: [0.24, -0.44, 0.66, 0.13, -0.88, ...]
2. Similarity Search in Vector Database
The vector database finds the most similar document chunks using mathematical distance:
TOP MATCHING CHUNKS:
Match 95%
Bridge_Design.pdf, p.23
"The bridge design requires steel grade 350..."
Match 87%
Specifications.pdf, p.5
"Structural steel specifications for grade 350..."
Match 79%
Materials.xlsx, Sheet 1
"Grade 350 steel meets AS/NZS standards..."
3. Assembling Context for the AI
The top matching chunks are combined with the user's question and sent to the language model:
PROMPT TO AI:
System: Use the following context to answer the question.
Context:
[Bridge_Design.pdf, p.23] The bridge design requires steel grade 350...
[Specifications.pdf, p.5] Structural steel specifications for grade 350...
[Materials.xlsx] Grade 350 steel meets AS/NZS standards...
Question:
What steel grade is required for the bridge?
4. AI Generates an Answer
The language model synthesizes the context to provide an accurate, source-backed answer:
AI RESPONSE:
"Based on the project documentation, the bridge requires steel grade 350. This is specified in the Bridge Design document (page 23) and meets the AS/NZS standards as outlined in the Materials specifications."
Sources:
- • Bridge_Design.pdf, page 23
- • Specifications.pdf, page 5
- • Materials.xlsx, Sheet 1
BENEFITS
Why RAG is Powerful
Factual Accuracy
Answers are grounded in your actual documents, not AI hallucinations
Always Up-to-Date
Add new documents anytime - no retraining the AI model required
Source Traceability
Every answer includes citations to original documents and page numbers
Domain-Specific Knowledge
Works with your firm's unique terminology, standards, and project history
See RAG in Action
Experience how Linuity uses RAG technology to transform your engineering documents into instant, accurate answers.