Implementing RAG with Laravel and pgvector
A practical guide to building Retrieval-Augmented Generation systems in Laravel using PostgreSQL's pgvector extension for semantic search.
Robert Fridzema
Fullstack Developer

Large Language Models are powerful, but they hallucinate and lack knowledge of your specific data. Retrieval-Augmented Generation (RAG) solves this by giving the LLM relevant context from your own documents before generating a response. Here's how to build a RAG system in Laravel using pgvector.
What is RAG?
RAG combines two steps:
- Retrieval - Find relevant documents based on the user's query
- Generation - Use those documents as context for the LLM
User Query: "What's our refund policy?" │ ▼ ┌───────────────────┐ │ Vector Search │ ── Find similar documents └───────────────────┘ │ ▼ ┌───────────────────┐ │ Context: Found │ ── "Refunds within 30 days..." │ 3 relevant docs │ └───────────────────┘ │ ▼ ┌───────────────────┐ │ LLM Generation │ ── Generate answer using context └───────────────────┘ │ ▼ Response: "Our refund policy allows returns within 30 days..."
Why pgvector?
Vector databases are hot right now - Pinecone, Weaviate, Qdrant. But if you're already using PostgreSQL, pgvector lets you add vector search without another service:
- No additional infrastructure - Just a PostgreSQL extension
- Transactional consistency - Vectors and data in the same transaction
- Familiar tooling - Use Eloquent, migrations, backups as usual
- Good enough performance - Handles millions of vectors with proper indexing
Setup
1. Install pgvector
# PostgreSQL 16 with pgvector docker run -d \ --name postgres-vectors \ -e POSTGRES_PASSWORD=secret \ -p 5432:5432 \ pgvector/pgvector:pg16
Or add to existing PostgreSQL:
CREATE EXTENSION vector;
2. Laravel Migration
// database/migrations/create_documents_table.php public function up(): void { // Enable pgvector extension DB::statement('CREATE EXTENSION IF NOT EXISTS vector'); Schema::create('documents', function (Blueprint $table) { $table->id(); $table->string('title'); $table->text('content'); $table->string('source')->nullable(); $table->timestamps(); }); // Add vector column (1536 dimensions for OpenAI ada-002) DB::statement('ALTER TABLE documents ADD COLUMN embedding vector(1536)'); // Create index for fast similarity search DB::statement('CREATE INDEX documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100)'); }
3. Document Model
// app/Models/Document.php namespace App\Models; use Illuminate\Database\Eloquent\Model; use Illuminate\Support\Facades\DB; class Document extends Model { protected $fillable = ['title', 'content', 'source', 'embedding']; /** * Find documents similar to the given embedding */ public static function similarTo(array $embedding, int $limit = 5): Collection { $vectorString = '[' . implode(',', $embedding) . ']'; return static::select('*') ->selectRaw('embedding <=> ? as distance', [$vectorString]) ->orderByRaw('embedding <=> ?', [$vectorString]) ->limit($limit) ->get(); } /** * Set the embedding from an array */ public function setEmbeddingAttribute(array $value): void { $this->attributes['embedding'] = '[' . implode(',', $value) . ']'; } }
Embedding Service
We need to convert text to vectors. OpenAI's embedding API is the most common choice:
// app/Services/EmbeddingService.php namespace App\Services; use Illuminate\Support\Facades\Http; class EmbeddingService { private string $model = 'text-embedding-ada-002'; public function __construct( private string $apiKey ) {} /** * Get embedding for a single text */ public function embed(string $text): array { $response = Http::withHeaders([ 'Authorization' => "Bearer {$this->apiKey}", ])->post('https://api.openai.com/v1/embeddings', [ 'model' => $this->model, 'input' => $this->prepareText($text), ]); if (!$response->successful()) { throw new \Exception('Embedding API failed: ' . $response->body()); } return $response->json('data.0.embedding'); } /** * Get embeddings for multiple texts (batch) */ public function embedBatch(array $texts): array { $prepared = array_map([$this, 'prepareText'], $texts); $response = Http::withHeaders([ 'Authorization' => "Bearer {$this->apiKey}", ])->post('https://api.openai.com/v1/embeddings', [ 'model' => $this->model, 'input' => $prepared, ]); if (!$response->successful()) { throw new \Exception('Embedding API failed: ' . $response->body()); } return collect($response->json('data')) ->pluck('embedding') ->toArray(); } /** * Prepare text for embedding (clean and truncate) */ private function prepareText(string $text): string { // Remove excessive whitespace $text = preg_replace('/\s+/', ' ', trim($text)); // Truncate to ~8000 tokens (rough estimate: 4 chars per token) return mb_substr($text, 0, 32000); } }
Register in a service provider:
// app/Providers/AppServiceProvider.php $this->app->singleton(EmbeddingService::class, function () { return new EmbeddingService(config('services.openai.api_key')); });
Indexing Documents
Create a command to index your documents:
// app/Console/Commands/IndexDocuments.php namespace App\Console\Commands; use App\Models\Document; use App\Services\EmbeddingService; use Illuminate\Console\Command; class IndexDocuments extends Command { protected $signature = 'documents:index {--fresh : Re-index all documents}'; protected $description = 'Generate embeddings for documents'; public function handle(EmbeddingService $embeddings): void { $query = Document::query(); if (!$this->option('fresh')) { $query->whereNull('embedding'); } $documents = $query->get(); $this->info("Indexing {$documents->count()} documents..."); $bar = $this->output->createProgressBar($documents->count()); // Process in batches for efficiency $documents->chunk(20)->each(function ($chunk) use ($embeddings, $bar) { $texts = $chunk->map(fn ($doc) => $doc->title . "\n\n" . $doc->content)->toArray(); $vectors = $embeddings->embedBatch($texts); foreach ($chunk as $index => $document) { $document->embedding = $vectors[$index]; $document->save(); $bar->advance(); } }); $bar->finish(); $this->newLine(); $this->info('Done!'); } }
RAG Service
Now combine retrieval and generation:
// app/Services/RAGService.php namespace App\Services; use App\Models\Document; use Illuminate\Support\Facades\Http; class RAGService { public function __construct( private EmbeddingService $embeddings, private string $openAiKey ) {} /** * Answer a question using RAG */ public function answer(string $question, int $contextDocs = 3): array { // Step 1: Embed the question $questionEmbedding = $this->embeddings->embed($question); // Step 2: Find relevant documents $documents = Document::similarTo($questionEmbedding, $contextDocs); // Step 3: Build context $context = $documents->map(function ($doc) { return "---\nSource: {$doc->source}\n{$doc->content}\n---"; })->join("\n\n"); // Step 4: Generate response $response = $this->generate($question, $context); return [ 'answer' => $response, 'sources' => $documents->map(fn ($d) => [ 'title' => $d->title, 'source' => $d->source, 'relevance' => 1 - $d->distance, // Convert distance to similarity ])->toArray(), ]; } /** * Generate answer using context */ private function generate(string $question, string $context): string { $systemPrompt = <<<PROMPT You are a helpful assistant that answers questions based on the provided context. Rules: - Only use information from the provided context - If the context doesn't contain the answer, say "I don't have information about that" - Cite sources when possible - Be concise and direct PROMPT; $userPrompt = <<<PROMPT Context: {$context} Question: {$question} Answer based on the context above: PROMPT; $response = Http::withHeaders([ 'Authorization' => "Bearer {$this->openAiKey}", ])->post('https://api.openai.com/v1/chat/completions', [ 'model' => 'gpt-4-turbo-preview', 'messages' => [ ['role' => 'system', 'content' => $systemPrompt], ['role' => 'user', 'content' => $userPrompt], ], 'temperature' => 0.7, 'max_tokens' => 1000, ]); return $response->json('choices.0.message.content'); } }
API Endpoint
// routes/api.php Route::post('/ask', function (Request $request, RAGService $rag) { $request->validate([ 'question' => 'required|string|max:1000', ]); $result = $rag->answer($request->question); return response()->json($result); });
Advanced Techniques
Chunking Long Documents
Large documents need to be split into chunks:
// app/Services/DocumentChunker.php class DocumentChunker { public function chunk(string $content, int $maxTokens = 500, int $overlap = 50): array { $sentences = preg_split('/(?<=[.!?])\s+/', $content); $chunks = []; $currentChunk = []; $currentLength = 0; foreach ($sentences as $sentence) { $sentenceLength = $this->estimateTokens($sentence); if ($currentLength + $sentenceLength > $maxTokens && !empty($currentChunk)) { $chunks[] = implode(' ', $currentChunk); // Keep overlap sentences $overlapSentences = array_slice($currentChunk, -2); $currentChunk = $overlapSentences; $currentLength = array_sum(array_map([$this, 'estimateTokens'], $currentChunk)); } $currentChunk[] = $sentence; $currentLength += $sentenceLength; } if (!empty($currentChunk)) { $chunks[] = implode(' ', $currentChunk); } return $chunks; } private function estimateTokens(string $text): int { return (int) ceil(strlen($text) / 4); } }
Hybrid Search
Combine vector similarity with keyword search:
public static function hybridSearch(string $query, array $embedding, int $limit = 5): Collection { $vectorString = '[' . implode(',', $embedding) . ']'; return static::select('*') ->selectRaw('embedding <=> ? as vector_distance', [$vectorString]) ->selectRaw('ts_rank(to_tsvector(content), plainto_tsquery(?)) as text_rank', [$query]) ->selectRaw('(0.7 * (1 - (embedding <=> ?)) + 0.3 * ts_rank(to_tsvector(content), plainto_tsquery(?))) as combined_score', [$vectorString, $query]) ->orderByDesc('combined_score') ->limit($limit) ->get(); }
Metadata Filtering
Filter by metadata before vector search:
public static function similarWithFilters(array $embedding, array $filters, int $limit = 5): Collection { $vectorString = '[' . implode(',', $embedding) . ']'; $query = static::select('*') ->selectRaw('embedding <=> ? as distance', [$vectorString]); // Apply filters if (isset($filters['source'])) { $query->where('source', $filters['source']); } if (isset($filters['created_after'])) { $query->where('created_at', '>=', $filters['created_after']); } return $query ->orderByRaw('embedding <=> ?', [$vectorString]) ->limit($limit) ->get(); }
Performance Tips
1. Use IVFFlat Index
For large datasets, IVFFlat dramatically speeds up search:
-- Adjust 'lists' based on dataset size (sqrt of row count is a good start) CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
2. Batch Embeddings
Always batch embedding requests to reduce API calls:
// Instead of this: foreach ($documents as $doc) { $doc->embedding = $embeddings->embed($doc->content); } // Do this: $texts = $documents->pluck('content')->toArray(); $vectors = $embeddings->embedBatch($texts); foreach ($documents as $i => $doc) { $doc->embedding = $vectors[$i]; }
3. Cache Common Queries
public function answer(string $question): array { $cacheKey = 'rag:' . md5($question); return Cache::remember($cacheKey, 3600, function () use ($question) { return $this->performRAG($question); }); }
4. Use HNSW for Very Large Datasets
For millions of vectors, HNSW index is faster:
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
Evaluation
Track RAG quality with these metrics:
// Log every query for analysis RAGQuery::create([ 'question' => $question, 'answer' => $response, 'sources' => $documents->pluck('id'), 'latency_ms' => $latency, 'user_feedback' => null, // Collect later ]);
Conclusion
RAG with Laravel and pgvector is surprisingly straightforward. You get:
- Semantic search without a separate vector database
- Full ACID compliance with your existing data
- Familiar Laravel patterns and tooling
The key insights:
- Chunk documents appropriately - 500-1000 tokens works well
- Quality context > quantity - 3-5 relevant docs beats 20 mediocre ones
- Prompt engineering matters - Clear system prompts reduce hallucination
- Monitor and iterate - Log queries and gather feedback
Start simple, measure results, and iterate. RAG doesn't have to be complicated.
Building AI features into Laravel? Let's talk - I've shipped several AI-powered applications and happy to share more.
Related Articles

Building MCP Servers for Claude: A Complete Guide
Learn how to extend Claude's capabilities by building custom MCP (Model Context Protocol) servers that provide tools, resources, and prompts.

Building Production Electron Apps with Vue 3 and Azure SSO
A comprehensive guide to building secure, enterprise-ready Electron applications with Vue 3, TypeScript, and Azure AD authentication.

Migrating Legacy Databases: A Real-World n8n + PostgreSQL Story
How we migrated decades of business data from a legacy system to a modern Laravel application using n8n workflows and PostgreSQL.