Implementing RAG with Laravel and pgvector

Large Language Models are powerful, but they hallucinate and lack knowledge of your specific data. Retrieval-Augmented Generation (RAG) solves this by giving the LLM relevant context from your own documents before generating a response. Here's how to build a RAG system in Laravel using pgvector.

What is RAG?

RAG combines two steps:

Retrieval - Find relevant documents based on the user's query
Generation - Use those documents as context for the LLM

User Query: "What's our refund policy?"
        │
        ▼
┌───────────────────┐
│  Vector Search    │ ── Find similar documents
└───────────────────┘
        │
        ▼
┌───────────────────┐
│  Context: Found   │ ── "Refunds within 30 days..."
│  3 relevant docs  │
└───────────────────┘
        │
        ▼
┌───────────────────┐
│  LLM Generation   │ ── Generate answer using context
└───────────────────┘
        │
        ▼
Response: "Our refund policy allows returns within 30 days..."

Why pgvector?

Vector databases are hot right now - Pinecone, Weaviate, Qdrant. But if you're already using PostgreSQL, pgvector lets you add vector search without another service:

No additional infrastructure - Just a PostgreSQL extension
Transactional consistency - Vectors and data in the same transaction
Familiar tooling - Use Eloquent, migrations, backups as usual
Good enough performance - Handles millions of vectors with proper indexing

Setup

1. Install pgvector

# PostgreSQL 16 with pgvector
docker run -d \
  --name postgres-vectors \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Or add to existing PostgreSQL:

CREATE EXTENSION vector;

2. Laravel Migration

// database/migrations/create_documents_table.php
public function up(): void
{
    // Enable pgvector extension
    DB::statement('CREATE EXTENSION IF NOT EXISTS vector');

    Schema::create('documents', function (Blueprint $table) {
        $table->id();
        $table->string('title');
        $table->text('content');
        $table->string('source')->nullable();
        $table->timestamps();
    });

    // Add vector column (1536 dimensions for OpenAI ada-002)
    DB::statement('ALTER TABLE documents ADD COLUMN embedding vector(1536)');

    // Create index for fast similarity search
    DB::statement('CREATE INDEX documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100)');
}

3. Document Model

// app/Models/Document.php
namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Support\Facades\DB;

class Document extends Model
{
    protected $fillable = ['title', 'content', 'source', 'embedding'];

    /**
     * Find documents similar to the given embedding
     */
    public static function similarTo(array $embedding, int $limit = 5): Collection
    {
        $vectorString = '[' . implode(',', $embedding) . ']';

        return static::select('*')
            ->selectRaw('embedding <=> ? as distance', [$vectorString])
            ->orderByRaw('embedding <=> ?', [$vectorString])
            ->limit($limit)
            ->get();
    }

    /**
     * Set the embedding from an array
     */
    public function setEmbeddingAttribute(array $value): void
    {
        $this->attributes['embedding'] = '[' . implode(',', $value) . ']';
    }
}

Embedding Service

We need to convert text to vectors. OpenAI's embedding API is the most common choice:

// app/Services/EmbeddingService.php
namespace App\Services;

use Illuminate\Support\Facades\Http;

class EmbeddingService
{
    private string $model = 'text-embedding-ada-002';

    public function __construct(
        private string $apiKey
    ) {}

    /**
     * Get embedding for a single text
     */
    public function embed(string $text): array
    {
        $response = Http::withHeaders([
            'Authorization' => "Bearer {$this->apiKey}",
        ])->post('https://api.openai.com/v1/embeddings', [
            'model' => $this->model,
            'input' => $this->prepareText($text),
        ]);

        if (!$response->successful()) {
            throw new \Exception('Embedding API failed: ' . $response->body());
        }

        return $response->json('data.0.embedding');
    }

    /**
     * Get embeddings for multiple texts (batch)
     */
    public function embedBatch(array $texts): array
    {
        $prepared = array_map([$this, 'prepareText'], $texts);

        $response = Http::withHeaders([
            'Authorization' => "Bearer {$this->apiKey}",
        ])->post('https://api.openai.com/v1/embeddings', [
            'model' => $this->model,
            'input' => $prepared,
        ]);

        if (!$response->successful()) {
            throw new \Exception('Embedding API failed: ' . $response->body());
        }

        return collect($response->json('data'))
            ->pluck('embedding')
            ->toArray();
    }

    /**
     * Prepare text for embedding (clean and truncate)
     */
    private function prepareText(string $text): string
    {
        // Remove excessive whitespace
        $text = preg_replace('/\s+/', ' ', trim($text));

        // Truncate to ~8000 tokens (rough estimate: 4 chars per token)
        return mb_substr($text, 0, 32000);
    }
}

// app/Providers/AppServiceProvider.php
$this->app->singleton(EmbeddingService::class, function () {
    return new EmbeddingService(config('services.openai.api_key'));
});

Indexing Documents

Create a command to index your documents:

// app/Console/Commands/IndexDocuments.php
namespace App\Console\Commands;

use App\Models\Document;
use App\Services\EmbeddingService;
use Illuminate\Console\Command;

class IndexDocuments extends Command
{
    protected $signature = 'documents:index {--fresh : Re-index all documents}';
    protected $description = 'Generate embeddings for documents';

    public function handle(EmbeddingService $embeddings): void
    {
        $query = Document::query();

        if (!$this->option('fresh')) {
            $query->whereNull('embedding');
        }

        $documents = $query->get();

        $this->info("Indexing {$documents->count()} documents...");

        $bar = $this->output->createProgressBar($documents->count());

        // Process in batches for efficiency
        $documents->chunk(20)->each(function ($chunk) use ($embeddings, $bar) {
            $texts = $chunk->map(fn ($doc) => $doc->title . "\n\n" . $doc->content)->toArray();

            $vectors = $embeddings->embedBatch($texts);

            foreach ($chunk as $index => $document) {
                $document->embedding = $vectors[$index];
                $document->save();
                $bar->advance();
            }
        });

        $bar->finish();
        $this->newLine();
        $this->info('Done!');
    }
}

RAG Service

Now combine retrieval and generation:

// app/Services/RAGService.php
namespace App\Services;

use App\Models\Document;
use Illuminate\Support\Facades\Http;

class RAGService
{
    public function __construct(
        private EmbeddingService $embeddings,
        private string $openAiKey
    ) {}

    /**
     * Answer a question using RAG
     */
    public function answer(string $question, int $contextDocs = 3): array
    {
        // Step 1: Embed the question
        $questionEmbedding = $this->embeddings->embed($question);

        // Step 2: Find relevant documents
        $documents = Document::similarTo($questionEmbedding, $contextDocs);

        // Step 3: Build context
        $context = $documents->map(function ($doc) {
            return "---\nSource: {$doc->source}\n{$doc->content}\n---";
        })->join("\n\n");

        // Step 4: Generate response
        $response = $this->generate($question, $context);

        return [
            'answer' => $response,
            'sources' => $documents->map(fn ($d) => [
                'title' => $d->title,
                'source' => $d->source,
                'relevance' => 1 - $d->distance, // Convert distance to similarity
            ])->toArray(),
        ];
    }

    /**
     * Generate answer using context
     */
    private function generate(string $question, string $context): string
    {
        $systemPrompt = <<<PROMPT
You are a helpful assistant that answers questions based on the provided context.
Rules:
- Only use information from the provided context
- If the context doesn't contain the answer, say "I don't have information about that"
- Cite sources when possible
- Be concise and direct
PROMPT;

        $userPrompt = <<<PROMPT
Context:
{$context}

Question: {$question}

Answer based on the context above:
PROMPT;

        $response = Http::withHeaders([
            'Authorization' => "Bearer {$this->openAiKey}",
        ])->post('https://api.openai.com/v1/chat/completions', [
            'model' => 'gpt-4-turbo-preview',
            'messages' => [
                ['role' => 'system', 'content' => $systemPrompt],
                ['role' => 'user', 'content' => $userPrompt],
            ],
            'temperature' => 0.7,
            'max_tokens' => 1000,
        ]);

        return $response->json('choices.0.message.content');
    }
}

API Endpoint

// routes/api.php
Route::post('/ask', function (Request $request, RAGService $rag) {
    $request->validate([
        'question' => 'required|string|max:1000',
    ]);

    $result = $rag->answer($request->question);

    return response()->json($result);
});

Advanced Techniques

Chunking Long Documents

Large documents need to be split into chunks:

// app/Services/DocumentChunker.php
class DocumentChunker
{
    public function chunk(string $content, int $maxTokens = 500, int $overlap = 50): array
    {
        $sentences = preg_split('/(?<=[.!?])\s+/', $content);
        $chunks = [];
        $currentChunk = [];
        $currentLength = 0;

        foreach ($sentences as $sentence) {
            $sentenceLength = $this->estimateTokens($sentence);

            if ($currentLength + $sentenceLength > $maxTokens && !empty($currentChunk)) {
                $chunks[] = implode(' ', $currentChunk);

                // Keep overlap sentences
                $overlapSentences = array_slice($currentChunk, -2);
                $currentChunk = $overlapSentences;
                $currentLength = array_sum(array_map([$this, 'estimateTokens'], $currentChunk));
            }

            $currentChunk[] = $sentence;
            $currentLength += $sentenceLength;
        }

        if (!empty($currentChunk)) {
            $chunks[] = implode(' ', $currentChunk);
        }

        return $chunks;
    }

    private function estimateTokens(string $text): int
    {
        return (int) ceil(strlen($text) / 4);
    }
}

Hybrid Search

Combine vector similarity with keyword search:

public static function hybridSearch(string $query, array $embedding, int $limit = 5): Collection
{
    $vectorString = '[' . implode(',', $embedding) . ']';

    return static::select('*')
        ->selectRaw('embedding <=> ? as vector_distance', [$vectorString])
        ->selectRaw('ts_rank(to_tsvector(content), plainto_tsquery(?)) as text_rank', [$query])
        ->selectRaw('(0.7 * (1 - (embedding <=> ?)) + 0.3 * ts_rank(to_tsvector(content), plainto_tsquery(?))) as combined_score',
            [$vectorString, $query])
        ->orderByDesc('combined_score')
        ->limit($limit)
        ->get();
}

Metadata Filtering

Filter by metadata before vector search:

public static function similarWithFilters(array $embedding, array $filters, int $limit = 5): Collection
{
    $vectorString = '[' . implode(',', $embedding) . ']';

    $query = static::select('*')
        ->selectRaw('embedding <=> ? as distance', [$vectorString]);

    // Apply filters
    if (isset($filters['source'])) {
        $query->where('source', $filters['source']);
    }

    if (isset($filters['created_after'])) {
        $query->where('created_at', '>=', $filters['created_after']);
    }

    return $query
        ->orderByRaw('embedding <=> ?', [$vectorString])
        ->limit($limit)
        ->get();
}

Performance Tips

1. Use IVFFlat Index

For large datasets, IVFFlat dramatically speeds up search:

-- Adjust 'lists' based on dataset size (sqrt of row count is a good start)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

2. Batch Embeddings

Always batch embedding requests to reduce API calls:

// Instead of this:
foreach ($documents as $doc) {
    $doc->embedding = $embeddings->embed($doc->content);
}

// Do this:
$texts = $documents->pluck('content')->toArray();
$vectors = $embeddings->embedBatch($texts);
foreach ($documents as $i => $doc) {
    $doc->embedding = $vectors[$i];
}

3. Cache Common Queries

public function answer(string $question): array
{
    $cacheKey = 'rag:' . md5($question);

    return Cache::remember($cacheKey, 3600, function () use ($question) {
        return $this->performRAG($question);
    });
}

4. Use HNSW for Very Large Datasets

For millions of vectors, HNSW index is faster:

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

Evaluation

Track RAG quality with these metrics:

// Log every query for analysis
RAGQuery::create([
    'question' => $question,
    'answer' => $response,
    'sources' => $documents->pluck('id'),
    'latency_ms' => $latency,
    'user_feedback' => null, // Collect later
]);

Conclusion

RAG with Laravel and pgvector is surprisingly straightforward. You get:

Semantic search without a separate vector database
Full ACID compliance with your existing data
Familiar Laravel patterns and tooling

The key insights:

Chunk documents appropriately - 500-1000 tokens works well
Quality context > quantity - 3-5 relevant docs beats 20 mediocre ones
Prompt engineering matters - Clear system prompts reduce hallucination
Monitor and iterate - Log queries and gather feedback

Start simple, measure results, and iterate. RAG doesn't have to be complicated.

Building AI features into Laravel? Let's talk - I've shipped several AI-powered applications and happy to share more.