Skip to main content
Back to Blog
deep-diveJanuary 12, 202514 min min read

Implementing RAG with Laravel and pgvector

A practical guide to building Retrieval-Augmented Generation systems in Laravel using PostgreSQL's pgvector extension for semantic search.

Robert Fridzema

Robert Fridzema

Fullstack Developer

Implementing RAG with Laravel and pgvector

Large Language Models are powerful, but they hallucinate and lack knowledge of your specific data. Retrieval-Augmented Generation (RAG) solves this by giving the LLM relevant context from your own documents before generating a response. Here's how to build a RAG system in Laravel using pgvector.

What is RAG?

RAG combines two steps:

  1. Retrieval - Find relevant documents based on the user's query
  2. Generation - Use those documents as context for the LLM
User Query: "What's our refund policy?"
        │
        ▼
┌───────────────────┐
│  Vector Search    │ ── Find similar documents
└───────────────────┘
        │
        ▼
┌───────────────────┐
│  Context: Found   │ ── "Refunds within 30 days..."
│  3 relevant docs  │
└───────────────────┘
        │
        ▼
┌───────────────────┐
│  LLM Generation   │ ── Generate answer using context
└───────────────────┘
        │
        ▼
Response: "Our refund policy allows returns within 30 days..."

Why pgvector?

Vector databases are hot right now - Pinecone, Weaviate, Qdrant. But if you're already using PostgreSQL, pgvector lets you add vector search without another service:

  • No additional infrastructure - Just a PostgreSQL extension
  • Transactional consistency - Vectors and data in the same transaction
  • Familiar tooling - Use Eloquent, migrations, backups as usual
  • Good enough performance - Handles millions of vectors with proper indexing

Setup

1. Install pgvector

# PostgreSQL 16 with pgvector
docker run -d \
  --name postgres-vectors \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Or add to existing PostgreSQL:

CREATE EXTENSION vector;

2. Laravel Migration

// database/migrations/create_documents_table.php
public function up(): void
{
    // Enable pgvector extension
    DB::statement('CREATE EXTENSION IF NOT EXISTS vector');

    Schema::create('documents', function (Blueprint $table) {
        $table->id();
        $table->string('title');
        $table->text('content');
        $table->string('source')->nullable();
        $table->timestamps();
    });

    // Add vector column (1536 dimensions for OpenAI ada-002)
    DB::statement('ALTER TABLE documents ADD COLUMN embedding vector(1536)');

    // Create index for fast similarity search
    DB::statement('CREATE INDEX documents_embedding_idx ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100)');
}

3. Document Model

// app/Models/Document.php
namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Illuminate\Support\Facades\DB;

class Document extends Model
{
    protected $fillable = ['title', 'content', 'source', 'embedding'];

    /**
     * Find documents similar to the given embedding
     */
    public static function similarTo(array $embedding, int $limit = 5): Collection
    {
        $vectorString = '[' . implode(',', $embedding) . ']';

        return static::select('*')
            ->selectRaw('embedding <=> ? as distance', [$vectorString])
            ->orderByRaw('embedding <=> ?', [$vectorString])
            ->limit($limit)
            ->get();
    }

    /**
     * Set the embedding from an array
     */
    public function setEmbeddingAttribute(array $value): void
    {
        $this->attributes['embedding'] = '[' . implode(',', $value) . ']';
    }
}

Embedding Service

We need to convert text to vectors. OpenAI's embedding API is the most common choice:

// app/Services/EmbeddingService.php
namespace App\Services;

use Illuminate\Support\Facades\Http;

class EmbeddingService
{
    private string $model = 'text-embedding-ada-002';

    public function __construct(
        private string $apiKey
    ) {}

    /**
     * Get embedding for a single text
     */
    public function embed(string $text): array
    {
        $response = Http::withHeaders([
            'Authorization' => "Bearer {$this->apiKey}",
        ])->post('https://api.openai.com/v1/embeddings', [
            'model' => $this->model,
            'input' => $this->prepareText($text),
        ]);

        if (!$response->successful()) {
            throw new \Exception('Embedding API failed: ' . $response->body());
        }

        return $response->json('data.0.embedding');
    }

    /**
     * Get embeddings for multiple texts (batch)
     */
    public function embedBatch(array $texts): array
    {
        $prepared = array_map([$this, 'prepareText'], $texts);

        $response = Http::withHeaders([
            'Authorization' => "Bearer {$this->apiKey}",
        ])->post('https://api.openai.com/v1/embeddings', [
            'model' => $this->model,
            'input' => $prepared,
        ]);

        if (!$response->successful()) {
            throw new \Exception('Embedding API failed: ' . $response->body());
        }

        return collect($response->json('data'))
            ->pluck('embedding')
            ->toArray();
    }

    /**
     * Prepare text for embedding (clean and truncate)
     */
    private function prepareText(string $text): string
    {
        // Remove excessive whitespace
        $text = preg_replace('/\s+/', ' ', trim($text));

        // Truncate to ~8000 tokens (rough estimate: 4 chars per token)
        return mb_substr($text, 0, 32000);
    }
}

Register in a service provider:

// app/Providers/AppServiceProvider.php
$this->app->singleton(EmbeddingService::class, function () {
    return new EmbeddingService(config('services.openai.api_key'));
});

Indexing Documents

Create a command to index your documents:

// app/Console/Commands/IndexDocuments.php
namespace App\Console\Commands;

use App\Models\Document;
use App\Services\EmbeddingService;
use Illuminate\Console\Command;

class IndexDocuments extends Command
{
    protected $signature = 'documents:index {--fresh : Re-index all documents}';
    protected $description = 'Generate embeddings for documents';

    public function handle(EmbeddingService $embeddings): void
    {
        $query = Document::query();

        if (!$this->option('fresh')) {
            $query->whereNull('embedding');
        }

        $documents = $query->get();

        $this->info("Indexing {$documents->count()} documents...");

        $bar = $this->output->createProgressBar($documents->count());

        // Process in batches for efficiency
        $documents->chunk(20)->each(function ($chunk) use ($embeddings, $bar) {
            $texts = $chunk->map(fn ($doc) => $doc->title . "\n\n" . $doc->content)->toArray();

            $vectors = $embeddings->embedBatch($texts);

            foreach ($chunk as $index => $document) {
                $document->embedding = $vectors[$index];
                $document->save();
                $bar->advance();
            }
        });

        $bar->finish();
        $this->newLine();
        $this->info('Done!');
    }
}

RAG Service

Now combine retrieval and generation:

// app/Services/RAGService.php
namespace App\Services;

use App\Models\Document;
use Illuminate\Support\Facades\Http;

class RAGService
{
    public function __construct(
        private EmbeddingService $embeddings,
        private string $openAiKey
    ) {}

    /**
     * Answer a question using RAG
     */
    public function answer(string $question, int $contextDocs = 3): array
    {
        // Step 1: Embed the question
        $questionEmbedding = $this->embeddings->embed($question);

        // Step 2: Find relevant documents
        $documents = Document::similarTo($questionEmbedding, $contextDocs);

        // Step 3: Build context
        $context = $documents->map(function ($doc) {
            return "---\nSource: {$doc->source}\n{$doc->content}\n---";
        })->join("\n\n");

        // Step 4: Generate response
        $response = $this->generate($question, $context);

        return [
            'answer' => $response,
            'sources' => $documents->map(fn ($d) => [
                'title' => $d->title,
                'source' => $d->source,
                'relevance' => 1 - $d->distance, // Convert distance to similarity
            ])->toArray(),
        ];
    }

    /**
     * Generate answer using context
     */
    private function generate(string $question, string $context): string
    {
        $systemPrompt = <<<PROMPT
You are a helpful assistant that answers questions based on the provided context.
Rules:
- Only use information from the provided context
- If the context doesn't contain the answer, say "I don't have information about that"
- Cite sources when possible
- Be concise and direct
PROMPT;

        $userPrompt = <<<PROMPT
Context:
{$context}

Question: {$question}

Answer based on the context above:
PROMPT;

        $response = Http::withHeaders([
            'Authorization' => "Bearer {$this->openAiKey}",
        ])->post('https://api.openai.com/v1/chat/completions', [
            'model' => 'gpt-4-turbo-preview',
            'messages' => [
                ['role' => 'system', 'content' => $systemPrompt],
                ['role' => 'user', 'content' => $userPrompt],
            ],
            'temperature' => 0.7,
            'max_tokens' => 1000,
        ]);

        return $response->json('choices.0.message.content');
    }
}

API Endpoint

// routes/api.php
Route::post('/ask', function (Request $request, RAGService $rag) {
    $request->validate([
        'question' => 'required|string|max:1000',
    ]);

    $result = $rag->answer($request->question);

    return response()->json($result);
});

Advanced Techniques

Chunking Long Documents

Large documents need to be split into chunks:

// app/Services/DocumentChunker.php
class DocumentChunker
{
    public function chunk(string $content, int $maxTokens = 500, int $overlap = 50): array
    {
        $sentences = preg_split('/(?<=[.!?])\s+/', $content);
        $chunks = [];
        $currentChunk = [];
        $currentLength = 0;

        foreach ($sentences as $sentence) {
            $sentenceLength = $this->estimateTokens($sentence);

            if ($currentLength + $sentenceLength > $maxTokens && !empty($currentChunk)) {
                $chunks[] = implode(' ', $currentChunk);

                // Keep overlap sentences
                $overlapSentences = array_slice($currentChunk, -2);
                $currentChunk = $overlapSentences;
                $currentLength = array_sum(array_map([$this, 'estimateTokens'], $currentChunk));
            }

            $currentChunk[] = $sentence;
            $currentLength += $sentenceLength;
        }

        if (!empty($currentChunk)) {
            $chunks[] = implode(' ', $currentChunk);
        }

        return $chunks;
    }

    private function estimateTokens(string $text): int
    {
        return (int) ceil(strlen($text) / 4);
    }
}

Combine vector similarity with keyword search:

public static function hybridSearch(string $query, array $embedding, int $limit = 5): Collection
{
    $vectorString = '[' . implode(',', $embedding) . ']';

    return static::select('*')
        ->selectRaw('embedding <=> ? as vector_distance', [$vectorString])
        ->selectRaw('ts_rank(to_tsvector(content), plainto_tsquery(?)) as text_rank', [$query])
        ->selectRaw('(0.7 * (1 - (embedding <=> ?)) + 0.3 * ts_rank(to_tsvector(content), plainto_tsquery(?))) as combined_score',
            [$vectorString, $query])
        ->orderByDesc('combined_score')
        ->limit($limit)
        ->get();
}

Metadata Filtering

Filter by metadata before vector search:

public static function similarWithFilters(array $embedding, array $filters, int $limit = 5): Collection
{
    $vectorString = '[' . implode(',', $embedding) . ']';

    $query = static::select('*')
        ->selectRaw('embedding <=> ? as distance', [$vectorString]);

    // Apply filters
    if (isset($filters['source'])) {
        $query->where('source', $filters['source']);
    }

    if (isset($filters['created_after'])) {
        $query->where('created_at', '>=', $filters['created_after']);
    }

    return $query
        ->orderByRaw('embedding <=> ?', [$vectorString])
        ->limit($limit)
        ->get();
}

Performance Tips

1. Use IVFFlat Index

For large datasets, IVFFlat dramatically speeds up search:

-- Adjust 'lists' based on dataset size (sqrt of row count is a good start)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

2. Batch Embeddings

Always batch embedding requests to reduce API calls:

// Instead of this:
foreach ($documents as $doc) {
    $doc->embedding = $embeddings->embed($doc->content);
}

// Do this:
$texts = $documents->pluck('content')->toArray();
$vectors = $embeddings->embedBatch($texts);
foreach ($documents as $i => $doc) {
    $doc->embedding = $vectors[$i];
}

3. Cache Common Queries

public function answer(string $question): array
{
    $cacheKey = 'rag:' . md5($question);

    return Cache::remember($cacheKey, 3600, function () use ($question) {
        return $this->performRAG($question);
    });
}

4. Use HNSW for Very Large Datasets

For millions of vectors, HNSW index is faster:

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

Evaluation

Track RAG quality with these metrics:

// Log every query for analysis
RAGQuery::create([
    'question' => $question,
    'answer' => $response,
    'sources' => $documents->pluck('id'),
    'latency_ms' => $latency,
    'user_feedback' => null, // Collect later
]);

Conclusion

RAG with Laravel and pgvector is surprisingly straightforward. You get:

  • Semantic search without a separate vector database
  • Full ACID compliance with your existing data
  • Familiar Laravel patterns and tooling

The key insights:

  1. Chunk documents appropriately - 500-1000 tokens works well
  2. Quality context > quantity - 3-5 relevant docs beats 20 mediocre ones
  3. Prompt engineering matters - Clear system prompts reduce hallucination
  4. Monitor and iterate - Log queries and gather feedback

Start simple, measure results, and iterate. RAG doesn't have to be complicated.


Building AI features into Laravel? Let's talk - I've shipped several AI-powered applications and happy to share more.

#Laravel #AI #pgvector #OpenAI #RAG
Share: