Postgres is All You Need: Building a RAG System Without the Hype

Vector databases are all the rage, promising powerful semantic search capabilities for modern AI applications. But what if you could achieve similar results using a tool many developers already know and trust? This post explores how to build a robust, production-ready, and lightweight Retrieval-Augmented Generation (RAG) system using PostgreSQL enhanced with the pgvector extension.

A companion repository demonstrating these concepts is available at: https://github.com/dannybellion/postgres-is-all-you-need

Why Choose PostgreSQL for RAG?

While specialized vector databases offer advanced features, leveraging PostgreSQL for your RAG needs provides compelling advantages, especially if Postgres is already part of your stack and you value simplicity and a lightweight approach:

Simplicity: Work within a familiar database system, leveraging existing SQL knowledge and the mature PostgreSQL ecosystem. Avoid adding another complex system to your stack.
Lightweight Integration: pgvector is an extension, not a separate database. Integrate vector search directly into your existing PostgreSQL setup with minimal overhead.
Cost-Effectiveness: Potentially avoid extra costs and infrastructure complexity associated with dedicated vector database services.
Unified Data: Consolidate your application data, documents, and vector embeddings within a single database, simplifying architecture and data management.
Powerful Querying: Seamlessly combine traditional SQL filtering (e.g., by date, user, metadata) with vector-based semantic search for highly relevant results, all within one query language.
Production Ready: Build on PostgreSQL’s rock-solid foundation of reliability, scalability, ACID compliance, backup/restore tools, and extensive operational tooling. It’s a mature technology ready for demanding workloads.

The RAG Pipeline with PostgreSQL + pgvector

Building a RAG system generally involves several key steps. Here’s how you can implement them using PostgreSQL and pgvector, focusing on a simple and lightweight philosophy:

Document Preparation:
- First, gather your source documents (text files, PDFs, database records, etc.).
- Break these documents into smaller, meaningful chunks. This is vital for embedding context limits and retrieval relevance. For a lightweight approach without pulling in large frameworks like Langchain or LlamaIndex, a focused library like chonkie can handle this efficiently.
Embedding Generation:
- Convert each text chunk into a numerical vector (embedding) using an embedding model (e.g., from OpenAI, Hugging Face, Cohere). These vectors capture the semantic essence of the text.
- This often involves calling a model provider’s API. Here’s a basic example using OpenAI (error handling omitted):
```
from openai import OpenAI
client = OpenAI()

def get_embedding(text, model="text-embedding-3-small"):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding
```
Storing Vectors in PostgreSQL:
- Ensure the pgvector extension is installed in your target production-ready PostgreSQL database:
```
CREATE EXTENSION IF NOT EXISTS vector;
```
- Create a table to store your text chunks and their corresponding embeddings. The key is adding a vector type column:
```
CREATE TABLE items (
    id bigserial PRIMARY KEY,
    content text,
    embedding vector(1536) -- Dimension depends on your embedding model
);
```
- Insert your text chunks and their generated embeddings into this table.

Creating Indexes for Efficient Search:

To ensure fast lookups in a production environment, create an index on the vector column. pgvector supports efficient indexes like IVFFlat and HNSW (Hierarchical Navigable Small World).

-- Example using IVFFlat (adjust parameters based on dataset size)
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Or using HNSW (often better for high-dimensional data & recall)
-- CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

Performing Semantic Search:
- Generate an embedding for the user’s query using the same model.
- Use pgvector’s distance operators (<=> for cosine distance, <#> for L2 distance, <-> for inner product) within a standard SQL query to find the most similar stored vectors.
- A typical query to find the top 5 most similar items:
```
SELECT id, content
FROM items
ORDER BY embedding <=> %(query_embedding)s -- Use parameter binding!
LIMIT 5;
```
  (Note: Always use parameterized queries in production code to prevent SQL injection.)
Generating the Response:
- Take the text content (content) retrieved from the search results.
- Provide this retrieved context, along with the original user query, to a Large Language Model (LLM).
- The LLM uses the context to generate a relevant answer. To maintain a lightweight and simple approach, you can structure this LLM call using a library like PydanticAI, which leverages Pydantic models for clear input/output definition without the overhead of larger agentic frameworks.

Trying it Out

Want to see this simple, lightweight, production-focused approach in action? The companion repository (https://github.com/dannybellion/postgres-is-all-you-need) provides a practical implementation. Clone it, follow the setup in the README.md, and experiment with loading and querying documents.

Conclusion

While dedicated vector databases certainly have their merits, don’t overlook the capabilities of PostgreSQL enhanced with pgvector. For many applications, especially those already using PostgreSQL, it offers a powerful, integrated, lightweight, and production-ready path to building sophisticated RAG systems. By combining familiar SQL with vector search and choosing focused libraries for tasks like chunking and LLM interaction, you can create efficient and maintainable AI-driven features without unnecessary complexity.