Postgres is All You Need
Postgres is All You Need: Building a RAG System Without the Hype
Vector databases are all the rage, promising powerful semantic search
capabilities for modern AI applications. But what if you could achieve
similar results using a tool many developers already know and trust?
This post explores how to build a robust, production-ready, and
lightweight Retrieval-Augmented Generation (RAG) system using
PostgreSQL enhanced with the pgvector
extension.
A companion repository demonstrating these concepts is available at: https://github.com/dannybellion/postgres-is-all-you-need
Why Choose PostgreSQL for RAG?
While specialized vector databases offer advanced features, leveraging PostgreSQL for your RAG needs provides compelling advantages, especially if Postgres is already part of your stack and you value simplicity and a lightweight approach:
- Simplicity: Work within a familiar database system, leveraging existing SQL knowledge and the mature PostgreSQL ecosystem. Avoid adding another complex system to your stack.
- Lightweight Integration:
pgvector
is an extension, not a separate database. Integrate vector search directly into your existing PostgreSQL setup with minimal overhead. - Cost-Effectiveness: Potentially avoid extra costs and infrastructure complexity associated with dedicated vector database services.
- Unified Data: Consolidate your application data, documents, and vector embeddings within a single database, simplifying architecture and data management.
- Powerful Querying: Seamlessly combine traditional SQL filtering (e.g., by date, user, metadata) with vector-based semantic search for highly relevant results, all within one query language.
- Production Ready: Build on PostgreSQL’s rock-solid foundation of reliability, scalability, ACID compliance, backup/restore tools, and extensive operational tooling. It’s a mature technology ready for demanding workloads.
The RAG Pipeline with PostgreSQL + pgvector
Building a RAG system generally involves several key steps. Here’s how
you can implement them using PostgreSQL and pgvector
, focusing on a
simple and lightweight philosophy:
-
Document Preparation:
- First, gather your source documents (text files, PDFs, database records, etc.).
- Break these documents into smaller, meaningful chunks. This is
vital for embedding context limits and retrieval relevance. For a
lightweight approach without pulling in large frameworks like
Langchain or LlamaIndex, a focused library like
chonkie
can handle this efficiently.
-
Embedding Generation:
- Convert each text chunk into a numerical vector (embedding) using an embedding model (e.g., from OpenAI, Hugging Face, Cohere). These vectors capture the semantic essence of the text.
- This often involves calling a model provider’s API. Here’s a basic
example using OpenAI (error handling omitted):
from openai import OpenAI client = OpenAI() def get_embedding(text, model="text-embedding-3-small"): text = text.replace("\n", " ") return client.embeddings.create(input = [text], model=model).data[0].embedding
-
Storing Vectors in PostgreSQL:
- Ensure the
pgvector
extension is installed in your target production-ready PostgreSQL database:CREATE EXTENSION IF NOT EXISTS vector;
- Create a table to store your text chunks and their corresponding
embeddings. The key is adding a
vector
type column:CREATE TABLE items ( id bigserial PRIMARY KEY, content text, embedding vector(1536) -- Dimension depends on your embedding model );
- Insert your text chunks and their generated embeddings into this table.
- Ensure the
-
Creating Indexes for Efficient Search:
- To ensure fast lookups in a production environment, create an
index on the vector column.
pgvector
supports efficient indexes like IVFFlat and HNSW (Hierarchical Navigable Small World).-- Example using IVFFlat (adjust parameters based on dataset size) CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Or using HNSW (often better for high-dimensional data & recall) -- CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
- To ensure fast lookups in a production environment, create an
index on the vector column.
-
Performing Semantic Search:
- Generate an embedding for the user’s query using the same model.
- Use
pgvector
’s distance operators (<=>
for cosine distance,<#>
for L2 distance,<->
for inner product) within a standard SQL query to find the most similar stored vectors. - A typical query to find the top 5 most similar items:
(Note: Always use parameterized queries in production code to prevent SQL injection.)
SELECT id, content FROM items ORDER BY embedding <=> %(query_embedding)s -- Use parameter binding! LIMIT 5;
-
Generating the Response:
- Take the text content (
content
) retrieved from the search results. - Provide this retrieved context, along with the original user query, to a Large Language Model (LLM).
- The LLM uses the context to generate a relevant answer. To maintain a
lightweight and simple approach, you can structure this LLM
call using a library like
PydanticAI
, which leverages Pydantic models for clear input/output definition without the overhead of larger agentic frameworks.
- Take the text content (
Trying it Out
Want to see this simple, lightweight, production-focused approach in
action? The companion repository
(https://github.com/dannybellion/postgres-is-all-you-need)
provides a practical implementation. Clone it, follow the setup in the
README.md
, and experiment with loading and querying documents.
Conclusion
While dedicated vector databases certainly have their merits, don’t
overlook the capabilities of PostgreSQL enhanced with pgvector
. For many
applications, especially those already using PostgreSQL, it offers a
powerful, integrated, lightweight, and production-ready path to
building sophisticated RAG systems. By combining familiar SQL with vector
search and choosing focused libraries for tasks like chunking and LLM
interaction, you can create efficient and maintainable AI-driven features
without unnecessary complexity.