Full-text search: when PostgreSQL is enough and when it isn't.

Adding search to an application usually starts with the simplest thing: a LIKE query. Then you discover LIKE '%term%' can’t use an index and scans the entire table. Then you learn about PostgreSQL’s full-text search. Then you realize you need fuzzy matching, facets, and sub-100ms response times. The question is: where does PostgreSQL’s full-text capability end?

PostgreSQL full-text search

PostgreSQL stores documents as tsvector — a list of normalized lexemes (stemmed words) with their positions. Queries are tsquery — a boolean expression of lexemes.

-- The simple approach: generate tsvector on the fly
SELECT title
FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'database & performance');

-- Better: store the tsvector in a generated column with a GIN index
ALTER TABLE articles
  ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))
  ) STORED;

CREATE INDEX articles_search_idx ON articles USING GIN(search_vector);

With the GIN index, full-text queries are fast even on large tables. The @@ operator checks if a document matches a query; ts_rank() provides relevance scoring.

SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'database & performance') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

What works well in PostgreSQL:

Boolean queries (AND, OR, NOT)
Prefix matching (web:* matches “web”, “webapp”, “website”)
Phrase search (phraseto_tsquery)
Relevance ranking with ts_rank
Highlighting matches with ts_headline
Simple faceting with GROUP BY on indexed columns

Where PostgreSQL falls short

Fuzzy matching / typo tolerance: to_tsquery('english', 'databse') returns no results. PostgreSQL full-text search operates on exact lexemes after stemming. You can combine it with pg_trgm for trigram similarity:

SELECT title
FROM articles
WHERE similarity(title, 'databse') > 0.3
   OR search_vector @@ to_tsquery('english', 'database');

But this gets complicated fast, and the trigram index is larger and slower than a GIN index for full-text.

Relevance tuning: PostgreSQL’s ts_rank is limited. You can’t easily boost certain fields, implement BM25 scoring, or tune relevance with custom signals like click-through rate.

Faceted search: Getting counts by category, tag, or date range while also filtering by them requires complex queries. PostgreSQL can do it, but performance degrades as the complexity grows.

Scale: GIN indexes are update-expensive. High write rates with concurrent search queries can cause contention. A dedicated search engine has write and search paths designed independently.

When to bring in a dedicated search engine

Elasticsearch / OpenSearch: Best-in-class for complex relevance tuning, large-scale aggregations, and autocomplete. Heavy operationally: you’re managing a JVM cluster with significant memory requirements. Appropriate for applications where search is a core product feature.

Typesense: A modern Elasticsearch alternative with built-in typo tolerance, easier operations (single binary, no JVM), and a simpler API. Good choice if you need fuzzy search and faceting without Elasticsearch’s operational overhead.

// Typesense: typo-tolerant search with facets
const results = await client.collections('articles').documents().search({
  q: 'databse performanc',   // typos handled automatically
  query_by: 'title,body',
  facet_by: 'category,tags',
  sort_by: '_text_match:desc',
  num_typos: 2,
});

Meilisearch: Similar to Typesense, optimized for developer experience and instant search UIs.

The practical decision

Use PostgreSQL full-text search when:

Your data is already in PostgreSQL and search is a secondary feature
You need simple keyword search without typo tolerance
You want to avoid another infrastructure component
Your dataset is under a few million rows

Move to a dedicated search engine when:

Typo tolerance is required
You need complex faceting and filtering
Search performance is business-critical
You’re doing autocomplete over large datasets

Many applications use both: PostgreSQL for the primary data store and application queries, with a search engine synced via a CDC (Change Data Capture) pipeline or background job for search-specific queries.