PostgreSQL – TSVECTOR and TSQUERY: Supporting full-text search, vectors, and queries

Full-text search is an invaluable feature for many applications, allowing users to search documents and databases based on complex strings of text, rather than simple matching queries. In PostgreSQL, full-text search capabilities are robust and efficient, thanks to two primary PostgreSQL types: TSVECTOR and TSQUERY. These types allow you to index, query, and rank text according to its linguistic relevance to a search query, providing comprehensive search functionality directly within the SQL environment.

## Understanding TSVECTOR

TSVECTOR stands for “text search vector.” It is a data type in PostgreSQL designed to represent document content in a way optimized for text search. A TSVECTOR value is essentially a sorted list of distinct lexemes, which are words that have been normalized to simplify matching, typically lowercased, and stripped of suffixes. Each lexeme can optionally have weights and positions associated with it, which can be used to rank search results according to relevance.

### How to Create TSVECTORs

You can create a TSVECTOR in PostgreSQL using the to_tsvector function. This function takes a text string and an optional configuration parameter that specifies the language dictionary to be used for lexeme normalization.


SELECT to_tsvector('english', 'A fat cat sat on a mat and ate a fat rat.');

Output:


'theat' 'rat' 'mat' 'fat':1,8 'cat' 'sat' 'on' 'a' 'and' 'ate'

As shown, common words (stop words) like “a” and “on” are omitted in the output since they carry less meaning in full-text searches and can clutter the vector unnecessary.

### Indexing TSVECTOR

For performance reasons, especially with large datasets, it is recommended to index TSVECTOR data. GIN (Generalized Inverted Index) indexes are often used because they efficiently handle queries that test for the presence of list items within indexed items.


CREATE INDEX document_idx ON documents USING GIN (content);

Here, `content` is a column of type TSVECTOR. This index can substantially speed up full-text search queries on the `documents` table.

## Understanding TSQUERY

TSQUERY, or “text search query,” represents a query that can be run against a TSVECTOR. Like TSVECTOR, TSQUERY allows for sophisticated matching operations, including logical and proximity operations between lexemes.

### Constructing TSQUERY Expressions

Queries in PostgreSQL’s full-text search can include simple word queries, multiple words, and even complex phrases with logical operators. The to_tsquery function helps in constructing these TSQUERY objects:


SELECT to_tsquery('english', 'fat & rat');

Output:


'fat' & 'rat'

This query matches documents that contain both “fat” and “rat”. Logical operators like & (AND), | (OR), and ! (NOT) can be used to build more nuanced queries.

### Phrases and Proximity Searches in TSQUERY

Proximity searches allow you to search for words that occur close to each other in the text. The `<->` operator is used to find words that are adjacent, and you can specify the maximum distance between words using the `` operator, where N is a number. For instance:


SELECT to_tsquery('fat <-> rat');

Output:


'fat' <-> 'rat'

This query matches documents where “fat” and “rat” appear next to each other.

## Using TSVECTOR and TSQUERY Together

To perform a full-text search, you use a TSVECTOR and a TSQUERY together with the @@ operator, which returns true if the TSQUERY matches the TSVECTOR.


SELECT 'A fat cat sat on a mat and ate a fat rat.'::tsvector @@ 'fat & rat'::tsquery AS match;

Output:


true

This example demonstrates the document matching the query because it contains both “fat” and “rat”.

## Best Practices and Tips

### Weights and Rankings

TSVECTOR supports assigning weights (A, B, C, D) to lexemes, which can influence the ranking of search results:


SELECT setweight(to_tsvector('simple', 'important Word'), 'A') || 
       setweight(to_tsvector('simple', 'less important word'), 'D') as weighted_vector;

Weights are useful when certain parts of a document are more valuable than others, such as titles or headings compared to body text.

### Combining Full-Text Search with Other PostgreSQL Features

Combining full-text search with other PostgreSQL functionalities, like JSONB operations or geographic data queries, can provide powerful query capabilities. For instance, you could do a full-text search on user comments stored as JSONB while filtering results based on geographic proximity.

In summary, TSVECTOR and TSQUERY are powerful tools provided by PostgreSQL to implement efficient and effective full-text search within your applications. By understanding how to create, manipulate, and index these objects, and how they interact through search queries, you can build sophisticated text-search capabilities directly in your database.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top