1.2. FTS Overview

Text search operators in database existed for years. PostgreSQL has ~,~*, LIKE, ILIKE operators for textual datatypes, but they lack many essential properties required for modern information system:

The improvements to the FTS came from the idea to preprocess document at index time to save time later, at a search stage. Preprocessing includes:

PostgreSQL is an extendable database, so it's natural to introduce a new data types (Section 1.2.1) tsvector for storing preprocessed document and tsquery for textual queries. Also, full-text search operator (FTS) @@ is defined for these data types (Section 1.2.2). FTS operator can be accelerated using indices (Section 2.7).

1.2.1. Tsquery and tsvector


tsvector is a data type, which represents document, and optimized for FTS. In simple phrase, tsvector is a sorted list of lexemes, so even without index support full text search should performs better than standard ~,LIKE operators.

=# select 'a fat cat sat on a mat and ate a fat rat'::tsvector;
 'a' 'on' 'and' 'ate' 'cat' 'fat' 'mat' 'rat' 'sat'

Notice, that space is also lexeme !

=# select 'space ''    '' is a lexeme'::tsvector;
 'a' 'is' '    ' 'space' 'lexeme'

Each lexeme, optionally, could have positional information, which used for proximity ranking.

=# select 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector;
 'a':1,6,10 'on':5 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12 'sat':4

Each position of a lexeme can be labeled by one of 'A','B','C','D', where 'D' is default. These labels can be used to indicate group membership of lexeme with different importance or rank, for example, reflect document structure. Actually, labels are just a way to differentiate lexemes. Actual values will be assigned at search time and used for calculation of document rank. This is very convenient to control and tune search machine.

Concatenation operator - tsvector || tsvector "constructs" document from several parts. The order is important if tsvector contains positional information. Of course, using SQL join operator, it is possible to "build" document using different tables.

=# select 'fat:1 cat:2'::tsvector || 'fat:1 rat:2'::tsvector;
 'cat':2 'fat':1,3 'rat':4
=# select 'fat:1 rat:2'::tsvector || 'fat:1 cat:2'::tsvector;
 'cat':4 'fat':1,3 'rat':2


Tsquery is a data type for textual queries with support of boolean operators - & (AND), | (OR), parenthesis. Tsquery consists of lexemes (optionally labeled by letter[s]) with boolean operators between.

=# select 'fat & cat'::tsquery;
 'fat' & 'cat'
=# select 'fat:ab & cat'::tsquery;
 'fat':AB & 'cat'

Labels could be used to restrict search region, which allows to develop different search engines using the same full text index.

tsqueries could be concatenated using && (AND-ed) and || (OR-ed) operators.

test=# select 'a & b'::tsquery && 'c|d'::tsquery;
 'a' & 'b' & ( 'c' | 'd' )
test=# select 'a & b'::tsquery || 'c|d'::tsquery;
 'a' & 'b' | ( 'c' | 'd' )

1.2.2. FTS operator

FTS in PostgreSQL provides operator @@ for the two data types - tsquery and tsvector, which represents, correspondingly, document and query. Also, FTS operator has support of TEXT,VARCHAR data types, which allows to setup simple full-text search, but without ranking support.

tsvector @@ tsquery 
tsquery  @@ tsvector
text|varchar @@ text|tsquery

Full text search operator @@ returns TRUE if tsvector contains tsquery.

=# select 'cat & rat':: tsquery @@ 'a fat cat sat on a mat and ate a fat rat'::tsvector;
=# select 'fat & cow':: tsquery @@ 'a fat cat sat on a mat and ate a fat rat'::tsvector;