Full-Text Search in PostgreSQL: A Gentle Introduction
Prev	Fast Backward	Chapter 2. FTS Operators and Functions	Fast Forward	Next

2.2. Vector Operations

to_tsvector( [configuration,] document TEXT) RETURNS TSVECTOR

Parses a document into tokens, reduces the tokens to lexemes, and returns a tsvector which lists the lexemes together with their positions in the document in lexicographic order.

strip(vector TSVECTOR) RETURNS TSVECTOR

Return a vector which lists the same lexemes as the given vector, but which lacks any information about where in the document each lexeme appeared. While the returned vector is thus useless for relevance ranking, it will usually be much smaller.

setweight(vector TSVECTOR, letter) RETURNS TSVECTOR

This function returns a copy of the input vector in which every location has been labeled with either the letter 'A', 'B', or 'C', or the default label 'D' (which is the default with which new vectors are created, and as such is usually not displayed). These labels are retained when vectors are concatenated, allowing words from different parts of a document to be weighted differently by ranking functions.

vector1 || vector2 concat(vector1 TSVECTOR, vector2 TSVECTOR) RETURNS TSVECTOR

Returns a vector which combines the lexemes and position information in the two vectors given as arguments. Position weight labels (described in the previous paragraph) are retained intact during the concatenation. This has at least two uses. First, if some sections of your document need be parsed with different configurations than others, you can parse them separately and concatenate the resulting vectors into one. Second, you can weight words from some sections of you document more heavily than those from others by: parsing the sections into separate vectors; assigning the vectors different position labels with the setweight() function; concatenating them into a single vector; and then providing a weights argument to the rank() function that assigns different weights to positions with different labels.

length(vector TSVECTOR) RETURNS INT4

Returns the number of lexemes stored in the vector.

text::TSVECTOR RETURNS TSVECTOR

Directly casting text to a tsvector allows you to directly inject lexemes into a vector, with whatever positions and position weights you choose to specify. The text should be formatted like the vector would be printed by the output of a SELECT.

tsearch(vector_column_name[, (my_filter_name | text_column_name1) [...] ], text_column_nameN)

tsearch() trigger used to automatically update vector_column_name, my_filter_name is the function name to preprocess text_column_name. There are can be many functions and text columns specified in tsearch() trigger. The following rule used: function applied to all subsequent text columns until next function occurs. Example, function dropatsymbol replaces all entries of @ sign by space.

CREATE FUNCTION dropatsymbol(text) RETURNS text 
AS 'select replace($1, ''@'', '' '');'
LANGUAGE SQL;

CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT 
ON tblMessages FOR EACH ROW EXECUTE PROCEDURE 
tsearch(tsvector_column,dropatsymbol, strMessage);

stat(sqlquery text [, weight text ]) RETURNS SETOF statinfo

Here statinfo is a type, defined as

CREATE TYPE statinfo as (word text, ndoc int4, nentry int4);

and sqlquery is a query, which returns column tsvector. This returns statistics (the number of documents ndoc and total number nentry of word in the collection) about column vectortsvector. Useful to check how good is your configuration and to find stop-words candidates.For example, find top 10 most frequent words:

=# select * from stat('select vector from apod') order by ndoc desc, nentry desc,word limit 10;

Optionally, one can specify weight to obtain statistics about words with specific weight.

=# select * from stat('select vector from apod','a') order by ndoc desc, nentry desc,word limit 10;

TSVECTOR < TSVECTOR TSVECTOR <= TSVECTOR TSVECTOR = TSVECTOR TSVECTOR >= TSVECTOR TSVECTOR > TSVECTOR

All btree operations defined for tsvector type. tsvectors compares with each other using lexicographical order.

Prev	Home	Next
FTS operator	Up	Query Operations