|Full-Text Search in PostgreSQL: A Gentle Introduction|
|Prev||Fast Backward||Chapter 2. FTS Operators and Functions||Fast Forward||Next|
to_tsvector( [configuration,] document TEXT) RETURNS TSVECTOR
Parses a document into tokens, reduces the tokens to lexemes, and returns a tsvector which lists the lexemes together with their positions in the document in lexicographic order.
strip(vector TSVECTOR) RETURNS TSVECTOR
Return a vector which lists the same lexemes as the given vector, but which lacks any information about where in the document each lexeme appeared. While the returned vector is thus useless for relevance ranking, it will usually be much smaller.
setweight(vector TSVECTOR, letter) RETURNS TSVECTOR
This function returns a copy of the input vector in which every location has been labeled with either the letter 'A', 'B', or 'C', or the default label 'D' (which is the default with which new vectors are created, and as such is usually not displayed). These labels are retained when vectors are concatenated, allowing words from different parts of a document to be weighted differently by ranking functions.
vector1 || vector2 concat(vector1 TSVECTOR, vector2 TSVECTOR) RETURNS TSVECTOR
Returns a vector which combines the lexemes and position information
in the two vectors given as arguments. Position weight labels (described in
the previous paragraph) are retained intact during the concatenation.
This has at least two uses. First, if some sections of your document need be
parsed with different configurations than others, you can parse them
separately and concatenate the resulting vectors into one.
Second, you can weight words from some sections of you document more
heavily than those from others by: parsing the sections into separate vectors;
assigning the vectors different position labels with the
setweight() function; concatenating them into a
single vector; and then providing a weights argument to the
rank() function that assigns different weights to
positions with different labels.
length(vector TSVECTOR) RETURNS INT4
Returns the number of lexemes stored in the vector.
text::TSVECTOR RETURNS TSVECTOR
Directly casting text to a tsvector allows you to directly inject lexemes into a vector, with whatever positions and position weights you choose to specify. The text should be formatted like the vector would be printed by the output of a SELECT.
tsearch(vector_column_name[, (my_filter_name | text_column_name1) [...] ], text_column_nameN)
tsearch() trigger used to automatically update
my_filter_name is the function name to preprocess
There are can be many functions and text columns specified in
The following rule used: function applied to all subsequent text columns
until next function occurs.
dropatsymbol replaces all entries of
@ sign by space.
CREATE FUNCTION dropatsymbol(text) RETURNS text AS 'select replace($1, ''@'', '' '');' LANGUAGE SQL; CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages FOR EACH ROW EXECUTE PROCEDURE tsearch(tsvector_column,dropatsymbol, strMessage);
stat(sqlquery text [, weight text ]) RETURNS SETOF statinfo
Here statinfo is a type, defined as
CREATE TYPE statinfo as (word text, ndoc int4, nentry int4);
and sqlquery is a query, which returns column tsvector. This returns statistics (the number of documents ndoc and total number nentry of word in the collection) about column vectortsvector. Useful to check how good is your configuration and to find stop-words candidates.For example, find top 10 most frequent words:
=# select * from stat('select vector from apod') order by ndoc desc, nentry desc,word limit 10;
Optionally, one can specify weight to obtain statistics about words with specific weight.
=# select * from stat('select vector from apod','a') order by ndoc desc, nentry desc,word limit 10;
TSVECTOR < TSVECTOR TSVECTOR <= TSVECTOR TSVECTOR = TSVECTOR TSVECTOR >= TSVECTOR TSVECTOR > TSVECTOR
All btree operations defined for tsvector type. tsvectors compares with each other using lexicographical order.