Appendix B. FTS Parser Example

SQL command CREATE FULLTEXT PARSER creates a parser for full-text search. In our example we will implement a simple parser, which recognize space delimited words and has only two types (3, word, Word; 12, blank, Space symbols). Identifiers were chosen to keep compatibility with default headline(), since we won't implement our version.

To implement parser one need to realize minimum four functions (CREATE FULLTEXT PARSER).

START = start_function

Initialize the parser. Arguments are a pointer to the parsed text and its length.

Returns a pointer to the internal structure of a parser. Note, it should be malloced or palloced in TopMemoryContext. We name it ParserState.

GETTOKEN = gettoken_function

Returns the next token. Arguments are (ParserState *),(char **), (int *).

This procedure will be called so long as the procedure return token type = 0.

END = end_function,

Void function, will be called after parsing is finished. We have to free our allocated resources in this procedure (ParserState). Argument is (ParserState *).

LEXTYPES = lextypes_function

Returns an array containing the id, alias and the description of the tokens of our parser. See LexDescr in src/include/utils/ts_public.h

Source code of our test parser, organized as a contrib module, available in the next section.

Testing:

=# SELECT * FROM parse('testparser','That''s my first own parser');
 tokid | token
-------+--------
     3 | That's
    12 |
     3 | my
    12 |
     3 | first
    12 |
     3 | own
    12 |
     3 | parser
=# SELECT to_tsvector('testcfg','That''s my first own parser');
                   to_tsvector
-------------------------------------------------
 'my':2 'own':4 'first':3 'parser':5 'that''s':1
=# SELECT headline('testcfg','Supernovae stars are the brightest phenomena in galaxies', to_tsquery('testcfg', 'star'));
                            headline
-----------------------------------------------------------------
 Supernovae <b>stars</b> are the brightest phenomena in galaxies