2009-08-12

Algebra for full-text queries

We introduce phrase operator ?[n], or phrase conjuction operator, which is similar logical conjuction operator ( AND, &), but preserve order of operands (non-commutative) and constraint distance between them (<=n)

Logical conjuction operator (AND, &) is associative, commmutative, distributive, idempotent. In set theory intersection operator is an example of logical conjunction operator.

  • The ? operator is non-commutative, so 'A ? B' ≠ 'B ? A'
  • The ? operator is non-associative (left-associative) and evaluates from left to right.
=# select '1 ? 2 ? 3'::tsquery = '(1 ? 2) ? 3'::tsquery;
 ?column?
----------
 t

but

=# select '1 ? 2 ? 3'::tsquery = '1 ?  (2 ? 3)'::tsquery;
 ?column?
----------
 f

Function *phraseto_tsquery()* can be used for easy construction of phrase queries:

=# select phraseto_tsquery('1 2 3');
  phraseto_tsquery
---------------------
 ( '1' ? '2' ) ? '3'
  • The ? operator distributes across OR and AND:
=# select '1 ? ( 2 | 3)'::tsquery = '( 1 ? 2 ) | ( 1 ? 3 )'::tsquery;
 ?column?
----------
 t
=# select '1 ? ( 2 & 3)'::tsquery = '( 1 ? 2 ) & ( 1 ? 3 )'::tsquery;
 ?column?
----------
 t

'1 ? ( 2 & 3)'::tsquery looks like a problem, but consider situation when dictionary returns two lexems, so in tsvector they will have the same coodinates.

=# select '1:1 2:2 3:2'::tsvector  @@ '1 ? ( 2 & 3)'::tsquery;
 ?column?
----------
 t
  • The ? operator is non-idempotent, i.e. 'A ? A' ≠ 'A' ( not as AND: A & A ≡ A )
=# select '1 ? 1'::tsquery;
  tsquery
-----------
 '1' ? '1'

Compound word

=# CREATE TEXT SEARCH DICTIONARY nb_no_ispell ( TEMPLATE = ispell,
DictFile = nb_no, AffFile = nb_no );
=# select ts_lexize('nb_no_ispell', 'telefonsvarer');
          ts_lexize
------------------------------
 {telefonsvarer,telefon,svar}
=# CREATE TEXT SEARCH CONFIGURATION public.no ( COPY=pg_catalog.norwegian);
=# ALTER TEXT SEARCH CONFIGURATION  no ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,word,
hword, hword_part WITH nb_no_ispell, norwegian_stem;

=# select to_tsquery('no','telefonsvarer & device');
                     to_tsquery
----------------------------------------------------
 ( 'telefonsvarer' | 'telefon' & 'svar' ) & 'devic'
=# select to_tsvector('no','telefonsvarer  device');
                   to_tsvector
--------------------------------------------------
 'devic':2 'svar':1 'telefon':1 'telefonsvarer':1

Now, see how phraseto_tsquery works:

=# select phraseto_tsquery('no','telefonsvarer device');
                              phraseto_tsquery
----------------------------------------------------------------------------
 'telefonsvarer' ? 'devic' | ( 'telefon' ? 'devic' ) & ( 'svar' ? 'devic' )

Casting produce the same result:

=# select '(telefonsvarer | telefon & svar ) ? devic'::tsquery;
                                  tsquery
----------------------------------------------------------------------------
 'telefonsvarer' ? 'devic' | ( 'telefon' ? 'devic' ) & ( 'svar' ? 'devic' )

More complex phrase:

=# select phraseto_tsquery('no','telefonsvarer device ok');
                                              phraseto_tsquery
-------------------------------------------------------------------------------------------------------------
 ( 'telefonsvarer' ? 'devic' ) ? 'ok' | ( ( 'telefon' ? 'devic' ) ? 'ok' ) & ( ( 'svar' ? 'devic' ) ? 'ok' )

=# select '(telefonsvarer |  telefon & svar ) ? devic ? ok'::tsquery;
                                                   tsquery
-------------------------------------------------------------------------------------------------------------
 ( 'telefonsvarer' ? 'devic' ) ? 'ok' | ( ( 'telefon' ? 'devic' ) ? 'ok' ) & ( ( 'svar' ? 'devic' ) ? 'ok' )

(1 row)