tprewriter

tprewriter - sophisticated query rewriter for tsearch2

This module was designed specially for travelpost project. Design notes of full text search are available from tp search en.

FUNCTIONS:

select tp_rewrite (query::tsquery, keyword::tsquery)
select tp_rewrite (ARRAY[query::tsquery, test.keyword] ) from test
select tp_rewrite (query::tsquery,'select test.keyword from test'::text);
        
        - Here, column test.keyword is of type tsquery.
  
        - Rewrite original query by changing '&' to '|'. 
          Integrity of complex keyword is preserved by using '&' to 
          minimize possible noise.

example: 

Rewrite query 'new & york & hotel' and preserve 'new & york'
regression=# select tp_rewrite(to_tsquery('new&york&hotel'),to_tsquery('new&york'));
        tp_rewrite        
--------------------------
 'hotel' | 'york' & 'new'


regression=# select tp_rewrite( ARRAY['new&york&hotel'::tsquery, test.keyword] )
 from test;
        tp_rewrite        
--------------------------
 'hotel' | 'york' & 'new'

regression=# select tp_rewrite( 'new&york&hotel'::tsquery,'select test.keyword from test'::text);
        tp_rewrite        
--------------------------
 'hotel' | 'york' & 'new'


Note:

        tp_rewrite  supports multiple keywords, but result depends on the 
        order of keywords.

         Intersected keywords:
        'new & york & city' - original query,
        new & york', 'new & city'  - keywords, 
        then tp_rewrite could produce two variants:
                'new & york | city' 
                'new | york & city' 

        Embedded keywords:
        'new & york & city & hotel'  - original query,
        'new & york & city', 'new & city'  - keywords,
        then tp_rewrite could produce two variants:
                'new & york & city | hotel'
                'new & city | york | hotel'

---------------------------------------------------------------------------

tp_substitute(query::tsquery,target::tsquery,sample::tsquery)
tp_substitute(ARRAY[query::tsquery,target::tsquery,sample::tsquery]) from test;
tp_substitute(query::tsquery, 'select test.target,test.sample from test');

        - Here,  columns test.target and test.sample are of type tsquery.
        - substitute part of original query (sample) by specified target.
                
example:

Expand city name by its aliases.

regression=# select tp_substitute('new & york & hotel','new & york', 'nyc|gotham
|big& apple');
                    tp_substitute                     
------------------------------------------------------
 'hotel' & ( 'apple' & 'big' | ( 'nyc' | 'gotham' ) )


Note:

        tp_substitute doesn't supports embedded substitutions, 
        only first target will expands. For example, if we have aliases
        (in followin order!):
        'new & york & city' ->  'new & york | nyc' and 
        'new & york'    -> 'gotham', 
        then query 'new & york & city & hotel'  
        will be '(new & york | nyc) & hotel'

---------------------------------------------------------------------------
tp_rewrite_substitute(query::tsquery,target::tsquery,sample::tsquery)
tp_rewrite_substitute(ARRAY[query::tsquery,target::tsquery,sample::tsquery]) from test;
tp_rewrite_substitute(query::tsquery,'select test.target,test.sample from test');

        - combines tp_substitute and tp_rewrite functions in this order.

example:

Expand city name by its aliases and preserve integrity of city name.

regression=# select tp_rewrite_substitute('new & york & hotel','new & york',
'nyc|gotham|big& apple');
                tp_rewrite_substitute                 
------------------------------------------------------
 'hotel' | ( 'apple' & 'big' | ( 'nyc' | 'gotham' ) )

regression=# select tp_rewrite_substitute(ARRAY['new & york & hotel', test.target, test.sample]) from test;
                   tp_rewrite_substitute                   
-----------------------------------------------------------
 'hotel' | ( 'nyc' | ( 'appl' & 'big' | 'york' & 'new' ) )


regression=# select tp_rewrite_substitute('select test.target, test.sample from test','new & york & hotel');
                   tp_rewrite_substitute                   
-----------------------------------------------------------
 'hotel' | ( 'nyc' | ( 'appl' & 'big' | 'york' & 'new' ) )


OPERATORS:

tsquery @ tsquery - TRUE if right agrument *might* contained in left argument
tsquery ~ tsquery - TRUE if left agrument *might* contained in right argument

        - these operators could be use to speedup tp_* functions filtering out
          non-candidate tuples from table with samples.

example:

Check tuples iff test.target contained in query.

regression=# select tp_rewrite_substitute(ARRAY[query, test.target, test.sample]) from test, 
             to_tsquery('new & york & hotel') as query where  
             query @ test.target;

                   tp_rewrite_substitute                   
-----------------------------------------------------------
 'hotel' | ( 'nyc' | ( 'appl' & 'big' | 'york' & 'new' ) )

note:

it's safe to use coalesce(tp_rewrite(ARRAY[...]),query), because 
tp_rewrite(ARRAY[...]) returns NULL if query @ test.target fail (nothing found).


INDEX SUPPORT:

It's possible to create index to speedup operators @, ~.

example:

create index sample_idx on test using gist(sample gist_tp_tsquery_ops);


Documentation TODO:

1 добавлена ф-ция
        int4 numnode( TSQUERY )
 Возвращает суммарное число операндов и операций (проще говоря - число узлов
в перставлении запроса как дерево)

2 операции < <= = <> >= >  (следует отметить, что меньше-больше отношения
для запросов величина странная и абстрактная, никакой реальной пользы из
этого извлень нельзя, кроме быстрого поиска на совпадение. Единственно, что
гарантируется - однозначность такого сравнения)
3 BTree index on tsquery