2007-04-26

testing gin on TEXT

Found pretty bad time of creating gin index :) CVS HEAD with non-modified postgresql.conf (tried to increase maintenance_work_mem tp 64Mb, but results stay the same).

words=# \d w
           Table "public.w"
 Column |       Type       | Modifiers 
--------+------------------+-----------
 word   | text             | 
 f1     | double precision | 
 f2     | double precision | 
words=# show tsearch_conf_name;
 tsearch_conf_name 
-------------------
 public.test
words=# \dF+ test
Configuration "public.test"
Parser name: "pg_catalog.default"
Locale: 'C'
    Token     |   Dictionaries    
--------------+-------------------
 hword        | pg_catalog.simple
 lhword       | pg_catalog.simple
 lpart_hword  | pg_catalog.simple
 lword        | pg_catalog.simple
 nlhword      | pg_catalog.simple
 nlpart_hword | pg_catalog.simple
 nlword       | pg_catalog.simple
 part_hword   | pg_catalog.simple
 word         | pg_catalog.simple
words=# create index gin_idx on w using gin(word);
CREATE INDEX
Time: 147324.213 ms
words=# create index bt_idx on w(word);
CREATE INDEX
Time: 26284.108 ms

Looks like it's simple dictionary, since for default

words=# show tsearch_conf_name;
    tsearch_conf_name    
-------------------------
 pg_catalog.russian_utf8
(1 row)

Time: 1.117 ms
words=# create index gin_idx on w using gin(word);
CREATE INDEX
Time: 70558.607 ms

In this configuration snowball stemmer is used. Index creation time is better, but still 3 times bigger than plain btree index. Also, index size is:

words=# select pg_relation_size('w');
 pg_relation_size 
------------------
         44990464
words=# select pg_relation_size('bt_idx');
 pg_relation_size 
------------------
         23420928
words=# select pg_relation_size('gin_idx');
 pg_relation_size 
------------------
         32841728