GinTest

Test results of Generalized Inverted Index

Gin stands for generalized inverted index (not a drink, but genie). See Gin for more information.

Testing of gin index

PostgreSQL 8.2dev, standard postgresql.conf. Notice, gin uses maintenance_work_mem to accumulate postings in memory during index creation. Default value is 16Mb, we used 64Mb.

Quick results: time to create index (bulk) is about the same for intarray and gin, while search using gin is about an order of magnitude faster than gist__intbig_ops !

Test data for table tt(a integer[]) were generated using RANDOM function for three variables:

  • MAXLEN - maximal array length
  • MAXVAL - cardinality of integer numbers
  • MAXNUM - the number of rows

Testing integer[]

100 500 100000 (gin,64Mb)

  • real 0m14.340s
  • table size:26312704, index size=33923072
=# explain analyze select * from tt where a @> '{2}';
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tt  (cost=6.35..349.75 rows=100 width=226) (actual time=6.529..42.784 rows=9371 loops=1)
   Recheck Cond: (a @> '{2}'::integer[])
   ->  Bitmap Index Scan on gin_idx  (cost=0.00..6.35 rows=100 width=0) (actual time=4.798..4.798 rows=9371 loops=1)
         Index Cond: (a @> '{2}'::integer[])
 Total runtime: 52.566 ms
(5 rows)
# select count(*) from tt where a @> '{2}';
 count
-------
  9371
(1 row)

100 50000 100000(gin,16Mb)

  • real 3m9.144s
  • table size:26361856, index size=286359552
  • time to insert 10 rows vary from 0.2s to 21s

100 50000 100000(gin,64Mb)

  • real 1m11.144s
  • table size:26361856, index size=286359552
  • time to insert 10 rows vary from 0.4s to 10s

100 500 100000 (intarray,gist__intbig_ops)

  • real 0m17.220s
  • table size:26312704, index size=33775616
=# explain analyze select * from tt where a @> '{2}';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tt  (cost=6.35..349.75 rows=100 width=227) (actual time=99.690..447.405 rows=9371 loops=1)
   Filter: (a @> '{2}'::integer[])
   ->  Bitmap Index Scan on int_idx  (cost=0.00..6.35 rows=100 width=0) (actual time=97.742..97.742 rows=9371 loops=1)
         Index Cond: (a @> '{2}'::integer[])
 Total runtime: 457.207 ms
(5 rows)

Time: 458.190 ms

Testing text[]

300,000 documents (108Mb) loaded for about 44s and indexed (bulk mode) - 62s(16Mb) and 66s(64Mb). Index size is 55Mb. Search is very fast:

# explain analyze select count(*) from aa where a @> '{oil}';
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1067.10..1067.11 rows=1 width=0) (actual time=304.686..304.687 rows=1 loops=1)
   ->  Bitmap Heap Scan on aa  (cost=9.05..1066.35 rows=300 width=0) (actual time=19.771..283.364 rows=19203 loops=1)
         Recheck Cond: (a @> '{oil}'::text[])
         ->  Bitmap Index Scan on txt_idx  (cost=0.00..9.05 rows=300 width=0) (actual time=11.705..11.705 rows=19203 loops=1)
               Index Cond: (a @> '{oil}'::text[])
 Total runtime: 304.855 ms
(6 rows)

Testing tsearch2

postgres=# \d titles
              Table "public.titles"
  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 date      | timestamp with time zone |
 title     | text                     |
 fts_index | tsvector                 |

postgres=# \timing
Timing is on.
postgres=# create index fts_idx on titles using gin(fts_index);
CREATE INDEX
Time: 236752.569 ms