GinTestUpdate

Testing GIN update performance

For integer arrays GIN is 10x slower than GiST on update. It's about 10x faster than GiST on select. Updating of 100 rows in 100,000 takes < 0.5s for GIN index, which is quite reasonable speed.

Test environment

Linux 2.6.19.1, Slackware, x86, Intel(R) Pentium(R) 4 CPU 2.40GHz, 2Gb RAM.

PostgreSQL 8.2.4

  • shared_buffers = 256MB
  • work_mem = 8MB
  • maintenance_work_mem = 64MB
  • effective_cache_size = 256MB

Test table tt(id integer, a int[]) with GIN index created

test=# \d tt 
       Table "public.tt"
 Column |   Type    | Modifiers 
--------+-----------+-----------
 id     | integer   | 
 a      | integer[] | 
Indexes:
    "gin_idx" gin (a)

For creating GiST index

create index gist_idx on tt using gist(a gist__intbig_ops);

Load data

Example of populating a table - 10000 rows, with integer arrays cardinality 500 and maximal array length of 100.

sh ./test-id.sh  -l 100 -n 10000 -v 500 | psql test -c "copy tt from stdin with delimiter as '|'"

Example of update SQL - update first 300 rows

update tt set a=array( select(1000*random())::integer+generate_series( 0,200+id) ) 
where id < 300;

Results

Update array column by random array of length 200, vary #rows ( All timings in ms !)

Low cardinality (500)

-l 100 -n 10000 -v 500

Create index:

  • GIN 2804.709
  • GiST 1458.526
  • #updates, GIN, GiST
  • 10 - 28.916, 21.273
  • 100 - 453.513, 28.504
  • 1000 - 2824.042, 159.474
  • 1000 - 27909.337, 2092.837

-l 100 -n 100000 -v 500

Create index:

  • GIN 7669.875
  • GiST 1458.526
  • 10 - 94.637, 54.870
  • 100 - 473.569, 69.470
  • 1000 - 2407.273, 243.742
  • 10000 - 30856.287, 2155.403

Big cardinality (5000)

-l 100 -n 10000 -v 5000

Create index:

  • GIN 2303.915
  • GiST 1167.952
  • #updates, GIN, GiST
  • 10 - 40.031, 17.197
  • 100 - 194.322, 26.901
  • 1000 - 2265.735, 172.825
  • 1000 - 32416.623, 2054.121

-l 100 -n 100000 -v 5000

Create index (real):

  • GIN 9746.955
  • GiST 13600.256
  • 10 - 158.897, 47.338
  • 100 - 250.516, 65.378
  • 1000 - 2871.439, 425.907
  • 10000 - 31537.869, 2538.468

Very Big cardinality (100000)

Simulate typical news archive - 100,000 documents with length < 500 words and dictionary size 100,000 words.

-l 500 -n 100000 -v 100000

Create index:

  • GIN 146765.140
  • GiST 12721.421
  • #updates, GIN, GiST
  • 10 - 160.569, 72.390
  • 100 - 543.285, 83.026
  • 1000 - 3702.067, 284.039
  • 1000 - 32571.510, 2969.296

Search performance:

explain analyze select * from tt where a @> '{2}';

GiST:

 Bitmap Heap Scan on tt  (cost=29.27..405.87 rows=100 width=979) (actual time=68.714..6311.757 rows=678 loops=1)
   Filter: (a @> '{2}'::integer[])
   ->  Bitmap Index Scan on gist_idx  (cost=0.00..29.24 rows=100 width=0) (actual time=43.430..43.430 rows=10774 loops=1)
         Index Cond: (a @> '{2}'::integer[])
 Total runtime: 6312.807 ms

GIN:

 Bitmap Heap Scan on tt  (cost=150.50..527.11 rows=100 width=982) (actual time=0.610..11.277 rows=678 loops=1)
   Recheck Cond: (a @> '{2}'::integer[])
   ->  Bitmap Index Scan on gin_idx  (cost=0.00..150.48 rows=100 width=0) (actual time=0.433..0.433 rows=678 loops=1)
         Index Cond: (a @> '{2}'::integer[])
 Total runtime: 11.892 ms

GiST ia a way slower than GIN index, because GiST found 10774 documents where only 678 documents are right answer and remaining are false drops !

Support script

Скрипт test-id.sh for generating test load. It's not optimized and could be horribly slow !

#!/bin/sh

# default values
# array length
MAXLEN=4
# number of rows
MAXNUM=10
# cardinality
MAXVAL=100

while getopts l:n:v: opt
do
        case "$opt" in
                l) MAXLEN="$OPTARG";;
                n) MAXNUM="$OPTARG";;
                v) MAXVAL="$OPTARG";;
        esac
done

for ((i=0;i<$MAXNUM;i++)) do
 len=$(( $RANDOM % $MAXLEN))+1
 for ((j=0;j<$len;j++)) do
         if [ $j == '0' ]; then
                val=$(( $RANDOM % $MAXVAL))
         else
                val=$val,$(( $RANDOM % $MAXVAL))
         fi
 done
 echo $i\|{$val}
done