Difference between revision 4 and current revision
No diff available.For integer arrays GIN is 10x slower than GiST on update. It's about 10x faster than GiST on select. Updating of 100 rows in 100,000 takes < 0.5s for GIN index, which is quite reasonable speed.
Linux 2.6.19.1, Slackware, x86, Intel(R) Pentium(R) 4 CPU 2.40GHz, 2Gb RAM.
PostgreSQL 8.2.4
Test table tt(id integer, a int[]) with GIN index created
test=# \d tt Table "public.tt" Column | Type | Modifiers --------+-----------+----------- id | integer | a | integer[] | Indexes: "gin_idx" gin (a)
For creating GiST index
create index gist_idx on tt using gist(a gist__intbig_ops);
Example of populating a table - 10000 rows, with integer arrays cardinality 500 and maximal array length of 100.
sh ./test-id.sh -l 100 -n 10000 -v 500 | psql test -c "copy tt from stdin with delimiter as '|'"
Example of update SQL - update first 300 rows
update tt set a=array( select(1000*random())::integer+generate_series( 0,200+id) ) where id < 300;
Update array column by random array of length 200, vary #rows ( All timings in ms !)
-l 100 -n 10000 -v 500
Create index:
-l 100 -n 100000 -v 500
Create index:
-l 100 -n 10000 -v 5000
Create index:
-l 100 -n 100000 -v 5000
Create index (real):
Simulate typical news archive - 100,000 documents with length < 500 words and dictionary size 100,000 words.
-l 500 -n 100000 -v 100000
Create index:
Search performance:
explain analyze select * from tt where a @> '{2}';
GiST:
Bitmap Heap Scan on tt (cost=29.27..405.87 rows=100 width=979) (actual time=68.714..6311.757 rows=678 loops=1) Filter: (a @> '{2}'::integer[]) -> Bitmap Index Scan on gist_idx (cost=0.00..29.24 rows=100 width=0) (actual time=43.430..43.430 rows=10774 loops=1) Index Cond: (a @> '{2}'::integer[]) Total runtime: 6312.807 ms
GIN:
Bitmap Heap Scan on tt (cost=150.50..527.11 rows=100 width=982) (actual time=0.610..11.277 rows=678 loops=1) Recheck Cond: (a @> '{2}'::integer[]) -> Bitmap Index Scan on gin_idx (cost=0.00..150.48 rows=100 width=0) (actual time=0.433..0.433 rows=678 loops=1) Index Cond: (a @> '{2}'::integer[]) Total runtime: 11.892 ms
GiST ia a way slower than GIN index, because GiST found 10774 documents where only 678 documents are right answer and remaining are false drops !
Скрипт test-id.sh for generating test load. It's not optimized and could be horribly slow !
#!/bin/sh # default values # array length MAXLEN=4 # number of rows MAXNUM=10 # cardinality MAXVAL=100 while getopts l:n:v: opt do case "$opt" in l) MAXLEN="$OPTARG";; n) MAXNUM="$OPTARG";; v) MAXVAL="$OPTARG";; esac done for ((i=0;i<$MAXNUM;i++)) do len=$(( $RANDOM % $MAXLEN))+1 for ((j=0;j<$len;j++)) do if [ $j == '0' ]; then val=$(( $RANDOM % $MAXVAL)) else val=$val,$(( $RANDOM % $MAXVAL)) fi done echo $i\|{$val} done