Introduction to GiST
We wrote a paper GiST programming tutorial,
Hellerstein et al. [HNP95] introduced an index structure, called a
Generalized Search Tree (GiST), which is a generalized form of an
The GiST provides a possibility to create custom data types
with indexed access methods and extensible set of queries
for specific domain experts not a database one.
GiST was implemented in an early version of PostgreSQL by
J. Hellerstein and P.Aoki, more details is available from
"The GiST Indexing Project" (http://gist.cs.berkeley.edu/)
at Berkeley. As an "university" project it has a limited
number of features and was in rare use. Since 7.1 version
of PostgreSQL the GiST was taken up by Oleg Bartunov and
Teodor Sigaev, see "GiST development" (http://www.sai.msu.su/~postgres/gist)
for information about current development.
Current implementation of GiST supports:
There are several contrib modules developed using GiST and distributed with
PostgreSQL. For example:
- Variable length keys
- Composite keys (multi-key)
- provides NULL-safe interface to GiST core
GiST is used in several public projects, like PosGIS project,
which adds support for geographic objects to the PostgreSQL,
OpenFTS - Open Source Full Text Search engine,
which provides online indexing of data and relevance ranking for database searching.
- rtree_gist, btree_gist - GiST implementation of R-Tree and B-Tree
- intarray - index support for one-dimensional array of int4's
- tsearch - a searchable (full text) data type with indexed access
- ltree - data types, indexed access methods and queries for data organized as a tree-like structures
- cube - data type, representing multidimensional cubes
GiST is a height-balanced tree structure with tree nodes containing
(p, ptr) pairs, where
p is a predicate
(attribute),which is used as a
search key, and
ptr is the pointer to data for a leaf node or
a pointer to another tree node for a non-leaf node.
GiST has the following properties:
- Every node contains between
max index entries unless it is the root
- For each index entry
(p,ptr) in a leaf node,
p is true when
instantiated with the values from the indicated tuple
p holds for the tuple).
- For each index entry
(p, ptr) in a non-leaf node,
p is true when instantiated with the values of any tuple
- The root has at least two children unless it is a leaf
- All leaves appear on the same level
In principle, the keys of a GiST may be any arbitrary predicates that hold for
each datum below the key. In practice, the keys come from a user-defined object
class, which provides a set of six methods required by the GiST:
- given an entry
E = (p, ptr), and a query predicate
p ^ q can be guaranteed unsatisfiable, and true otherwise.
- Union(P ):
- given a set P of entries
(p1,ptr1), ... (pn,ptrn), returns
r that holds for all tuples stored below
- given an entry
E = (p, ptr), returns an entry
pp is a compressed representation of
- given a compressed representation
E = (pp, ptr), returns an
(r, ptr) where
r is a decompressed representation
- given two entries
E1 = (p1, ptr1), E2 = (p2, ptr2), returns
a domain specific penalty for inserting
E2 into the subtree
E1, which is used to aid the splitting process of the
- PickSplit(P ):
- given a set
into two sets of entries
P2, each of size
kM , where
is the minimum fill factor.
The three methods of GiST provide algorithms for search, insertion and
can be used to search any dataset with any query predicate by traversing
as much of the tree as necessary to satisfy the query.
The general search algorithm is controlled by
the user-specified Consistent method, which returns 'TRUE'
if the predicate is satisfiable and 'FALSE' in vice versa.
The same Consistent method applies to both index node and
leaf node. But usually the satisfiable conditions are different for the index
node and leaf node, no matter if it is an exact matching search or a range/window
query. So further checking is needed after leaf entries are fetched according to
the algorithm. Another restriction is that only one type of query can be
conducted in one program because function Consistent is used in the search instead
of a pointer to a function.
- guarantees that the GiST remains balanced.
User defined key method Penalty is used for choosing a subtree to
insert; method PickSplit is used for the node splitting algorithm; method Union is
used for propagating changes upward to maintain the tree properties.
- determines a leaf with specified key and removes
(p, ptr) and, if required, maintains the
balance of the tree using method Union (as in INSERT).
- [Gut84] Antonin Guttman. R-trees: a dynamic index structure for spatial searching. In ACM SIGMOD International Conference on Management of Data,
pages 47-54, 1984.
[HNP95] J. M. Hellerstein, J. F. Naughton, and Avi Pfeffer. Generalized search
trees for database systems. In Proceedings of the 21st International Conference
on Very Large Data Bases, Zurich, Switzerland, 1995.