Update: We wrote a paper GiST programming tutorial, in Russian.
Hellerstein et al. [HNP95] introduced an index structure, called a Generalized Search Tree (GiST), which is a generalized form of an R-tree [Gut84]. The GiST provides a possibility to create custom data types with indexed access methods and extensible set of queries for specific domain experts not a database one.
GiST was implemented in an early version of PostgreSQL by J. Hellerstein and P.Aoki, more details is available from "The GiST Indexing Project" (http://gist.cs.berkeley.edu/) at Berkeley. As an "university" project it has a limited number of features and was in rare use. Since 7.1 version of PostgreSQL the GiST was taken up by Oleg Bartunov and Teodor Sigaev, see "GiST development" (http://www.sai.msu.su/~postgres/gist) for information about current development.
Current implementation of GiST supports:
(p, ptr) pairs, where p is a predicate
(attribute),which is used as a
search key, and ptr is the pointer to data for a leaf node or
a pointer to another tree node for a non-leaf node.
GiST has the following properties:
min and max index entries unless it is the root
(p,ptr) in a leaf node, p is true when
instantiated with the values from the indicated tuple
(i.e., p holds for the tuple).
(p, ptr) in a non-leaf node,
p is true when instantiated with the values of any tuple
reachable from ptr.
In principle, the keys of a GiST may be any arbitrary predicates that hold for each datum below the key. In practice, the keys come from a user-defined object class, which provides a set of six methods required by the GiST:
E = (p, ptr), and a query predicate q, returns
false if p ^ q can be guaranteed unsatisfiable, and true otherwise.
(p1,ptr1), ... (pn,ptrn), returns
some predicate r that holds for all tuples stored below ptr1
through ptrn.
E = (p, ptr), returns an entry (pp, ptr)
where pp is a compressed representation of p.
E = (pp, ptr), returns an
entry (r, ptr) where r is a decompressed representation
of pp.
E1 = (p1, ptr1), E2 = (p2, ptr2), returns
a domain specific penalty for inserting E2 into the subtree
rooted at E1, which is used to aid the splitting process of the
insertion operation.
P of M+1 entries (p, ptr),
splits P
into two sets of entries P1 and P2, each of size kM , where k
is the minimum fill factor.
The three methods of GiST provide algorithms for search, insertion and deletion operations:
(p, ptr) and, if required, maintains the
balance of the tree using method Union (as in INSERT).
Literature: