GiST home |
Introduction |
GiST interface (Russian) |
Readings

Search PostgreSQL resources | OpenFTS | PostgreSQL site

Search PostgreSQL resources | OpenFTS | PostgreSQL site

Update: We wrote a paper GiST programming tutorial, in Russian.

Hellerstein et al. [HNP95] introduced an index structure, called a Generalized Search Tree (GiST), which is a generalized form of an R-tree [Gut84]. The GiST provides a possibility to create custom data types with indexed access methods and extensible set of queries for specific domain experts not a database one.

GiST was implemented in an early version of PostgreSQL by J. Hellerstein and P.Aoki, more details is available from "The GiST Indexing Project" (http://gist.cs.berkeley.edu/) at Berkeley. As an "university" project it has a limited number of features and was in rare use. Since 7.1 version of PostgreSQL the GiST was taken up by Oleg Bartunov and Teodor Sigaev, see "GiST development" (http://www.sai.msu.su/~postgres/gist) for information about current development.

Current implementation of GiST supports:

- Variable length keys
- Composite keys (multi-key)
- provides NULL-safe interface to GiST core

- rtree_gist, btree_gist - GiST implementation of R-Tree and B-Tree
- intarray - index support for one-dimensional array of int4's
- tsearch - a searchable (full text) data type with indexed access
- ltree - data types, indexed access methods and queries for data organized as a tree-like structures
- cube - data type, representing multidimensional cubes

`(p, ptr)`

pairs, where `p`

is a predicate
(attribute),which is used as a
search key, and `ptr`

is the pointer to data for a leaf node or
a pointer to another tree node for a non-leaf node.
GiST has the following properties:

- Every node contains between
`min`

and`max`

index entries unless it is the root - For each index entry
`(p,ptr)`

in a leaf node,`p`

is true when instantiated with the values from the indicated tuple (i.e.,`p`

holds for the tuple). - For each index entry
`(p, ptr)`

in a non-leaf node,`p`

is true when instantiated with the values of any tuple reachable from`ptr`

. - The root has at least two children unless it is a leaf
- All leaves appear on the same level

In principle, the keys of a GiST may be any arbitrary predicates that hold for each datum below the key. In practice, the keys come from a user-defined object class, which provides a set of six methods required by the GiST:

**Consistent(E,q):**- given an entry
`E = (p, ptr)`

, and a query predicate`q`

, returns false if`p ^ q`

can be guaranteed unsatisfiable, and true otherwise. **Union(P ):**- given a set P of entries
`(p1,ptr1), ... (pn,ptrn)`

, returns some predicate`r`

that holds for all tuples stored below`ptr1`

through`ptrn`

. **Compress(E):**- given an entry
`E = (p, ptr)`

, returns an entry`(pp, ptr)`

where`pp`

is a compressed representation of`p`

. **Decompress(E):**- given a compressed representation
`E = (pp, ptr)`

, returns an entry`(r, ptr)`

where`r`

is a decompressed representation of`pp`

. **Penalty(E1,E2):**- given two entries
`E1 = (p1, ptr1), E2 = (p2, ptr2)`

, returns a domain specific penalty for inserting`E2`

into the subtree rooted at`E1`

, which is used to aid the splitting process of the insertion operation. **PickSplit(P ):**- given a set
`P`

of`M+1`

entries`(p, ptr)`

, splits`P`

into two sets of entries`P1`

and`P2`

, each of size`kM`

, where`k`

is the minimum fill factor.

The three methods of GiST provide algorithms for search, insertion and deletion operations:

**SEARCH**-
can be used to search any dataset with any query predicate by traversing
as much of the tree as necessary to satisfy the query.

The general search algorithm is controlled by the user-specified**Consistent**method, which returns 'TRUE' if the predicate is satisfiable and 'FALSE' in vice versa. The same**Consistent**method applies to both index node and leaf node. But usually the satisfiable conditions are different for the index node and leaf node, no matter if it is an exact matching search or a range/window query. So further checking is needed after leaf entries are fetched according to the algorithm. Another restriction is that only one type of query can be conducted in one program because function Consistent is used in the search instead of a pointer to a function. **INSERT**- guarantees that the GiST remains balanced.
User defined key method
**Penalty**is used for choosing a subtree to insert; method**PickSplit**is used for the node splitting algorithm; method**Union**is used for propagating changes upward to maintain the tree properties. **DELETE**- determines a leaf with specified key and removes
entry
`(p, ptr)`

and, if required, maintains the balance of the tree using method**Union**(as in**INSERT**).

Literature:

- [Gut84] Antonin Guttman. R-trees: a dynamic index structure for spatial searching. In ACM SIGMOD International Conference on Management of Data, pages 47-54, 1984.
- [HNP95] J. M. Hellerstein, J. F. Naughton, and Avi Pfeffer. Generalized search trees for database systems. In Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switzerland, 1995.