intarray tutorial

Intarray tutorials by Achilleus Mantzios

Arrays in postgresql are sometimes a good way to store vector data.
In almost all aspects these type of data can indeed be modeled by
the traditional relational way.
So why do we need arrays?
While a large deal of debate has been discussed over the matter, it is still
true that whenever a relatively small set of numbered values is to be stored,
"arrays" is the natural answer.

Imagine for instance that we have to store polynomial formulas and later
do computations on them (find the value of the polyonym, find its first product, etc...).
We could choose a relational aproach, and have a table representing polyonyms,
e.g.
CREATE TABLE polyonym (
id integer PRIMARY KEY,
degree integer not null
)
and a child table
CREATE TABLE polyonym_factors (
pid integer REFERENCES polyonym(id),
position integer,
factor integer
)
ALTER TABLE polyonym_factors ADD CONSTRAINT pf_posfac UNIQUE (pid,position)

Using the arrays aproach we could simply have
CREATE TABLE polyonym (
id integer PRIMARY KEY,
formula integer[]
)
which is both more intuitive and far more efficient.

The intarray package provides a handful set of functions, operators,
and indexing capabilities to integer arrays.

Another useful use of integer arrays is when implementing a tree structure
following the genealogical approach, i.e. for each node store the path
from this node to the root of the tree.
Integer arrays along with the intarray package is a powerful tool to
implement this.

Lets suppose we model our tree data like this:

CREATE TABLE tree(
id integer PRIMARY KEY,
name text not null,
parents integer[]
)

We then create an index on the parents column:

CREATE INDEX tree_parents on tree using gist (parents gist__int_ops);

Now we can do quite interesting things with our tree:

Lets insert the root:

INSERT INTO tree (id,name,parents) VALUES (1,'root',null);

The root does not a have any parents.
And having the id of the root lets insert its first child:

INSERT INTO tree (id,name,parents) VALUES (2,'kid1',intset(1));

The intset function converts an integer to a single element array.
Now lets add a second child.

INSERT INTO tree (id,name,parents) VALUES (3,'kid2',intset(1));

Now having the id of the first child 'kid1' (id=2) and its parents (='{1}'),
lets insert a child for it.

INSERT INTO tree (id,name,parents) VALUES (4,'kid1.1',intset(2)+'{1}'::int4[]);

The + operator concatenates 2 arrays.

and a second child of 'kid1',

INSERT INTO tree (id,name,parents) VALUES (5,'kid1.2',intset(2)+'{1}'::int4[]);

Since the '+' operator returns null, a good way to add nodes in a root/not root agnostic way is to use the coalesce function:
e.g.
INSERT INTO TREE (id,name,parents) VALUES (<id val>,<name>,intset(<parent's id>)+coalesce(<parent's parents>,'{}'::int4[]))

Also note that intset(null) returns null as well.

So to find the first immediate ancenstor of any node we simply access parents[1],
and to find the last ancenstor before the root we simply access parents[n-1],
where n=the length of the path.
To find the root we access parents[n].

The icount function gives us the degree, .i.e the number of elements of the array.
So in the previous example, n=icount(parents), if parents is null (we are at the root),
then icount returns null.
So icount gives us the "depth" of the node.

Upto now our tree is like this:

                       (1,root,null)
                       /          \
                      /            \
                     /              \
                   (2,kid1,'{1}')   (3,kid2,'{1}')
                   /   \
                  /     \
                 /       \
  (4,kid1.1,'{2,1}')   (5,kid1.2,'{2,1}')

To find the immediate children of node id=1 (root) we do:

SELECT * from tree where intset(1) ~ parents and icount(parents)=1.

Generally to find the immediate children of a node with id=<id> and depth=<n> we do,
SELECT * from tree where intset(<id>) ~ parents and icount(parents) = <n>+1;

To find all children of a node we do:
SELECT * from tree where intset(<id>) ~ parents.