Zen: sncat design

SAI SN catalogue design

Development version of catalogue scheme
sncat scheme - database diagram
sncat.sql, snref.sql, snproc.sql
sncat2sntab.pl - convert old catalog to table sn
leda documentation - parameters description.

Convertations Issues

How to distinguish unknown optdisc and default (ccd) ? See 1885A example.
type of sn ??? Ideally, we want to encode information about common type, subtype, pecularity and uncertainty of each part.
date ??? We have incomplete date (only month is specified), which is no way to use in db, but we want to preserve somehow this info and use date data type.
cross-identification with leda catalogue. This is not difficult using q3c package. See ledasn table in leda database for results.

select m.pgc,m.objname,m.al2000,m.de2000,s.id,s.name,s.ra,s.dec into ledasn 
from leda.m000 as m, sn.sn as s where
q3c_join(m.al2000,m.de2000,s.ra,s.dec,0.05);

It took about 90 seconds to join 2,000,000 galaxies with 3348 sne to produce about 20,000 results (yes, 0.05 is too much !) on very modest linux box: PIII 1Gz, 256Mb RAM, PostgreSQL 8.1.2, q3c 1.1

q3c doesn't supports ellipses yet, so for now I tried circle (should correct for positional angle !)

select m.pgc,m.objname,m.al2000,m.de2000,s.id,s.name,s.ra,s.dec into ledasn 
from leda.m000 as m, sn.sn as s 
where q3c_join(m.al2000,m.de2000,s.ra,s.dec,2*10^(m.logd25-1)/60.);

use coalesce() function to specify default joining diameter (if logd25 is null).

Comments

Well, I finally decided to abandon galaxy table from our catalogue and to use hyperleda as a reference data for galaxies. This will greatly simplify maintenance of our catalog and make it more transparent. But, I see a problem with support of several reference tables for galaxies. For example, hyperleda is ok for close galaxies, but modern sn surveys look much further and we certainly need other catalogs with faint galaxies. I need to figure out how to modify SQL scheme. Ideally, we want be able to:

add (online) new reference catalogs
specify priority of catalogs in case of overlapping data (for example, we could have our local hand-made table for galaxies and assign it high priority)

Questions

1. How to implement history of changes on catalog ?: Ideally, we need separate history table which stores information about who made the change, when it was occured, what was changed and old value.

CREATE TABLE history (
  id bigint PRIMARY KEY,
  who text,
  when timestamp without time zone,
  what text,
  old  ????
);

For now, I use triggers to store modification datetime in mod_date column.

Maintain compiled catalogs

The problem:It's needed to maintain catalog, which contains data collected from different catalogs (primary). We want to find out the best database schema which supports synchronization of data in such (secondary) catalog with primary catalogs.

t0.sql, diagram - sql schema and diagram for example database (galaxy has 3 properties - ra,dec,z)

1. Fully normalized schema

It's possible to store every property of a galaxy as a separate row, so we could maintain separate table with references to the source catalog and record id in source catalog.

CREATE TABLE cats (
	id integer PRIMARY KEY,
	name text
);

-- a list of properties of galaxy
CREATE TABLE properties (
	id integer PRIMARY KEY, 
	name text
);

CREATE TABLE galaxy (
	id integer PRIMARY KEY,
	name text
);

CREATE TABLE galaxy_property (
	gid integer REFERENCES galaxy(id),
	pid integer REFERENCES properties(id),
	value	float,
	UNIQUE(gid,pid)
);

CREATE TABLE map (
	gid integer REFERENCES galaxy(id),	-- galaxy id
	pid integer REFERENCES properties(id),  -- property id
	cid integer REFERENCES cats(id),	-- catalog id
	cidid integer,  			-- id in catalog cid
	UNIQUE(gid,pid,cid)
);

t=# select * from galaxy;
 id | name 
----+------
  1 | N224
t=# select * from galaxy_property where gid=1;
 gid | pid | value 
-----+-----+-------
   1 |   1 |  0.74
   1 |   2 |  41.3
   1 |   3 |     0
t=# select * from properties;
 id | name 
----+------
  1 | ra
  2 | dec
  3 | z
t=# select * from cats;      
 id | name  
----+-------
  1 | LEDA
  2 | NED
  3 | SDSS
  4 | OTHER
(4 rows)

Benefits:

Very flexible
- catalogs could be added online,
- only known properties could be stored for a galaxy

Drawbacks:

Each galaxy requires many rows in galaxy_property and map tables ( as much as the number of properties)
select queries are very cumbersome
There is a problem with storing values of different types in galaxy_property (we use float in this example)

2. Partially normalized schema

Instead of maintaining separate row for each attribute, we use custom composite type, which encapsulates knowledge about value and source catalog where this value comes from. See table galaxy_properties instead of galaxy_property'.

CREATE TYPE val_cid as ( value float, cid integer);
CREATE TABLE galaxy_properties (
        gid integer REFERENCES galaxy(id),
        ra  val_cid,
        dec val_cid,
        z   val_cid
);

Notice, composite type designated as BLOB on picture!

t=# select * from galaxy_properties;
 gid |    ra    |   dec    |   z   
-----+----------+----------+-------
   1 | (0.74,1) | (41.3,1) | (0,3)
(1 row)

For composite type we should use special syntax to access specific component.

t=# select * from galaxy_properties where (z).cid = 3;
 gid |    ra    |   dec    |   z   
-----+----------+----------+-------
   1 | (0.74,1) | (41.3,1) | (0,3)
(1 row)

Benefits:

It's flexible and catalogs could be added online
One row per galaxy
No problem with data type, we could create many composite types

Drawbacks:

Require creating of custom type
select queries are very cumbersome

Example query - identify source (source catalog, record id in source catalog) of redshift of galaxy with gid = 1.

t=# select g.id, g.name as galaxy, c.name as catalog,m.cidid 
from galaxy_properties gp, cats c, map m,galaxy g,properties p
where  (gp.z).cid = c.id and gp.gid = 1 and m.gid = gp.gid and m.cid = c.id 
and g.id = gp.gid and m.pid=p.id and p.name='z';
 id | galaxy | catalog | cidid 
----+--------+---------+-------
  1 | N224   | SDSS    |   178
(1 row)

3. Basic schema

What if we combine tables galaxy and galaxy_properties ? Then, if assume, that (gid,pid) uniquely connected with (cid,cidid), we could a bit easier identified the source of a specific column.

Uniqueness means, that for given property of given galaxy we have onle one record in source catalog !

t=# select g.gid, g.name as galaxy, c.name as catalog,m.cidid
from galaxies g, cats c, map m, properties p 
where g.gid = 1 and m.gid = g.gid and m.cid = c.id  and m.pid=p.id and p.name='z';
 gid | galaxy | catalog | cidid 
-----+--------+---------+-------
   1 | N224   | SDSS    |   178
(1 row)

sncat design

Projects

Links