XMLCatalog

Here I describe the XML format which I propose for loading new data into CAS. Also I describe the technology to create the Java class reading such XML files.

The W3C XML schema of the proposed format is available here -- http://vo.astronet.ru/cgi-bin/viewvc.cgi/cas_code/trunk/schema/cas1.xsd?root=cas&view=markup

The example of the the file conforming to this scheme is

<catalog name="usnob1">
<description>The USNO-B1.0 Catalog</description>
<info>The USNO-B1.0 is a catalog that presents positions, proper motions,
    magnitudes in various optical passbands, and star/galaxy estimators
    for 1,045,913,669 objects derived from 3,648,832,040 separate
    observations. The data were taken from scans of 7,435 Schmidt plates
    taken from various sky surveys during the last 50 years.

    The catalog is expected to be complete down to V=21; the estimated
    accuracies are 0.2arcsec for the positions at J2000, 0.3mag in up to 5
    colors, and 85% accuracy for distinguishing stars from non-stellar
    objects.
</info>
<property_list>
        <property name="Author" value="Monet D.G."/>
</property_list>
<table name="main">
<property_list>
        <property name="ra_main" value="radeg"/>
                <property name="dec_main" value="dedeg"/>
</property_list>
<description>The main table in USNO-B1</description>
<column name="USNO_B1_0" datatype="char" unit="" ucd="ID_MAIN">
<description>Designation of the object</description>
<info>The USNO-B1.0 is arranged in zones of 0.1deg in Declination,
with objects ordered by Right Ascension in each zone.
The USNO-B1.0 is made of the zone number (from 0000 in South Pole
to 1799 in North Pole), followed by a sequential number.
</info>
</column>
<column name="Tycho_2" datatype="char" unit="">
<description>Designation in the Tycho-2 Catalog</description>
</column>
<column name="RAdeg" datatype="double" unit="deg" ucd="POS_EQ_RA_MAIN">
<description>Right Ascension at Eq=J2000, Ep=J2000</description>
<info>The proper motion was applied to compute the RAdeg and DEdeg
values. Please note that the large uncertainties in the proper motions
mean that the RAdeg and DEdeg are less accurate than the mean errors
e_RAdeg and e_DEdeg which apply to the position at the mean Epoch.
</info>
</column>
<column name="DEdeg" datatype="double" unit="deg" ucd="POS_EQ_RA_MAIN">
<description>Declination at Eq=J2000, Ep=J2000 </description>
<info>The proper motion was applied to compute the RAdeg and DEdeg
values. Please note that the large uncertainties in the proper motions
mean that the RAdeg and DEdeg are less accurate than the mean errors
e_RAdeg and e_DEdeg which apply to the position at the mean Epoch.
</info>
</column>
<column name="e_RAdeg" datatype="short" unit="mas">
<description>Mean error on RAdeg*cos(DEdeg) at Epoch</description>
</column>
<column name="e_DEdeg" datatype="short" unit="mas">
<description>Mean error on DEdeg at Epoch</description>
</column>
<column name="Epoch" datatype="float" unit="yr" ucd="TIME_EPOCH">
<description>Mean epoch of observation</description>
<info>The proper motion was applied to compute the RAdeg and DEdeg
values. Please note that the large uncertainties in the proper motions
mean that the RAdeg and DEdeg are less accurate than the mean errors
e_RAdeg and e_DEdeg which apply to the position at the mean Epoch.
</info>
</column>
<column name="pmRA" datatype="int" unit="mas/yr" ucd="POS_EQ_PMRA">
<description>Proper motion in RA (relative to YS4.0)</description>
</column>
<column name="pmDE" datatype="int" unit="mas/yr" ucd="POS_EQ_PMDEC">
<description>Proper motion in DE (relative to YS4.0)</description>
</column>
<column name="muPr" datatype="short" unit="0.1">
<description>Total Proper Motion probability</description>
<info>For Tycho-2 stars, the Total Proper Motion probability is
not given, and the number of detections Ndet is set to zero.
For other stars, Ndet is 2 or more.
</info>
</column>
<column name="e_pmRA" datatype="short" unit="mas/yr">
<description>Mean error on pmRA</description>
</column>
<column name="e_pmDE" datatype="short" unit="mas/yr">
<description>Mean error on pmDE</description>
</column>
<column name="fit_RA" datatype="short" unit="100mas">
<description>Mean error on RA fit</description>
</column>
<column name="fit_DE" datatype="short" unit="100mas">
<description>Mean error on DE fit</description>
</column>
<column name="Ndet" datatype="short" unit="">
<description>Number of detections</description>
<info>For Tycho-2 stars, the Total Proper Motion probability is
not given, and the number of detections Ndet is set to zero.
For other stars, Ndet is 2 or more.
</info>
</column>
<column name="Flags" datatype="char" unit="">
<description>Flags on object</description>
<info>. denotes the absence of any flag
M = Existence in a proper motion catalog,
s = object on a diffraction spike
Y = Correlation with YS4.0 catalog (Monet, in prep.)
</info>
</column>
<column name="B1mag" datatype="float" unit="mag" ucd="PHOT_PHG_B">
<description>First blue magnitude</description>
</column>
<column name="B1C" datatype="short" unit="">
<description>source of photometric calibration</description>
<info>the photometric calibration is represented by a number:
0 = bright photometric standards on the plate
1 = faint photometric standard on the plate
2 = faint photometric standard one plate away (on overlap plate)
3 = faint photometric standard two plate away (on overlap of overlap)
etc
</info>
</column>
<column name="B1S" datatype="short" unit="">
<description>Survey number</description>
</column>
<column name="B1f" datatype="short" unit="">
<description>Field number in survey</description>
</column>
<column name="B1s_g" datatype="short" unit="">
<description>Star-galaxy separation</description>
<info>The star/galaxy separation is a measure of the similarity
of the point-spread function to a stellar profile:
 0 means quite dissimilar -- i.e. a non-stellar object
11 means quite similar -- i.e. a stellar object
</info>
</column>
<column name="B1xi" datatype="float" unit="arcsec">
<description>Residual in X direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<column name="B1eta" datatype="float" unit="arcsec">
<description>Residual in Y direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<column name="R1mag" datatype="float" unit="mag" ucd="PHOT_PHG_R">
<description>First red magnitude</description>
</column>
<column name="R1C" datatype="short" unit="">
<description>source of photometric calibration</description>
<info>the photometric calibration is represented by a number:
0 = bright photometric standards on the plate
1 = faint photometric standard on the plate
2 = faint photometric standard one plate away (on overlap plate)
3 = faint photometric standard two plate away (on overlap of overlap)
etc
</info>
</column>
<column name="R1S" datatype="short" unit="">
<description>Survey number</description>
</column>
<column name="R1f" datatype="short" unit="">
<description>Field number in survey</description>
</column>
<column name="R1s_g" datatype="short" unit="">
<description>Star-galaxy separation</description>
<info>The star/galaxy separation is a measure of the similarity
of the point-spread function to a stellar profile:
 0 means quite dissimilar -- i.e. a non-stellar object
11 means quite similar -- i.e. a stellar object
</info>
</column>
<column name="R1xi" datatype="float" unit="arcsec">
<description>Residual in X direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<column name="R1eta" datatype="float" unit="arcsec">
<description>Residual in Y direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<column name="B2mag" datatype="float" unit="mag" ucd="PHOT_PHG_B">
<description>Second blue magnitude</description>
</column>
<column name="B2C" datatype="short" unit="">
<description>source of photometric calibration</description>
<info>the photometric calibration is represented by a number:
0 = bright photometric standards on the plate
1 = faint photometric standard on the plate
2 = faint photometric standard one plate away (on overlap plate)
3 = faint photometric standard two plate away (on overlap of overlap)
etc
</info>
</column>
<column name="B2S" datatype="short" unit="">
<description>Survey number</description>
</column>
<column name="B2f" datatype="short" unit="">
<description>Field number in survey</description>
</column>
<column name="B2s_g" datatype="short" unit="">
<description>Star-galaxy separation</description>
<info>The star/galaxy separation is a measure of the similarity
of the point-spread function to a stellar profile:
 0 means quite dissimilar -- i.e. a non-stellar object
11 means quite similar -- i.e. a stellar object
</info>
</column>
<column name="B2xi" datatype="float" unit="arcsec">
<description>Residual in X direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<column name="B2eta" datatype="float" unit="arcsec">
<description>Residual in Y direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<column name="R2mag" datatype="float" unit="mag" ucd="PHOT_PHG_R">
<description>Second red magnitude</description>
</column>
<column name="R2C" datatype="short" unit="">
<description>source of photometric calibration</description>
<info>the photometric calibration is represented by a number:
0 = bright photometric standards on the plate
1 = faint photometric standard on the plate
2 = faint photometric standard one plate away (on overlap plate)
3 = faint photometric standard two plate away (on overlap of overlap)
etc
</info>
</column>
<column name="R2S" datatype="short" unit="">
<description>Survey number</description>
</column>
<column name="R2f" datatype="short" unit="">
<description>Field number in survey</description>
</column>
<column name="R2s_g" datatype="short" unit="">
<description>Star-galaxy separation</description>
<info>The star/galaxy separation is a measure of the similarity
of the point-spread function to a stellar profile:
 0 means quite dissimilar -- i.e. a non-stellar object
11 means quite similar -- i.e. a stellar object
</info>
</column>
<column name="R2xi" datatype="float" unit="arcsec">
<description>Residual in X direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<column name="R2eta" datatype="float" unit="arcsec">
<description>Residual in Y direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<column name="Imag" datatype="float" unit="mag" ucd="PHOT_PHG_I">
<description>Infrared (N) magnitude</description>
</column>
<column name="IC" datatype="short" unit="">
<description>source of photometric calibration</description>
<info>the photometric calibration is represented by a number:
0 = bright photometric standards on the plate
1 = faint photometric standard on the plate
2 = faint photometric standard one plate away (on overlap plate)
3 = faint photometric standard two plate away (on overlap of overlap)
etc
</info>
</column>
<column name="_IS" datatype="short" unit="">
<description>Survey number</description>
</column>
<column name="_If" datatype="short" unit="">
<description>Field number in survey</description>
</column>
<column name="Is_g" datatype="short" unit="">
<description>Star-galaxy separation</description>
<info>The star/galaxy separation is a measure of the similarity
of the point-spread function to a stellar profile:
 0 means quite dissimilar -- i.e. a non-stellar object
11 means quite similar -- i.e. a stellar object
</info>
</column>
<column name="Ixi" datatype="float" unit="arcsec">
<description>Residual in X direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<column name="Ieta" datatype="float" unit="arcsec">
<description>Residual in Y direction</description>
<info>Distance, along the x- and y- direction, of the object
position compared to the mean epoch.
</info>
</column>
<data>
<externaldata format="fixed-width">
        <property_list>
                <property name="fields" value="1 12 14 25 27 36 37 46 48 50 52 54 56 61 63 68 70 75 77 77 79 81 83 85 87 87 89 89 91 91 93 95 98 102 104 104 106 106 108 110 112 113 115 120 121 126 129 133 135 135 137 137 139 141 143 144 146 151 152 157 160 164 166 166 168 168 170 172 174 175 177 182 183 188 191 195 197 197 199 199 201 203 205 206 208 213 214 219 222 226 228 228 230 230 232 234 236 237 239 244 245 250">
                </property>
        </property_list>
<!--    <source uri="file:///home/math/vo_work/cas/demo_data/out.sam"/>-->
        <source uri="file:///tmp/usno_fifo"/>

</externaldata>
</data>
</table>
</catalog>

<?xml version="1.0" encoding="utf-8" ?>
<catalog name="cat1">
<info>This is my first CAS catalog</info>
<property_list>
        <property name="something" value="SSSSSSSSS"/>
        <property name="something different :)" value="1 4 5 6 5"/>
</property_list>
        <table name="gal1">
                <info>The first table in cat1</info>
                <description>Something long..........</description>
                <column name="name" ucd="name_MAIN" datatype="char"/>  
                <column name="RA" ucd="POS_EQ_RA_MAIN" datatype="double" unit="deg"/>
                <column name="Dec" ucd="POS_EQ_DEC_MAIN" datatype="double" unit="deg"/>
                <column name="RAErr" ucd="ERROR" datatype="double" unit="deg"/>
                <column name="DecErr" ucd="ERROR" datatype="double" unit="deg"/>
                <column name="Epoch" ucd="TIME_EPOCH" datatype="double" unit="deg"/>
                <column name="Fmag" ucd="PHOT_PHG_R" datatype="double" unit="mag"/>
                <column name="FmagErr" ucd="ERROR" datatype="double" unit="mag"/>
                <column name="Jmag" ucd="PHOT_PHG_B" datatype="double" unit="mag"/>
                <column name="JmagErr" ucd="ERROR" datatype="double" unit="mag"/>
                <column name="Vmag" ucd="PHOT_PHG_V" datatype="double" unit="mag"/>
                <column name="VmagErr" ucd="ERROR" datatype="double" unit="mag"/>
                <column name="Nmag" ucd="PHOT_PHG_N" datatype="double" unit="mag"/>
                <column name="NmagErr" ucd="ERROR" datatype="double" unit="mag"/>
                <column name="classification" ucd="CLASS_OBJECT" datatype="char"/>
                <column name="semiMajor" ucd="EXTENSION_RAD" datatype="double"/>
                <column name="eccentricity" ucd="PHYS_ECCENTRICITY" datatype="double"/>
                <column name="positionAngle" ucd="POS_POS-ANG" datatype="double" unit="deg"/>
                <column name="status" ucd="CODE_QUALITY" datatype="double"/>
                <externaldata format="fixed-width" encoding="gzip">
                     <property_list>
                         <property name="something" value="SSSSSSSSS"/>
                         <property name="widths" value="5 10 12 4 14 12 8 10 9 10 11 8 3 1"/>
                     </property_list>
                     <source uri="http://xxx.xxx/xxx.gz"/>
                     <source uri="ftp://yyy.yyy/yy.gz"/>
                     <source uri="file://tmp/something.gz"/>
                </externaldata>
        </table>

        <!-- Second table in the catalogue -->
        
        <table name="heasarc_abell_9001">
                <description>Abell Clusters</description>
                <column name="unique_id" datatype="int"  ucd="ID_MAIN">
                        <description>Integer key</description>
                </column>      
                <column name="name" datatype="char" >
                        <description>Abell Catalog Number (s = Supplemental Cluster Catalog)</description>
                </column>
                <column name="ra" datatype="double" unit="degree" ucd="POS_EQ_RA_MAIN">
                        <description> Right Ascension (hh mm.m)</description>
                </column>
                <column name="dec" datatype="double" unit="degree" ucd="POS_EQ_DEC_MAIN">
                        <description>Declination (dd mm)</description>
                </column>
                <column name="_count" datatype="int" >
                        <description>Number of Cluster Members</description>
                </column>
                <column name="bmtype" datatype="char"  >
                        <description>Bautz-Morgan Type, :=mean of obs, ?=quest</description>
                </column>
                <column name="redshift" datatype="float">
                        <description>Redshift</description>
                </column>
                <column name="rich" datatype="long" >
                        <description>Abell Richness Class (Abell 1958)</description>
                </column>
                <column name="dist" datatype="long">
                        <description>Abell Distance Class (Abell 1958)</description>
                </column>
                <column name="vmag" datatype="float"  >
                        <description>V Magnitude for 10th Ranked Member, Correlated for Extinction</description>
                </column>
                <column name="Search_Offset" datatype="double"  >
                        <description>Something uknown :)</description>
                </column>               
                <data>
                        <externaldata format="delimited"  encoding="none">
                                <property_list>
                                        <property name="delimiter" value=","/>
                                </property_list>
                                <source uri="file:///home/math/vo_work/cas/demo_data/xx.csv_long"/>
                        </externaldata>
                </data> 

        </table>
</catalog>  
          

I decided to use the XML schema as basis for inserting data into CAS, since there are some clever technologies converting the XML schema to Java classes parsing and reading the XML files conforming to the schema and providing the clear interface to access the data from those XML files. One such technology is JAXP (Java XML Processing library) (build by Sun) which is part of JWSDP (Java Web Services Developper Pack).


The separate thing important for that format is the relation with VOTable. I decided to not use the VOTable as the format for loading input catalogues, since I see the datamodel of VOTable a bit different from what we need for loading new catalogues. But still I think that VOTables can be used easily to describe the input catalogues.

I see the rather clean mapping between VOTable <resource> tag and our <catalog> tag. VOtable

and <field> tags correspond to our <column> tags. So currently I see that our method of loading new catalogues should or provide the wrapper to VOTable parser with the interface of our XML format or do the XSLT transformation…