ISTP Metadata Guidelines: Standard Attributes

Bare Bones

Global and Varible Attributes are now separated out from extra (recommended or optional) attributes. These Bare Bones attributes are just enough to make the data correctly and independently usable by someone not connected with the instrument team. This chapter together with How to Structure Data in CDF constitute the ISTP standard. Usage of this standard increases the likelihood of useful comparisons between data sets.

Look at


* Return to ISTP Metadata Guidelines

*CDF home page


Global Attributes

Global-scope-attributes are used to provide information about the data set as an entity. Note that CDF attributes are case-sensitive and must exactly follow what is shown here.

The

Bare Bones

Global Attributes are listed here with example values. These are just enough to make the data correctly and independently usable by someone not connected with the instrument team, and hence make a good archive product.
     ATTRIBUTE                              EXAMPLE VALUE
--------------------------------------------------------------------

     "Project"                     { "ISTP>International " - 
                                     "Solar-Terrestrial Physics" }.
     "Source_name"                 { "GEOTAIL>Geomagnetic Tail" }.
     "Discipline"                  { "Space Physics>Magnetospheric Science" }.
     "Data_type"                   { "K0>Key Parameter" }.
     "Descriptor"                  { "EPI>Energetic Particles" -
                                     " and Ion Composition" }.
     "Data_version"                { "1" }.
     "Logical_file_id"             { "GE_K0_EPI_19920908_V01" }.
     "PI_name"                     { "D. Williams" }.
     "PI_affiliation"              { "JHU/APL" }.
     "TEXT"                        { "reference to journal article" }.
--------------------------------------------------------------------

The Global Attributes needed for the CDAWeb software are listed next with example values. These should be included if your data is to be made available through CDAWeb. See below for the full set of defined Global Attributes.

     ATTRIBUTE                              EXAMPLE VALUE
--------------------------------------------------------------------

     "Instrument_type"             { "Magnetic Fields (space)" }.
     "Mission_group"               { "Geotail" }.
     "Logical_source"              { "GE_K0_EPI" }.
     "Logical_source_description"  { "Geotail Magnetic Field Key Parameters" }.
--------------------------------------------------------------------

*Look at All Global Attribute Definitions

*Return to Top of Page


Global Attribute Definitions

Acknowledgement--- recommended

Text string at PI disposal allowing for information on expected acknowledgment if data is citable.

ADID_ref --- recommended

This attribute stores the control authority identifier associated with the detached SFDU label. If no control authority identifier has been assigned, then the identifier associated with the ISTP Guidelines (NSSD0241) or with CDF (NSSD0110) can be used.

Data_type --- required

This attribute identifies the data type of the CDF data set. Both a long name and a short name are given. For ISTP exchangeable data products the values are "Kn>Key Parameter" for approximately minute averaged survey data, and "Hn>High Resolution data" for certified data of higher resolution than Key Parameters.$n$ can run from 0 to 9 to allow for more than one kind of data product. For Cluster/CSDS this can either be "SP>Summary Parameter" or "PP>Prime Parameter". Other possible data types may be defined in future. If any of these data sets are modified or used to produce derived products, the data type should be, e.g., "Mn>Modified Data n", where n is from 0 to 9.

Data_version --- required

This attribute identifies the version of a particular CDF data file for a given date, e.g., the file GE_K0_MGF_19920923_V01 is the first version of data for 1992 September 23. Each time this particular data file is reproduced - for recalibration or other reasons - the Data_version is incremented by 1. Data_version always starts at `1'.

Descriptor --- required

This attribute identifies the name of the instrument or sensor that collected the data. Both a long name and a short name are given. An example for ISTP is "EPI>Energetic Particles and Ion Composition". The short name should be limited to from 2 to 4 characters for consistency with ISTP. This attribute should be single valued.

Discipline --- required

This attribute describes both the science discipline and subdiscipline. The list for space physics is:

Generated_by --- recommended

This attribute allows for the generating data center/group to be identified.

Generation_date --- recommended

Date stamps the creation of the file using the syntax yyyymmdd, e.g., "19920923". This is distinct from the date in "validate" below which records the times of later validation processes.

Instrument_type --- recommended

This attribute is used as a single value to facilitate making choices of instrument type through CDAWeb.

Logical_file_id --- required

This attribute stores the name of the CDF file using the ISTP naming convention (source_name / data_type / descriptor / date / data_version), e.g., GE_K0_MGF_19920923_V01. This attribute is required (1) to allow storage of the full name on IBM PCs, and (2) to avoid loss of the original source in the case of accidental (or intentional) renaming. For CDFs created on the ISTP CDHF, the correct Logical_file_id will be filled in by an ICSS support routine.

Logical_source --- recommended

This attribute carries source_name, data_type, and descriptor information. Used by CDAWeb.

Logical_source_description --- recommended

This attribute writes out the full words associated with the encrypted Logical_source above, e.g., "Geotail Magnetic Field Key Parameters". Used by CDAWeb.

Mission_group --- recommended

This attribute is used as a single value to facilitate making choices of source through CDAWeb. Valid values include (but are not restricted to) :

MODS --- recommended

This attribute is an SPDF standard global-scope-attribute which is used to denote the history of modifications made to the CDF data set. The MODS attribute should contain a description of all significant changes to the data set. This attribute is not directly tied to Data_version, but each version produced will contain the relevant modifications. This attribute can have as many entries as necessary to contain the desired information.

Parents --- optional

This attribute lists the parent CDF(S) for files of derived and merged data sets. Subsequent entry values are used for multiple parents. The syntax for a CDF parent would be e.g. "CDF>logical_file_id".

PI_affiliation --- required

This attribute value should include a recognizable abbreviation.

PI_name --- required

This attribute value should include first initial and last name.

Project --- required

This attribute identifies the name of the project and indicates ownership. For ISTP missions and investigations, the value used is "ISTP>International Solar-Terrestrial Physics". For the Cluster mission, the value is "STSP Cluster>Solar Terrestrial Science Programmes, Cluster". Other acceptable values are "IACG>Inter-Agency Consultative Group", "CDAWxx>Coordinated Data Analysis Workshop xx", "SPDS>Space Physics Data System", and "NSSDC>National Space Science Data Center Archived Data". Others may be defined in future. This attribute can be multi-valued if the data has been supplied to more than one project.

Rules_of_use --- recommended

Text containing information on, {\it e.g.} citability and PI access restrictions. This may point to a World Wide Web page specifying the rules of use.

Skeleton_version --- optional

This is a text attribute containing the skeleton file version number. This is a required attribute for Cluster, but for IACG purposes it exists if experimenters want to track it.

Software_version --- optional

This is a required attribute for Cluster, but for IACG purposes it exists if experimenters want to track it.

Source_name --- required

This attribute identifies the mission or investigation that contains the sensors. For ISTP, this is the mission name for spacecraft missions or the investigation name for ground-based or theory investigations. Both a long name and a short name are provided. This attribute should be single valued. Examples:

TEXT --- required

This attribute is an SPDF standard global-scope-attribute which is a text description of the experiment whose data is included in the CDF. A reference to a journal article(s) or to a World Wide Web page describing the experiment is essential, and constitutes the minimum requirement. A written description of the data set is also desirable. This attribute can have as many entries as necessary to contain the desired information.

Time_resolution --- recommended

specifies time resolution of the file, e.g., "3 seconds".

TITLE --- optional

This attribute is an SPDF standard global-scope-attribute which is a title for the data set, e.g., " Geotail EPIC Key Parameters".

Validate --- optional

Details to be specified. This attribute is written by software for automatic validation of features such as the structure of the CDF file on a simple pass/fail criterion. The software will test that all expected attributes are present and, where possible, have reasonable values. The syntax is likely to be of the form "test>result>where-done>date". It is not the same as data validation.

*Return to List of Global Attributes

*Return to Top of Page


Variable Attributes

Variable-scope-attributes are linked with each individual variable, and provide additional information about each variable. A standard set of these attributes is very important, for this is where the information can be stored in a commonly defined manner. Note that CDF attributes are case-sensitive and must exactly follow what is shown here. The variable attributes can be listed in any order.

See below for Bare Bones Attributes. Each variable needs adequate information about structure and for interpretation, for listing, and for plotting.

Structure.

These are all Bare Bones Attributes. First we define each variable as either data, support_data, or metadata as discussed in How to Structure Data in CDF, and put this information in the variable attribute VAR_TYPE. If the variable is time varying it needs a DEPEND_0 attribute defined with the value "Epoch". If the variable is 1-dimensional it needs a DEPEND_1 attribute defined with the value being the name of the variable on which it depends. If the variable is 2-dimensional it needs both DEPEND_1 and DEPEND_2 attributes defined.

Interpretation.

The Bare Bones label attributes that must be defined for all data variables and for most support_data variables are: CATDESC, FIELDNAM, LABLAXIS (LABL_PTR_1), UNITS (UNIT_PTR). There are extra attributes used to provide additional information: DICT_KEY, and VAR_NOTES.

Listing and plotting.

The Bare Bones attributes used specifically in the listing or display of data are: VALIDMIN/VALIDMAX, FILLVAL, and FORMAT. There are extra attributes used to provide additional information: DISPLAY_TYPE, SCALETYP, and AVG_TYPE.

Bare Bones Examples:

for scalar data in Table 1, for vector data in Table 2, and for 2D data in Table 3, with example values in each case.

*Look at All Variable Attribute Definitions

*Return to Top of Page


Table 1

        Variable Attributes for Scalar "Density"

Attribute Name       Data Type     Example of Attribute Value 

VAR_TYPE             CDF_CHAR      data  
DEPEND_0             CDF_CHAR      "Epoch"  

CATDESC              CDF_CHAR      Proton number density determined 
                                   from a moment calculation, scalar   
FIELDNAM             CDF_CHAR      Proton No. Density 
LABLAXIS             CDF_CHAR      Np 
UNITS                CDF_CHAR      no/cc  

VALIDMIN             CDF_REAL4     0.0  
VALIDMAX             CDF_REAL4     50.0  
FILLVAL              CDF_REAL4     -1.0E31   
FORMAT               CDF_CHAR      F6.2  


                     "Attached Variables"  

                          "Epoch"  
                   VAR_TYPE = support_data 

Back to Variable Attributes description


Table 2

          Variable Attributes for Vector "Magnetic Field"
  
Attribute Name           Data Type          Attribute Value  

VAR_TYPE                 CDF_CHAR           data  
DEPEND_0                 CDF_CHAR          "Epoch"  
DEPEND_1                 CDF_CHAR          "cartesian"  

CATDESC                  CDF_CHAR           Magnetic Field Vector, 
                                            GSE cartesian coordinates  
FIELDNAM                 CDF_CHAR           Magnetic Field Vector   
LABL_PTR_1               CDF_CHAR           "label_b" 
UNITS                    CDF_CHAR           nT  

VALIDMIN                 CDF_REAL4         -10000.0  
VALIDMAX                 CDF_REAL4          10000.0  
FILLVAL                  CDF_REAL4         -1.0E31   
FORMAT                   CDF_CHAR           F8.3  


                       "Attached Variables" 

                              "Epoch"  
                       VAR_TYPE = support_data 

                            "cartesian"  
                       VAR_TYPE = support_data     
                          [1] = "x" 
                          [2] = "y" 
                          [3] = "z"  

                             "label_b"  
                        VAR_TYPE = metadata     
                          [1] = "Bx GSE" 
                          [2] = "By GSE" 
                          [3] = "Bz GSE"

Back to Variable Attributes description


Table 3

              Variable Attributes for 2D "Flux"

Attribute Name        Data Type         Attribute Value

VAR_TYPE              CDF_CHAR          data  
DEPEND_0              CDF_CHAR          "Epoch"  
DEPEND_1              CDF_CHAR          "Energy"  
DEPEND_2              CDF_CHAR          "Pitch_angle" 

CATDESC               CDF_CHAR          Electron Flux at 8 energies 
                                        5-1361 keV and 5 pitch  
                                        angles 30-150 deg  
FIELDNAM              CDF_CHAR          Electron Flux  
LABL_PTR_1            CDF_CHAR          "energy_Flux"  
LABL_PTR_2            CDF_CHAR          "pitch_Flux"  
UNITS                 CDF_CHAR          no./cm**2-s  

VALIDMIN              CDF_REAL4         0.0  
VALIDMAX              CDF_REAL4         1.0e11  
FILLVAL               CDF_REAL4         -1.0E31   
FORMAT                CDF_CHAR          E11.3  
 

                      "Attached Variables"  

       "Epoch" 
VAR_TYPE = support_data                      "energy_Flux" 
                                           VAR_TYPE = metadata  
       "Energy"                           [1] = "e- Flux 8keV"  
VAR_TYPE = support_data                   [2] = "e- Flux 25keV"  
       [1] = 8.0                          [3] = "e- Flux 54keV" 
       [2] = 25.0                         [4] = "e- Flux 120keV" 
       [3] = 54.0                         [5] = "e- Flux 210keV" 
       [4] = 120.0                        [6] = "e- Flux 380keV"
       [5] = 210.0                        [7] = "e- Flux 0.76MeV" 
       [6] = 380.0                        [8] = "e- Flux 1.22MeV"  
       [7] = 760.0
       [8] = 1220.0                          "pitch_Flux"  
                                           VAR_TYPE = metadata 
    "Pitch_angle"                         [1] = "e- Flux 30deg" 
VAR_TYPE = support_data                   [2] = "e- Flux 60deg"  
       [1] = 30.0                         [3] = "e- Flux 90deg" 
       [2] = 60.0                         [4] = "e- Flux 120deg"
       [3] = 90.0                         [5] = "e- Flux 150deg" 
       [4] = 120.0 
       [5] = 150.0

Back to Variable Attributes description


Variable Attribute Definitions

AVG_TYPE --- optional

sets up useful default conditions: different techniques appropriate to averaging different types of data. If this attribute is not present, standard average, i.e., simple arithmetic mean, is assumed. The value of this attribute can be used with application software. The valid options are listed below.

CATDESC --- required

(catalog description) is an approximately 80-character string which is a textual description of the variable and includes a description of what the variable depends on. This information needs to be complete enough that users can select variables of interest based only on this value. (see CDAWeb www-based interface via URL https://cdaweb.gsfc.nasa.gov/space/). Examples :

DELTA_PLUS_VAR and DELTA_MINUS_VAR --- optional

are included to point to a variable (or variables) which stores the uncertainty in (or range of) the original variable's value. The uncertainty (or range) is stored as a (+/-) on the value of the original variable. For many variables in ISTP, the original variable will be at the center of the interval so that only one value (or one set of values) of uncertainty (or range) will need to be defined. In this case, DELTA_PLUS_VAR, and DELTA_MINUS_VAR will point to the same variable. See Particles (space) for an example. The value of the attribute must be a variable in the same CDF data set.

DEPEND_0 --- required for time-varying variables

explicitly ties a data variable to the time variable on which it depends. All variables which change with time must have a DEPEND_0 attribute defined. The value of DEPEND_0 is "{\em Epoch}", the time ordering parameter for ISTP. Different time resolution data can be supported in a single CDF data set by defining the variables Epoch, Epoch_1, Epoch_2, etc. each representing a different time resolution. These are "attached" appropriately to the variables in the CDF data set via the attribute DEPEND_0. The value of the attribute must be a variable in the same CDF data set. See also How to Structure Data in CDF.

DEPEND_1, DEPEND_2, etc --- required for dimensional variables

All variables which have dimensionality (separately from time which is considered here as the zero$^{th}$ dimension) must have DEPEND attributes defined. The number of DEPEND attributes must match the dimensionality of the variable, {\em i.e.,} a one-dimensional variable must have a DEPEND_1, a two-dimensional variable must have a DEPEND_1 and a DEPEND_2 attribute. The value of the attribute must be a variable in the same CDF data set.See also How to Structure Data in CDF, for example Particles (space) .

DERIVN --- Cluster required for derived variables

A text string identifying the derivation of the variable, possibly including a function/algorithm name or journal reference. Most derived variables will not be unique, and this information is essential if the product is to be compared/validated elsewhere.

DICT_KEY --- optional

comes from a data dictionary keyword list and describes the variable to which it is attached. The ISTP standard dictionary keyword list is described in ISTP Dictionary Keywords.

DISPLAY_TYPE --- optional

tells automated software what type of plot to make and what associated variables in the CDF are required in order to do so. Some valid values are listed below:

FIELDNAM --- required

holds a character string (up to 30 characters) which describes the variable. It can be used to label a plot either above or below the axis, or can be used as a data listing heading. Therefore, consideration should be given to the use of upper and lower case letters where the appearance of the output plot or data listing heading will be affected.

FILLVAL --- required

is the number inserted in the CDF in place of data values that are known to be bad or missing. Fill data are always non-valid data. The ISTP standard fill values are listed below. Fill values are automatically supplied in the ISTP CDHF ICSS environment (ICSS_KP_FILL_VALUES.INC) for key parameters produced at the CDHF. For key parameters produced outside of the CDHF, the values below should be used.

FORMAT --- required if not using FORM_PTR

is the output format used when extracting data values out to a file or screen (using CDFlist). The magnitude and the number of significant figures needed should be carefully considered. A good check is to consider it with respect to the values of VALIDMIN and VALIDMAX attributes. The output should be in Fortran format.

FORM_PTR --- required if not using FORMAT

has as its value a variable which stores the character strings (up to 20 characters per character string) representing the desired output format for the original variable. FORM_PTR is used {\em instead of} FORMAT. The value of the attribute must be a variable in the same CDF data set.

LABLAXIS --- required if not using LABL_PTR_1

should be a short character string (approximately 10 characters, but preferably 6 characters - more only if absolutely required for clarity) which can be used to label a y-axis for a plot or to provide a heading for a data listing.

LABL_PTR_1, LABL_PTR_2, etc. --- required if not using LABLAXIS

is used to label a dimensional variable when one value of LABLAXIS is not sufficient to describe the variable or to label all the axes. LABL_PTR_i is used {\em instead of} LABLAXIS, where $i$ can take on any value from 1 to $n$ where $n$ is the total number of dimensions of the original variable. The value of LABL_PTR_1 is a variable which will contain the short character strings which describe the first dimension of the original variable. The actual labels should be short as described above for LABLAXIS. The value of the attribute must be a variable in the same CDF data set. See also How to Structure Data in CDF, for example Magnetic Fields (space) .

MONOTON --- optional

indicates whether the variable is monotonically increasing or monotonically decreasing. Use of MONOTON is strongly recommended for the Epoch time variable, and can significantly increase the performance speed on retrieval of data. Valid values: INCREASE, DECREASE.

OFFSET_0 --- optional

is used as a way to carry multiple time resolutions or multiple time tags offset from each other in a file, while maintaining only one time that is the record ordering parameter. The variable which holds the time offset(s) is the value of the attribute. The value of the attribute must be a variable in the same CDF data set.

SCALEMIN and SCALEMAX --- optional

are values which can be based on the actual values of data found in the CDF data set or on the probable uses of the data, {\em e.g.}, plotting multiple files at the same scale. Visualization software can use these attributes as defaults for plotting. The values must match the data type of the variable.

SCALETYP --- recommended for non-linear scales if not using SCAL_PTR

indicates whether the variable should have a linear or a log scale as a default. If this attribute is not present, linear scale is assumed.

SCAL_PTR --- recommended for non-linear scales if not using SCALETYP

is used for dimensional variables when one value of SCALTYP is not sufficient. SCAL_PTR is used {\em instead of} SCALTYP, and will point to a variable which will be of the same dimensionality as the original variable. The allowed values are linear and log. The value of the attribute must be a variable in the same CDF data set.

sig_digits --- Cluster recommended

This attribute provides the number of significant digits or other measure of data accuracy in a TBD manner. It is to allow compression software to optimise the number of digits to retain, and users to assess the accuracy of products. This operation is subject to the deliberations of the `network traffic report' Task Group, DS-CFC-TN-0001, on compression algorithms and implementation. Restrictions on data compression may also influence the format and choice of data type used by the CDF generation software.

SI_conversion --- Cluster recommended

The conversion factor to SI units. This is the factor that the variable must be multiplied by in order to turn it to generic SI units. It will copntain two text fields separated by the delimiter >. The first is the conversion and the second is the standard unit that it converts to. For example the magnetic field for FGM will be in nT, and to convert to Tesla the value of SI_conv will be `1.0e9>Tesla'. Note, not 1.0e-9. The use of text allows this attribute to be parsed and the value must be extracted in software.

UNITS --- required if not using UNIT_PTR

is a character string (no more than 20 characters, but preferably 6 characters) representing the units of the variable, e.g., nT for magnetic field. If the standard abbreviation used is short then the units value can be added to a data listing heading or plot label. Use a blank character, rather than "None" or "unitless", for variables that have no units (e.g., a ratio or a direction cosine).

UNIT_PTR --- required if not using UNITS

has as its value a variable which stores the character strings (up to 20 characters per character string) representing the units of the original variable, which can be added to a data listing heading or plot label. Use a blank character, rather than "None" or "unitless", for variables that have no units (e.g., a ratio or a direction cosine). If this attribute is used, then UNITS is not used. The value of the attribute must be a variable in the same CDF data set.

VALIDMIN and VALIDMAX --- required

hold values which are, respectively, the minimum and maximum values for a particular variable that are expected over the lifetime of the mission. The values must match the data type of the variable.

VAR_NOTES --- optional

holds ancilliary information about the variable and can be any length.

VAR_TYPE --- required

identifies a variable as either

V_PARENT --- optional for use with derived variables

identifies the "attached" variable which stores the parent variable(s) of a derived variable. The `attached" variable can be dimensional and sized to hold as many parents as necessary. The syntax of each entry would be: logical_file_id>variable_name.

*Return to List of Variable Attributes

*Return to Top of Page


Return to ISTP Metadata Guidelines

CDF home page