Registering Data Products with SPASE Metadata Descriptions**

1. Introduction of SPASE and SPDF Relationship

The Space Physics Archive Search and Extract (SPASE) metadata or information model consists of a set of terms and values along with the relationships between them that describe all the resources in a heliophysics data environment. It is the result of many years of effort by an international collaboration to unify and improve continuously the descriptions of space and solar physics data and models. The intent of this metadata model is to provide the means to describe heliophysics resources, most importantly scientifically useful data products, in a uniform way so that they may be easily registered, found, accessed, and used. The SPASE metadata model was created in 2005 to externally described datasets to enable searching for data across many repositories.

The Space Physics Data Facility (SPDF) has participated in the SPASE effort from its inception. Earlier, the International Solar-Terrestrial Program (ISTP) coordinated space and ground missions and recognized the need for standardized metadata for describing the datasets and how to use them. As a result, SPDF coordinated the ISTP metadata guidelines used internally in the self-describing scientific data formats, initially Common Data Format (CDFs) and now netCDFs as well. This standard metadata has enabled the development of generic science analysis and display tools, such as Coordinated Data Analysis Web (CDAWeb) and Autoplot.org.

When a new dataset in CDF is added to the SPDF archive, the SPASE Metadata Working Team (SMWT) works with the SPDF team to create a SPASE description using the SPASE Active Data Archive Product Tracking (ADAPT) software to translate the ISTP metadata to SPASE metadata and add additional SPASE metadata. SMWT will then make the SPASE landing page at SPASE Metadata and register a DOI for the dataset if it is NASA product. (Note: The SMWT can also make SPASE descriptions and mint DOIs for non-NASA products upon request.)

The SPDF collaborates with the SPASE team through meetings, working groups, and events.

2. SPASE Details

The SPASE metadata model can be used to describe several resource types. A key resource type is Numerical Data. This type of resource typically consists of a set of files containing values of one or more physical variables (or measured parameters) and that differ from each other only by the time span. To fully describe a numerical data resource requires other types of resources, namely Observatory, Instrument, Person, and Repository, whose names are self-explanatory, and each of which has its own set of attributes. Often, Numerical Data are presented in prepared images or graphical displays (e.g., gif or jpeg), and such presentations are referred to as Display Data resources. The other data related resource types are:

· Catalog which can be a list of events

· Collection which can be for the data resource description

· Annotation which enables expert comments on data products

· Granule which describes individual files within another resource (i.e., Numerical Data, Display Data or Catalog)

Other types of resources include:

· Document which can contain narratives or supporting information

· Service that provides software to use data resources

· Repository for storage locations

· Software for representations of digital data

· Registry for metadata collections. Resource descriptions and the links in them are intended to make the resource useful to scientific users.

The SPASE metadata model is used to uniformly document and describe all electronically accessible heliophysics and space weather resources (data, models, tools, and documents). The purposes of adopting a standard metadata model are (1) to simplify the development and implementation of data services for searching and accessing the resources and (2) to enable the understanding of resources to allow independent use of the resources by the Heliophysics community.

The SPASE group is an international community of scientists, specialists, information engineers, and system designers who are endeavoring to create standards and services to enable the open exchange of Heliophysics data. Within this community, there are a variety of archives, data centers, virtual observatories, and other data-related resources acquiring, holding, and distributing data. The SPASE group has simplified the search for data through the development of the SPASE metadata model as a common language to describe datasets in the various archives.

SPASE Resource IDs registered on the SPASE registry (hosted on Github) are organized by Naming Authorities, with e.g., NASA as the Naming Authority for all resources produced from NASA missions. SPASE records, when combined with their associated Digital Object Identifiers (DOIs), would then become the DOI landing pages at SPASE Metadata, although DOIs registered by other entities may point to landing pages other than the SPASE records.

The SMWT operates under the NASA Heliophysics Digital Resource Library (HDRL) with the primary objective of implementing the SPASE metadata model to uniformly document and describe all electronically accessible heliophysics and space weather resources (data, models, tools and documents). In accordance to the NASA Heliophysics Science Data Management Policy, the SMWT will also assist and work with resource providers in creating, maintaining, and registering the SPASE metadata descriptions of their digital resources. The Heliophysics Science Data Management Policy states that all NASA sponsored data products must be described in SPASE.

Some virtual observatories use SPASE metadata for specialized searching across heliophysics resources. The Helio.Data and Heliophysics Data Portal (HDP) are the primary search engines for SPASE records, and provides a web service API. Other search interfaces depend on SPASE metadata, including Heliophysics Digital Observatory (HDO) and Virtual Wave Observatory (VWO).

There are two series of bi-weekly SPASE online meetings. The SPASE Group meeting discusses the SPASE model, enhancements, and changes. The SMWT meeting discusses the actual implementation of the SPASE model to describe digital resources and the SPASE and DOI registries. It also provides feedback on SPASE model issues to the SPASE Group for resolution considerations.

3. SPASE Tools

There are reference implementations and several tools for working with the SPASE metadata and SPASE framework. They include:

Viewer: a browser-based viewer of SPASE descriptions in XML. Drag and drop files into the page to view an html-formatted (i.e., a landing-page) version of the SPASE XML files.
Validator: Determines compliance with the SPASE metadata model.
Editor: Create and edit SPASE XML descriptions.
Resource Tools: Collections of tools and applications for working with resource descriptions. A set of command-line applications to generate, validate, referentially check, use and organize resource descriptions written in SPASE XML.

4. SPASE Documents and Tutorials

The SPASE documentation is a valuable set of references for understanding, using, and creating SPASE resource descriptions and services. SPASE uses a controlled vocabulary to specify its metadata model. The available documentation includes:

Metadata Model: The Metadata Model defines the specification for how to create resource descriptions. The SPASE metadata model can be expressed in many forms, including human-readable model diagrams and the machine-readable XML Schema.

XML Schema: The XML Schema documents define the metadata model for use in XML documents.

More information about SPASE including the tutorials and data model can be found at the SPASE tutorials home page.