Richard Bogart (Stanford University), Robert Hanisch (Space Telescope Science Institute), Joseph King (National Space Science Data Center), Roger Pyle (Bartol Research Institute/University of Delaware), David Sibeck (The Johns Hopkins University Applied Physics Laboratory), Ray Walker (University of California Los Angeles), David Winningham (Southwest Research Institute)
This report describes the system architecture and organization of a data service to manage the data associated with the Sun-Earth Connection theme within the context of the NASA Space Science Data System. We propose a distributed service in which the data sets are managed at a number of sites by scientists actively involved in data analysis. These data providers will be grouped under three service groups: Solar Physics, Terrestrial Environment Imaging, and In Situ Space Physics. The data providers will also be organized into three cross-cutting user views: The Sun as a Star, The Sun in Space, and The Earth in Space. The organization of the proposed data service will consist of a "thin" management layer that will provide a unified budget and reporting path, and a governing council that will ensure interoperability and sensitivity to interdisciplinary needs. The overall budget for the fully organized and operational data service is estimated at $5 million per year.
Scientific discoveries provide the true return from NASA's investment in space exploration. To maximize this return, researchers must be able to exploit the rich database of observations from NASA's past, present, and future missions. Successful data use requires easy location of and access to active and historical archives, as well as the information and tools necessary to interpret the observations. Functions are being incorporated within the NASA Space Science Data System (SSDS) designed to provide common search and access tools across the range of space science. An essential component of the SSDS will be a system designed to manage the abundance of data associated with the Sun-Earth Connection (SEC) theme, with an emphasis on enabling correlative studies both within the theme and within the SSDS. This report recommends a structure for such a service and describes the relation of that service to broader level SSDS activities.
It is both natural (in view of limited budgets) and desirable (for cross-discipline compatibility) that the nascent Sun-Earth Connection Data Service (SECDS) will make use of, and build upon the tools and organizational structure developed by the more mature astrophysics data environment, Planetary Data System (PDS), and other existing systems. These systems have demonstrated that there are many benefits to improving community organization to a greater degree than has been the case within the Sun-Earth Connection theme. Their use of common data systems and formats greatly facilitates the delivery of data. Nevertheless, it is clear that any successful data system must evolve from existing SEC functions and services, rather than being imposed upon the research community. It is also a central tenet of the SSDS that data are best managed by scientists actively engaged in their analysis.
There are some problems in the SEC data environment that are serious impediments to scientific research. These problems include: 1) publicly inaccessible data sets; 2) data sets that are only available in formats which are ineffective for scientific analysis; 3) data set documentation that does not support independent data use; 4) the excessively wide range of data formats in use; and 5) difficulty of locating data in today's distributed data environment. While some important data sets and services are currently conveniently accessible to the SEC community from a number of sites, we believe the establishment of an SECDS is important. The SECDS will be established to: 1) ensure accessibility to a broad suite of data sets; 2) promote interoperable search and use of data and other services across multiple SEC disciplines and sites; 3) improve the interface through which mission data flow to public-access sites; and 4) promote interoperability across the entire space science domain through participation in and adherence to the standards of SSDS.
These recommendations are an outgrowth of the Community Wide Workshop on NASA's Space Physics Data System held at Rice University in June 1993, and are guided by the recommendations of the Task Group on Science Data Management. These recommendations are also responsive to NASA's Science Information Services Study Team preliminary report. They are a direct result of the efforts of the SSDS Technical Working Group (SSDS TWG) to increase interoperability between the various space science disciplines.
The SECDS is also designed to assist scientists in more diverse fields who need to analyze or correlate data arising from research in different traditional disciplines. It must assume responsibility for all data sets of scientific interest resulting from NASA missions and investigations within the Sun-Earth Connection Theme and the former Space Physics Division, including data from such projects as the International Solar Terrestrial Physics Program (SOHO, Polar, Wind, Equator-S, Geotail), Yohkoh, TRACE, Ulysses, ACE, FAST, TIMED, SMM, UARS, KPVT, IMP8, and Voyager 1 & 2 missions. The SECDS should also strive to provide full access to and interoperability with other relevant data archives, including both non-NASA space missions and ground-based observatories. In cases where such data are important to Office of Space Science (OSS) research and are at risk of becoming inaccessible, SECDS should endeavor to preserve, curate, and archive the data in accessible form.
The SECDS will function as an integral component of the SSDS with the aim of developing a level of connection and interoperability useful to scientists involved in cross-disciplinary research. Particular attention should be paid to coordination with closely related disciplines in other fields, such as stellar activity, stellar winds, asteroseismology, particle physics, planetary magnetospheres and atmospheres, and cosmic ray sources.
The scientific user view of SECDS reflects scientific interests that may lie in accessing cross-disciplinary data sets. This suggests that a user interface presenting three cross-cutting thematic categories would be useful. We propose that the thematic categories The Sun as Star, The Sun in Space, and The Earth in Space be implemented at the SECDS coordination level and overseen by the Project Scientist and the Science Members of the Coordinating Council. Each category will consist of a mapping of relevant data sets from areas covered by the three Service Groups (see sections 3 and 4 for details concerning these entities).
As seen in Figure 1, each Service Group is responsible for providing a portion of each thematic view of the data. The exact mechanism by which each thematic view is presented will be determined by the Service Groups.
The overall system architecture of the SECDS is designed to accomplish two goals: 1) to provide users with rapid access to well-documented, Sun-Earth Connection data and 2) to provide efficient management of the SECDS. To this end, the SECDS will consist of three levels that represent different management, user service, and data provider responsibilities (see Figure 2).
The first level in the SECDS architecture will consist of a "thin" Management Office, an Advisory Committee, and a Coordinating Council. These three bodies are described in detail in section 4.0.
The next level of the SECDS will be composed of service groups organized by scientific discipline or data type. We propose that there be three groups at this level that are data-oriented and reflect similar collection procedures and data structures. These groups will be organized around Solar Physics, Terrestrial Environment Imagery (auroral and energetic neutral imaging, for example), and In Situ Space Physics (e.g., interplanetary measurements and magnetospheric, ionospheric, and atmospheric data sets). The responsibilities of these three Service Groups are given in section 3.1.
The third level in this system architecture will consist of a dynamically evolving set of Data Providers that will accomplish specified tasks including data set management and the development of software tools. These Providers may supply data to one or more Service Groups, depending on their function. There may also be Support Providers at this level, which will be chartered to provide specified software tools or support functions to a particular Service Group. The Providers? responsibilities are discussed in section 3.2.
An integral part of the data identification and acquisition process will be the pre- and post-launch interactions between the Level II Service Groups and spaceflight project personnel. These groups will work to ensure the orderly and timely flow of well-documented and standardized data into SECDS archives. To accomplish this task, scientific and technical experts within the Level II Service Groups will interface with project personnel starting early in the project data management planning phases. Data identification and acquisition topics that will be discussed include 1) a Project Data Management Plan (PDMP), 2) relevant standards and guidelines for the preparation and archiving of data and supporting material, and 3) tools and services available through SECDS and elsewhere that will be useful in data preparation and archiving and also for reaching the project's science objectives.
PDMPs will be reviewed by Service Group personnel for adherence to SECDS guidelines and standards. These personnel will also review (and arrange for reviews of) data and supporting material to be archived as those products are first created and judged ready for archiving by Data Providers (project or PI level) and iterate with Data Providers as needed to ensure that the products are correct, complete, comprehensible, and standardized. Some current or older projects may be too budget-constrained to provide the best organized and annotated still-reversible data set and supporting material. In these cases, Service Group personnel will consult with project-level and/or instrument PI personnel to define a data preparation/archiving activity that represents an affordable effort and the best benefit-to-cost ratio data products in light of anticipated future demand. Data prepared for archiving must be documented sufficiently to support independent use.
Service Group and project personnel will explore the feasibility and cost effectiveness of providing public access to SECDS-adherent data and supporting material from project facilities while those facilities exist. In many cases, Service Group personnel will also work directly with instrument Principal Investigators (PIs) concerning the archiving and public accessibility of data sets and supporting material created at PI sites rather than at central project facilities. Such PI-specific efforts (data products, supporting materials, schedules and pathways for archiving, etc.) should also be addressed in the PDMP.
In interactions with both project-level and PI-level personnel, a key role of Service Group personnel will be to explain the available tools and services. These tools and services will aid in satisfying requirements and may be useful for project and PI data management and analysis.
Level II Service Groups identify potential Level III Data Providers, work with them to develop mutually agreeable data formats and media, and sponsor (either funded or unfunded) the preparation by the Providers of data sets in these formats and media, along with documentation sufficient to interpret the observations. The organization and requirements of the Level II Service Groups must be sufficiently flexible to be responsive both to the needs of new missions and to evolving trends in information technology as they affect the scientific community.
Service Groups validate data sets and accompanying metadata as they are produced and ingested.
Rapid access to well-documented data (including the results of relevant models) is the ultimate goal of the SECDS. With advice from the community and in response to user requests, the SECDS will provide for the most rapid possible access to the digital data within the constraint of available resources. The locations of the data repositories and the means of access may vary with both data set and time, in response to community needs.
A distributed, dynamically updated, on-line, searchable catalog of data sets, metadata, software, and relevant models will be an essential feature of the SECDS. The catalog must be consistent with those maintained by the Planetary and Astrophysics communities (i.e., for integration into the SSDS) and must be compatible with the search engines that these communities employ. The catalogs will be automatically updated by the Service Groups and Data Providers using distributed data base technologies, reflecting newly available resources and changes in the status of existing resources. Service Groups, in consultation with each other, the SECDS management, and SSDS representatives, will identify keywords suitable for describing SECDS resources to members of both the SECDS and broader SSDS communities. In particular, these catalogs must be developed in parallel with the 'search' and 'browse' functions described below, so as to provide a comprehensive view of the data holdings in the system. The catalog must provide a short but complete description including keywords, information concerning the time period covered, and pointers to contact persons and bibliographies concerning the resources. Finally, the catalogs (or subsections of them) must be downloadable by interested users or service providers.
Service Groups, and others, will provide search interfaces which serve as primary entry points for potential data users. By querying the search interfaces with appropriate keywords, users obtain lists of all links to resources matching the query. Possible search criteria include, but are not limited to, time interval, spacecraft, instrument/data type, and region of space. It must be possible to use the interface iteratively to isolate the data set of interest. Lower level search engines must be able to forward unsuccessful or incomplete requests to other elements of the distributed search facilities transparently. The search interface should also provide information concerning available tools, value-added products, contact points, and bibliographies for selected data sets.
Service Groups may produce value-added products and tools such as browse (low resolution) parameters and on-line plotting routines suitable for both standard and interdisciplinary scientific research, or they may commission Data Providers to undertake these tasks. The browser should be able to view both the entire data set and selected short intervals when appropriate. These tools should be applicable across the browseable data sets held by the SECDS. Service Groups must ensure that efforts to develop tools do not duplicate those already being used in the community. Tools should be written as system-independent software to facilitate maximum portability. Such software products must be registered in the searchable catalog.
Service Groups will identify general-purpose tools and disseminate knowledge about them to the community and to the other Service Groups. Under some circumstances, with the approval of the Coordinating Council, they may establish Support Providers to develop general-purpose software.
Service Groups will develop and/or apply standards for user interfaces, directories, data formats, documentation, node connectivity, distribution and archive media. They will set expectations for Data Providers and develop criteria for evaluating their success. Proposed standards should be considered and approved by the Coordinating Council.
The Level II Service Groups will primarily be responsible for facilitation of data access to the scientific community. This will involve encouragement and facilitation of adherence to data preparation and archiving standards and schedules. SECDS Service Groups will periodically report to the Coordinating Council (and, through it, to NASA Headquarters) on the level of compliance to expected standards and schedules by spaceflight projects and other NASA-funded Data Providers. Ultimately, it is NASA's responsibility to provide incentives for compliance for both NASA and non-NASA official Data Providers.
Service Groups maintain links to each other and a distributed system of multiple access points to the SECDS.
Service Groups may keep records of user names and data set requests in order to establish usage rates for metric reporting, contact users concerning improvements in service of updates to databases, and identify needs for improvements in service.
All Data Providers will be required to hand the data off to the Service Group prior to the conclusion of the Data Provider?s activities. Upon selection each Provider will prepare a data transfer plan. The data must adhere to SECDS standards for data documentation and be on archivable media. The Service Group will make sure that the data transfer plan adheres to SECDS standards and that the data are in a form that the Service Group can make accessible to the community after the relationship with the Provider formally ends. The Service Group has the responsibility for data access after the Data Provider?s activities end. This does not necessarily mean that the Service Group will provide access to the data directly. Its job is to make sure the data are accessible.
Not all Data Providers need to receive funding from NASA or the SECDS. Other US government agencies or foreign entities may sponsor the formation of Data Providers. If these Data Providers subscribe to SECDS and SSDS data policies, they may associate with Service Groups to become part of the SECDS.
The SECDS Project Scientist, an individual cognizant and experienced in data management and use, will have overall management responsibility for the SECDS. The SECDS Project Scientist will work to facilitate communication between the scientific community and the SECDS, will represent SECDS at scientific and project meetings, and will chair the Coordinating Council. This will be a full time position. The Project Scientist should be an active research scientist in a field related to the Sun-Earth Connections. It is anticipated that the Project Scientist will spend ~50% time on SECDS management and facilitation activities and ~50% on scientific research.
The Project Manager will have the day-to-day management responsibility for the SECDS. Under the direction of the Project Scientist and with the assistance of the Service Groups, the Project Manager will prepare the annual budget for the project. He or she will oversee contract negotiations with the Service Groups and Data Providers. The Project Manager will be responsible for coordinating the activities of the separate SECDS bodies and for the day to day interaction with NASA Headquarters. He or she will complete and submit project financial reports and will coordinate the submission of financial reports from the Groups or Providers. This will be a full-time position.
A half-time clerical position will be provided to support the Project Scientist and Project Manager.
The Coordinating Council will consist of the Project Scientist (Chair), the Chief Scientists of the Service Groups, the Project Manager, and three members of the scientific community. These community members will represent the scientific themes (The Sun as a Star, The Sun in Space and The Earth in Space) in Sun-Earth Connection research. They will have the important task of assuring that the needs of their disciplines are met by SECDS. They will be appointed by the Project Scientist and will be compensated for their work on the Coordinating Council. The Council will be responsible for setting SECDS policy within the overall guidelines of NASA and SSDS policy. They will make financial decisions based on the budget submitted by the Project Manager. They will also set priorities for data ingestion (mission activities and data restoration projects) and for development of standards and tools for data management and access. The Coordinating Council must approve Data Provider funding. It is anticipated that the Coordinating Council will meet 3-4 times per year.
The SECDS Advisory Committee will provide high-level oversight of the activities of the Data Service. It will regularly review the performance of SECDS and report to the Program Manager, Mr. J. Bredekamp. This committee will consist of members of the scientific community appointed through the Program Manager. It will meet twice per year.
The chief scientists of the SECDS Service Groups will participate in the overall management of SECDS by their membership on the SECDS Coordinating Council. They will define and manage the ongoing activities of the Groups. In addition, the Group scientists will be responsible for the performance of the associated Data Providers. They will define upcoming work and prepare and budgets for the Management Office. The Group scientists will prepare bimonthly reports of the Groups? activities and accomplishments and deliver them to the Management Office. It is anticipated that each of the Service Groups will establish its own advisory structure.
The Service Groups and Data Providers (many of which are expected to reside in universities and other institutions outside NASA) will be competitively selected. An initial SECDS RFP will be issued specifying that three Service Groups be selected (Solar Physics, Terrestrial Environment Imagery, and In Situ Space Physics). The responding proposals can specify Data Providers needed to acquire the data, models, and necessary software. Individual proposals may be for the Service Groups only or for combinations of Service Group and Data Providers. The participants will be selected by using the standard NASA peer review process. They will be recompeted every 5 years. The specific relationship between the Service Groups and Data Providers will be determined by individual contracts.
Service Groups may solicit proposals from time to time for specified purposes. They will also annually review unsolicited proposals for Data Providers supporting specific data sets or tools. The SECDS Coordinating Council will have final authority to approve or disapprove the Data Providers.
Responsibility of the SECDS to the broader Space Science community and the SSDS is ensured via the oversight of the ISSOMOWG and SSDS TWG.
Given the breadth of activities and scope of data holdings associated with the SEC theme, and based on a comparison with the other NASA OSS data systems, the full cost of SECDS activities is estimated to be approximately $5 million per year. This includes the costs associated with data services currently funded through active missions. An initial funding level of $1.5 million/year should suffice for the organization and operations of the SECDS Service Groups and Data Providers and the Coordinating Council, ramping up to the $5 million/year level as Data Providers are incorporated into the system.