ftp://nssdcftp.gsfc.nasa.gov/spacecraft_data/imp/imp8/mag/320ms_ascii/doc/imp8_mag_320ms_proc.txt On the 2008-2009 reprocessing of IMP 8 0.32s data Joe King and Natalia Papitashvili In early 2008, we were provided by Adam Szabo a 1973-2000 IMP 8 magnetic field data set having a 320ms resolution. We were asked to produce a clean-as-reasonably-possible copy of the data. This documentation file describes the reorganization of the data provided, the “cleaning” of this reorganized data set, and the resultant data set available from CDAWeb and, in daily ASCII files, from this directory. This file describes in sequence: A. Reorganization of input data for subsequent cleaning B. Removal of data from intervals of sparse data (“desparsing”) C. Removal of anomalous data (of various catgegories of anomalies) C.1 Spikes of one to a few points C.2 Points with |Bi| larger than sensor range constraints C.3 Isolated 20.48s "square waves" C.4 Extended intervals of contiguous 20.48s "square waves" D. Removal of spacecraft spin-frequency power (“despinning”) E. Removal of intervals of significant difference from prior, credible 15.36s data F. Removal of manually found anomalous data that survived A-E. G. The newly available data of this directory The amounts of data removed and despun are indicated in the following, as are pointers to lists of points or intervals removed for various causes. It is the expectation of the IMP 8 MAG team to retain the input data set used in this exercise. ------------------------------------- A. Data reorganization The input data set had 6-hour files with 320ms records having magnetic field Cartesian components and azimuthal and elevation angles in payload, GSE and GSM coordinates, plus time tags and spacecraft position information. We created an intermediate data set that (a) retains only time tags and GSE field components, (b) eliminates 320ms records whose time tags were earlier than previously handled data (i.e., we removed out-of-sequence data) and (c) has 1-day output files. Out-of-sequence data arose because occasionally two telemetry ground stations captured the same segment of data, and such overlapping data were carried forward to Principal Investigator teams. Thus we assume no unique data were eliminated by this process. The reorganized version of the input data for 1978 and 1996 are visible at http://ftpbrowser.gsfc.nasa.gov/imp_mag_320ms_org.html -------------------------------------- B. Sparse data removal We want to remove sparse data because they may be unreliable, difficult to reliably test as spikes, and typically of only marginal scientific utility. In brief, and with EWM25C meaning “each of whose minutes has >25% coverage,” (a) we ignore minutes with <25% coverage (i.e., with <47 320ms data points, (b) we retain EWM25C intervals of >= 10 minutes duration, (c) we delete EWM25C intervals of <5 minutes duration, (d) we retain or delete 5-9 minute EWM25C intervals according to proximity to neighboring EWM25C intervals, as described below. In further detail, at the start of the data set, or after a data gap (one or more minutes having <25% coverage), find the first minute having >25% coverage. Say it starts at time T0. Test successive minutes starting at T0+60s, T0+120s,etc. Let the first such minute tested that does not have >25% coverage start at T1. Then the time interval T0-T1 is made up of (T1-T0)/60s 1-min subintervals each of which has >25% coverage. Keep this EWM25C interval if its duration >= 10 minutes. A data gap starts at T1. Find the first following minute having >25% coverage. Let it start at T2. Again test 1-min intervals starting at T2+60s, T2+120s, etc. Let T3 be the start of the first minute after T2 that does not have >25% coverage. Keep the T2-T3 EWM25C interval if its duration >= 10 mins. Let T3-T4 be the next data gap, and T4-T5 the next EWM25C interval. Keep the T4-T5 interval if its duration >= 10 mins. Consider next the conditions under which to retain EWM25C intervals of less than 10 mins duration. If the interval's duration is <5 mins, delete it. If an EWM25C interval having between 5 and 9 minutes is separated from one or the other of its neighboring EWM25C intervals (i.e., the EWM25C interval before it or the one after it) by a gap of less than 5 minutes, and that neighboring EWM25C interval has at least 5 minutes duration, the interval should be retained. Any EWM25C interval having duration <10 mins and separated from both its neighbors by >= 5 minutes should be deleted. This process of desparsing led to the elimination of ~4% of the 320ms input data points, distributed more or less uniformly across 1973–2000. ------------------------------------ C. Anomalous data removal: These steps remove four types of anomalous data C1. single-point or multi-point (<~10 points) spike C2. points with unrealistic |Bi| values C3. “square waves” of duration ~20.48s typically occurring in Bx and By C4. Intervals of duration minutes to tens of minutes or longer that have multiple, contiguous 20.48s “square waves” Anomalous data of types C1-C3 were automatically deleted. Intervals of type (C4), occurring only a few time per year, were checked manually with the help of FTPBrowser plots. In more detail, the algorithms used to identify these data anomalies are C1. For finding and eliminating single or multi-point spikes, we read the first point in an input daily data file. The time tag for this point becomes the time base for all subsequent reads for the day. We shall work with 1-min averages and standard deviations whose first points are offset from the initial point’s time tag by integral numbers of minutes (“integral minutes” for secs 0-60, 60-120, etc.) or are offset from that first point by odd numbers of 30-sec intervals (“offset minutes” for secs 30-90, 90-150, etc.) We read data for seconds 0-60 (“relative to the time base” is implicit here and in the following) Owing to the preceding “desparsing” activity, this first minute should have at least 47 data points. For each field component, we compute a mean and standard deviation sigma. (We suppress subscripts denoting x,y,z components; points are taken as spikes if any one or more components satisfies the spike requirements defined in the following.) Each point, Bj, in the first 30 sec of the first minute are tested against the spike condition |Bj-|>k*sigma, using the first 60-min and sigma values. We have used k=2 for cases when points are tested once (as for the 1st 30 sec as just discussed) and k = 3.5 when points are tested twice (as discussed in the following). To test the points in the next 30 sec, we read the next not-yet-read 30 sec of data (i.e., for seconds 60-90 past the time tag of the 1st point.) We create minute averages and standard deviations for the offset minute spanning seconds 30-90 after determining that such a minute has at least 6 points. (Note that the possibility of this minute having <6 points is not ruled out by our desparsing activity which only addresses “integral minutes” and not “offset minutes.” Then each point Bj in the interval 30s – 60s is tested against the |Bj-|>k*sigma condition twice, once with the and sigma for the minute defined by seconds 0-60 and once with those for the minute defined by seconds 30-90. A point must satisfy both conditions to be declared a spike. If an offset minute with fewer than 6 points is encountered, the points in the first half of this minute, if any, are tested only against the means and sigmas of the integral minute whose 2nd half is the same as the 1st half of the offset minute. The program then seeks the next integral minute retained by the desparsing process and the procedure is repeated. An example of a one-point spike removed by this process may be seen at time 1978/009.39540497 at http://ftpbrowser.gsfc.nasa.gov/imp_mag_320ms_org.html. C2. On July 11, 1975, the IMP 8 magnetometer was frozen into a +/- 36 nT range for each of the orthogonal sensors. Some points of the input data set had |Bi|>36 nT. Making allowance for possible zero-level offsets and for GSE Bx and By vector components not being instantaneously aligned with the sensors in the spacecraft spin plane, we deleted all points having |Bi|>38.5 nT, for i = x or y or z. An example of a deleted point is in the original 1978 dataat day 46.06976500. C3. See-sun direction was determined on IMP 8 every ~20.48s. Occasionally there would be an error in this determination for single 20.48s intervals that would lead to erroneous Bx and By values. These isolated ~20.48 sec “square waves” are found as follows. Differences in Bx and By between successive 320ms points are examined. If a pair of points is found where the magnitude of the Bx and By changes are greater than 1.2 nT and 6.5 nT respectively, the time tag is noted. If another such change occurs (with opposite signs) between 19 and 21 sec later, we take the intervening data to be a square wave and we delete it. Two examples of isolated "square waves" may be seen at 02:24 and 20:48 of January 3, 1996 at http://ftpbrowser.gsfc.nasa.gov/imp_mag_320ms_org.html. Elimination of spikes (item C1), of points with |Bi| > 38.5 nT (item C2), and of the isolated 20.48s square waves (item C3) combined led to the removal of about 0.3% of all the input data. C4. Occasionally there occurred in the input data extended intervals (many minutes to 10’s of minutes to, on 28 occasions, whole days) that seem to consist of contiguous and anomalous square waves in Bx and By. To find these, we searched for pairs of 320ms points in which Bx and By changed by at least 4 nT. We noted the times. We then asked whether the time spacing between such pairs of points was between 20.40 and 20.50 sec. If they were, we retained the time tags of the transitions. If we accumulated >4 such transition pairs separated by 20.4-20.5 sec within 5 min., we write out the interval start and stop times into a temporary file. We then examined such intervals manually with the help of FTPBrowser’s plotting capability to make retain or delete judgements. The full list of 119 days having such intervals deleted is given at ftp://omniweb.gsfc.nasa.gov/imp8/mag/320ms_ascii/cleaned/doc/ concat_sqwvs.txt. Approximately 0.15% of the input data were deleted in this step. Note that over half the affected days occur in the year 1978. Data for the individual 1978 and 1996 intervals may be seen graphically in the reorganized original data set using http://ftpbrowser.gsfc.nasa.gov/imp_mag_320ms_org.html. ---------------------------------------- D. Despinning In the input 320ms data set used for our processing, there was significant spin modulation in the GSE Bx and By components that were measured in a plane approximately normal to the IMP 8 spin vector. The IMP spin period was about 2.6 sec. The plot at http://ftpbrowser.gsfc.nasa.gov/imp_gif/gif/x_1996003_0608.gif shows an example of anomalous power in Bx at the IMP spin frequency and its harmonic. This section describes our efforts to eliminate or minimize such spin effects. We assess 1-hour spans of GSE Bx and By data after having eliminated the various categories of anomalous data discussed above. We look for anomalous power at/near the spacecraft spin frequency (near 22-23 rpm and slightly variable) or its second harmonic. If we find such power at either frequency, we use a notch filter to remove the power from both frequency intervals. This is done for Bx and By independently. First, we perform Fast Fourier Transforms on the input data. We take the small band 0.36-0.39 Hz as the spin frequency, and 0.74-0.77 Hz as its harmonic. We determine the mean power in each of these bands, and, for each band, we take the mean power in the contiguous bands ("sidebands"). That is, we determine one mean power across the bands 0.30-0.33 Hz and 0.42-0.45 Hz and a second mean power across the bands 0.69-0.72 Hz and 0.80-0.83 Hz. We also determine the standard deviations in these two sideband mean power determinations. We then ask if the mean power in either of the central bands (spin frequency or its lowest harmonic) is more than one sideband standard deviation greater than the mean power of the corresponding side bands. If not, we accept the hour of data with no further processing (for Bx and By separately, although it would be highly unusual for one component but not both to have anomalous spin effects.) If so, we proceed as follows. We develop notch filters in IDL(6.4) to eliminate power in the bands 0.34-0.40 Hz and 0.72-0.79 Hz. Parameters mb1 and mb2 in the IDL program notch_filter (res,f1,f2,mb1,mb2) have been assigned the values 50 and 200, respectively, in our runs. In convolving the notch filters with the input data sets, we actually address the hour of interest plus the 10-min span on either side of the hour of interest. For the interval cited above, the post- filtering intensity-time plot and its FFT are visible at http://ftpbrowser.gsfc.nasa.gov/imp_gif/gif_adam/x_1996003_0608.gif. Of all the 1973-2000 hours addressed, we had to remove spin frequencies from ~75% of them. No cases were found where only one of the two spin plane components needed notch filtering. ALERT: In coming out of an IMP data gap, there are sometimes anomalous periods, of durations up to 20.48s, when Bx and By exhibit unrealistically large amplitude, spin-period oscillations, followed by reasonable-looking data. Such periods were not caught by the anomalous data finders C1-C4 discussed above. Performing notch filtering in the presence of data gaps and anomalous Bx and By behavior typically removed the prior spin period oscillations (although left Bx and By with anomalously large values for the interval of the prior oscillations), but introduced smaller-amplitude spin period oscillations prior to the data gap and immediately after the interval of the initial large-amplitude oscillations. These intervals of artificially introduced spin period oscillations have durations on the order of a minute. Some but not all of these small-amplitude periods were manually detected (see section F below) and deleted. Thus, the potential data user is alerted to the likelihood that any spin period oscillations encountered in the 320ms data during the minute before or after a data gap are spurious. An example of this is data for hour 9 of January 14, 1978. One sees in the original data [http://ftpbrowser.gsfc.nasa.gov/imp_mag_320ms_org.html] a 6-minute data gap followed immediately by a 9-sec interval with Bx and By spin-period oscillations of amplitude +/- 30 nT. Now, if we look at the notch-filtered and cleaned data at http://ftpbrowser.gsfc.nasa.gov/imp_mag_320ms_cln.html, we see that the anomalously large-amplitude data of the 9-sec interval have been removed by the steps of Section F below, but, because we designated too narrow an interval for deletion, the minute or so before and after the gap in the cleaned data have spurious small-amplitude (+/- 1 nT) spin-period oscillations before and after the gap. ------------------------------------ E. Removal of data inconsistent with prior 15.36s data This section gives details of the editing of the reprocessed 320ms data to accommodate occasional significant differences between the reprocessed data and the long-available 15.36s data from the IMP 8 magnetometer. After "desparsing," despiking and despinning the 320ms data, and eliminating isolated 20.48s "square waves" and extended concatenations of such intervals, as discussed in the foregoing, we compared daily plots of cleaned 320ms data with daily plots of old 15.36s data. We saw many intervals, of durations 10s of minutes to hours, having differing profiles between new and old Bx(GSE) and By(GSE) data, with the new data frequently being the more suspicious. At this point, the old data were quite spiky, but this did not mask the temporally extended old-new profile differences. An example of the type of discrepancy referenced may be seen at the url cited two paragraphs below, for July 22, 1978, hours 7-9. We attribute these differences to the fact that the long-available 15.36 sec data were generated from 320 msec data, but these original 320 msec data were not recorded at that time due to storage limitations back in the 1970s and 80s. The IMP 8 level 0 data are analogue, and their modern reprocessing could not access all the required housekeeping data for all time periods, resulting in an increased number of incorrect time intervals. We made new 15.36s averages from the new 0.32s data and made a set of data files containing both old and new Bx, By, Bz values. These old-new-merged data can be viewed at http://ftpbrowser.gsfc.nasa.gov/mag_merge_15s.html. Virtually always for Bz, and most of the time for Bx and By, there was such agreement that the colored traces of old and new overlaid each other. To see when there were both old and new data, and when there were only new or old data but not both, we plotted in the Bz panel Bz(new) and Bz(old) - 3.0nT. We then determined to make judgements preferring old or new data in 30-minute increments. To aid this process, we made a new tool that would make daily new-old plots for any day having at least one 30-min interval where our threshold for new-old disagreement was exceeded. The tool "paused at" each such 30min interval and invited us to make a new-old preference judgement. The tool then captured the start points of all 30min intervals for which the old data were chosen as being the more credible. The measure of new-old disagreement level for a 30-min interval was the average value, DBA, of DB = SQRT{[Bx(old)-Bx(new)]**2 + [By(old)-By(new)]**2} wherein the two largest DB values were excluded from the 30-min means to minimize effects of spikes in the old data. In our operational runs, we chose DBA >= 3.0 nT as marking a suspicious 30-min interval to be checked. This meant that we made new-old preference judgements for 9,944 30-min intervals of the 286,000 30-min intervals available. The number of 30-min intervals judged to have new data inferior to old data was 7208, or about 72% of the intervals evaluated. (The 7208 represents ~1.5% of the ~27 year data set.) For each of these 30-min intervals we deleted all 320ms data from the data set previously cleaned for the issues enumerated above. Performing this judgement process on a semi-automated basis with a 30-min granularity, rather than considering each interval of significant old-new differences and making manual determinations of exact start and end times of the discrepancies, means that some good 320ms data will have been eliminated, and some suspicious 320ms data will have been retained. The list of 7208 30-min intervals deleted is available at ftp://nssdcftp.gsfc.nasa.gov/spacecraft_data/imp/imp8/mag/320ms_ascii/doc/ disagree.txt. A small list of 14 intervals where both the old and new data were judged bad, and where the 320ms data were deleted, is available at ftp://nssdcftp.gsfc.nasa.gov/spacecraft_data/imp/imp8/mag/320ms_ascii/doc/ bad_old_new.txt In most cases, it was fairly clear whether the new or old data were to be preferred. Typically, new and old data were coincident in time, and then at one point one trace would diverge discontinuously from the second trace; the latter would show no such discontinuity. The trace with the continuity was judged to be preferred. In most but not all such cases, the preferred trace was the old data. In some cases, it was less obvious whether new or old was to be preferred. While the default was to prefer the new data in such cases, there was a broad spectrum of differences between new and old. The new-old preferences were made by one person (JK) over a period of about two weeks, which hopefully yielded a mostly, albeit imperfectly, uniform approach to new-old judgements. We have retained the old-new-merged 15.36s data set. As discussed elsewhere, we have created a new 15.36s data set from the final 320ms data set, with inclusion of 30-m increments of old data (separately despiked) for those intervals where the old data were judged to be credible and preferable to the new data. ---------------------------------- F. Removal of residual anomalous points. Browsing data that passed through all the edits discussed above revealed a set of about 600 residual anomalous intervals, almost all of which were of durations <2 minutes, although one or two multi- hour intervals not previously found were also noted. We deleted these intervals that, in aggregate, represented about 0.02% of the data. The list of these intervals is given at ftp://nssdcftp.gsfc.nasa.gov/spacecraft_data/imp/imp8/mag/320ms_ascii/doc/ residual.txt As noted at the end of Section D above, some of the points removed at this step were introduced by the notch filtering performed to remove anomalous spacecraft-spin-period power in the presence of data gaps and anomalous large-amplitude, spin-period Bx and By oscillations lasting up to 20.48 sec immediately following some gaps. The anomalous large-amplitude, post-gap data will show in the plots of http://ftpbrowser.gsfc.nasa.gov/imp_mag_320ms_org.html, but the smaller-amplitude anomalies introduced by the notch filtering will not. ------------------------------------ G. The new 320ms data set. The cleaned 320ms data set is available with graphical display capability at http://ftpbrowser.gsfc.nasa.gov/imp_mag_320ms_cln.html and at http://cdaweb.gsfc.nasa.gov/ It is also available for ftp download in daily ASCII files at ftp://nssdcftp.gsfc.nasa.gov/spacecraft_data/imp/imp8/mag/320ms_ascii/ and in daily CDF's from ftp://cdaweb.gsfc.nasa.gov/pub/istp/imp8/mag_320ms The format of the ASCII records is very simply: Word Format Year I4 Fractional day of year F14.8 Bx(GSE, nT) F8.2 By(GSE, nT) F8.2 Bz(GSE, nT) F8.2 |B| (nT) F8.2