Open Data Interface - PM 2 - ESTEC 2009-06-04

Place:	ESTEC
Time:	10:00 - 17:50

Attendees:	Hugh Evans	ESA/ESTEC
	Daniel Heynderickx	DHC
	Peter Wintoft	IRF

The current status of the project were presented by Peter and Daniel. Items discussed were:

The URD.
The technical note describing the ODI database.
The ODI administrator user guide.
The structure of the ODI database and how tables are created and populated.
The procedure to ingest data, both from local files and over the internet.
A set of command line tools to query the ODI database.
The SAAPS Data Plotter Tool now accessing the ODI database.
The SEDAT software accessing the ODI database.
The SPENVIS software accessing the ODI database.

The strucure of the ODI database is basically fixed and code exists to set up, create, and populate the the database. A subset of the tools to query the database exists. A subset of the datasets that shall be ingested exists. The SAAPS, SEDAT, and SPENVIS software have been demonstrated to work on the ODI database.

According to the original schedule the complete ODI system should have been delivered at this meeting. However, it has been delayed and a new date for the delivery was suggested to 1st September 2009. The maintenance phase then starts and runs for 6 months until 1st March 2010.

The following items shall be addressed until the software delivery on 1st Sep. 2009:

The administrator user guide:

Describe the datasets that are delivered with ODI in an appendix.
Include a datasets_demo.txt file that contain a set of publicly available datasets.
Section 2.1: Add wget to the prerequisites.
Section 3: Remove the subversion export alternative for downloading the ODI software. Use only a tar-file put on the IRF-Lund server.
Describe that ">>" means the Unix shell.
Give the detailed ODI table description in an appendix.
Describe the syntax of datasets.txt.
Expand Section 5 to include non-CDF datasets.
Add a point 6 to Section 5 to describe the parser routine.
Describe, in an appendix, mandatory fields that must exist in the skeleton table.
Describe the MySQL LOCAL_INFILE option.
Describe more clearly the different user levels. Give examples of users.
$ODI_USER_3 should be changed to $ODI_ADMIN, etc.

Write a user manual for an end user:

User will only pull data out of the ODI system.
Describe interfaces for different languages (PHP, IDL, Java, Matlab).
The manual shall detail how to connect to the database and use the data. The SEDAT IDL data and meta structures can be used as a template.

The documentation for the SAAPS/SEDAT/SPENVIS interfaces should go into separate documents.
The ODI software:

Change the environment variable $ODI_DATA to $ODI_RAWDATA.
Include the file path in datasets.txt instead of using platform/instrument.
populate_all.sh shall be renamed to populate.sh.
populate.sh shall take zero or more input arguments:

With no arguments all datasets in datasets.txt are parsed.
With one or more arguments only the named datasets are parsed.

Remove the "_0" for scalar from column names in the dataset_* tables.
The ODI_DATA directory structure should be added to the data repository, with one sample file per dataset.
The datasets_demo.txt has been replaced by commenting out all datasets except the demo sets (by putting a # in the first column).
The path directory should be added to datasets.txt as a new column. Download commands (such as wget) should be added as well.
Action Item on Hugh: check the format specifications for the SEDAT get_data routine. The perl script can use the following formats for output: epoch in F21.4, real/double in E21.15, integer in I.
The parameters for connecting to the ODI database should be added to the SEDAT parameters file (sedat.pm?). Hugh will send Daniel a copy of the currently used parameter file.

Administration:

Action Item (not specified for whom): check if the software is supposed to be open source (probably not).
For now, the archive should remain under the closed section restricted to team members. Later, a registration page should be set up for distribution purposes.
Complete software delivery shall take place 1st September 2009.
Maintenance phase runs from 1st Sep. 2009 - 1st March 2010.
Update the Gantt chart to reflect the latest planning.

The next meeting was not decided. But a possibility is to have a progress meeting at ESWW (16-20 Nov. 2009).

On Friday (2009-06-05), Hugh and Daniel had a closer look on the issues for the SEDAT and SPENVIS implementations. Here are some points resulting from the day's work (once again, my thanks to Hugh for his time and his assistance):

The SPENVIS interface currently uses IDL DataMiner to dynamically construct html templates for and from database queries. This will be replaced by php routines which will run as stand-alone, i.e. not embedded in the HTML templates. In this way, the implementation will not depend on an IDL DataMiner licence.
The SPENVIS interface will be simplified to extract data from one dataset only, i.e. data merging will no longer be performed. The data will be output in the time resolution in which they are stored in the database. The maximum number of points that can be retrieved will be set in a SPENVIS parameter.
The new SPENVIS database interface will be limited to producing an output file in SPENVIS csv format, i.e. no plot routines will be developed under the ODI contract.
The SEDAT IDL extensions for ODI were tested on the datasets currently ingested in ODI. Most of the required functionality is implemented. Support for multi-dimension variables needs to be added. In order to test this, Hugh will export the ISEE2 data from SEDAT into a cdf file which can then be ingested in the database.
The final structure and calling sequence of the new routines was discussed. The SEDAT perl script generating the IDL command file needs to be extended in order to provide a new parameter in the cmn structure. Daniel will rework the routines to take all this into account for the next release.
A performance issue was identified with the current implementation of the SEDAT IDL DataMiner calls to the database. Currently, the GetField method is used, as this allows the specification of NULL replacement values. However, it turns out that this slows down the data retrieval. An alternative approach is to use GetRecord, which is significantly faster, but this does not allow for FILLVAL specification (NULL floats are returned as 0.0). A possible solution would be to store the FILLVALs directly in the database records instead of NULLs as is currently done. This is a less elegant approach, as NULL values are an inherent feature of SQL. The issue needs to be resolved soon.

Swedish Institute of Space Physics, Peter Wintoft,