Tuesday, 17 May 2016

A Better PDS Parser

The Improved Parser

This (source tarball via Google Drive) is a slightly better PDS parser than we've made previously.

Like most parsers it's actually fairly uninteresting: There are a number of oddities in parsing that mean that the logic would be a little convoluted as a simple regexp parser. Specifically the field separator (\r\n) can appear at several places without indicating a new field.

However using this parser we can parse several new files as well as Rosetta, including the Venux Express VMC, which we used the VICAR header from previously, and the Mars Express files, which leads us to the more interesting problem of parsing Mars Express files.

Mars Express Files

These are 16bpp greyscale images, however one of the interesting things in the mars express data is the presence of prefix data on each image line.

The details of the binary prefix are in the document HRSC_LABEL_HEADER.PDF from the data archive documentation directory. This is a 68 byte header which is in front of every line in the image data.

This has several fields, but the main thing to consider initially is the leading 12 bytes. This is an 8 byte double, and 4 byte float. This is used to provide the Ephemeris time and Exposure time for each line, where the Ephemeris time is the time this line was sampled at, and the exposure time is the time in milliseconds it took to acquire the line.

In theory then we should find that the start (Ephemeris) time of each line is the start time of the previous line plus the exposure time. Using this information we can spot gaps in the incoming record, since the Ephemeris time will jump unexpectedly.

It's complicated by the fact that the exposure time is not constant over a file. The exposure time changes over the sequence of a capture and therefore in particularly large gaps the line calculation will be off.

Also in a couple of files the gap isn't exactly the exposure time - for example H0165_0068_P12.IMG has a consistent double spacing (this could be me screwing up, of course).

We should do a multiple pass, building up a list of capture times and intervals, and fill in this. However for now we just take the case where we skip 3 or more exposure times between lines and flag a gap.

Be warned that the Mars Express file can be very large, i.e. HRSC H0165_0000_ND2.IMG has a resolution of 5176x256600 and a size of ~1G