The Rosetta mission
made headlines last year, at the rendezvous with comet
67P/Churyumov-Gerasimenko, and the deployment of the Philae lander. Some of the raw data
is becoming available from the ESA, so let's take a look at that.
Imaging Systems
There are three
imaging systems on Rosetta:
The Navcam is a
1024*1024 pixel 16bpp instrument. Initially we're going to ignore
this, since the data is distributed in a slightly different format
than the primary (OSIRIS) instruments.
The two main imaging
systems are really a single system: the “Optical, Spectroscopic,
and Infrared Remote Imaging System” (OSIRIS). The output of the
OSIRIS imaging systems is a maximum resolution of 2048x2048 pixels at
16bpp.
OSIRIS actually has
two front-ends: a Narrow Angle Camera Unit, and a Wide Angle Camera
Unit. These two units have different optical designs and filter sets,
but then feed into a common electronics back end. However from our
perspective these are distributed as separate, if similar, record
sets and we can treat them as independent instruments.
The planetary society have a good background article on the instrument:
The Data Sets
These are available
from ESA via FTP,
The main project page is:
And the FTP site is
However the NASA PDS Small Bodies node has a more
convenient archive, since it offers tar.gz archives of complete data
sets – these contain the same data as the ESA site in an easily
downloadable form.
The main site is
And scroll down to the section "67P/C-G Prelanding (Orbiter only)" for the sets.
One minor gotcha is that the cases of the files and patch have been transformed to lowercase, where the original references all use uppercase, but that's not a big deal.
The data sets have a
common naming scheme similar to:
ro-c-osinac-2-prl-67pchuryumov-m07-v1.0/
In this name
“osinac” tells us this is the OSIRIS Narrow Camera, and this
would be osiwac for the wide angle. “The “-2” indicates this is
the basic un-calibrated data available. A “-3” here indicates
calibrated records, but these are more complex to process, so we'll stick with the "-2" sets for now.
In the PDS Small Bodies links The "-2" sets are referred to as the EDR sets, and the "-3" sets the CDR (Experimental and Calibrated Data Record sets respectively).
Finally the
remainder of the name tells us the target and data volume, and each
volume covers a range of dates for data acquisition, with a summary
of the details in the top level file AAREADME.TXT. In general the
“later” the volume the more interesting it is (i.e. closer to the
comet).
What's in them
For the data sets
then the main data is held in the “data/” subdirectory, and is
further grouped in the directory tree by date.
So for the m07-v1.0
then the data is held in:
ro-c-osinac-2-prl-67pchuryumov-m07-v1.0/data/2014_09/
These files are
“.img” files – they're PDS3 data records, where the name
encodes the date and type of file acquisition. The complete details
are in the “catalog/dataset.cat” file.
A lot of the details
of PDS3 we went over as part of the Venus Express parser, however in
this case the PDS header itself is more important – we need to
parse it correctly to understand the data layout of the images.
However – let's be
hacky.
Quickly Picking Apart the Files
We can quickly
extract the image data as follows:
The PDS header
contains top level file information, however it contains two key
fields
The RECORD_BYTES
value tells you how big each record is, and the ^IMAGE tells you which one
contains the start of the image data – so this quickly tells you
that the offset to the start of image data in the file is:
(^IMAGE-1) * RECORD_BYTES
Since record bytes is essentially just "512" you really only need the value of ^IMAGE.
However you still
need a couple of additional pieces of information to extract the
data: Particularly you need to know the resolution of the image
within the file.
This information is
held inside the section of the PDS header that runs between
OBJECT
= IMAGE
and
END_OBJECT
And from this
section of data we need to extract:
LINES: The
number of vertical lines in the image data
LINE_SAMPLES:
The horizontal resolution
SAMPLE_BITS:
The bpp
SAMPLE_TYPE:
how it's formatted.
FIRST_LINE: The
first row (Y value) in the image data
FIRST_LINE_SAMPLE:
The first X co-ordinate in the image data
However we can be
really hacky here, and assume that everything starts in the top
corner (starting x & Y of 1) and that the samples are all raw
16bpp unsigned LSB values – this will work for (almost) all of the
images in the level 2 datasets.
This still leaves us
needing to get LINES and LINE_SAMPLES from the IMAGE section of the
header – we can pull these using our existing PDS parser, although
we have to be careful to make sure we pull the IMAGE version of these
numbers – some files contain additional objects which mean the
header will have different LINES/LINE_SAMPLES values for the
different objects.
However here we can be even hackier and sort the files based on size: 2048*2048 images will be about 8M, and 1024*1024 will be about 2M. We can ignore anything smaller, and assumes the image sizes (and therefore data available) in those two groupings.
So, an example: The
file: N20140806T062000104ID20F28.IMG
Parsing the header
(eliminating white-space to make this easier to read) then we see
RECORD_BYTES = 512
^IMAGE = 42
Also we get the
IMAGE section
OBJECT = IMAGE
INTERCHANGE_FORMAT = BINARY
LINES = 2048
LINE_SAMPLES = 2048
FIRST_LINE = 1
FIRST_LINE_SAMPLE = 1
SAMPLE_TYPE = LSB_UNSIGNED_INTEGER
SAMPLE_BITS = 16
BANDS = 1
SAMPLE_DISPLAY_DIRECTION = RIGHT
LINE_DISPLAY_DIRECTION = DOWN
END_OBJECT = IMAGE
So this has the
image data start at (42-1)*512 = 20992 bytes. The image data size is 8388608 bytes (2048*2048*2), and our
assumptions about the format (512 byte records, 16bpp LSB image data, starting at 1,1) are good.
Reading the file we
simply seek to the correct location, and load the data with code
like:
QFile fi;
fi.setFileName(filename);
fi.open(QIODevice::ReadOnly);
fi.seek(offptr);
QByteArray imd;
imd = fi.read(data_sz);
And the just load
up the 16bpp values from the file data with code like:
void RawImage::SetImageData(QByteArray im, int width, int height)
{
_data.reserve(width*height);
_width = width;
_height = height;
int cursor =0;
for (int y=0; y < height; y++)
{
for (int x=0; x < width; x++)
{
int v;
int vh,vl;
vh = (unsigned char) im.at(cursor+1);
vl = (unsigned char) im.at(cursor);
v = (vh << 8) | vl;
_data.append(v);
cursor += 2;
if (cursor >= im.size())
{
return;
}
}
}
}
Where “_data”is
just a Qlist<int>.
From this we can drop the values into an
OpenCV image structure:
void RawImage::RegenerateImage()
{
int x,y;
int cursor;
int16_t val;
cursor = 0;
_im16 = cv::Mat(_height, _width, CV_16UC1);
for (y=0; y < _height; y++)
for (x=0; x < _width; x++)
{
if (cursor >= _data.length())
return; // Sanity bound check
val = _data.at(cursor++);
_im16.at<uint16_t>(cv::Point(x,y)) = val;
}
}
And at this point we
can correct our levels, save out a tiff, etc as we've done with
previous image sets.
So running this over
the data we have and viewing the resulting .tiff files in an image
browser (XFCE's ristretto
in this screenshot) gets us:
And that's looking
enough like the comet we expect that we can't have screwed up that
badly...
(just scrolling
through the image sets after processing and seeing the slow rotation of 67P here is
actually sort of awesome...)
We can also do a
similar processing sequence on Wide Angle images for similar
results...
One major step we haven't taken is image rotation: The WAC
images should be flipped vertically, and the NAC angle both
vertically and horizontally to get the correct alignment – we could
do this when building the OpenCV image by messing around with the X &
Y stepping.
Also we're fairly clearly messing up the processing of smaller images, and we don't cope with all the options we could encounter in the PDS data set, so we will be messing up some content.
But we've got a whole list of missing features, and
really need to build a “proper” PDS parser anyway, so given this is a ten minute hack on top of the code we already had for Venus Express then let's call it good enough for now.