The Rosetta mission made headlines last year, at the rendezvous with comet 67P/Churyumov-Gerasimenko, and the deployment of the Philae lander. Some of the raw data is becoming available from the ESA, so let's take a look at that.
There are three imaging systems on Rosetta:
- Navcam: The navigation camera
- OSIRIS Wide Angle Camera
- OSIRIS Narrow Angle Camera
The Navcam is a 1024*1024 pixel 16bpp instrument. Initially we're going to ignore this, since the data is distributed in a slightly different format than the primary (OSIRIS) instruments.
The two main imaging systems are really a single system: the “Optical, Spectroscopic, and Infrared Remote Imaging System” (OSIRIS). The output of the OSIRIS imaging systems is a maximum resolution of 2048x2048 pixels at 16bpp.
OSIRIS actually has two front-ends: a Narrow Angle Camera Unit, and a Wide Angle Camera Unit. These two units have different optical designs and filter sets, but then feed into a common electronics back end. However from our perspective these are distributed as separate, if similar, record sets and we can treat them as independent instruments.
The planetary society have a good background article on the instrument:
The Data Sets
These are available from ESA via FTP,
The main project page is:
And the FTP site is
However the NASA PDS Small Bodies node has a more convenient archive, since it offers tar.gz archives of complete data sets – these contain the same data as the ESA site in an easily downloadable form.
The main site is
And scroll down to the section "67P/C-G Prelanding (Orbiter only)" for the sets.
One minor gotcha is that the cases of the files and patch have been transformed to lowercase, where the original references all use uppercase, but that's not a big deal.
The data sets have a common naming scheme similar to:
In this name “osinac” tells us this is the OSIRIS Narrow Camera, and this would be osiwac for the wide angle. “The “-2” indicates this is the basic un-calibrated data available. A “-3” here indicates calibrated records, but these are more complex to process, so we'll stick with the "-2" sets for now.
In the PDS Small Bodies links The "-2" sets are referred to as the EDR sets, and the "-3" sets the CDR (Experimental and Calibrated Data Record sets respectively).
Finally the remainder of the name tells us the target and data volume, and each volume covers a range of dates for data acquisition, with a summary of the details in the top level file AAREADME.TXT. In general the “later” the volume the more interesting it is (i.e. closer to the comet).
What's in them
For the data sets then the main data is held in the “data/” subdirectory, and is further grouped in the directory tree by date.
So for the m07-v1.0 then the data is held in:
These files are “.img” files – they're PDS3 data records, where the name encodes the date and type of file acquisition. The complete details are in the “catalog/dataset.cat” file.
A lot of the details of PDS3 we went over as part of the Venus Express parser, however in this case the PDS header itself is more important – we need to parse it correctly to understand the data layout of the images.
However – let's be hacky.
Quickly Picking Apart the Files
We can quickly extract the image data as follows:
The PDS header contains top level file information, however it contains two key fields
The RECORD_BYTES value tells you how big each record is, and the ^IMAGE tells you which one contains the start of the image data – so this quickly tells you that the offset to the start of image data in the file is:
(^IMAGE-1) * RECORD_BYTES
Since record bytes is essentially just "512" you really only need the value of ^IMAGE.
However you still need a couple of additional pieces of information to extract the data: Particularly you need to know the resolution of the image within the file.
This information is held inside the section of the PDS header that runs between
OBJECT = IMAGE
And from this section of data we need to extract:
- LINES: The number of vertical lines in the image data
- LINE_SAMPLES: The horizontal resolution
- SAMPLE_BITS: The bpp
- SAMPLE_TYPE: how it's formatted.
- FIRST_LINE: The first row (Y value) in the image data
- FIRST_LINE_SAMPLE: The first X co-ordinate in the image data
However we can be really hacky here, and assume that everything starts in the top corner (starting x & Y of 1) and that the samples are all raw 16bpp unsigned LSB values – this will work for (almost) all of the images in the level 2 datasets.
This still leaves us needing to get LINES and LINE_SAMPLES from the IMAGE section of the header – we can pull these using our existing PDS parser, although we have to be careful to make sure we pull the IMAGE version of these numbers – some files contain additional objects which mean the header will have different LINES/LINE_SAMPLES values for the different objects.
However here we can be even hackier and sort the files based on size: 2048*2048 images will be about 8M, and 1024*1024 will be about 2M. We can ignore anything smaller, and assumes the image sizes (and therefore data available) in those two groupings.
So, an example: The file: N20140806T062000104ID20F28.IMG
Parsing the header (eliminating white-space to make this easier to read) then we see
RECORD_BYTES = 512
^IMAGE = 42
Also we get the IMAGE section
OBJECT = IMAGE
INTERCHANGE_FORMAT = BINARY
LINES = 2048
LINE_SAMPLES = 2048
FIRST_LINE = 1
FIRST_LINE_SAMPLE = 1
SAMPLE_TYPE = LSB_UNSIGNED_INTEGER
SAMPLE_BITS = 16
BANDS = 1
SAMPLE_DISPLAY_DIRECTION = RIGHT
LINE_DISPLAY_DIRECTION = DOWN
END_OBJECT = IMAGE
So this has the image data start at (42-1)*512 = 20992 bytes. The image data size is 8388608 bytes (2048*2048*2), and our assumptions about the format (512 byte records, 16bpp LSB image data, starting at 1,1) are good.
Reading the file we simply seek to the correct location, and load the data with code like:
imd = fi.read(data_sz);
And the just load up the 16bpp values from the file data with code like:
void RawImage::SetImageData(QByteArray im, int width, int height)
_width = width;
_height = height;
int cursor =0;
for (int y=0; y < height; y++)
for (int x=0; x < width; x++)
vh = (unsigned char) im.at(cursor+1);
vl = (unsigned char) im.at(cursor);
v = (vh << 8) | vl;
cursor += 2;
if (cursor >= im.size())
Where “_data”is just a Qlist<int>.
From this we can drop the values into an OpenCV image structure:
cursor = 0;
_im16 = cv::Mat(_height, _width, CV_16UC1);
for (y=0; y < _height; y++)
for (x=0; x < _width; x++)
if (cursor >= _data.length())
return; // Sanity bound check
val = _data.at(cursor++);
_im16.at<uint16_t>(cv::Point(x,y)) = val;
And at this point we can correct our levels, save out a tiff, etc as we've done with previous image sets.
So running this over the data we have and viewing the resulting .tiff files in an image browser (XFCE's ristretto in this screenshot) gets us:
And that's looking enough like the comet we expect that we can't have screwed up that badly...
(just scrolling through the image sets after processing and seeing the slow rotation of 67P here is actually sort of awesome...)
We can also do a similar processing sequence on Wide Angle images for similar results...
One major step we haven't taken is image rotation: The WAC images should be flipped vertically, and the NAC angle both vertically and horizontally to get the correct alignment – we could do this when building the OpenCV image by messing around with the X & Y stepping.
Also we're fairly clearly messing up the processing of smaller images, and we don't cope with all the options we could encounter in the PDS data set, so we will be messing up some content.
But we've got a whole list of missing features, and really need to build a “proper” PDS parser anyway, so given this is a ten minute hack on top of the code we already had for Venus Express then let's call it good enough for now.
We can keep up to date with the releases from the PDS Small bodies node at http://pdssbn.astro.umd.edu/index.shtml for new releases and data sets