Saturday, 26 September 2015

Loading the images

Overview

The core abstraction here is a view, represented by openMVG::sfm::View.

This represents a single image, and contains a reference to the image path as well as the camera information associated with it.

The top level SfM_Data object holds a list of View objects, openMVG::sfm::Views, which is a Hash Map indexed through the IndexT Id's of each View.

We also need to consider the camera intrinsic structure mentioned earlier, openMVG::cameras::IntrinsicBase. The SfM_Data object also holds aopenMVG::sfm::Intrinsics list - another hash map which contains an Id indexed map of camera intrinsics.

In theory we could share the intrinsics, since several of the images will come from a single camera (and the reference code does this), however it's simpler just to have an intrinsic structure per picture.

Using unique cameras per image does have a fairly major drawback though: During the bundle adjustment OpenMVG will refine the camera intrinsics for us, and dump these in the output. Obviously if we were sharing a camera model these distortion parameters should converge to a single model for a given camera, however since we're using separate intrinsics then the distortion parameters per view can be slightly different even if they should not be.

This problem isn't enough to break things completely, but it does mean our code will be sub-optimal here. For now lets run with it though, and fix it later.

The Image Load Process

Create a view, with an instance of openMVG::sfm::View

This simply takes the basic View constructor parameters, which include the image name, the view intrinsic and pose ID's associated with the view and the width and height of the image.  Note that Pose is the camera extrinsic – it's basically a 3D transform, but we don't need to get involved with it right now.

For the ID's we simply use the current view size to provide a unique integer value. We can use common ID's for the view, intrinsic & pose because they're stored in different lists. As a result we wind up with the view constructor:

openMVG::sfm::View v(which, views.size(), views.size(), views.size(), width, height);

We also create an Intrinsic, allocating and filling in a copy of openMVG::cameras::IntrinsicBase

For our example we're using the class: openMVG::cameras::Pinhole_Intrinsic_Radial_K3 and leaving the distortion parameters as 0.

We add the intrinsic & view to the top level hash map of SfM_Data, with the Id's from the view object that we allocated earlier using the array accessor notation:

  intrinsics[v.id_intrinsic] = intrinsic;
  views[v.id_view] = std::make_shared(v);

Putting this all together we get a final MWE code layout of:

void ImageList::loadImage(std::string which)
{
double width;
double height;
double image_focal;
double focal;
double ppx;
double ppy;

openMVG::sfm::Views& views = _sfm_data.views;
openMVG::sfm::Intrinsics& intrinsics = _sfm_data.intrinsics;
std::shared_ptr intrinsic (NULL);

  if (openMVG::image::GetFormat(which.c_str()) == openMVG::image::Unknown)
    return;


  std::unique_ptr exifReader(new openMVG::exif::Exif_IO_EasyExif());
  if (!exifReader->open(which))
    return;

  image_focal = static_cast(exifReader->getFocal());
  width = static_cast(exifReader->getWidth());
  height = static_cast(exifReader->getHeight());

  ppx = width / 2.0;
  ppy = height / 2.0;

  if (image_focal == 0)
    return;

  printf("Image %s: %f x %f, Focal Length %f\n", which.c_str(), width, height, image_focal);

  const double ccdw = 5.75;
  focal = std::max (width, height) * image_focal / ccdw;

  openMVG::sfm::View v(which, views.size(), views.size(), views.size(), width, height);

  intrinsic = std::make_shared (width, height, focal, ppx, ppy, 0.0, 0.0, 0.0);
  intrinsics[v.id_intrinsic] = intrinsic;
  views[v.id_view] = std::make_shared(v);
}

Processing: Image Processing

At this stage we load the image file information, including the width and height as well as the focal information embedded in the camera images.

We make a decision as to how to handle the camera model, and compensate for image distortion introduced by the camera itself in this stage, and OpenMVG offers several different types of model to choose from.

Reading Image Parameters

OpenMVG provides a simple mechanism for extracting exif data, which can then be used to calculate focal length. Since we have to use the exif information to get a reliable focal length value then at this stage we may as well use the exif reader to extract the width and height information.

Therefore our minimum working example to extract information from the image file is:

  std::unique_ptr exifReader(new openMVG::exif::Exif_IO_EasyExif());
  if (!exifReader->open(which))
    return;
  image_focal = static_cast(exifReader->getFocal());
  width = static_cast(exifReader->getWidth());
  height = static_cast(exifReader->getHeight());

However the value of focal length we have retrieved is in mm, and the actual focal length value we want to use would be in pixels, so we calculate this value using information on our sensor sizes (in mm):

  const double ccdw = 5.75;
  focal = std::max (width, height) * image_focal / ccdw;

where ccdw is the CCD sensor width of the camera, which can be extracted from a camera information database, or hardcoded for a given camera.  It can also be retrieved from the exif image information in some cases.

The OpenMVG sample code uses the database lookup to pick up the information based on the camera name, but I'm hardcoding the numbers for my camera ccd here.

There is some more background on the derivation of the focal length here: http://phototour.cs.washington.edu/focal.html


Note that OpenMVG provides simple methods to read the image data directly, and we can extract the width & height information from that process instead, i.e.:

openMVG::image::Image
  if (openMVG::image::ReadImage(_name.c_str(), &_image_data))
  {
    width = _image_data.Width();
    height = _image_data.Height();
  }

The example code does this in a few places. However this load is redundant in our case, since we need to run the exif parsing for focal lengths. While we want to load the data eventually doing it now would simply incur a double load, and the load call itself is actually quite slow.

Filling in the camera details

The camera information associated with an image is held in a camera Intrinsics structure.

The intrinsic structure stores the optical model, covering such parameters as the camera focal length, the principal point (ideal image centre) and the associated image distortion parameters, etc.

There is also a camera Extrinsics model, referred to in OpenMVG as a Pose (of type openMVG::geometry::Pose3) ,which covers the camera position and rotation, however the details of this actually will be handled by OpenMVG itself.

The structure openMVG::cameras::IntrinsicBase is used as a base type when building lists, and this is ultimately derived into two common classes, openMVG::cameras::Pinhole_Intrinsic_Radial_K1 and openMVG::cameras::Pinhole_Intrinsic_Radial_K3.

The difference between these two classes is the number of parameters supplied (one for K1 and three for K3) which are used to remove image distortion.

The full documentation on this is held here:
http://openmvg.readthedocs.org/en/latest/openMVG/cameras/cameras/#pinhole-camera-model

To keep things simple we initially use a K3 model, but leave the distortion coefficients as 0.

So, to build the intrinsic model from an image then

  intrinsic = std::make_shared (
        width, height, focal,
        ppx, ppy,
        0.0, 0.0, 0.0);
and now we can pass the intrinsic into the SfM container objects.
How to do that next...

Overview of the processing pipeline

The sequence of operations

The basic sequence we follow is:
  • Examine the reference images
    • Extract data on the camera in the process
    • We have to select how we'll handle the camera model here
  • Process the images for feature points – these are points we can track across pictures
    • We have to pick one of the available feature detection algorithms here
  • Process the image point sets, filtering out “outlier” points (i.e. those which don't match between images)
    • We have to select a filtering mechanism, and set thresholds for discarding outliers
  • Incremental SFM, performing Bundle Adjustment
    • Pick an initial pair of images, 
    • Run the processing
    • Output the point cloud

While the detail of the underlying Incremental SFM & filtering algorithms are interesting, and actually rather clever, it's not essential to understand them completely in order to use the library – from a hacky-programmers point of view there are a couple of fairly simple APIs and abstractions that can be plugged in to perform the reconstruction work, and a handful of parameters to tune.

Top Level Data Structures & Concepts

The top level abstraction is that of an SFM container, openMVG::sfm::SfM_Data.

This class contains all of the data and configuration that is used to generate the output as a set of 3d points. We're going to be operating on a single instance of this throughout.

One other key point is the use of reference ID's to track things like camera information, image information, etc. Each ID (of type IndexT) is basically a 32 bit integer, and a number of internal data structures are lists which associate an ID value to a given instance.

The codebase also makes extensive use C++ 11's shared_ptr and unique_ptr smart pointer types to handle automatic object destruction, and as a consequence so will our application.

Finally – the objects all contain serialisation methods, which use the cereal library to save and reload information, and much of the standard use cases involve disk storage and reload to recover the state of processing. This minimal application mostly avoid this, but the feature detection is slow enough that we want to handle this.

Sunday, 20 September 2015

The Basic Build

Install OpenMVG

The openMVG library contains and rebuilds almost all of the dependencies required during a source install, so simply clone it using:

git clone --recursive https://github.com/openMVG/openMVG.git 
cd openMVG 
git submodule init
git submodule update
cd .. 
Then build it in a separate directory with:

mkdir openMVG_Build
cd openMVG_Build/ 
cmake \ 
  -DCMAKE_BUILD_TYPE=DEBUG \
  -DOpenMVG_BUILD_TESTS=ON \
  -DopenMVG_BUILD_EXAMPLES=ON \
  -DCMAKE_INSTALL_PREFIX:STRING="/usr/local/openMVG_install" \
  -G "CodeBlocks - Unix Makefiles" . ../openMVG/src/ 

make clean 
make -j4 
sudo make install 
Note the custom install directory – this can be changed fairly easily, since the application will use the supplied cmake files in the install to locate the headers & libraries.

Adding the Library to an Application

This is all driven through cmake, with the target CmakeLists.txt containing the following fragment to locate the openMVG libraries and headers.

set(OpenMVG_DIR "/usr/local/openMVG_install/share/openMVG/cmake") 
set(CMAKE_BUILD_TYPE "Debug") 
ADD_DEFINITIONS(
    -std=c++11
)
It's actually slightly painful to use anything other than cmake here; the library include and link paths are rather verbose.
Note that C++ 11 is required for the smart pointers used by the openSfM library.

Using the OpenMVG Library

The next few posts look at an Incremental Structure from Motion (SfM) pipeline, where we use the openMVG library to produce a point cloud from some input images, i.e. to generate a 3D model.

Using the openMVG library simplifies the construction process immensely, and hides some of the algorithm details from us.

As a library it's not perfect, and it can be annoyingly verbose at times, but it beats doing it from scratch. There's a few sample applications in the OpenMVG tree, which most of this work is based off, but no good overview on how to tie all the pieces together when using the library directly.

The Very Top Level

We're processing a set of 2D images, and the output will be a 3D model of the objects in the image, as a point cloud.

So we start with a set of photographs:


 And generate a point cloud:





Using an incremental pipeline means that the reconstruction will start with a model reconstruction based on two initial images, and then we add  add new images and 3d points to generate the complete model.

The process is described at: http://imagine.enpc.fr/~marletr/publi/ACCV-2012-Moulon-et-al.pdf.

This approach is based around feature detection within the images and Bundle Adjustment (https://en.wikipedia.org/wiki/Bundle_adjustment) to synthesise the 3D data. The “a contrario model estimation” means that we're going to use the input data itself to identify and remove outlying data samples, however this processing will be hidden within the library itself.

Using an incremental pipeline is slightly simpler, from the perspective of a library client. An alternative approach is Global SFM, which would process all the images simultaneously: Global SFM is potentially more accurate, since sequential processing can introduce errors, and the global version is more efficient and simpler to parallelise. But the library usage in the sequential case is easier, so lets start with that.