Saturday, 26 September 2015

Overview of the processing pipeline

The sequence of operations

The basic sequence we follow is:
  • Examine the reference images
    • Extract data on the camera in the process
    • We have to select how we'll handle the camera model here
  • Process the images for feature points – these are points we can track across pictures
    • We have to pick one of the available feature detection algorithms here
  • Process the image point sets, filtering out “outlier” points (i.e. those which don't match between images)
    • We have to select a filtering mechanism, and set thresholds for discarding outliers
  • Incremental SFM, performing Bundle Adjustment
    • Pick an initial pair of images, 
    • Run the processing
    • Output the point cloud

While the detail of the underlying Incremental SFM & filtering algorithms are interesting, and actually rather clever, it's not essential to understand them completely in order to use the library – from a hacky-programmers point of view there are a couple of fairly simple APIs and abstractions that can be plugged in to perform the reconstruction work, and a handful of parameters to tune.

Top Level Data Structures & Concepts

The top level abstraction is that of an SFM container, openMVG::sfm::SfM_Data.

This class contains all of the data and configuration that is used to generate the output as a set of 3d points. We're going to be operating on a single instance of this throughout.

One other key point is the use of reference ID's to track things like camera information, image information, etc. Each ID (of type IndexT) is basically a 32 bit integer, and a number of internal data structures are lists which associate an ID value to a given instance.

The codebase also makes extensive use C++ 11's shared_ptr and unique_ptr smart pointer types to handle automatic object destruction, and as a consequence so will our application.

Finally – the objects all contain serialisation methods, which use the cereal library to save and reload information, and much of the standard use cases involve disk storage and reload to recover the state of processing. This minimal application mostly avoid this, but the feature detection is slow enough that we want to handle this.