Tuesday 2 July 2013

Aligning Point Clouds

Point clouds are joined together once the Kinect depth map data has been converted into an absolute [x, y, z] coordinate system, as described in an earlier.  At the core of this process is the Iterative Closest Point (ICP) algorithm.  The ICP process finds an affine transformation between two point clouds that maximise their overlapping regions (thus minimises the distance between the two point clouds in space).

It should be noted that this post will only provide a textual overview of the process, which will be expanded in future articles.  The web also has a good number of resources and examples of this process that can be found by searching for Iterative Closest Point.

The following definitions describe the terms to be used:
  • Frame Data (F): Perspective-corrected depth map data from the Kinect device
  • World Data (W): An accumulation of point data from one or many frames defined in world space
  • Frame Transformation (T): A 4x4 affine transformation matrix that transforms frame data into world space defined by rotation and translation about the 3 axes (affording 6 independent variables).

The basic ICP algorithm proceeds as follows and is per frame of data:
  1. For each pixel in the frame data, apply the current frame transformation and find the closest point in the world data
  2. Minimise the distance between the pairs by refining the frame transformation
  3. Repeat back to step 1 while the sum of errors over all pairs is reducing

Initially the frame transformation is the identify matrix and the world data is the first frame from the Kinect device.  The closest point also takes into account the point normals which are calculated using neighbouring points.

The minimisation step is an inner iterative process that makes use of the Jacobian of the transformation matrix; the Jacobian relates small changes in the independent variables with changes in the positions of the frame data transformed into world space.  The aim of ICP is thus to minimise the general equation (F*T)-W over all frame and world pairs (although in practice, this equation becomes slightly more complex when normal point-to-plane distances are taken into account.  Furthermore, the error function used to measure the correctness of the transformation matrix is typically a sum of square errors).

The inner iteration that involves the Jacobian is due to the manner in which the problem is linearised and thus at each step, a new approximation to the solution is given which might be bettered using the new values from the previous step and so on.  The inner iterations occur until the error between the pairs is no longer decreased.

The outer iteration of the above algorithm uses the revised frame transformation to determine new (and hopefully better) pairings.  The outer loop continues while the resulting sum of distances between pairs is decreasing.

Assuming this process finds a transformation that generates a low error, the frame data can be transformed into world space and merged with the world environment.  The next frame of data is processed, using the last known frame transformation, and so on, building the virtual environment by piecing together the data from the individual frames.  This will be discussed further in a later post.

An example of the tracking process is given below:

In this video, the first frame is captured and taken to be the world data; updates to the world data are not undertaken.  Each frame of the video represents a new frame from the Kinect device.  The cyan colour dots indicate that a pairing between the frame data and world data has been found and lies within a given threshold.  The red dots indicate that the closest point match between frame and world data lies outside the given threshold.  The video shows the position of the frame data once it has been transformed into world space and hence the more stable the features, the better the tracking (if just the original frame data were viewed, the scene would be seen to pan and shake as the camera is moved).

No comments:

Post a Comment