Thursday 4 July 2013

Merging Point Clouds

Once the frame data has been aligned with the world space using the iterative closest point algorithm, it can be merged to created the larger environment.  In this process we maintain the concept of point clouds as opposed to create surfaces.  There are three components to merging the dataset:
  1. Refine existing world points
  2. Add new frame points
  3. Remove erroneous world points

Merging point clouds proceeds by considering only a subset of world points.  This subset is defined as the points that fall within the camera’s view frustum when transformed by the frame transformation.  In the discussion that follows, this subset will be referred to as simply the world points.


Refine Existing World Points
Each world point is matched against a frame point after applying the frame transformation.  The matching threshold can be stricter than the ICP process to increase world point cloud density; for example, a threshold of 1cm will provide a final resolution of 1cm whereas a threshold of 1mm will provide much more fidelity although it might also introduce errors due to the level of noise returned by the Kinect sensor.  More information about the level of accuracy, noise and reliability will be given in a future post.

The world points are updated using the matches, where one frame point may map to many world points.  After the existing points are updated, all world points and frame points that are involved in a match are ignored for the remaining merging processes.


Add New Frame Points
The frame points that were not matched to a world point are considered new points and added to the world point dataset.


Remove Erroneous World Points
There is noise in the depth map that the Kinect sensor returns and thus some points in the world dataset will also be erroneous and need to be pruned.  The strategy employed here is to eliminate any world points that fall within the transformed camera frustum that do not have significant support for their existence.  We therefore don’t simply remove per frame all world points that don’t have a match with the frame points as the frame itself could be in error.  Instead as each world point is updated and added, we take note of when they were last seen.  If at the end of each frame there are world points that have not been matched and have not been seen for a given period of frames, they are removed.


Tracking and Merging Example
The following video illustrates the process of tracking and merging point clouds.  The bottom left window is a 3D point cloud representation of the RGB and depth data combined from the Kinect sensor (which would normally be correctly coloured, but the green in this window represents point matches into the world space).  The larger window in the centre of the screen is the compiled world space.  The green wireframe box indicates the current camera position and orientation into this world.  Green indicates the points that are paired with the individual captures from the Kinect device.
 
Underneath the larger 3D window are "debug" outputs - the one on the left give internal states for the steps within the matching process and the right one gives the camera orientation of the current frame in terms of rotation and offset into the global space.


No comments:

Post a Comment