While mapping depth values
from the Kinect sensor to a virtual space is straightforward, a perspective
correction factor needs to be taken into account, which is discussed in this post. In the following, the official Windows Kinect
SDK is used and all formula given relate to the specific values returned from
the API (which can be different to those returned by unofficial SDKs). Depth data is delivered as scanlines from
bottom to top.
To convert Kinect data into
3D space where one unit is equal to 1 metre:
scale=depth*PERSPECTIVE_CORRECTIONx=(i-(DEPTH_FRAME_WIDTH/2))*scale;y=(j-(DEPTH_FRAME_HEIGHT/2))*scale;z=-depth/1000;
Where:
- depth is the millimetre depth value returned by the
Kinect device within the depth map
- PERSPECTIVE_CORRECTION is an empirically derived constant that converts
from the camera’s perspective into an orthogonal view (essentially
“undoing” the natural perspective view of the camera)
- DEPTH_FRAME_WIDTH is the width dimension of the depth map
(typically 320 or 640)
- DEPTH_FRAME_HEIGHT is the height dimension of the depth map
(typically 240 or 480)
- i and j
represent the ith pixel from the left and jth pixel from
the bottom of the frame
Notes:
- This formula translates the depth values onto the negative z-axis such that a value of zero is the camera position and -1.0 is 1 metre away.
- A right-handed coordinate system is used.
- The PERSPECTIVE_CORRECTION constant is fixed for a given depth map resolution and defined as 0.00000356 for a resolution of 320x240 and 0.00000178 for a resolution of 640x480
- When doubling the width and depth of the depth map, the constant is halved
Perspective Correction
The camera’s perspective field of vision is important
to factor out to get precise [x, y, z] coordinates in space that can
be used to correlate different snapshots of the same scene taken at different
angles since camera perspective varies according to camera position. Figure 1 illustrates the result of mapping
depth values directly to fixed [x, y] coordinates without taking into account
perspective.
|
Figure 1a) Mapping depth values to fixed [x, y] coordinates
without perspective correction: view seen from the camera
|
|
Figure 1b) Mapping depth values to fixed [x,y] coordinates without perspective correction: view of the scene from above - note that the wall and shelves do not make right-angles due to the camera taking a perspective view |
By including the perspective correction, real-world right angles remain right angles in the virtual space and distances are corrected to their absolute values as illustrated in Figure 2.
|
Figure 2a) Mapping depth values to absolute [x, y, z] coordinates using perspective correction: view seen from the camera |
|
Figure 2b) Mapping depth values to absolute [x, y, z] coordinates using perspective correction: view of the scene from above – note that the wall and shelves make right-angles when using the perspective correction constant and appear straight and well aligned |
The perspective correction
was determined by measuring objects in the real world and comparing them to the
size of their virtual counterpart without correction. This was correlated against distance from the
camera, resulting in the derived constants.
The formula for determining the initial fixed [x, y] positions are given
below:
x=(i-(DEPTH_FRAME_WIDTH/2))*WORLD_SCALE;
y=(j-(DEPTH_FRAME_HEIGHT/2))*WORLD_SCALE;
z=-depth*WORLD_SCALE*DEPTH_SCALE;
WORLD_SCALE is 0.01 or 0.02 for 640x480 and 320x240 depth map
resolutions respectively and DEPTH_SCALE is 0.1. These values were selected empirical to offer
a visually good representation of the real world when mapped into the virtual
space.
Using this mapping, a number of objects were placed in
front of the camera and measured in both the real world and virtual space along
their x- and y-axis to provide a scale factor mapping between the two
spaces. These values are given in Table
1 along with the object’s distance from the camera.
Distance from Camera
|
Mean Scale Factor
|
810mm
|
0.137
|
1380mm
|
0.245
|
2630mm
|
0.472
|
3750mm
|
0.666
|
Table 1: Scale factors between real and virtual objects at a specific distance
Plotting the two columns of
Table 1 against each other illustrates a linear correlation, as shown in Figure
3.
|
Figure 3: Plotting distance from camera against
mean depth scale factor for perspective correction |
The gradient of the linear
line in Figure 3 gives the perspective correction value, calculated with
respect to millimetre distances as per the original set of equations and
factoring in the DEPTH_SCALE and WORLD_SCALE constants as per
the second set of uncorrected equations.