To convert Kinect data into
3D space where one unit is equal to 1 metre:
x=(i-(DEPTH_FRAME_WIDTH/2))*scale;
y=(j-(DEPTH_FRAME_HEIGHT/2))*scale;
z=-depth/1000;
- depth is the millimetre depth value returned by the
Kinect device within the depth map
- PERSPECTIVE_CORRECTION is an empirically derived constant that converts
from the camera’s perspective into an orthogonal view (essentially
“undoing” the natural perspective view of the camera)
- DEPTH_FRAME_WIDTH is the width dimension of the depth map
(typically 320 or 640)
- DEPTH_FRAME_HEIGHT is the height dimension of the depth map
(typically 240 or 480)
- i and j
represent the ith pixel from the left and jth pixel from
the bottom of the frame
- This formula translates the depth values onto the negative z-axis such that a value of zero is the camera position and -1.0 is 1 metre away.
- A right-handed coordinate system is used.
- The PERSPECTIVE_CORRECTION constant is fixed for a given depth map resolution and defined as 0.00000356 for a resolution of 320x240 and 0.00000178 for a resolution of 640x480
- When doubling the width and depth of the depth map, the constant is halved
Perspective Correction
The camera’s perspective field of vision is important to factor out to get precise [x, y, z] coordinates in space that can be used to correlate different snapshots of the same scene taken at different angles since camera perspective varies according to camera position. Figure 1 illustrates the result of mapping depth values directly to fixed [x, y] coordinates without taking into account perspective.
Figure 1a) Mapping depth values to fixed [x, y] coordinates
without perspective correction: view seen from the camera |
Figure 2a) Mapping depth values to absolute [x, y, z] coordinates using perspective correction: view seen from the camera |
The perspective correction
was determined by measuring objects in the real world and comparing them to the
size of their virtual counterpart without correction. This was correlated against distance from the
camera, resulting in the derived constants.
The formula for determining the initial fixed [x, y] positions are given
below:
x=(i-(DEPTH_FRAME_WIDTH/2))*WORLD_SCALE;
y=(j-(DEPTH_FRAME_HEIGHT/2))*WORLD_SCALE;
z=-depth*WORLD_SCALE*DEPTH_SCALE;
WORLD_SCALE is 0.01 or 0.02 for 640x480 and 320x240 depth map
resolutions respectively and DEPTH_SCALE is 0.1. These values were selected empirical to offer
a visually good representation of the real world when mapped into the virtual
space.
Using this mapping, a number of objects were placed in
front of the camera and measured in both the real world and virtual space along
their x- and y-axis to provide a scale factor mapping between the two
spaces. These values are given in Table
1 along with the object’s distance from the camera.
Distance from Camera
|
Mean Scale Factor
|
810mm
|
0.137
|
1380mm
|
0.245
|
2630mm
|
0.472
|
3750mm
|
0.666
|
Table 1: Scale factors between real and virtual objects at a specific distance
Plotting the two columns of
Table 1 against each other illustrates a linear correlation, as shown in Figure
3.
Figure 3: Plotting distance from camera against mean depth scale factor for perspective correction |
The gradient of the linear
line in Figure 3 gives the perspective correction value, calculated with
respect to millimetre distances as per the original set of equations and
factoring in the DEPTH_SCALE and WORLD_SCALE constants as per
the second set of uncorrected equations.
pics are dead
ReplyDelete