tag:blogger.com,1999:blog-14842107725348126392023-11-15T17:27:59.540+00:00Kinecting up the PastKinecting up the Past is exploring the research benefits, use, and disruptive nature of cheap consumer-grade technology to capture environments and artefacts in 3-dimensions using Microsoft’s Kinect controller.Michael PIddhttp://www.blogger.com/profile/06126677936848272714noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-1484210772534812639.post-68023847961319369342013-08-21T11:04:00.001+01:002013-08-21T11:04:32.154+01:00Data AccuracyBetween the default and near modes, Microsoft advertises a usable depth range of between 40cm and 4 metres with millimetre granularity along the depth axis. The process of “undoing” the perspective view of the camera essentially stretches the orthogonal plane with respect to depth – see previous post on <a href="http://kinectupthepast.blogspot.co.uk/2013/07/mapping-depth-data-into-virtual-space.html">Mapping Depth Data into Virtual Space</a>.<br />
<br />
Using the empirically derived constants, a theoretical resolution of the sensor can be determined: at the closest point (40cm), the x/y-plane resolution is 0.712mm whereas at 4 metres from the sensor, this drops to 7.12mm. The effective resolution follows a linear correlation and is based on the higher capture resolution of 640x480.<br />
<br />
The closer distance resolution compares very favourably with that which would typically be used in the field by archaeologists; between the project partners, we were looking at using 1mm accuracy laser scanning equipment as the standard to compare against. Anything measured up to 56.2cm from the Kinect sensor would therefore be better or equal to this standard (in theory).<br />
<br />
The linear nature of increasing resolution as objects are further away from the camera is something that should also be taken into account when performing the <a href="http://kinectupthepast.blogspot.co.uk/2013/07/aligning-point-clouds.html">Iterative Closest Point</a> algorithm, perhaps favouring pairings closer to the camera. While this has not been factored into our current process, it is certainly something worth investigating to aid tracking accuracy. It also highlights the need for objects close to the camera to be visible while tracking and stitching for optimal results; these could be later removed from the final models.<br />
<br />
The camera plane resolution is only one part of the overall data accuracy question; the next is how reliable is the depth data returned from the camera. To give an indication of this, the camera was pointed towards a static scene and the depth measurements recorded and compared over time. Lighting is constant and nothing visible enters or exits the scene. The scene is illustrated below and contains visible points that span from being too close to the camera, into its normal range and beyond:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-fng8k8vk8yg/UhSP6G1KihI/AAAAAAAAAJs/1JdU4Xvc6pw/s1600/Kinect+Snapshot.bmp" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="http://4.bp.blogspot.com/-fng8k8vk8yg/UhSP6G1KihI/AAAAAAAAAJs/1JdU4Xvc6pw/s400/Kinect+Snapshot.bmp" width="400" /></a></div>
<br />
It should be noted at this stage that what follows are observations based on rudimentary experiments and are by no means rigorously tested under strict scientific conditions. That said, conditions have been maintained enough to provide some indicative results that have some meaning.<br />
<br />
The first thing we looked at was the variation of depth for a given pixel in the depth map. Over 300 frames of the static scene (10 seconds), the minimum and maximum depths reported per pixel are extracted along with an average pixel depth. The average pixel depth is plotted against the average variation, removing the outliers. This gives the plot below – note the default Kinect range is used given an effective sensor range of 0.8m and 4m.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-BBqzPtpe2Jw/UhSQDOi2kpI/AAAAAAAAAJ0/6AveBtFMgwE/s1600/average+depth+variations.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="339" src="http://4.bp.blogspot.com/-BBqzPtpe2Jw/UhSQDOi2kpI/AAAAAAAAAJ0/6AveBtFMgwE/s640/average+depth+variations.jpg" width="640" /></a></div>
<br />
This plot demonstrates that there is noise at all depths from the camera sensor, although perhaps measurements less than 1.7m suffer less as the base-line tends to be lower. This adds further support for this technology when scanning objects closer to the camera than further away. Many of the high peaks can probably be attributed to the edges of objects as our scene has little in the way of smooth gradient changes. The depth error therefore seems to be within a few millimetres at nearer distances.<br />
<br />
In addition to the variation of depths per pixel, there are times when the sensor can fluctuate between returning valid and invalid data. Constructing a histogram of percentage of valid pixels suggests that in our example 89% of the pixels remained valid leaving 11% fluctuating. A plot of this 11% (normalised) is given below (note that 100% valid has not been plotted to ensure the remaining values can be displayed on a meaningful scale).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://4.bp.blogspot.com/-IZPGiBdlb3Q/UhSQLaj8bQI/AAAAAAAAAJ8/QqvwFvBpkMs/s1600/valid+pixels.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="340" src="http://4.bp.blogspot.com/-IZPGiBdlb3Q/UhSQLaj8bQI/AAAAAAAAAJ8/QqvwFvBpkMs/s640/valid+pixels.jpg" width="640" /></a></div>
<br />
The majority of those pixels without 100% valid depth values across time have either a very low (below 1%) or a high (above 98%) percentage. While “missing” depth data isn’t a significant problem, it is still worth noting that it happens and thus erroneous data values needed to be pruned (such as those with low valid percentages) and it cannot be relied upon that each pixel will consistently have depth values. Thus pruning algorithms need to take into account temporal history.<br />
<br />
In summary, the accuracy of the Kinect sensor appears sufficient to capture good resolution objects with 1-2mm resolution at a distance less than 1 metre from the camera. However because of the variation of the depth data and validity of pixel data, care needs to be taken when designing the tracking and stitching process to accommodate the level of error that we are seeing.<br />
<div>
<br /></div>
Anonymousnoreply@blogger.com1tag:blogger.com,1999:blog-1484210772534812639.post-41903032333990936702013-07-04T11:06:00.005+01:002013-07-04T11:06:51.000+01:00When Tracking and Merging Goes Wrong<span style="font-family: inherit;">
</span><span><span style="font-family: inherit;">As mentioned in previous
posts, the Kinect sensor data can be noisy which can lead to errors during both
the ICP and merging processes, potentially compounding each other.<span style="mso-spacerun: yes;"> </span>The process itself is also subject to inherent
algorithmic issues.<span style="mso-spacerun: yes;"> </span>This short post
gives a couple of examples where the tracking drifts and demonstrates the
importance of the perspective correction factor previously discussed balancing
the thresholds used in the various processes. <o:p></o:p></span></span><br />
<span style="font-family: inherit;">
</span><br />
<span style="font-family: inherit;">
</span><span><span style="font-family: inherit;">In the first video we’ve
completely removed the perspective correction factor.<span style="mso-spacerun: yes;"> </span>Without this, as the camera moves around the
scene, the tracking processes does it best to stitch “skewed” frame data into
world space yet inevitably fails.<o:p></o:p></span></span><br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/Auyd-TJalUw?feature=player_embedded' frameborder='0'></iframe></div>
<span style="font-family: inherit;">
</span><br />
<span><span style="font-family: inherit;">The next video takes a much
larger pan but due to the flat wall surfaces being tracked that provide little
depth variation, the tracking algorithm drifts upwards.<o:p></o:p></span></span><br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/rCVYRAC8fm8?feature=player_embedded' frameborder='0'></iframe></div>
<span style="font-family: inherit;">
</span><br />
<br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span><span style="font-family: inherit;">In the above videos, the top
two left window show what the camera is seeing as a colour image and as a
greyscale depth map.<span style="mso-spacerun: yes;"> </span>The bottom left
window is a 3D point cloud representation of the RGB and depth data combined
from the Kinect sensor (which would normally be correctly coloured, but the
green in this window represents point matches into the world space).<span style="mso-spacerun: yes;"> </span>The larger window in the centre of the screen
is the compiled world space.<span style="mso-spacerun: yes;"> </span>The green
wireframe box indicates the current camera position and orientation into this
world.<span style="mso-spacerun: yes;"> </span>Green indicates the points that
are paired with the individual captures from the Kinect device.<o:p></o:p></span></span></div>
<span style="font-family: inherit;">
</span><br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span><span style="font-family: inherit;">Underneath the larger 3D window
are "debug" outputs - the one on the left give internal states for
the steps within the matching process and the right one gives the camera
orientation of the current frame in terms of rotation and offset into the
global space.</span></span></div>
Anonymousnoreply@blogger.com1tag:blogger.com,1999:blog-1484210772534812639.post-87655081634839197992013-07-04T10:35:00.000+01:002013-07-04T10:35:07.729+01:00Merging Point Clouds<span style="font-family: inherit;">
</span><span><span style="font-family: inherit;">Once the frame data has been
aligned with the world space using the iterative closest point algorithm, it
can be merged to created the larger environment.<span style="mso-spacerun: yes;"> </span>In this process we maintain the concept of
point clouds as opposed to create surfaces.<span style="mso-spacerun: yes;">
</span>There are three components to merging the dataset:</span></span><br />
<ol>
<li><span><span style="font-family: inherit;">Refine existing world points</span></span></li>
<li><span></span><span><span style="font-family: inherit;">Add new frame points</span></span></li>
<li><span></span><span><span style="font-family: inherit;">Remove erroneous world points</span></span></li>
</ol>
<span style="font-family: inherit;">
</span><br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span><span style="font-family: inherit;">Merging point clouds proceeds
by considering only a subset of world points.<span style="mso-spacerun: yes;">
</span>This subset is defined as the points that fall within the camera’s view
frustum when transformed by the frame transformation.<span style="mso-spacerun: yes;"> </span>In the discussion that follows, this subset
will be referred to as simply the world points.<o:p></o:p></span></span></div>
<span style="font-family: inherit;">
</span><br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span></span><span style="font-family: inherit;"></span><br /></div>
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<b style="mso-bidi-font-weight: normal;"><span><span style="font-family: inherit;">Refine Existing World Points<o:p></o:p></span></span></b></div>
<span style="font-family: inherit;">
</span><span><span style="font-family: inherit;">Each world point is matched
against a frame point after applying the frame transformation.<span style="mso-spacerun: yes;"> </span>The matching threshold can be stricter than
the ICP process to increase world point cloud density; for example, a threshold
of 1cm will provide a final resolution of 1cm whereas a threshold of 1mm will
provide much more fidelity although it might also introduce errors due to the
level of noise returned by the Kinect sensor.<span style="mso-spacerun: yes;">
</span>More information about the level of accuracy, noise and reliability will
be given in a future post.<o:p></o:p></span></span><br />
<span style="font-family: inherit;">
</span><br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span><span style="font-family: inherit;">The world points are updated
using the matches, where one frame point may map to many world points.<span style="mso-spacerun: yes;"> </span>After the existing points are updated, all
world points and frame points that are involved in a match are ignored for the
remaining merging processes.<o:p></o:p></span></span></div>
<span style="font-family: inherit;">
</span><br />
<br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<b style="mso-bidi-font-weight: normal;"><span><span style="font-family: inherit;">Add New Frame Points<o:p></o:p></span></span></b></div>
<span style="font-family: inherit;">
</span><span><span style="font-family: inherit;">The frame points that were
not matched to a world point are considered new points and added to the world
point dataset.<o:p></o:p></span></span><br />
<span style="font-family: inherit;">
</span><br />
<br />
<b style="mso-bidi-font-weight: normal;"><span><span style="font-family: inherit;">Remove Erroneous World Points<o:p></o:p></span></span></b><br />
<span style="font-family: inherit;">
</span><span><span style="font-family: inherit;">There is noise in the depth
map that the Kinect sensor returns and thus some points in the world dataset
will also be erroneous and need to be pruned.<span style="mso-spacerun: yes;">
</span>The strategy employed here is to eliminate any world points that fall
within the transformed camera frustum that do not have significant support for
their existence.<span style="mso-spacerun: yes;"> </span>We therefore don’t
simply remove per frame all world points that don’t have a match with the frame
points as the frame itself could be in error.<span style="mso-spacerun: yes;">
</span>Instead as each world point is updated and added, we take note of when
they were last seen.<span style="mso-spacerun: yes;"> </span>If at the end of
each frame there are world points that have not been matched and have not been
seen for a given period of frames, they are removed.</span></span><br />
<span></span><br />
<span></span><br />
<span><strong><span style="font-family: inherit;">Tracking and Merging Example</span></strong></span><br />
<span><span><span style="font-family: inherit;">The following video
illustrates the process of tracking and merging point clouds.<span style="mso-spacerun: yes;"> </span>The bottom left window is a 3D point cloud
representation of the RGB and depth data combined from the Kinect sensor (which
would normally be correctly coloured, but the green in this window represents
point matches into the world space).<span style="mso-spacerun: yes;"> </span>The
larger window in the centre of the screen is the compiled world space.<span style="mso-spacerun: yes;"> </span>The green wireframe box indicates the current
camera position and orientation into this world.<span style="mso-spacerun: yes;"> </span>Green indicates the points that are paired
with the individual captures from the Kinect device.<o:p></o:p></span></span><br />
<span style="font-family: inherit;">
<span><o:p> </o:p></span></span><br />
<span style="font-family: inherit;">
<span style="font-family: "Times New Roman"; mso-ansi-language: EN-GB; mso-bidi-language: AR-SA; mso-fareast-font-family: "Times New Roman"; mso-fareast-language: EN-GB;">Underneath the larger 3D window are "debug"
outputs - the one on the left give internal states for the steps within the
matching process and the right one gives the camera orientation of the current
frame in terms of rotation and offset into the global space.</span></span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen='allowfullscreen' webkitallowfullscreen='webkitallowfullscreen' mozallowfullscreen='mozallowfullscreen' width='320' height='266' src='https://www.youtube.com/embed/Mg1afz9iTl4?feature=player_embedded' frameborder='0'></iframe></div>
</span><span style="font-family: inherit;"></span><br />
Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1484210772534812639.post-90957103831096730782013-07-02T15:41:00.000+01:002013-07-04T10:35:55.806+01:00Aligning Point Clouds<span style="font-size: 12pt;">Point clouds are joined
together once the Kinect depth map data has been converted into an absolute [x,
y, z] coordinate system, as described in an earlier.<span style="mso-spacerun: yes;"> </span>At the core of this process is the Iterative
Closest Point (ICP) algorithm.<span style="mso-spacerun: yes;"> </span>The ICP
process finds an affine transformation between two point clouds that maximise
their overlapping regions (thus minimises the distance between the two point clouds in space).<o:p></o:p></span><br />
<br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<i style="mso-bidi-font-style: normal;"><span style="font-size: 12pt;">It should be noted that this post will only provide a textual overview of the
process, which will be expanded in future articles.<span style="mso-spacerun: yes;"> </span>The web also has a good number of resources
and examples of this process that can be found by searching for Iterative
Closest Point.<o:p></o:p></span></i></div>
<br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span style="font-size: 12pt;">The following definitions
describe the terms to be used:<o:p></o:p></span></div>
<ul>
<li>
<span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: Symbol; mso-fareast-font-family: Symbol;"><span style="mso-list: Ignore;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";">
</span></span></span><!--[endif]--><span style="font-size: 12pt;">Frame Data (F): Perspective-corrected depth map data from the Kinect device<o:p></o:p></span></li>
<li><span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: Symbol; mso-fareast-font-family: Symbol;"><span style="mso-list: Ignore;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";"></span></span></span><!--[endif]--><span style="font-size: 12pt;">World Data (W): An accumulation of point data from one or many frames defined in world space<o:p></o:p></span></li>
<li><span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: Symbol; mso-fareast-font-family: Symbol;"><span style="mso-list: Ignore;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";"></span></span></span><!--[endif]--><span style="font-size: 12pt;">Frame Transformation (T): A 4x4 affine transformation matrix that transforms frame data into world space defined by rotation and translation about the 3 axes (affording 6 independent variables).<o:p></o:p></span></li>
</ul>
<span style="font-size: 12pt;"></span><br />
<span style="font-size: 12pt;">The basic ICP algorithm
proceeds as follows and is per frame of data:<o:p></o:p></span><br />
<ol>
<li>
<span style="font-size: 12pt;"><span style="mso-list: Ignore;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";">
</span></span></span><!--[endif]--><span style="font-size: 12pt;">For each pixel in the frame data, apply the current frame transformation and find the closest point in the world data<o:p></o:p></span></li>
<li><span style="font-size: 12pt;">Minimise the distance between the pairs by refining the frame transformation<o:p></o:p></span></li>
<li><span style="font-size: 12pt;"><span style="mso-list: Ignore;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";"></span></span></span><!--[endif]--><span style="font-size: 12pt;">Repeat back to step 1 while the sum of errors over all pairs is reducing<o:p></o:p></span></li>
</ol>
<span style="font-size: 12pt;"></span><br />
<span style="font-size: 12pt;">Initially the frame
transformation is the identify matrix and the world data is the first frame
from the Kinect device.<span style="mso-spacerun: yes;"> </span>The closest
point also takes into account the point normals which are calculated using neighbouring points.</span><br />
<br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span style="font-size: 12pt;">The minimisation step is an
inner iterative process that makes use of the Jacobian of the transformation
matrix; the Jacobian relates small changes in the independent variables with
changes in the positions of the frame data transformed into world space.<span style="mso-spacerun: yes;"> </span>The aim of ICP is thus to minimise the
general equation (F*T)-W over all frame and world pairs (although in practice,
this equation becomes slightly more complex when normal point-to-plane
distances are taken into account.<span style="mso-spacerun: yes;">
</span>Furthermore, the error function used to measure the correctness of the
transformation matrix is typically a sum of square errors).<o:p></o:p></span></div>
<br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span style="font-size: 12pt;">The inner iteration that
involves the Jacobian is due to the manner in which the problem is linearised and
thus at each step, a new approximation to the solution is given which might be
bettered using the new values from the previous step and so on.<span style="mso-spacerun: yes;"> </span>The inner iterations occur until the error
between the pairs is no longer decreased.<o:p></o:p></span></div>
<br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span style="font-size: 12pt;">The outer iteration of the
above algorithm uses the revised frame transformation to determine new (and
hopefully better) pairings.<span style="mso-spacerun: yes;"> </span>The outer
loop continues while the resulting sum of distances between pairs is
decreasing.<o:p></o:p></span></div>
<br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span style="font-size: 12pt;"><span style="font-family: "Times New Roman"; font-size: 12pt; mso-ansi-language: EN-GB; mso-bidi-language: AR-SA; mso-fareast-font-family: "Times New Roman"; mso-fareast-language: EN-GB;">Assuming this process finds a transformation
that generates a low error, the frame data can be transformed into world space
and merged with the world environment.<span style="mso-spacerun: yes;"> </span>The
next frame of data is processed, using the last known frame transformation, and
so on, building the virtual environment by piecing together the data from the
individual frames.<span style="mso-spacerun: yes;"> </span>This will be
discussed further in a later post.</span></span><br />
<br />
An example of the tracking process is given below:<br />
<div class="separator" style="clear: both; text-align: center;">
<object width="320" height="266" class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="https://ytimg.googleusercontent.com/vi/knQolYGF_PI/0.jpg"><param name="movie" value="https://www.youtube.com/v/knQolYGF_PI?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" /><param name="bgcolor" value="#FFFFFF" /><param name="allowFullScreen" value="true" /><embed width="320" height="266" src="https://www.youtube.com/v/knQolYGF_PI?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" type="application/x-shockwave-flash" allowfullscreen="true"></embed></object></div>
<br />
<div class="separator" style="clear: both; text-align: left;">
In this video, the first frame is captured and taken to be the world data; updates to the world data are not undertaken. Each frame of the video represents a new frame from the Kinect device. The cyan colour dots indicate that a pairing between the frame data and world data has been found and lies within a given threshold. The red dots indicate that the closest point match between frame and world data lies outside the given threshold. The video shows the position of the frame data once it has been transformed into world space and hence the more stable the features, the better the tracking (if just the original frame data were viewed, the scene would be seen to pan and shake as the camera is moved).</div>
</div>
Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-1484210772534812639.post-53431562761412936652013-07-02T10:01:00.000+01:002013-07-02T10:01:23.547+01:00Mapping Depth Data into Virtual Space<span style="font-size: 12pt;">While mapping depth values
from the Kinect sensor to a virtual space is straightforward, a perspective
correction factor needs to be taken into account, which is discussed in this post.<span style="mso-spacerun: yes;"> </span>In the following, the official Windows Kinect
SDK is used and all formula given relate to the specific values returned from
the API (which can be different to those returned by unofficial SDKs).<span style="mso-spacerun: yes;"> </span>Depth data is delivered as scanlines from
bottom to top.<o:p></o:p></span><br />
<br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span style="font-size: 12pt;">To convert Kinect data into
3D space where one unit is equal to 1 metre:</span></div>
<span style="font-family: 'Courier New'; font-size: 12pt;"><div>
<span style="font-family: 'Courier New'; font-size: 12pt;"><br /></span></div>
scale=depth*PERSPECTIVE_CORRECTION</span><br /><span style="font-family: 'Courier New'; font-size: 12pt;">x=(i-(DEPTH_FRAME_WIDTH/2))*scale;</span><br /><span style="font-family: 'Courier New'; font-size: 12pt;">y=(j-(DEPTH_FRAME_HEIGHT/2))*scale;</span><br /><span style="font-family: 'Courier New'; font-size: 12pt;">z=-depth/1000;</span><br /><ul>
</ul>
Where:
<br />
<ul>
<li><span style="font-family: "Courier New"; font-size: 12pt;">depth</span><span style="font-size: 12pt;"> is the millimetre depth value returned by the
Kinect device within the depth map<o:p></o:p></span></li>
<li class="MsoNormal" style="margin: 0cm 0cm 0pt; mso-list: l0 level1 lfo1; tab-stops: list 36.0pt;"><span style="font-family: "Courier New"; font-size: 12pt;">PERSPECTIVE_CORRECTION</span><span style="font-size: 12pt;"> is an empirically derived constant that converts
from the camera’s perspective into an orthogonal view (essentially
“undoing” the natural perspective view of the camera)<o:p></o:p></span></li>
<li class="MsoNormal" style="margin: 0cm 0cm 0pt; mso-list: l0 level1 lfo1; tab-stops: list 36.0pt;"><span style="font-family: "Courier New"; font-size: 12pt;">DEPTH_FRAME_WIDTH</span><span style="font-size: 12pt;"> is the width dimension of the depth map
(typically 320 or 640)</span><span style="font-family: "Courier New"; font-size: 12pt;"><o:p></o:p></span></li>
<li class="MsoNormal" style="margin: 0cm 0cm 0pt; mso-list: l0 level1 lfo1; tab-stops: list 36.0pt;"><span style="font-family: "Courier New"; font-size: 12pt;">DEPTH_FRAME_HEIGHT</span><span style="font-size: 12pt;"> is the height dimension of the depth map
(typically 240 or 480)</span><span style="font-family: "Courier New"; font-size: 12pt;"><o:p></o:p></span></li>
<li class="MsoNormal" style="margin: 0cm 0cm 0pt; mso-list: l0 level1 lfo1; tab-stops: list 36.0pt;"><span style="font-family: "Courier New"; font-size: 12pt;">i</span><span style="font-size: 12pt;"> and </span><span style="font-family: "Courier New"; font-size: 12pt;">j</span><span style="font-size: 12pt;">
represent the i<sup>th</sup> pixel from the left and j<sup>th</sup> pixel from
the bottom of the frame<o:p></o:p></span></li>
</ul>
Notes:<br />
<ul>
<li><!--[endif]--><span style="font-size: 12pt;">This formula translates the depth values onto the negative z-axis such that a value of zero is the camera position and -1.0 is 1 metre away.<o:p></o:p></span></li>
<li><span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: Symbol; mso-fareast-font-family: Symbol;"><span style="mso-list: Ignore;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";"></span></span></span><!--[endif]--><span style="font-size: 12pt;">A right-handed coordinate system is used.<o:p></o:p></span></li>
<li><span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: Symbol; mso-fareast-font-family: Symbol;"><span style="mso-list: Ignore;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";"></span></span></span><!--[endif]--><span style="font-size: 12pt;">The </span><span style="font-family: "Courier New"; font-size: 12pt;">PERSPECTIVE_CORRECTION</span><span style="font-size: 12pt;"> constant is fixed for a given depth map resolution and defined as </span><span style="font-size: 12pt;">0.00000356 for a resolution of 320x240 and </span><span style="font-size: 12pt;">0.00000178 for a resolution of 640x480<o:p></o:p></span></li>
<li><span style="font-family: Symbol; font-size: 12pt; mso-bidi-font-family: Symbol; mso-fareast-font-family: Symbol;"><span style="mso-list: Ignore;"><span style="font-size-adjust: none; font-stretch: normal; font: 7pt/normal "Times New Roman";">W</span></span></span><span style="font-size: 12pt;">hen doubling the width and depth of the depth map, the constant is halved</span></li>
</ul>
<span style="font-size: 12pt;"><o:p><br />
<br />
</o:p></span><br />
<div class="MsoNormal" style="margin: 0cm 0cm 0pt;">
<span style="font-size: 12pt;"><b style="mso-bidi-font-weight: normal;"><span style="font-size: 12pt;">Perspective Correction<o:p></o:p></span></b></span></div>
<span style="font-size: 12pt;">
<br />
<span style="font-family: "Times New Roman"; font-size: 12pt; mso-ansi-language: EN-GB; mso-bidi-language: AR-SA; mso-fareast-font-family: "Times New Roman"; mso-fareast-language: EN-GB;">The camera’s perspective field of vision is important
to factor out to get precise [x, y, z] coordinates in space that can
be used to correlate different snapshots of the same scene taken at different
angles since camera perspective varies according to camera position.<span style="mso-spacerun: yes;"> </span>Figure 1 illustrates the result of mapping
depth values directly to fixed [x, y] coordinates without taking into account
perspective.</span></span><a href="http://4.bp.blogspot.com/-zWfG9-nBkao/UdKRz6QNgGI/AAAAAAAAAHU/EJAb04AxODk/s623/Without+Correction+-+Front.jpg" imageanchor="1" style="clear: left; display: inline !important; font-size: 12pt; height: 527px; margin-bottom: 1em; margin-right: 1em; text-align: center; width: 637px;"></a><br />
<a href="http://4.bp.blogspot.com/-zWfG9-nBkao/UdKRz6QNgGI/AAAAAAAAAHU/EJAb04AxODk/s623/Without+Correction+-+Front.jpg" imageanchor="1" style="clear: left; float: left; font-size: 12pt; height: 527px; margin-bottom: 1em; margin-right: 1em; width: 637px;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><img border="0" src="http://4.bp.blogspot.com/-zWfG9-nBkao/UdKRz6QNgGI/AAAAAAAAAHU/EJAb04AxODk/s623/Without+Correction+-+Front.jpg" style="margin-left: auto; margin-right: auto;" /></td></tr>
<tr><td class="tr-caption" style="text-align: center;">
<span style="mso-bidi-font-size: 10.0pt;"><span style="font-size: x-small;">Figure 1a) Mapping depth values to fixed [x, y] coordinates
without perspective correction: view seen from the camera<o:p></o:p></span></span><br />
<span style="font-size: small;">
</span></td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
<br />
</a><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://3.bp.blogspot.com/-cvkRewk2Me8/UdKR0D7OYmI/AAAAAAAAAHo/nC0ZHavU7wI/s624/Without+Correction+-+Top.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><span style="color: black;"><img border="0" src="http://3.bp.blogspot.com/-cvkRewk2Me8/UdKR0D7OYmI/AAAAAAAAAHo/nC0ZHavU7wI/s624/Without+Correction+-+Top.jpg" /></span></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: x-small;">Figure 1b) Mapping depth values to fixed [x,y] coordinates without perspective correction: view of the scene from above - note that the wall and shelves do not make right-angles due to the camera taking a perspective view</span></td></tr>
</tbody></table>
By including the perspective correction, real-world right angles remain right angles in the virtual space and distances are corrected to their absolute values as illustrated in Figure 2.<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="http://2.bp.blogspot.com/-IiEGLws5hN0/UdKRzuXkDDI/AAAAAAAAAHQ/KNUATFekg9A/s621/With+Correction+-+Front.jpg" imageanchor="1" style="clear: left; font-size: 12pt; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="http://2.bp.blogspot.com/-IiEGLws5hN0/UdKRzuXkDDI/AAAAAAAAAHQ/KNUATFekg9A/s621/With+Correction+-+Front.jpg" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: x-small;">Figure 2a) Mapping depth values to absolute [x, y, z] coordinates using perspective correction: view seen from the camera</span></td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://1.bp.blogspot.com/--tzSw43gvpI/UdKRzt-kqyI/AAAAAAAAAHY/B4zrGo6lSIk/s622/With+Correction+-+Top.jpg" imageanchor="1" style="font-size: 12pt; margin-left: auto; margin-right: auto;"><img border="0" src="http://1.bp.blogspot.com/--tzSw43gvpI/UdKRzt-kqyI/AAAAAAAAAHY/B4zrGo6lSIk/s622/With+Correction+-+Top.jpg" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 2b) Mapping depth values to absolute [x, y, z] coordinates using perspective correction: view of the scene from above – note that the wall and shelves make right-angles when using the perspective correction constant and appear straight and well aligned</td></tr>
</tbody></table>
<br /><div class="MsoNormal">
<span style="font-size: 12.0pt;">The perspective correction
was determined by measuring objects in the real world and comparing them to the
size of their virtual counterpart without correction. This was correlated against distance from the
camera, resulting in the derived constants.
The formula for determining the initial fixed [x, y] positions are given
below:</span></div>
<div class="MsoNormal">
<span style="font-family: 'Courier New'; font-size: 12pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: 'Courier New'; font-size: 12pt;">x=(i-(DEPTH_FRAME_WIDTH/2))*WORLD_SCALE;</span></div>
<div class="MsoNormal">
<span style="font-family: 'Courier New'; font-size: 12pt;">y=(j-(DEPTH_FRAME_HEIGHT/2))*WORLD_SCALE;</span></div>
<div class="MsoNormal">
<span style="font-family: 'Courier New'; font-size: 12pt;">z=-depth*WORLD_SCALE*DEPTH_SCALE;</span></div>
<div class="MsoNormal">
<span style="font-family: 'Courier New'; font-size: 12pt;"><br /></span></div>
<div class="MsoNormal">
<span style="font-family: "Courier New"; font-size: 12.0pt;">WORLD_SCALE</span><span style="font-size: 12.0pt;"> is 0.01 or 0.02 for 640x480 and 320x240 depth map
resolutions respectively and </span><span style="font-family: "Courier New"; font-size: 12.0pt;">DEPTH_SCALE</span><span style="font-size: 12.0pt;"> is 0.1. These values were selected empirical to offer
a visually good representation of the real world when mapped into the virtual
space.<o:p></o:p></span></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
</div>
<div class="MsoNormal">
<span style="font-size: 12.0pt;">Using this mapping, a number of objects were placed in
front of the camera and measured in both the real world and virtual space along
their x- and y-axis to provide a scale factor mapping between the two
spaces. These values are given in Table
1 along with the object’s distance from the camera.</span><span style="font-family: "Courier New"; font-size: 9.5pt;"><o:p></o:p></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt;"><br /></span></div>
<table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: none; mso-border-alt: solid windowtext .5pt; mso-border-insideh: .5pt solid windowtext; mso-border-insidev: .5pt solid windowtext; mso-padding-alt: 0cm 5.4pt 0cm 5.4pt; mso-yfti-tbllook: 480;">
<tbody>
<tr>
<td style="border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">Distance from Camera<o:p></o:p></span></div>
</td>
<td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">Mean Scale Factor<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">810mm<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">0.137<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">1380mm<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">0.245<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">2630mm<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">0.472<o:p></o:p></span></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">3750mm<o:p></o:p></span></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0cm 5.4pt 0cm 5.4pt; width: 146.7pt;" valign="top" width="196">
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: 12.0pt;">0.666<o:p></o:p></span></div>
</td>
</tr>
</tbody></table>
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: x-small;">Table 1: Scale factors between real and virtual objects at a specific distance</span><span style="font-size: small;"><o:p></o:p></span></div>
<div class="MsoNormal" style="text-align: left;">
<span style="font-size: x-small;"><br /></span></div>
<div class="MsoNormal">
<span style="font-size: 12.0pt;">Plotting the two columns of
Table 1 against each other illustrates a linear correlation, as shown in Figure
3.<o:p></o:p></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://1.bp.blogspot.com/-VL8eLmXxKRM/UdKWl_I5xrI/AAAAAAAAAH4/5mDV0IeFYDM/s620/Calibrations.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://1.bp.blogspot.com/-VL8eLmXxKRM/UdKWl_I5xrI/AAAAAAAAAH4/5mDV0IeFYDM/s620/Calibrations.jpg" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: x-small;">Figure 3: Plotting distance from camera against
mean depth scale factor for perspective correction</span></td></tr>
</tbody></table>
<div class="MsoNormal">
<span style="font-size: 12.0pt;"><br /></span></div>
<br />
<div>
<div class="MsoNormal">
<span style="font-size: 12.0pt;">The gradient of the linear
line in Figure 3 gives the perspective correction value, calculated with
respect to millimetre distances as per the original set of equations and
factoring in the </span><span style="font-family: "Courier New"; font-size: 12.0pt;">DEPTH_SCALE</span><span style="font-size: 12.0pt;"> and </span><span style="font-family: "Courier New"; font-size: 12.0pt;">WORLD_SCALE</span><span style="font-size: 12.0pt;"> constants as per
the second set of uncorrected equations.<o:p></o:p></span></div>
</div>
Anonymousnoreply@blogger.com1tag:blogger.com,1999:blog-1484210772534812639.post-35679781959375025692012-10-23T15:47:00.001+01:002012-10-23T15:47:10.759+01:00Introductions...<span style="font-family: Arial, Helvetica, sans-serif;">Welcome to the blog for the </span><a href="http://www.shef.ac.uk/hri/projects/projectpages/kinecting"><span style="font-family: Arial, Helvetica, sans-serif;">Kinecting up the Past</span></a><span style="font-family: Arial, Helvetica, sans-serif;"> project at
the </span><a href="http://www.shef.ac.uk/hri"><span style="font-family: Arial, Helvetica, sans-serif;">Humanities Research Institute</span></a><span style="font-family: Arial, Helvetica, sans-serif;">, </span><a href="http://www.shef.ac.uk/"><span style="font-family: Arial, Helvetica, sans-serif;">The University of Sheffield</span></a><span style="font-family: Arial, Helvetica, sans-serif;">. </span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"></span><br />
<span style="mso-ansi-language: EN-GB; mso-bidi-language: AR-SA; mso-fareast-font-family: "Times New Roman"; mso-fareast-language: EN-GB;"><span style="font-family: Arial, Helvetica, sans-serif;">Kinecting up the Past is funded by </span><a href="http://www.jisc.ac.uk/"><span style="font-family: Arial, Helvetica, sans-serif;">JISC</span></a><span style="font-family: Arial, Helvetica, sans-serif;"> and partners
with </span><a href="http://www.creswell-crags.org.uk/"><span style="font-family: Arial, Helvetica, sans-serif;">Creswell Crags</span></a><span style="font-family: Arial, Helvetica, sans-serif;"> and </span><a href="http://tparchaeology.co.uk/"><span style="font-family: Arial, Helvetica, sans-serif;">Trent and Peak Archeology</span></a><span style="font-family: Arial, Helvetica, sans-serif;">. <span style="mso-spacerun: yes;"> </span>The aim is to make Microsoft's </span><a href="http://www.microsoft.com/en-us/kinectforwindows/"><span style="font-family: Arial, Helvetica, sans-serif;">Kinect </span></a><span style="font-family: Arial, Helvetica, sans-serif;">device as simple
to use as a digital camera, but to capture 3-dimensional objects and
environments, making it much more accessible, easy to use, and cheaper to
undertake for those without specific skill sets.</span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;">The project team includes:</span><br />
<ul>
<li><span style="font-family: Arial, Helvetica, sans-serif;"><strong>The Humanities Research Institute</strong> at the University of Sheffield: one of the UK’s leading centres for the study and use of digital technology within the arts and humanities. As a major research facility within the University, it currently comprises 21 active research projects and nine staff. The HRI has expertise in all aspects of this project, having been involved in the conception, management and delivery of digital humanities research projects since its establishment in 1992.</span></li>
<li><span style="font-family: Arial, Helvetica, sans-serif;"><strong>Creswell Heritage Trust:</strong> established in 1991 to manage the internationally significant Ice Age site of Creswell Crags. The mission to inscribe Creswell Crags as a World Heritage Site has resulted in £14 million infrastructure investment over the last twenty years including a new Museum and Education Centre build. The Crags are now on the Governments UK World Heritage Tentative List. Ian Wall, the Director of the Trust and who will oversee the Trust’s involvement in this project, has more than twenty years of experience in heritage management and interpretation</span></li>
<span style="font-family: Arial, Helvetica, sans-serif;">
<li><strong>Trent & Peak Archaeology: </strong>part of the Archaeological Trust family and has been practicing archaeology in the Trent Valley, Peak District, and surrounding areas for over 45 years, developing strong expertise in the area of 3D digital capture. Dr David Strange-Walker is the main contact for this project at Trent & Peak and is the Project Manager on the Nottingham Caves Survey, an innovative project to survey hundreds of sandstone caves to high archaeological standards while simultaneously increasing public awareness and understanding of the heritage through social media and smartphone delivery.</li>
</span></ul>
Anonymousnoreply@blogger.com0