Accuracy
of AutoDesk 123D Catch?
In April 2011,
Autodesk provided access to a free and simple to use package for creating 3-D
meshes from user supplied digital imagery (traditionally known as the
"photogrammetry"!). Originally known as Project PhotoFly,
this has evolved into the current package known as "Autodesk 123D Catch ",
but how accurate is it? This brief report outlines an initial approach adopted to
answer this question.
The beta version of
123D Catch can be freely downloaded and is currently free to use for
non-commercial purposes. It simply requires the user to supply a minimum of
three images of an object, which are then uploaded to a server for processing, presumably
involving PVMS methods (Patch-based Multiview
Stereo). No restrictions appear to be placed upon camera type or focus setting,
so some form of camera calibration is being conducted. Of course, all photographs cannot be simply taken from the same location and consistent/natural lighting conditions generate optimum results. The processing strategy
appears similar to the Microsoft Photosynth initiative, in that minimal user
input is required other than providing the images. However, 123D Catch appears
to offer increased benefits, particularly in the visualisation and options to
extract data for subsequent use. Both these packages represent a significant
development for those interested in photogrammetry. However, a key question is
just how accurate are the data generated by a wholly image based approach that
uses no external object control constraints and includes fully automated camera
calibration.
For this latest test,
the 16 original images were uploaded to the 123D Catch server and restituted successfully and automatically within just 15 min.
Originally, this stage had required 4 days
data-processing requiring a high level of user interaction and experience.
Visually, the mesh looked superb and indeed a 3-D fly-through visualisation
suggested a far wider area of successful measurement than had been achieved prior. The control points were then measured individually in 123D Catch as
"reference points" and the model was scaled using a known distance
and the "define reference distance" command. The model and measured
data were then exported as an Autodesk FBX file, this ASCII file containing the
measured control, mesh vertices and a variety of other data of relevance to the
camera calibration/ restitution.
|
101 |
-0.013 |
-0.030 |
-0.005 |
|
102 |
-0.004 |
-0.017 |
0.000 |
|
105 |
-0.016 |
-0.003 |
-0.005 |
|
106 |
0.003 |
0.009 |
0.004 |
|
107 |
0.000 |
0.018 |
-0.003 |
|
109 |
0.002 |
0.006 |
-0.001 |
|
110 |
0.008 |
0.006 |
0.007 |
|
111 |
0.004 |
0.009 |
-0.002 |
|
112 |
0.021 |
0.012 |
0.000 |
|
114 |
0.009 |
0.001 |
-0.001 |
|
115 |
0.017 |
0.006 |
0.005 |
|
116 |
0.013 |
-0.002 |
0.004 |
|
117 |
0.003 |
-0.003 |
0.000 |
|
118 |
0.009 |
-0.005 |
0.007 |
|
119 |
0.003 |
-0.005 |
0.000 |
|
120 |
-0.009 |
-0.004 |
-0.001 |
|
122 |
-0.011 |
-0.002 |
0.000 |
|
125 |
-0.016 |
-0.003 |
-0.004 |
|
126 |
-0.024 |
0.007 |
-0.005 |
|
Std.Dev. |
0.012 |
0.011 |
0.004 |
The 123D Catch control
points and original control coordinates were then used in a 3-D similarity
transformation to determine the optimum rigid body transform between the two
coordinate systems. Seven parameters were estimated: 3 translation,
3 rotation and 1 scale. The residuals derived from this least-squares estimation
are presented in Table 1.
|
Table 1 123D Catch residuals following best fit 3-D
similarity transformation |
As the overall
standard deviations suggest, the fit to the original control is just 12 mm, 11
mm and 4 mm in XYZ respectively. Although such accuracy is comparatively low
(1:600) compared to normal stereo close range photogrammetry (1:1,000-1:10,000),
the results are certainly acceptable for many applications, particularly when considering:
the restitution includes camera calibration for each photo; the whole task was
fully automated; and, 123D Catch reduces the resolution of each original image
to just 3 megapixel. Finally, the process is solely image based and no control
constraints have been applied other than applying an approximate scale factor
During the original
data-processing conducted in 2004, a self-calibrating bundle adjustment (Erdas Imagine/ LPS/in-house software) had been used to
derive a set of parameters to model the focal length, principal point offset
and radial lens distortion, which was assumed to be stable for all frames. In
the least-squares adjustment for this original restitution, the overall
residual fit to the control was 3.5 mm, 1.7 mm, 3.4 mm in XYZ respectively.
Clearly this earlier estimation achieved a higher accuracy (1:1,600) than Autodesk
123D Catch could manage, but a significantly greater effort had been required!
Figure 2 Residual fit following 3D similarity
transformation
Examining the
residuals graphically and in three dimensions was revealing (Figure 2), the viewpoint
being similar to the camera position adopted for Figure 1. Note also that
viewing using standard red/green stereo glasses enhances the
three-dimensional effect! Figure 2 demonstrates a clear systematic pattern in
which residuals are highest towards the edge and middle, but are in opposing
directions to the approximate camera axes. This could be explained in two ways.
First, the accuracy of the estimated focal lengths for each frame could be
questioned, the inaccurate estimates creating the "push/pull" effect
so graphically represented. Alternatively, the systematic pattern could be
accounted for by considering classical principles associated with vertical
aerial photography used for mapping. Although a series of stereo pairs were captured,
they were effectively in the form of a classical aerial "strip", in
which the normal end lap simply varied sequentially. This is an inherently weak
geometry, one that is recognized, tolerated and accepted because it is usually
managed and minimised through the use of a series of ground control points. Such
control would constrain each image individually, forcing it to fit the known
object space. Without such a control constraint, any strip would have a
tendency to wobble as small systematic errors make their presence known. Indeed
the authors have seen and modelled this type of effect
before (Fryer et al, 1994). This earlier study revealed that a measurement
error introduced into the centre of the block will propagate to the
geometrically weaker periphery, as can be seen repeated for the Emu cave
examined here. Autodesk 123D Catch is wholly image-based, and provides no
opportunity to constrain individual frames in the manner required for this
particular configuration.
The simple solution
would have been to strengthen the image block by including additional frames
which capture larger areas of the cave from different positions. This would
have no doubt prevented the wavering/drifting effect so graphically represented
in figure 2, but unfortunately such imagery wasn't acquired at the time.
This brief test has
provided an assessment of the accuracy of Autodesk's 123D Catch. Although not
at the level of accuracies routinely achieved in normal terrestrial
photogrammetry using control, the accuracy achieved was certainly useful for
many applications. Moreover, if more imagery have been acquired and included to
provide a stronger configuration, accuracies would certainly have been improved.
1st December 2011
Jim Chandler's homepage: http://www-staff.lboro.ac.uk/~cvjhc/index.htm
ISPRS working group V6,"close range morphological measurement for the Earth sciences": http://isprsv6.lboro.ac.uk/
Chandler,
J.H., Fryer, J.G. and Kniest, H.T., 2005. Non-invasive
3D recording of aboriginal rock art using cost effective digital photogrammetry,
Rock Art Research, 22(2): 119-130.
Chandler,
J.H., Bryan, P. and Fryer, J.G., 2007. The development and application of a
simple rock-art recording methodology based on consumer grade digital cameras”,
The Photogrammetric Record, 22(117): 10-21.
Chandler,
J.H. and Fryer, J.G., 2005.'Recording aborginal rock
cut using cheap digital cameras and digital photogrammetry, CIPA XX International
Symposium, XX, International Cooperation to save the World's Cultural
Heritage, Torino, pp. 193-8, ISSN 1682 1777.
Chandler,
J.H. and Bryan, P., 2007. Cost-effective rock-art recording in a production
environment: is There a wider message?, CIPA XXI
International Symposium, AntCIPAting the future
of the Cultural past, Athens, [CD-ROM], ISSN 1682 1777.
Fryer,
J.G., Chandler, J.H. and Cooper, M.A.R., 1994. On
the Accuracy of Heighting from Aerial Photographs and
Maps: Implications to Process Modellers, Earth
Surface Processes and Landforms, 19: 577-583, ISSN 0197-9337.