Accuracy of AutoDesk 123D Catch?

Aboriginal cave re-measurement using digital photogrammetry- Jim Chandler and John Fryer

In April 2011, Autodesk provided access to a free and simple to use package for creating 3-D meshes from user supplied digital imagery (traditionally known as the "photogrammetry"!). Originally known as Project PhotoFly, this has evolved into the current package known as "Autodesk 123D Catch ", but how accurate is it? This brief report outlines an initial approach adopted to answer this question.

Autodesk 123D Catch

The beta version of 123D Catch can be freely downloaded and is currently free to use for non-commercial purposes. It simply requires the user to supply a minimum of three images of an object, which are then uploaded to a server for processing, presumably involving PVMS methods (Patch-based Multiview Stereo). No restrictions appear to be placed upon camera type or focus setting, so some form of camera calibration is being conducted. Of course, all photographs cannot be simply taken from the same location and consistent/natural lighting conditions generate optimum results. The processing strategy appears similar to the Microsoft Photosynth initiative, in that minimal user input is required other than providing the images. However, 123D Catch appears to offer increased benefits, particularly in the visualisation and options to extract data for subsequent use. Both these packages represent a significant development for those interested in photogrammetry. However, a key question is just how accurate are the data generated by a wholly image based approach that uses no external object control constraints and includes fully automated camera calibration.

Aboriginal cave re-measurement

To answer this question, a past project conducted to record an aboriginal cave site was reprocessed. The cave (figure 1) is approximately 9 m in length and is of interest because of engraved features which resemble an emu foot; a token for the local aboriginal community that lived in this area of the Blue Mountains, New South Wales, Australia. This site had been recorded in 2004 using a series of overlapping stereo-pairs acquired using a six megapixel DCS 460 digital camera, equipped with a 24 mm lens. Of significance for this latest study, was the inclusion originally of 20 3D control points which were established using a reflectorless total station. These data were used to conduct the original photogrammetric survey, extracting high resolution DEMs/orthophotos and a fly through visualisation. Further details concerning this earlier work are described more fully in a series of papers: (Chandler et al, 2005; 2007; Chandler and Fryer, 2005; Chandler and Bryan, 2007).

For this latest test, the 16 original images were uploaded to the 123D Catch server and restituted successfully and automatically within just 15 min. Originally, this stage had required 4 days data-processing requiring a high level of user interaction and experience. Visually, the mesh looked superb and indeed a 3-D fly-through visualisation suggested a far wider area of successful measurement than had been achieved prior. The control points were then measured individually in 123D Catch as "reference points" and the model was scaled using a known distance and the "define reference distance" command. The model and measured data were then exported as an Autodesk FBX file, this ASCII file containing the measured control, mesh vertices and a variety of other data of relevance to the camera calibration/ restitution.

Accuracy assessment

 

101

-0.013

-0.030

-0.005

102

-0.004

-0.017

0.000

105

-0.016

-0.003

-0.005

106

0.003

0.009

0.004

107

0.000

0.018

-0.003

109

0.002

0.006

-0.001

110

0.008

0.006

0.007

111

0.004

0.009

-0.002

112

0.021

0.012

0.000

114

0.009

0.001

-0.001

115

0.017

0.006

0.005

116

0.013

-0.002

0.004

117

0.003

-0.003

0.000

118

0.009

-0.005

0.007

119

0.003

-0.005

0.000

120

-0.009

-0.004

-0.001

122

-0.011

-0.002

0.000

125

-0.016

-0.003

-0.004

126

-0.024

0.007

-0.005

Std.Dev.

0.012

0.011

0.004

The 123D Catch control points and original control coordinates were then used in a 3-D similarity transformation to determine the optimum rigid body transform between the two coordinate systems. Seven parameters were estimated: 3 translation, 3 rotation and 1 scale. The residuals derived from this least-squares estimation are presented in Table 1.

Table 1 123D Catch residuals following best fit 3-D similarity transformation

As the overall standard deviations suggest, the fit to the original control is just 12 mm, 11 mm and 4 mm in XYZ respectively. Although such accuracy is comparatively low (1:600) compared to normal stereo close range photogrammetry (1:1,000-1:10,000), the results are certainly acceptable for many applications, particularly when considering: the restitution includes camera calibration for each photo; the whole task was fully automated; and, 123D Catch reduces the resolution of each original image to just 3 megapixel. Finally, the process is solely image based and no control constraints have been applied other than applying an approximate scale factor

During the original data-processing conducted in 2004, a self-calibrating bundle adjustment (Erdas Imagine/ LPS/in-house software) had been used to derive a set of parameters to model the focal length, principal point offset and radial lens distortion, which was assumed to be stable for all frames. In the least-squares adjustment for this original restitution, the overall residual fit to the control was 3.5 mm, 1.7 mm, 3.4 mm in XYZ respectively. Clearly this earlier estimation achieved a higher accuracy (1:1,600) than Autodesk 123D Catch could manage, but a significantly greater effort had been required!

Figure 2 Residual fit following 3D similarity transformation

Examining the residuals graphically and in three dimensions was revealing (Figure 2), the viewpoint being similar to the camera position adopted for Figure 1. Note also that viewing using standard red/green stereo glasses enhances the three-dimensional effect! Figure 2 demonstrates a clear systematic pattern in which residuals are highest towards the edge and middle, but are in opposing directions to the approximate camera axes. This could be explained in two ways. First, the accuracy of the estimated focal lengths for each frame could be questioned, the inaccurate estimates creating the "push/pull" effect so graphically represented. Alternatively, the systematic pattern could be accounted for by considering classical principles associated with vertical aerial photography used for mapping. Although a series of stereo pairs were captured, they were effectively in the form of a classical aerial "strip", in which the normal end lap simply varied sequentially. This is an inherently weak geometry, one that is recognized, tolerated and accepted because it is usually managed and minimised through the use of a series of ground control points. Such control would constrain each image individually, forcing it to fit the known object space. Without such a control constraint, any strip would have a tendency to wobble as small systematic errors make their presence known. Indeed the authors have seen and modelled this type of effect before (Fryer et al, 1994). This earlier study revealed that a measurement error introduced into the centre of the block will propagate to the geometrically weaker periphery, as can be seen repeated for the Emu cave examined here. Autodesk 123D Catch is wholly image-based, and provides no opportunity to constrain individual frames in the manner required for this particular configuration.

The simple solution would have been to strengthen the image block by including additional frames which capture larger areas of the cave from different positions. This would have no doubt prevented the wavering/drifting effect so graphically represented in figure 2, but unfortunately such imagery wasn't acquired at the time.

Conclusion

This brief test has provided an assessment of the accuracy of Autodesk's 123D Catch. Although not at the level of accuracies routinely achieved in normal terrestrial photogrammetry using control, the accuracy achieved was certainly useful for many applications. Moreover, if more imagery have been acquired and included to provide a stronger configuration, accuracies would certainly have been improved.


1st December 2011

Jim Chandler's homepage: http://www-staff.lboro.ac.uk/~cvjhc/index.htm

ISPRS working group V6,"close range morphological measurement for the Earth sciences": http://isprsv6.lboro.ac.uk/

References

Chandler, J.H., Fryer, J.G. and Kniest, H.T., 2005. Non-invasive 3D recording of aboriginal rock art using cost effective digital photogrammetry, Rock Art Research, 22(2): 119-130.

Chandler, J.H., Bryan, P. and Fryer, J.G., 2007. The development and application of a simple rock-art recording methodology based on consumer grade digital cameras”, The Photogrammetric Record, 22(117): 10-21.

Chandler, J.H. and Fryer, J.G., 2005.'Recording aborginal rock cut using cheap digital cameras and digital photogrammetry, CIPA XX International Symposium, XX, International Cooperation to save the World's Cultural Heritage, Torino, pp. 193-8, ISSN 1682 1777.

Chandler, J.H. and Bryan, P., 2007. Cost-effective rock-art recording in a production environment: is There a wider message?, CIPA XXI International Symposium, AntCIPAting the future of the Cultural past, Athens, [CD-ROM], ISSN 1682 1777.

Fryer, J.G., Chandler, J.H. and Cooper, M.A.R., 1994. On the Accuracy of Heighting from Aerial Photographs and Maps: Implications to Process Modellers, Earth Surface Processes and Landforms, 19: 577-583, ISSN 0197-9337.