Development of the components of a low cost, distributed facial virtual conferencing system

Panago ., Soterios (2000) Development of the components of a low cost, distributed facial virtual conferencing system. Masters thesis, Rhodes University.




This thesis investigates the development of a low cost, component based facial virtual conferencing system. The design is decomposed into an encoding phase and a decoding phase, which communicate with each other via a network connection. The encoding phase is composed of three components: model acquisition (which handles avatar generation), pose estimation and expression analysis. Audio is not considered part of the encoding and decoding process, and as such is not evaluated. The model acquisition component is implemented using a visual hull reconstruction algorithm that is able to reconstruct real-world objects using only sets of images of the object as input. The object to be reconstructed is assumed to lie in a bounding volume of voxels. The reconstruction process involves the following stages: - Space carving for basic shape extraction; - Isosurface extraction to remove voxels not part of the surface of the reconstruction; - Mesh connection to generate a closed, connected polyhedral mesh; - Texture generation. Texturing is achieved by Gouraud shading the reconstruction with a vertex colour map; - Mesh decimation to simplify the object. The original algorithm has complexity O(n), but suffers from an inability to reconstruct concave surfaces that do not form part of the visual hull of the object. A novel extension to this algorithm based on Normalised Cross Correlation (NCC) is proposed to overcome this problem. An extension to speed up traditional NCC evaluations is proposed which reduces the NCC search space from a 2D search problem down to a single evaluation. Pose estimation and expression analysis are performed by tracking six fiducial points on the face of a subject. A tracking algorithm is developed that uses Normalised Cross Correlation to facilitate robust tracking that is invariant to changing lighting conditions, rotations and scaling. Pose estimation involves the recovery of the head position and orientation through the tracking of the triangle formed by the subject’s eyebrows and nose tip. A rule-based evaluation of points that are tracked around the subject’s mouth forms the basis of the expression analysis. A user assisted feedback loop and caching mechanism is used to overcome tracking errors due to fast motion or occlusions. The NCC tracker is shown to achieve a tracking performance of 10 fps when tracking the six fiducial points. The decoding phase is divided into 3 tasks, namely: avatar movement, expression generation and expression management. Avatar movement is implemented using the base VR system. Expression generation is facilitated using a Vertex Interpolation Deformation method. A weighting system is proposed for expression management. Its function is to gradually transform from one expression to the next. The use of the vertex interpolation method allows real-time deformations of the avatar representation, achieving 16 fps when applied to a model consisting of 7500 vertices. An Expression Parameter Lookup Table (EPLT) facilitates an independent mapping between the two phases. It defines a list of generic expressions that are known to the system and associates an Expression ID with each one. For each generic expression, it relates the expression analysis rules for any subject with the expression generation parameters for any avatar model. The result is that facial expression replication between any subject and avatar combination can be performed by transferring only the Expression ID from the encoder application to the decoder application. The ideas developed in the thesis are demonstrated in an implementation using the CoRgi Virtual Reality system. It is shown that the virtual-conferencing application based on this design requires only a bandwidth of 2 Kbps.

Item Type:Thesis (Masters)
Uncontrolled Keywords:Virtual reality, Computer conferencing, Three dimensional graphics, Reconstruction, Scene Analysis Tracking, Shape, Time-Varying Imagery, Videoconferencing, Coding and Information Theory
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Faculty > Faculty of Science > Computer Science
ID Code:2295
Deposited By: Mrs Carol Perold
Deposited On:01 Dec 2011 09:24
Last Modified:06 Jan 2012 16:22
7 full-text download(s) since 01 Dec 2011 09:24
7 full-text download(s) in the past 12 months
More statistics...

Repository Staff Only: item control page