Optimal Visual Representation Engineering and Learning for Computer Vision

Dong Jingming

www.escholarship.org

Optimal Visual Representation Engineering and Learning for Computer Vision

Abstract

Estimating the optimal representation from sensor data has been one of the most challenging problems in computer vision research. Given a particular task, an optimal representation should contain the right information for answering queries related to the task. To be specific, such a representation should be a sufficient statistics of the data that is invariant to nuisance factors irrelevant to the task yet affecting the data. Among all the sufficient statistics, we desire the minimal that costs the least in terms of complexity. In terms of invariance, we want to achieve the maximal so that nuisance will not affect the inference at test time. In the first part of the dissertation, we show that it is possible to build such an optimal local descriptor that is a minimal sufficient statistic of the data and is maximally invariant to certain nuisance variables in the problem of establishing feature correspondence. Given only one single image, such nuisance group is quite restricted as a single view does not afford the ability to distinguish the intrinsic properties of the scene from the extrinsics. This restriction is lifted once multiple views of the same underlying scene become available. A theoretical framework is proposed to compute an optimal multiple-view local representation with view-point change-induced domain deformation marginalized. In the second part, we investigate the nuisance management ability of deep neural networks in the context of image classification and show that an explicit sampling-based marginalization technique can improve its performance significantly. This is in line with the principle developed in the previous part. Finally, we build a real-time system to estimate a visual-inertial-semantic representation of the 3D scene from both imaging and inertial measurements. Evidence from the imaging and inertial measurements are causally aggregated into the final estimate in a Bayesian filtering framework. The geometric and semantic properties of the scene do not depend on the pose and motion of the camera, and are persistent over time.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Edit

Keywords

Add keywords

Reprint years

My notes

Analytics

Added to PP
2017-06-07

Downloads
1 (#1,903,274)

6 months
1 (#1,475,652)

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Optimal Visual Representation Engineering and Learning for Computer Vision

Abstract

Categories

Keywords

Reprint years

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work