Sampling Algorithms to Handle Nuisances in Large-Scale Recognition

Abstract

Convolutional neural networks have risen to be the de-facto paragon for detecting the presence of objects in a scene, as portrayed by an image. CNNs are described as being "approximately invariant" to nuisance transformations such as planar translation, both by virtue of their convolutional architecture and by virtue of their approximation properties that, given sufficient parameters and training data, could in principle yield discriminants that are insensitive to nuisance transformations of the data. The fact that contemporary deep convolutional architectures appear very effective in classifying images as containing a given object regardless of its position, scale, and aspect ratio in large-scale benchmarks suggests that the network can effectively manage such nuisance variability. We conduct an empirical study and show that, contrary to popular belief, at the current level of complexity of convolutional architectures and scale of the data sets used to train them, CNNs are not very effective at marginalizing nuisance variability.This discovery leaves researchers the choice of investing more effort in the design of models that are less sensitive to nuisances or designing better region proposal algorithms in an effort to predict where the objects of interest lie and center the model around these regions. In this thesis steps towards both directions are made. First, we introduce DSP-CNN, which deploys domain-size pooling in order to transform the neural networks to be scale invariant in the convolutional operator level. Second, motivated by our empirical analysis, we propose novel sampling and pruning techniques for region proposal schemes that improve the end-to-end performance in large-scale classification, detection and wide-baseline correspondence to state-of-the-art levels. Additionally,since a proposal algorithm involves the design of a classifier, whose results are to be fed to another classifier, it seems natural to leverage on the latter to design the former. Thus, we introduce a method that leverages on filters learned in the lower layers of CNNs to design a binary boosting classifier for generating class-agnostic proposals. Finally, we extend sampling over time by designing a temporal, hard-attention layer which is trained with reinforcement learning, with application in video sequences for person re-identification.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 91,503

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Why Build a Virtual Brain? Large-Scale Neural Simulations as Jump Start for Cognitive Computing.Matteo Colombo - 2016 - Journal of Experimental and Theoretical Artificial Intelligence.
Why Build a Virtual Brain? Large-scale Neural Simulations as Test-bed for Artificial Computing Systems.Matteo Colombo - 2015 - In D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings & P. P. Maglio (eds.), Proceedings of the 37th Annual Conference of the Cognitive Science Society. Cognitive Science Society. pp. 429-434.
Explaining large-scale historical change.Daniel Little - 2000 - Philosophy of the Social Sciences 30 (1):89-112.
Nonrobustness in one-sample Z and t tests: A large-scale sampling study.James V. Bradley - 1980 - Bulletin of the Psychonomic Society 15 (1):29-32.
Quantum computing.Amit Hagar & Michael Cuffaro - 2019 - Stanford Encyclopedia of Philosophy.

Analytics

Added to PP
2017-06-07

Downloads
1 (#1,897,693)

6 months
1 (#1,472,167)

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references