Publication Type
PhD Dissertation
Version
publishedVersion
Publication Date
7-2020
Abstract
Despite achieving significant progresses, most existing detectors are designed to detect objects in academic contexts but consider little in real-world scenarios. In real-world applications, the scale variance of objects can be significantly higher than objects in academic contexts; In addition, existing methods are designed for achieving localization with relatively low precision, however more precise localization is demanded in real-world scenarios; Existing methods are optimized with huge amount of annotated data, but in certain real-world scenarios, only a few samples are available. In this dissertation, we aim to explore novel techniques to address these research challenges to make object detection algorithms practical for real-world applications.
The first problem is scale-invariant detection. Detecting objects with multiple scales is covered in existing detection benchmarks. However, in real-world applications the scale variance of objects is extremely high and thus it requires more discriminative features. Face detection is a suitable benchmark to evaluate scale-invariant detection due to the vastly different scales of faces. In this dissertation, we propose a novel framework of ``Feature Agglomeration Networks" (FAN) to build a new single stage face detector. A novel feature agglomeration block is proposed to enhance low-level feature representation and the model is optimized in a hierarchical manner. FAN achieved state-of-the-art results in real world face detection benchmarks with real-time inference speed.
The second problem is high-quality detection. This challenge requires detectors to predict more precise localization. In this dissertation, we propose two novel detection frameworks for high-quality detection: ``Bidirectional Pyramid Networks'' (BPN) and ``KPNet''. In BPN, a Bidirectional Feature Pyramid structure is proposed for robust feature representations, and a Cascade Anchor Refinement is proposed to gradually refine the quality of pre-designed anchors. To eliminate the initial anchor design step in BPN, KPNet is proposed which automatically learns to optimize a dynamic set of high-quality keypoints without heuristic anchor design. Both BPN and KPNet show significant improvement over existing on MSCOCO dataset, especially in high quality detection settings.
The third problem is few-shot detection, where only a few training samples are available.
Inspired by the principle of meta-learning methods, we propose two novel meta-learning based few-shot detectors: ``Meta-RCNN" and ``Meta Constrastive Detector'' (MCD). Meta-RCNN learns an binary object detector in an episodic learning paradigm on the training data with a class-aware attention module, and it can be end-to-end meta-optimized. Based on Meta-RCNN, MCD follows the principle of contrastive learning to enhance the feature representation for few-shot detection, and a new hard negative sampling strategy is proposed to address imbalance of training samples. We demonstrate the effectiveness of Meta-RCNN and MCD in few-shot detection on Pascal VOC dataset and obtain promising results.
The proposed techniques address the problems discussed and show significant improvement on real-world utility.
Keywords
Deep Learning, Deep Convolutional Neural Networks, Object Detection
Degree Awarded
PhD in Information Systems
Discipline
Databases and Information Systems | Data Storage Systems
Supervisor(s)
HOI, Chu Hong
First Page
1
Last Page
195
Publisher
Singapore Management University
City or Country
Singapore
Citation
WU, Xiongwei.
Deep learning for real-world object detection. (2020). 1-195.
Available at: https://ink.library.smu.edu.sg/etd_coll/300
Copyright Owner and License
Author
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.