Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

5-2024

Abstract

Cyber-physical systems and applications have fundamentally changed people and processes in the way they interact with the physical world, ushering in the fourth industrial revolution. Supported by a variety of sensors, hardware platforms, artificial intelligence and machine learning models, and systems frameworks, CPS applications aim to automate and ease the burden of repetitive, laborious, or unsafe tasks borne by humans. Machine visual perception, encompassing tasks such as object detection, object tracking and activity analysis, is a key technical enabler of such CPS applications. Efficient execution of such machine vision perception tasks on resource-constrained edge devices, especially in terms of ensuring both high fidelity and processing throughput, remains a formidable challenge. This is due to the continuing increase in resolution of sensor streams (e.g., video input streams generated by 4K/8K cameras and high-volume event streams generated by emerging neuromorphic event cameras) and the computational complexity of the Deep Neural Network (DNN) models that underpin such perception capabilities, which overwhelms edge platforms, adversely impacting machine perception efficiency. This challenge is even more severe when a perception pipeline operating on a single edge device must process multiple concurrent video streams for accurate sense-making of the physical world. Given the insufficiency of the available computation resources, a question then arises on whether parts of the perception task can be prioritized (and executed preferentially) to achieve highest task fidelity while adhering to the resource budget. This thesis introduces the paradigm of Canvas-based Processing and Criticality Awareness to tackle the challenge of multi-sensor machine perception pipelines on resource-constrained platforms. The proposed paradigm guides perception pipelines and systems on "what" to pay attention to in the sensing field and "when", across multiple camera streams, to significantly increase both perception fidelity under computational constraints and achievable system throughput on a single edge device. By creating spatial and temporal degrees of freedom for stimuli/regions of interest from their original video streams, such a perception pipeline can "pick and choose” which stimuli to ascribe more priority for preferential DNN inference over time, thereby reducing the total computational load. The thesis explores how such prioritized and selective processing, across multiple RGB and event sensor streams, needs to be designed to support both non-streaming and streaming perception tasks. With multiple strategies for fine-tuning such a perception pipeline for real-world deployment characteristics such as bandwidth constrained wireless networks, variable workloads at the edge, spatial overlap between cameras, this thesis demonstrates that it is possible to achieve multiplicative gains in processing throughput with no cost to DNN task accuracy, across multiple concurrent RGB and event camera streams at the resource-constrained edge. The proposed techniques are especially applicable for real-time multi-sensor machine perception tasks such as drone-based surveillance and multi-camera traffic analysis.

Keywords

Visual Machine Perception, Edge Computing, Canvas-based Processing, Criticality-Aware Processing, Edge Optimization, RGB Cameras, Neuromorphic Cameras, Event Cameras, Canvas-based Scheduling, Multi-Camera Systems

Degree Awarded

PhD in Computer Science

Discipline

Computer Sciences

Supervisor(s)

MISRA, Archan

First Page

1

Last Page

204

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Share

COinS