Publication Type
Journal Article
Version
acceptedVersion
Publication Date
9-2016
Abstract
In large-scale distributed file systems, efficient metadata operations are critical since most file operations have to interact with metadata servers first. In existing distributed hash table (DHT) based metadata management systems, the lookup service could be a performance bottleneck due to its significant CPU overhead. Our investigations showed that the lookup service could reduce system throughput by up to 70%, and increase system latency by a factor of up to 8 compared to ideal scenarios. In this paper, we present MetaFlow, a scalable metadata lookup service utilizing software-defined networking (SDN) techniques to distribute lookup workload over network components. MetaFlow tackles the lookup bottleneck problem by leveraging B-tree, which is constructed over the physical topology, to manage flow tables for SDN-enabled switches. Therefore, metadata requests can be forwarded to appropriate servers using only switches. Extensive performance evaluations in both simulations and testbed showed that MetaFlow increases system throughput by a factor of up to 3.2, and reduce system latency by a factor of up to 5 compared to DHT-based systems. We also deployed MetaFlow in a distributed file system, and demonstrated significant performance improvement.
Keywords
Metadata Management, Software-Defined Networking, B-tree, Big Data
Discipline
Databases and Information Systems | Data Storage Systems | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
IEEE Transactions on Big Data
Volume
4
Issue
2
First Page
203
Last Page
216
Identifier
10.1109/TBDATA.2016.2612241
Publisher
Institute of Electrical and Electronics Engineers
Citation
SUN, Peng; WEN, Yonggang; TA, Nguyen Binh Duong; and XIE, Haiyong.
Metaflow: a scalable metadata lookup service for distributed file systems in data centers. (2016). IEEE Transactions on Big Data. 4, (2), 203-216.
Available at: https://ink.library.smu.edu.sg/sis_research/4767
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TBDATA.2016.2612241
Included in
Databases and Information Systems Commons, Data Storage Systems Commons, Software Engineering Commons