Publication Type
Working Paper
Version
publishedVersion
Publication Date
6-2010
Abstract
This paper reports on a preliminary study to assess the feasibility of using the Open Cirrus Cloud Computing Research testbed to provide offline and online analytical support for taxi fleet operations. In the study, we benchmarked the performance gains from distributing the offline analysis of GPS location traces over multiple virtual machines using the Apache Hadoop implementation of the MapReduce paradigm. We also explored the use of the Apache Cassandra distributed database system for online retrieval of vehicle trace data. While configuring the testbed infrastructure was straightforward, we encountered severe I/O bottlenecks in running the benchmarks due to the lack of local disk storage on the compute nodes. This design limitation severely impedes the analysis of large data sets using cloud computing technologies.
Keywords
taxi fleet management, GPS data, cloud computing, Apache Hadoop
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Information Systems and Management
First Page
1
Last Page
11
Publisher
SMU School of Information Systems Technical Paper Series
City or Country
Singapore
Embargo Period
3-28-2022
Citation
KOH, Alvin Jun Yong; NGUYEN, Xuan Khoa; and WOODARD, C. Jason.
Using Hadoop and Cassandra for taxi data analytics: A feasibility study. (2010). 1-11.
Available at: https://ink.library.smu.edu.sg/sis_research/7045
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.460.7828
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons