Publication Type
Journal Article
Version
acceptedVersion
Publication Date
9-2025
Abstract
Deploying Large Language Models (LLMs) on edge devices presents significant challenges due to computational constraints, memory limitations, inference speed, and energy consumption. Model quantization has emerged as a key technique to enable efficient LLM inference by reducing model size and computational overhead. In this study, we conduct a comprehensive analysis of 28 quantized LLMs from the Ollama library, which applies by default Post-Training Quantization (PTQ) and weight-only quantization techniques, deployed on an edge device (Raspberry Pi 4 with 4GB RAM). We evaluate energy efficiency, inference performance, and output accuracy across multiple quantization levels and task types. Models are benchmarked on five standardized datasets (CommonsenseQA, BIG-Bench Hard, TruthfulQA, GSM8K, and HumanEval), and we employ a high-resolution, hardware-based energy measurement tool to capture real-world power consumption. Our findings reveal the trade-offs between energy efficiency, inference speed, and accuracy in different quantization settings, highlighting configurations that optimize LLM deployment for resource-constrained environments. By integrating hardware-level energy profiling with LLM benchmarking, this study provides actionable insights for sustainable AI, bridging a critical gap in existing research on energy-aware LLM deployment.
Discipline
Software Engineering
Research Areas
Cybersecurity; Software and Cyber-Physical Systems
Areas of Excellence
Sustainability
Publication
ACM Transactions on Internet of Things
First Page
1
Last Page
34
ISSN
2691-1914
Identifier
10.1145/3767742
Publisher
Association for Computing Machinery (ACM)
Citation
HUSOM, Erik Johanne; GOKNIL, Arda; ASTEKIN, Merve; SHAR, Lwin Khin; KASEN, Andre; SEN, Sagar; MITHASSEL, Benedikt Andreas; and SOYLU, Ahmet.
Sustainable LLM inference for edge AI: Evaluating quantized LLMs for energy efficiency, output accuracy, and inference latency. (2025). ACM Transactions on Internet of Things. 1-34.
Available at: https://ink.library.smu.edu.sg/sis_research/10489
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3767742