Edge computing's rise demands efficient compression strategies for deploying Machine Learning (ML) models on resource-constrained devices. As Artificial Intelligence (AI) shifts from cloud to edge, optimizing models across heterogeneous layers is crucial. Quantization reduces numerical precision, improving model size, inference speed and energy efficiency, key for edge deployments. However, its complexity limits accessibility. To address this, we propose Quantization-as-a-Service (QaaS), a serverless framework that automates model quantization for both cloud and edge environments. Built on OpenFaaS and Kubernetes, QaaS enables on-demand execution with dynamic resource orchestration, implementing Layer 5 of Edge Intelligence (EI). Our evaluation compares quantization performance on edge devices in terms of CPU usage and execution time when performed as a service versus locally. Results demonstrate that deploying quantization workflows using Function-as-a-Service (FaaS) not only maintains computational efficiency but also reduces CPU consumption compared to standalone execution, showcasing the potential of serverless solutions in EI.
A Serverless Quantization-as-a-Service Model to Run Compression Jobs for Edge Intelligence
De Novi, DannyPrimo
Software
;Dell'Acqua, PierluigiSecondo
Methodology
;Carnevale, LorenzoConceptualization
;Fazio, MariaPenultimo
Validation
;Villari, MassimoUltimo
Supervision
2025-01-01
Abstract
Edge computing's rise demands efficient compression strategies for deploying Machine Learning (ML) models on resource-constrained devices. As Artificial Intelligence (AI) shifts from cloud to edge, optimizing models across heterogeneous layers is crucial. Quantization reduces numerical precision, improving model size, inference speed and energy efficiency, key for edge deployments. However, its complexity limits accessibility. To address this, we propose Quantization-as-a-Service (QaaS), a serverless framework that automates model quantization for both cloud and edge environments. Built on OpenFaaS and Kubernetes, QaaS enables on-demand execution with dynamic resource orchestration, implementing Layer 5 of Edge Intelligence (EI). Our evaluation compares quantization performance on edge devices in terms of CPU usage and execution time when performed as a service versus locally. Results demonstrate that deploying quantization workflows using Function-as-a-Service (FaaS) not only maintains computational efficiency but also reduces CPU consumption compared to standalone execution, showcasing the potential of serverless solutions in EI.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


