Edge computing's rise demands efficient compression strategies for deploying Machine Learning (ML) models on resource-constrained devices. As Artificial Intelligence (AI) shifts from cloud to edge, optimizing models across heterogeneous layers is crucial. Quantization reduces numerical precision, improving model size, inference speed and energy efficiency, key for edge deployments. However, its complexity limits accessibility. To address this, we propose Quantization-as-a-Service (QaaS), a serverless framework that automates model quantization for both cloud and edge environments. Built on OpenFaaS and Kubernetes, QaaS enables on-demand execution with dynamic resource orchestration, implementing Layer 5 of Edge Intelligence (EI). Our evaluation compares quantization performance on edge devices in terms of CPU usage and execution time when performed as a service versus locally. Results demonstrate that deploying quantization workflows using Function-as-a-Service (FaaS) not only maintains computational efficiency but also reduces CPU consumption compared to standalone execution, showcasing the potential of serverless solutions in EI.

A Serverless Quantization-as-a-Service Model to Run Compression Jobs for Edge Intelligence

De Novi, Danny
Primo
Software
;
Dell'Acqua, Pierluigi
Secondo
Methodology
;
Carnevale, Lorenzo
Conceptualization
;
Fazio, Maria
Penultimo
Validation
;
Villari, Massimo
Ultimo
Supervision
2025-01-01

Abstract

Edge computing's rise demands efficient compression strategies for deploying Machine Learning (ML) models on resource-constrained devices. As Artificial Intelligence (AI) shifts from cloud to edge, optimizing models across heterogeneous layers is crucial. Quantization reduces numerical precision, improving model size, inference speed and energy efficiency, key for edge deployments. However, its complexity limits accessibility. To address this, we propose Quantization-as-a-Service (QaaS), a serverless framework that automates model quantization for both cloud and edge environments. Built on OpenFaaS and Kubernetes, QaaS enables on-demand execution with dynamic resource orchestration, implementing Layer 5 of Edge Intelligence (EI). Our evaluation compares quantization performance on edge devices in terms of CPU usage and execution time when performed as a service versus locally. Results demonstrate that deploying quantization workflows using Function-as-a-Service (FaaS) not only maintains computational efficiency but also reduces CPU consumption compared to standalone execution, showcasing the potential of serverless solutions in EI.
2025
Inglese
2025 IEEE Symposium on Computers and Communications (ISCC)
Institute of Electrical and Electronics Engineers Inc.
New York
STATI UNITI D'AMERICA
ELETTRONICO
no
1
7
7
30th IEEE Symposium on Computers and Communications, ISCC 2025
ita
2025
Internazionale
Edge Intelligence; FaaS; Quantization; Serverless
no
none
De Novi, Danny; Dell'Acqua, Pierluigi; Carnevale, Lorenzo; Fazio, Maria; Villari, Massimo
5
14.d Contributo in Atti di Convegno::14.d.3 Contributi in extenso in Atti di convegno
273
info:eu-repo/semantics/conferenceObject
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3353929
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact