CLOSER 2026 Abstracts

Area 1 - Cloud AI

Full Papers

Paper Nr:	34
Title:	Adaptive Compression of Clinical Signals: A Hybrid TinyML-Federated Learning Approach in Edge-Fog Environments
Authors:	João Medeiros Dinis, Jean Schmith, Diego Luis Kreutz, Guilherme Galante, Dalvan Griebler and Rodrigo da Rosa Righi
Abstract:	Continuous monitoring of vital signs with wearable devices (IoT) promises to reduce costs and extend remote care but faces network, hardware, and privacy constraints. Current approaches adopt fixed compression or cloud-centric pipelines and, while useful, degrade the rate–fidelity trade-off or require the transit of sensitive data. In view of this, this work proposes the COLA – Compressive Layers model, an architecture that integrates Edge and Fog layers to optimize clinical data flows while ensuring data privacy. The model introduces a novel per-window adaptive compression with TinyML at the edge (lightweight autoencoder + hysteresis-controlled quantization + zlib) and coordination via Federated Learning in the Fog, maintaining native HL7 FHIR interoperability. The pipeline publishes telemetry and compressed payloads via MQTT and materializes resources in the HAPI FHIR standard. Finally, the proposed objective was achieved: adaptive compression with TinyAE+zlib at the edge, interoperable and auditable, which increases the effective Compression Ratio (CReffective\|\| ) to 16.67× while maintaining signal quality and respecting the computational budget with a relative cost of 1.08 ms/CR (processing time per compression unit). The Edge→Fog→FHIR design proved suitable for continuous operation and open to federated evolution.
Download

Paper Nr:	43
Title:	MoE-Pipe: A Pipelined MoE Model Loading Framework for Reducing the Cold Start Delays in Serverless Inference
Authors:	Zhan-Wei Wu, Chih-Tai Tsai, Sao-Hsuan Lin, Yi-Syuan Ke and Jerry Chou
Abstract:	The Mixture-of-Experts (MoE) architecture enables the scaling of Large Language Models (LLMs) with high computational efficiency by sparsely activating a fraction of the model’s weights during inference. However, the substantial memory footprint of MoE models presents a critical challenge in serverless environments, leading to prohibitive cold-start latencies due to the extensive time required for model loading. To mitigate this bottleneck, we propose MoE-Pipe, a framework that significantly reduces cold-start latency by leveraging the inherent sparse activation property of MoE models. MoE-Pipe introduces a methodology to overlap the loading of expert weights with the ongoing model computation. Instead of a sequential load-then-compute process, our approach initiates inference and weight transfer simultaneously, orchestrated through a multi-level strategy of CPU-GPU, inter-GPU, and intra-GPU overlapping. Our experiments demonstrate that MoE-Pipe successfully hides approximately 65% of the loading time within the computation phase, achieving up to 1.85× improvement in end-to-end latency and 2× improvement in the time to first token latency. To facilitate further research and reproducibility, the source code of MoE-Pipe is publicly available at https://github.com/johnson684/MoE- Pipe.
Download

Short Papers

Paper Nr:	41
Title:	HybridServe: Stall-Free Distributed Disaggregated LLM Inference with Hybrid KVCache Buffering
Authors:	Yi-Syuan Ke, Zhan-Wei Wu, Chih-Tai Tsai, Sao-Hsuan Lin and Jerry Chou
Abstract:	Disaggregated LLM inference, which separates compute-bound prefill and memory-bound decoding, is essential for maximizing cloud resource efficiency and ensuring responsive services. However, in distributed cloud environments, this paradigm suffers from critical system-level inhibitors. We analyze dominant communication patterns–Push-mode and Pull-mode–and identify two fundamental bottlenecks: saturated VRAM buffers that stall prefill computation, leading to resource underutilization, and significant cross-node communication overhead. We propose HybridServe, a novel system featuring a hybrid KVCache buffering strategy. HybridServe utilizes the decode worker’s high-capacity CPU memory as an elastic secondary buffer to eliminate stalls and leverages decode-side memory locality to minimize inter-node latency. Our evaluation demonstrates that HybridServe effectively mitigates buffer constraints, achieving up to 1.76× TTFT speedup and 1.57× throughput improvement over the baseline disaggregated push/pull modes.
Download

Paper Nr:	42
Title:	DARO: An Auction-Based Multi-Agent Reinforcement Learning Framework for Task Scheduling in the Cloud Continuum
Authors:	Syed Mafooq Ul Hassan, Marios Touloupou, Jacopo Castellini, Pablo Strasser, Alexandros Kalousis and Herodotos Herodotou
Abstract:	The increasing heterogeneity and dynamism of cloud–edge-IoT infrastructures demand scalable, intelligent scheduling strategies that go beyond traditional centralized approaches. This paper presents DARO, a distributed, asynchronous scheduling framework that leverages multi-agent reinforcement learning (MARL) to enable autonomous, cooperative task allocation across the cloud continuum. DARO integrates natively with Kubernetes, introducing a decentralized, auction-based mechanism in which node-local agents submit bids for incoming tasks based on partial observations and a learned policy. The agents are trained using a value decomposition method (QMIX) under the centralized training-decentralized execution paradigm. We formalize the problem as a decentralized partially observable Markov decision process (Dec-POMDP), design a multi-factor reward function to guide learning, and implement the system within a high-fidelity Kubernetes simulator. The experimental evaluation demonstrates that the agents effectively learn balanced and resource-efficient task placement strategies. DARO demonstrates strong potential to serve as a robust scheduling layer for dynamic and large-scale distributed environments.
Download

Paper Nr:	44
Title:	NIKA: Optimal KV Cache Transfer for Minimizing the Latency of Disaggregated LLM Inference
Authors:	Chih Tai Tsai, Zhan-Wei Wu, Yi-Syuan Ke, Sao-Hsuan Lin and Jerry Chou
Abstract:	In disaggregated Large Language Model (LLM) inference systems, the transfer of the KV cache between the prefill and decode stages often becomes a significant latency bottleneck, increasing the Time to First Token (TTFT). In cloud environments, where inference components are often distributed across nodes to optimize resource costs, this issue is further exacerbated by limited inter-node network bandwidth. This paper proposes an optimal KV cache transfer method that mitigates this bottleneck by utilizing the residual capacity of the decode GPU to minimize TTFT without sacrificing throughput. We introduce an analytical performance model to determine the optimal ratio of KV cache to transfer versus recompute, based on profiled compute resources and network conditions. Experimental results demonstrate that our approach achieves a reduction of up to 54% in TTFT. Importantly, these latency gains are achieved without compromising system throughput, ensuring high resource utilization in disaggregated inference environments.
Download

Paper Nr:	12
Title:	HERS: A Gender-Specific Logistic Regression Model for Stress Recognition Using Vital Signs
Authors:	Eduarda Pinheiro, Fausto Neri da Silva Vanin, Cristiano André da Costa and Rodrigo da Rosa Righi
Abstract:	The analysis of health-related data enables continuous progress toward new discoveries in this field. Consequently, there arises a need to understand the particularities of each gender in order to avoid misleading generalizations. Previous studies have aimed to optimize the collection of vital signs, especially in hospital settings, and to develop low-cost devices that could be employed in contexts where such a need existed. Other research has sought to classify stress and predict the vital signs of the population, but these studies adopted a more generalist approach, without ensuring that gender-specific characteristics were taken into account. In this regard, the present work introduces the HERS model, which stands out from others by focusing on the analysis of vital signs for predicting stress in the female population. It also compares the correlation between men’s and women’s vital signs, with the aim of highlighting possible differences between the sexes . The model’s results proved encouraging in this respect and reinforce the importance of deepening the understanding of gender-specific particularities. In addition, the model was able to satisfactorily predict cases of high stress, with a classification error rate of 3.6%. Body temperature was identified as the most influential vital sign, contributing negatively to the probability of high stress in women.
Download

Paper Nr:	49
Title:	Predictive Modeling of Cloud Resources Using EHO-ANN Based Ensemble Technique
Authors:	Manik Chandra Pandey and Pradeep Singh Rawat
Abstract:	Efficient Virtual Machine (VM) provisioning in a cloud computing environment requires accurate resource prediction to optimize performance and minimize costs. This study presents an ensemble approach that integrates elephant herding optimization (EHO) using artificial neural networks (ANN) which improves the predictive accuracy for CPU, memory, and network bandwidth utilization. The proposed EHO-ANN ensemble model for the prediction of cloud resources has been evaluated using the Google cluster data set. The experimental results of proposed EHO-ANN model are compared against conventional ensemble approaches (standard ANN, GA-ANN, PSO-ANN, STO-ANN and SHO-ANN). The comparison is performed for CPU utilization, memory consumption, network bandwidth, SLA violation and total energy usage using performance matrices MAE, MSE, RMSE and MAPE. The result shows that the EHO-ANN performs superior against conventional hybrid models in both accuracy and computational efficiency. The experimental result shows that the EHO-ANN intelligently improves resource provisioning strategies in the cloud computing environment.
Download

Area 2 - Cloud Computing Paradigms and Practices

Full Papers

Paper Nr:	31
Title:	Learning-Based Optimization of IoT Data Replication in Cloud Environments
Authors:	Younes Jahandideh, Ziad Kobti and Ning Zhang
Abstract:	The rapid growth of the Internet of Things (IoT) has introduced significant challenges in managing large volumes of data across cloud infrastructures. While data replication improves availability and reduces access latency, determining optimal replica placement is an NP-hard problem due to dynamic workloads, network heterogeneity, and resource constraints. Traditional metaheuristic approaches, such as the Harmony Search Algorithm (HSA), provide partial solutions but often lack adaptability and scalability. Although Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) have been applied to cloud computing tasks, their adoption for dynamic, data-aware replication remains limited. This paper presents a learning-based framework that applies RL and DRL to optimize data replication across distributed mini-clouds by learning adaptive placement strategies through environmental interaction. The framework incorporates system-specific parameters, including gateway delays, access frequency, and cloud capacities, to support adaptive decision-making. Q-Learning is used for smaller-scale environments, while Deep Q-Networks (DQNs) address larger and more complex scenarios via neural function approximation. Simulation results demonstrate that RL and DRL outperform traditional algorithms, including HSA, Forest Optimization Algorithm (FOA), Genetic Algorithm (GA), and Random Algorithm (RA), across cost efficiency, response time, energy consumption, and user satisfaction, confirming the effectiveness of learning-based replica placement for scalable IoT-cloud infrastructures.
Download

Paper Nr:	40
Title:	SATORU: Proactive Length-Aware Scheduling for High-Throughput Batch LLM Serving
Authors:	Gideon Levi and Jerry Chou
Abstract:	Large language models (LLMs) are increasingly used for throughput-oriented workloads such as bulk summarization and translation, where a cluster of GPU instances must make scheduling decisions within each instance and routing decisions across instances. Current serving systems make these decisions reactively based on instantaneous signals such as free memory or queue length. As a result, they often respond only after memory pressure has already built up, leading to request preemption, and wasted decode tokens, and can also imbalance work across instances while missing prefix-reuse opportunities. We present SATORU, a proactive length-aware framework that predicts request output lengths to estimate future memory growth and remaining decoding work to maximize throughput. Within an instance, SATORU performs a lightweight lookahead simulation of the key-value (KV) cache, the per-request memory state maintained during generation, to reduce avoidable preemption and wasted decode tokens. Across instances, SATORU routes requests by predicted remaining work while co-locating requests that share prompt prefixes to improve reuse. Built on a vLLM-based stack, SATORU improves single-instance net token throughput by 13.5% on average (up to 26.9%) and achieves 3.3× average cluster-wide net token throughput (up to 4.3×) over standard baselines, with under 6% scheduling overhead.
Download

Short Papers

Paper Nr:	51
Title:	GPU-Aware Scheduling and Dynamic Resource Orchestration in Heterogeneous Edge-Cloud Environments
Authors:	Angelo Marchese, Damiano Samperi and Orazio Tomarchio
Abstract:	The expansion of cloud services and the increasing demand for hardware-accelerated applications have led to a significant surge in GPU resource requirements. To reduce cloud latency and bandwidth consumption, offloading these workloads to the network edge is essential. However, orchestrating GPU-intensive tasks in heterogeneous edge environments remains a significant challenge, as standard container orchestrators like Kubernetes exhibit a native blindness toward hardware accelerators. In this paper, we propose a custom scheduler that extends the default Kubernetes scheduler behavior through different plugins designed for heterogeneous environments. By integrating GPU status and architectural awareness, our scheduler optimizes resource utilization, as demonstrated through experimental evaluation with a microservices-based workload.
Download

Paper Nr:	52
Title:	TrustEdge: Failure-Aware Orchestration for Edge Application Provisioning
Authors:	Marcos Paulo Konzen, Paulo Silas Severo de Souza, Fábio Diniz Rossi and Júlio Carlos Balzano de Mattos
Abstract:	Edge computing enables ultra-low-latency services, yet edge servers fail far more often than data center hosts, making reliable application provisioning a major challenge. Existing approaches treat failure prediction, service placement, and provisioning efficiency as independent problems, limiting simultaneous gains in availability and SLA compliance. This paper presents TrustEdge, a failure-aware orchestration architecture that tightly couples conditional Weibull-based failure prediction with multi-horizon triggers, a hierarchical multi-criteria placement policy, and failure-aware peer-to-peer container-layer provisioning within a unified orchestration loop. Experiments with Weibull-based failure traces and real Docker Hub images show that TrustEdge reduces perceived downtime by up to 66.9% and SLA violations by up to 93.7% over Kubernetes baselines. Results further show that prediction alone reduces downtime by 41.8% but yields only 1.9% SLA improvement, confirming that predictive knowledge must be embedded in reliability-aware orchestration to translate forecast precision into operational QoS gains.
Download

Paper Nr:	18
Title:	A Survey on Cloud’s Comparative Impacts on Large Enterprises and SMEs in the US: Opportunities, Challenges, and Impacts
Authors:	Berkay Kaplan
Abstract:	The cloud is transformative, but it has different effects across businesses. For instance, the IT department’s status and goals can differ dramatically across business sizes, from SMEs to large companies. Previous research separately examined these effects in SMEs and large companies, but, to the best of our knowledge, it left a critical gap in comparative analysis between the two. This paper fills that gap and provides a structured, side-by-side comparative analysis of cloud adoption in SMEs and large enterprises in the US, identifying where their strategic motivations, constraints, and outcomes diverge by synthesizing industry reports, including the 2023-2024 Flexera ”State of the Cloud” reports and the 2024 Cloud Industry Forum research, with established academic literature. This paper revealed that all companies share concerns about costs and security, but their strategic realities diverge sharply by size. For SMEs, the cloud is a great equalizer, providing access to the same advanced technologies as their larger competitors, but unpredictable costs and excessive reliance on the CEO’s decisions are issues they encounter. Large enterprises can prefer private cloud but struggle with migrating large legacy systems, managing complex multi-cloud environments, and needing highly specialized personnel.
Download

Paper Nr:	47
Title:	A Data-Driven Approach to Creating Cloudlet Datasets for Cloud Task Scheduling
Authors:	Ons Mejouli, Andrea Kő and Tibor Kovács
Abstract:	Efficient task scheduling and resource allocation remain a central challenge in cloud computing, particularly as workloads continue to grow in scale, exacerbated with heterogeneity, and operational complexity. Simulation frameworks such as CloudSim are widely used to study scheduling and resource management approaches.However, existing approaches often rely either on purely synthetic workloads with limited realism or on real-world workloads traces that are proprietary, outdated, or difficult to reuse and adapt. This paper addresses this limitation by proposing a methodology for constructing synthetic cloudlet datasets for CloudSim, based on workloads models derived from real-world execution traces collected via Microsoft Fabric within a commercial cloud environment. The collected traces capture actual workloads to extract key workload characteristics rather than being replayed directly. The resulting synthetic datasets model would be a combination of recurring and one off tasks, reflecting operational patterns commonly observed in practice, where scheduling and resource allocation decisions must balance performance, utilization, and downstream objectives such as energy efficiency. Given the difficulty of accurately estimating resource requirements prior to execution, particular emphasis is placed on recurring tasks whose resource usage can be observed across executions and used to parameterize synthetic workload generation. We argue that this approach would enables more realistic, reproducible, and practically relevant evaluation of scheduling and resource allocation approaches in cloud environments. The main contributions of this position paper are twofold: (i) Proposing a framework to create realistic synthetic cloudlet dataset for CloudSim derived from Microsoft Fabric execution traces, and (ii) the formulation of empirically grounded design requirements for constructing reproducible and commercially relevant cloudlet datasets.
Download

Paper Nr:	60
Title:	Performance Benchmarking for Data Stream Processing on Constrained Edge Devices
Authors:	Margherita Codemo and Claus Pahl
Abstract:	The adoption of Internet of Things (IoT) systems has created a demand for real-time data processing closer to data sources. Edge computing addresses latency and bandwidth limitations of cloud-centric architectures, but constrained hardware raises questions about the feasibility of deploying distributed streaming platforms at the edge. This investigation explores the performance limits of a containerized Apache Kafka–based data streaming architecture deployed on a Raspberry Pi and compares it with a standard x86 desktop environment. A hypothesis-driven experimental evaluation measured a range of concerns throughput, latency percentiles, CPU utilization, classification accuracy, and system stability under increasing load and stress conditions. Results show that the Raspberry Pi can sustain stable real-time processing up to defined throughput thresholds. Kafka itself does not constitute the primary bottleneck; instead, CPU saturation and data processing limitations define system limits. We demonstrate the limits of lightweight streaming architectures on constrained edge devices and provide practical benchmarking data for real-time IoT deployment planning.
Download

Area 3 - Cloud Security and Privacy

Full Papers

Paper Nr:	14
Title:	Orama++: Extending Serverless Benchmarking with Tiobe and Halstead Metrics for Improved Performance Prediction
Authors:	Leonardo Rebouças de Carvalho, Geraldo Pereira Rocha Filho and Aleteia Araujo
Abstract:	The unpredictability of execution time and cost in Function-as-a-Service (FaaS) environments remains a critical challenge for serverless development. This paper introduces Orama++, an evolution of the Orama benchmarking framework that incorporates a robust Machine Learning-based performance prediction system, focusing on forecasting execution time and operational costs. The study investigates the impact of different static code complexity metrics (Tiobe, Halstead, and their combinations) on the predictive power of neural network models. The effectiveness of Fully Connected Neural Networks (Dense), Long Short-Term Memory (LSTM), and Bidirectional Long Short-Term Memory (BLSTM) architectures was evaluated. The findings demonstrate that, across all evaluated scenarios, BLSTM models consistently outperformed both the Dense and LSTM architectures. In terms of accuracy, measured by the coefficient of determination (R2), the notable results obtained with the BLSTM model were: the input scenario with Tiobe metrics achieved an average R2 of 86%; the combined scenario reached 91%; and the scenario using Halstead metrics demonstrated the best accuracy, with 92%. Due to its superior prediction quality, the BLSTM model trained with Halstead metrics was selected and designated as the new core performance predictor for the Orama++ framework. These results validate Orama++ as an essential tool for resource and cost predictability, enabling informed decisions regarding FaaS function optimization and deployment.
Download

Short Papers

Paper Nr:	13
Title:	The Emergence of the Self-Governing Cloud: A Synthesis of Generative AI, Autonomous Agents, and Zero-Trust Architectures
Authors:	Shreeyash Patil, Kapil Tajane and Rahul Pitale
Abstract:	Modern cloud infrastructure has grown too complex for traditional automation paradigms. This position paper argues that the industry is on the cusp of a paradigm shift toward the “Self-Governing Cloud,” driven by three synergistic pillars: Generative AI for intent-to-code synthesis, autonomous multi-agent systems for real-time operational management, and Zero-Trust Architecture as the foundational security fabric. Synthesizing literature across Infrastructure as Code (IaC) security, AI-driven cloud operations, and Zero-Trust enforcement, we propose a novel four-plane architectural framework for autonomous cloud governance. We outline the key grand challenges including agent alignment and algorithmic explainability-and define a concrete research agenda for the next generation of cloud ecosystems that are self-creating, self-governing, and self-securing by design.
Download

Paper Nr:	27
Title:	Self-Hosted High-Availability IAM System: A Reproducible Architecture
Authors:	Shahadat Hossain, Nuno G. Rodrigues and Rui P. Lopes
Abstract:	Enterprise identity and access management (IAM) systems require high availability (HA) to prevent authentication failures that compromise security and disrupt operations. Organizations face critical deployment challenges: cloud-managed IAM services incur $500–2000/month costs with elevated latencies, while vendorspecific HA solutions create unsuitable lock-in for regulatory compliance and multi-cloud strategies. Existing self-hosted approaches lack validated HA architectures with automated failover and empirical performance validation. This paper presents a fully open-source, self-hosted HA architecture for Keycloak, integrating HAProxy load balancing, PostgreSQL with Patroni for automated failover, and comprehensive observability through Prometheus, Grafana, and Loki. Rigorous validation using the official Keycloak Gatling benchmark (n=10 trials) demonstrates sustained throughput of 180 req/s with sub-100ms latency and 0% error rate, scaling to 362 req/s at saturation. Production-realistic failover testing under sustained load (n=3 trials, 181,500 requests) confirms 0% error rate with sub-40ms mean latency and sub-30s recovery with zero data loss. Complete infrastructure-as-code artifacts—Ansible playbooks, Docker Compose files, and Grafana dashboards—enable reproducible one-command deployment on commodity hardware in under 10 minutes, achieving 61% cost reduction versus AWS-managed equivalents. The validated architecture addresses a critical gap in production-grade Keycloak deployment guidance for mid-scale enterprises requiring infrastructure autonomy, regulatory compliance, or cost optimization.
Download

Paper Nr:	30
Title:	A Multi-Metric Scoring Framework for Resilient Microservice Orchestration in Kubernetes
Authors:	Sahraoui Zakaria, Khlidj Leila Manel, Allegue Abdelhamid and Ezziane Abir
Abstract:	Today’s large-scale client–server applications demand scalable and resilient architectures that exceed the capabilities of monolithic designs. While microservices and Kubernetes-based orchestration improve scalability and automation, maintaining high availability (HA) in distributed deployments still poses challenges. This work proposes an enhanced orchestration framework that integrates metric-based scheduling, multi-criteria availability assessment, and class-based redundancy. By extending Kubernetes with reliability-aware plugins, the framework optimises pod placement and proactive fault recovery. Monte Carlo simulations with 5,000 runs demonstrate that multi-node redundancy achieves 99.89% availability (2.96 nines) compared to 85.42% for monolithic architectures, a 99.2% reduction in downtime. These results provide quantitative evidence for availability-oriented microservices orchestration design.
Download

Paper Nr:	56
Title:	Toward Federated Evaluation-as-a-Service for LLMs: A Preliminary Engineering Blueprint
Authors:	Giulia Biagioni, Alex Montoya Franco, Julia García Fernández, Edwin Harmsma and Stephan Raaijmakers
Abstract:	Large Language Model (LLM) evaluation has rapidly evolved alongside LLMs progression. However, while LLMs and their respective platforms are increasingly becoming hubs that integrate a wide range of features and domains, the LLM evaluation landscape remains fragmented as research artifacts and operational evaluation platforms deliver domain-specific functionalities in isolation. Consequently, we look at service federation as a compelling approach to offer LLM Evaluation-as-a-Service (EaaS) across platform boundaries. In this article, we examine a representative set of operational LLM evaluation platforms using a structured set of operational criteria, including deployment models, API availability, openness, LLM-as-a-judge independence, self-hosting capabilities, data protection claims, and degree of vendor dependence. Particularly, we focus on the distinction between vendor-independent and vendor-bound platforms, which directly constrains the composition of cross-platform evaluation workflows. To move toward practical platform interoperability, we introduce a preliminary engineering blueprint aimed at supporting federated evaluation workflows across heterogeneous EaaS providers. Inspired by semantic federation initiatives such as Gaia-X and International Data Spaces, we outline the conceptual abstractions, semantic representations, compatibility validation mechanisms, and orchestration components that would be required to enable interoperable, multi-provider evaluation pipelines. Rather than proposing a finalized and standardized semantic-based framework for interoperability, this work formulates a preliminary position on how federation of LLM evaluation services could be approached and identifies concrete architectural implications for future technical development.
Download

Paper Nr:	26
Title:	High Availability Database Architectures in Containerized Environments: A Systematic Review
Authors:	Imane Jerrari and Ismail Assayad
Abstract:	Kubernetes is now deployed by 96% of surveyed organisations, yet selecting the right high-availability (HA) database architecture for containerised environments remains poorly guided by existing literature. This paper addresses that gap through a PRISMA-compliant systematic literature review (Moher et al., 2009) of 41 peer-reviewed publications (2016–2025) covering 127 distinct experimental configurations. We build a seven-dimension classification framework—derived through open and axial coding of the full corpus—that organises five architectural patterns along axes of performance, persistence, availability, and operational complexity. Industrial validation across three production deployments confirms 30–50% resource optimisation while sustaining 99.9–99.99% availability targets. A practitioner decision framework achieves 87% pattern-recommendation accuracy (13/15 scenarios; 95% CI: 62–96% via Wilson score interval), compared to a 20% random-selection baseline across five equally probable patterns, representing a 4.35× improvement over chance. Six concrete research gaps are identified and mapped to investigative directions for the community.
Download

Paper Nr:	37
Title:	Automation of Secure and Compliant Infrastructure Orchestration Utilizing Terraform on AWS
Authors:	Anusha Singamaneni, Ranjith Bhaskaran, Cristina Hava Muntean and Shaguna Gupta
Abstract:	Secure, compliant cloud provisioning is difficult with manual configuration, where misconfigurations, inconsistent security, and limited auditability often arise. Infrastructure-as-Code (IaC) solves this by defining infrastructure declaratively and enabling repeatable, version-controlled deployments. This paper presents an AWS-focused Terraform approach embedding security-by-design controls into automated provisioning, including least-privilege IAM, network segmentation with public and private VPC subnets, bastion-based administrative access, controlled outbound connectivity via a NAT gateway, and centralized logging and encryption baselines. The implementation is evaluated through a comparative study against manual provisioning using the AWS Management Console. Results show Terraform reduces provisioning time by over 75% across complex networking and access-control scenarios, while improving reliability by increasing success rates from 74% to 96%. Connectivity validation confirms that public resources route traffic through the Internet Gateway, private instances access outbound connectivity only via the NAT gateway, and administrative access to private resources is restricted to the bastion host. Security validation confirms consistent enforcement of baseline controls such as IAM least privilege, subnet isolation, restricted Secure Shell (SSH) ingress, centralized logging, and encryption at rest. These findings demonstrate that Terraform-based Infrastructure-as-Code can simultaneously improve operational efficiency and strengthen security and compliance consistency, offering a practical foundation for repeatable and audit-ready cloud infrastructure deployments, particularly for environments with limited operational overhead.
Download

Area 4 - Services, Platforms and Applications

Full Papers

Paper Nr:	15
Title:	CertiCheck-FTA: Interpretable Fraud Detection in Medical Certificates Using Multimodal LLMs
Authors:	Douglas Silva de Moura, Sandro José Rigo, Henrique Cota de Freitas and Rodrigo da Rosa Righi
Abstract:	Fraud detection in medical certificates in Brazil is a complex problem, hindered by the lack of document standardization and poor digitalization quality. Traditional approaches fail in extracting data from cluttered layouts or result in binary “fraud/non-fraud” classifications, without a clear diagnosis for the analyst. To overcome these limitations, this work proposes CertiCheck-FTA, a hybrid model that dissociates text detection from data extraction, delegating the latter to a multimodal Large Language Model (LLM). The main contributions are: (i) a hybrid architecture that integrates multimodal LLMs and Fault Tree Analysis (FTA); and (ii) the generation of a multi-level and interpretable diagnosis (XAI) that points out the causes of fraud suspicion. The model was evaluated through a functional prototype on a set of 40 documents, achieving 96% success in correctly identifying risk in simulated fraud scenarios. CertiCheck-FTA positions itself as a robust decision support tool for non-technical analysts, with potential application in Human Resources and document forensics, increasing efficiency and transparency in validation.
Download

Paper Nr:	35
Title:	MoST: Measuring of Sustainable Throughput Method for Large Language Models as a Service
Authors:	Matthew Bwye, Carlos Müller, José A. Parejo and Antonio Ruiz-Cortés
Abstract:	Large Language Model as a Service (LLMaaS) have gained popularity due to their utility and accessibility, yet guaranteeing expected performance can be challenging when placed in a static infrastructure. Given capacity elasticity restrictions, LLMaaS owners must rely on demand-regulation measures to prevent users from saturating the service. While per-second or per-minute time interval restrictions are effectively managed via the well-studied Maximum Instantaneous Throughput (MIT), defining hourly or daily regulations requires knowledge on performance degradation thresholds, explained through Maximum Sustainable Throughput (MST). However, finding the MST of a LLMaaS cannot be done with currently known methods due to the stochastic nature of LLM response times and the high variability of the workload involved in inference. To address this gap, we propose Measuring of Sustainable Throughput (MoST), a novel methodology that utilizes statistical testing to identify the precise threshold where response times significantly diverge, signaling performance degradation. We validate MoST using a modified version of the fmperf benchmarking tool on three distinct models: DeepSeek 7B, LLaMA 8B, and Gemma 7B. Our results demonstrate thatMoST effectively determines sustainable quotas without relying on volatile metrics often used in literature, enabling LLMaaS providers to optimize the rate and quota of their service, enabling them to reach agreed-upon performance levels even on static infrastructures.
Download

Paper Nr:	63
Title:	Optimizing Global Federated Learning: A Serverless Hierarchical Approach with Region-Aware Placement
Authors:	Matheus Marotti Pereira and Lúcia Maria de A. Drummond
Abstract:	Scaling Federated Learning (FL) to globally distributed clients introduces significant communication latency and operational complexity. Traditional architectures often rely on persistent infrastructure, incurring high costs for idle resources. We propose a Serverless Hierarchical Federated Learning architecture orchestrated via managed cloud workflows to achieve a fully pay-per-use model. Our design eliminates idle infrastructure costs while reducing the management burden of persistent orchestration. Furthermore, we introduce a region-aware optimization method to strategically place the global parameter server by balancing network latency, egress costs, and computation pricing. Evaluation using standard benchmarks (MNIST, CIFAR-10) and a world-scale geospatial case study (So2Sat), conducted through real cloud deployments and calibrated simulations, demonstrates that our hierarchical approach reduces infrastructure costs by over 93% compared to non-hierarchical (flat) baselines. Moreover, optimized region selection achieved a total reduction of 40.6% in training time relative to the flat baseline, outperforming intuitive geographic placement by an additional 11.6%.
Download

Paper Nr:	64
Title:	EES-CND: Collaborative Neural Decision-Making for Drift-Aware Fault-Tolerant Edge-Cloud Service Placement
Authors:	Mohammadsadeq G. Herabad, Javid Taheri, Bestoun S. Ahmed and Calin Curescu
Abstract:	The edge-cloud paradigm improves service delivery by orchestrating resources across edge nodes and cloud data centres. These environments consist of heterogeneous, interconnected computing nodes that cooperate to deliver continuous services. However, their scale and complexity increase vulnerability to failures from hardware malfunctions, software defects, and dynamic operating conditions. These failures can disrupt system configurations and service execution, leading to reduced reliability, performance degradation, and violations of service-level objectives. Ensuring service execution requires adaptive service placement strategies across edge-cloud resources. This study introduces a fault-tolerant service placement approach (Enhanced Evolution Strategy for Collaborative Neural Decision-making, EES-CND) for edge-cloud environments. The method employs collaborative decision-making, wherein multiple lightweight neural networks jointly infer redeployment strategies during failure events. To address the system dynamics and mitigate performance drift, adaptive models are updated online using an enhanced evolution strategy. Extensive simulations show that EES-CND effectively handles performance drift and significantly outperforms existing methods in service recovery time, response time, and reliability, achieving a 44.8% reduction in fault-tolerance cost compared to standalone models.
Download

Short Papers

Paper Nr:	11
Title:	NEWS2-Based Edge-Fog Architecture for Continuous Home Care Monitoring with Low-Latency Clinical Prioritization
Authors:	Sander Reis, Mateus Roveda, Daniel Lopes Ferreira and Rodrigo da Rosa Righi
Abstract:	The growing demand for continuous care, driven by an aging population and chronic diseases, faces technological limitations and reliance on in-person visits. Traditional models based on cloud computing and complex AI are often impractical in scenarios with limited infrastructure. As an alternative, the Edge Healthcare (EHC) model proposes a distributed architecture with local processing. Its two main contributions are: (i) the decentralization of clinical analysis through the autonomous execution of the NEWS2 score directly on edge devices, enabling real-time detection without cloud dependency; and (ii) an intelligent prioritization mechanism at the Fog layer, which uses lightweight fuzzy logic to organize care based on clinical risk and waiting time. By keeping critical decisions close to the patient and reducing dependence on centralized processing, EHC supports low-latency monitoring in constrained environments. In simulated scenarios, the system correctly detected clinical risk at the Edge layer and prioritized emergency cases at the Fog layer, with average processing times of 9.12 ms and 60.6 ms, respectively.
Download

Paper Nr:	17
Title:	Low-Code Quantum Algorithm Modeling and Execution for Hybrid Cloud Environments
Authors:	Lavinia Stiliadou, Johanna Barzen, Fabian Bühler, Daniel Georg and Sharon-Naemi Stiliadou
Abstract:	The complexity of quantum algorithms, as well as their integration with classical components, pose barriers to the broader adoption of quantum computing. Although quantum programming frameworks offer capabilities such as quantum circuit construction and error mitigation, they often require deep domain expertise and advanced programming skills, limiting accessibility for non-specialists. Meanwhile, low-code development environments lower the entry barrier for programming by enabling users to visually assemble blocks, modular units that represent specific operations, rather than requiring extensive manual coding. However, existing classical low-code platforms are typically domain-specific and lack the flexibility needed to support diverse quantum applications and the usage of quantum computers. Attempts to bring these two approaches together by combining the power of quantum programming frameworks with the ease of low-code development have so far led to quantum low-code solutions that are mostly closed-source or tightly bound to a single vendor’s ecosystem. To address these limitations, we identify essential blocks for modeling quantum algorithms and introduce an open-source, provider-agnostic low-code platform that enables their integration and execution across hybrid cloud environments. By automating the transformation of visual models into executable work-flows, the platform enables orchestration between classical services and quantum computers. To demonstrate the practical feasibility of our approach, we provide a prototypical implementation and showcase it for a use case from the optimization domain.
Download

Paper Nr:	19
Title:	ModuCloudEval: A Unified Modular Framework for Automated Evaluation of Cloud Systems and Services
Authors:	Sharmina Iasmin and Raju Shrestha
Abstract:	Evaluating the performance, reliability, and scalability of cloud systems remains challenging due to heterogeneous infrastructures and fragmented tooling. This paper presents ModuCloudEval, a modular and extensible framework for automated performance evaluation of cloud databases, load balancers, and microservices applications. The framework unifies offline benchmarking tools (Sysbench, Apache Benchmark, Redis Benchmark) with online monitoring solutions (K6, Prometheus, Grafana), enabling consistent assessment in both pre-deployment and live environments. Its plugin-oriented architecture supports independent subsystem evaluation and straightforward extensibility to new workloads, while YAML-driven automation and secure credential handling ensure reproducibility and scalability. In addition to presenting the architecture, this paper provides a comparative experimental evaluation across multiple cloud providers, including OpenStack, AWS, and Azure, and across representative subsystems such as databases and load balancers. These experiments demonstrate how the framework enables systematic cross-cloud comparison and configuration-aware performance analysis. Furthermore, the framework facilitates reproducible experimentation and enables future integration of cost, energy, and security evaluation modules, which are increasingly relevant for sustainable and trustworthy cloud deployments. Overall, ModuCloudEval addresses key limitations of existing evaluation approaches and provides a practical, automated foundation for reproducible cloud performance analysis.
Download

Paper Nr:	23
Title:	Time to Go Green: Carbon-Aware Request Scheduling in Software Services
Authors:	Ornela Danushi, Jacopo Soldani, Stefano Forti and Antonio Brogi
Abstract:	The growing environmental impact of the ICT sector calls for sustainable computing, including software services designed to minimise their carbon footprint, without compromising on quality of service. In particular, software services should be carbon-aware by design – leveraging alternative configurations to reduce emissions while featuring the required quality of service. In this work, we present a carbon-aware approach to realise dynamically adaptive software services. We propose an optimisation scheme that enables serving incoming requests with different implemented versions of software services, by minimising carbon emissions, combining service adaptability and time-shifting. Finally, to demonstrate the feasibility of our proposal, we introduce CarbonShift, an open-source prototype implementation.
Download

Paper Nr:	28
Title:	Infrastructure as Code: A Rule Catalog for Incident Self-Healing
Authors:	Niyazi Gökberk Gündüz, Florian Hofer and Claus Pahl
Abstract:	Infrastructure-As-Code (IaC) allows organizations to push dozens of software changes every day. However, human operators of that software remain a bottleneck when incidents must be addressed during software operation. Incident post-mortems show detection latencies of tens of minutes and manual recovery times of hours. The research question is therefore whether we can define an autonomous, rule-based self-healing system that addresses the auditability and safety guarantees in modern DevOps and incident management. We focus here on a rule catalog for self-healing that forms the core of the classical MAPE-K autonomic loop. The heart of the system is the reusable catalog of 20 event–condition–action (ECA) rules for incident management.
Download

Paper Nr:	29
Title:	An Open-Source Reference Architecture for Infrastructure-as-Code Self-Healing
Authors:	Niyazi Gökberk Gündüz, Florian Hofer and Claus Pahl
Abstract:	Enterprises that manage their cloud systems through Infrastructure-as-Code (IaC) often push dozens of changes every day. Human operators remain the bottleneck: incident post-mortems show detection latencies of tens of minutes and manual recoveries stretching into hours. A central research question is therefore whether an autonomous controller can reliably self-heal in a modern DevOps environment. We present a self-healing controller architecture that instantiates the classical MAPE-K autonomic loop with Monitoring runtime metrics, Analyzing them against declarative rules, Planning a response, and Executing an idempotent remediation, while recording every decision in a Knowledge store for post-incident forensics. This study defines a reference architecture for an end-to-end, rule-driven self-healing approach for an IaC pipeline that fully builds on open technologies, without the need for specialized hardware or proprietary AIOps platforms.
Download

Paper Nr:	36
Title:	ElasticHub: A Cost-Efficient JupyterHub Platform via Automated Scaling with Kubernetes on Hybrid Cloud
Authors:	Ryutaro Matsumoto, Kohei Taniguchi, Tomonori Hayami, Keichi Takahashi and Susumu Date
Abstract:	JupyterHub is widely used in educational and research institutions to provide Jupyter Notebook as a Platform-as-a-Service. However, when JupyterHub is deployed on on-premise resources, users may experience long waiting times if resource demand exceeds capacity. Although leveraging public cloud resources can mitigate this issue, cloud usage costs are significantly higher than on-premise costs. To reduce the cloud cost while maintaining responsiveness, Jupyter Notebook servers running on cloud nodes should be migrated to on-premise nodes as soon as on-premise resources become available. During migration, preserving the session state is essential to ensure a seamless user experience. This paper presents ElasticHub, a cost-efficient JupyterHub platform that enables automated scaling across on-premise and cloud resources. ElasticHub dynamically scales cloud nodes and transparently migrates session states to minimize user waiting times and reduce cloud resource usage. We evaluated ElasticHub through simulation and found that it reduces user waiting time compared with an on-premise-only deployment, while enabling earlier release of cloud nodes through migration.
Download

Area 5 - Cloud Computing Enabling Technology

Full Papers

Paper Nr:	33
Title:	Improving Cost-Performance Efficiency of Scientific Cloud Workflows through GPU Sharing
Authors:	Matheus M. Costa, Tiago Ferreto, Cesar A. F. De Rose, Odej Kao, Philippe Navaux and Arthur Lorenzon
Abstract:	Accelerator-based computing has become essential for deep learning and scientific simulations. However, in cloud environments, GPUs are often underutilized when applications execute in isolation, leading to high execution costs and inefficient use of hardware resources. One approach to mitigate this underutilization is GPU sharing, where multiple applications are allowed to execute concurrently on the same physical GPU. The central challenge lies in determining which applications can be co-executed and how to manage their concurrent execution without incurring unacceptable performance degradation. In this scenario, we propose GPU-CoLoC, an offline methodology that uses Integer Linear Programming (ILP) to optimize GPU sharing. This approach selects application co-execution groups to balance performance degradation against the costs of accelerator usage. The methodology was validated using eighteen applications on NVIDIA HGX H100 and AMD MI300X instances. Experimental results show that optimal co-location decisions can reduce cloud execution costs by up to 36% compared to sequential execution.
Download

Short Papers

Paper Nr:	24
Title:	Evaluating Scalability Using Open Table File Formats in Cloud Lakehouse Architectures with TPC-DS
Authors:	Italo V. P. Guimaraes and Aleteia Araujo
Abstract:	Cloud lakehouse systems unify the flexibility of data lakes with the performance of data warehouses over object storage, yet practitioners lack empirical guidance for selecting among major implementations (Apache Iceberg, Delta Lake, Apache Hudi) and file formats (Parquet, ORC, Avro). This paper presents a systematic TPC-DS performance evaluation executing all 99 decision support queries across multiple scale factors with comprehensive horizontal scalability analysis using varying worker configurations on S3-compatible cloud-native infrastructure. Our results reveal that lakehouse and file format selection significantly impacts query performance, with columnar formats substantially outperforming row-oriented alternatives. Furthermore, scalability analysis uncovers negative scaling behavior at larger data volumes, where adding workers degrades performance due to memory pressure-challenging conventional horizontal scaling assumptions. We provide a practitioner-oriented decision framework with evidence-based recommendations for format selection, cluster sizing, and cost optimization in cloud deployments.
Download

Paper Nr:	25
Title:	Zero-Touch Selection of Virtual Network Embedding Algorithms Using Multi-Objective Decision Trees
Authors:	Luis Antonio Momm Duarte and Guilherme Piêgas Koslovski
Abstract:	The Virtual Network Embedding (VNE) problem addresses the NP-hard challenge of mapping virtual network requests onto shared physical infrastructures while optimizing competing objectives. Since no single algorithm performs best in all scenarios, this work proposes multi-objective decision trees to automatically select the most appropriate VNE algorithm for each request based on network state and request characteristics. Three specialized models, each optimized for a specific objective (acceptance rate, revenue-cost ratio, and average revenue), allow adaptation to operational priorities at runtime. Experiments on three datacenter topologies with 8,000 requests show top-3 accuracy between 79.1% and 85.9%, with an improvement of +32.9 percentage points in acceptance rate over the fixed baseline. The approach provides interpretability through explicit decision paths and inference latency below 1 ms, aligned with zero-touch administration requirements.
Download

Paper Nr:	32
Title:	Flow-Aware, Machine Learning-Driven Mapping for Serverless Software Defined Networks
Authors:	Abdulaziz Alhindi and Karim Djemame
Abstract:	Energy efficiency has become a critical challenge in Software-Defined Networking (SDN), particularly as modern controllers increasingly rely on computationally intensive software components. In parallel, server-less computing has appeared as a lightweight, event-driven execution model that offers fine-grained resource utilisation and reduced operational overhead. While recent studies have explored integrating serverless platforms into SDN architectures, they largely focus on architectural disaggregation and overlook energy-aware resource provisioning decisions. This paper proposes a Machine Learning (ML)-driven, flow-aware architecture for energy-aware resource provisioning in serverless SDN environments. The proposed approach predicts per-flow energy consumption, processing time, and CPU usage using machine learning models trained on flow-level features. These predictions are used to dynamically map incoming data flows to the most energy-efficient execution nodes. To support accurate prediction, we introduce a methodology for fine-grained measurement of per-flow energy consumption and processing time in serverless SDN environments. The architecture is evaluated using an SDN–serverless testbed built with ONOS and Knative, leveraging real SDN traffic traces. Experimental results demonstrate that the proposed solution effectively reduces energy consumption compared to baseline deployments while maintaining service quality, highlighting the benefits of flow-aware, ML-based resource provisioning in serverless SDN systems.
Download

Paper Nr:	39
Title:	Why Large Language Models Struggle with Cloud Instance Selection
Authors:	Matheus Machado, Matheus de M. Costa, Marcelo C. Luizelli, Fábio D. Rossi and Arthur F. Lorenzon
Abstract:	Cloud instance selection for parallel and AI workloads is increasingly difficult due to architectural heterogeneity, non-linear scalability, and performance–cost trade-offs. Large language models (LLMs) are emerging as decision-support tools for this task, but their ability to reason about execution behavior remains poorly validated. We present an empirical evaluation of eight state-of-the-art LLMs across 103 CPU and 12 GPU cloud instances from major providers. Using 18 parallel CPU workloads and 4 MLPerf inference models, we compare LLM-selected instances against an Oracle baseline obtained through exhaustive design-space exploration. Results show substantial performance deficits, frequent mismatch with workload-specific architectural constraints, and a consistent bias toward static hardware descriptors such as core count, GPU count, and hourly price. For CPU workloads, the best-performing model reaches only 65.5% of Oracle performance on average, while others remain below 40%. Under the joint performance–cost objective, the strongest model is still about 2× worse than Oracle, and even an optimistic aggregation of all model suggestions remains 25% above the optimal trade-off. For GPU workloads, recommendations collapse to a single high-end configuration, yielding smaller but persistent gaps relative to Oracle.
Download