CLOSER 2025 Abstracts


Area 1 - Cloud Computing Fundamentals

Full Papers
Paper Nr: 43
Title:

LLM-Based Adaptive Digital Twin Allocation for Microservice Workloads

Authors:

Pedro Henrique Sachete Garcia, Ester de Souza Oribes, Ivan Mangini Lopes Junior, Braulio Marques de Souza, Angelo Nery Vieira Crestani, Arthur Francisco Lorenzon, Marcelo Caggiani Luizelli, Paulo Silas Severo de Souza and Fábio Diniz Rossi

Abstract: Efficient resource allocation in programmable datacenters is a critical challenge due to the diverse and dynamic nature of workloads in cloud-native environments. Traditional methods often fall short in addressing the complexities of modern datacenters, such as inter-service dependencies, latency constraints, and optimal resource utilization. This paper introduces the Dynamic Intelligent Resource Allocation with Large Language Models and Digital Twins (DIRA-LDT) framework, a cutting-edge solution that combines real-time monitoring capabilities of Digital Twins with the predictive and reasoning strengths of Large Language Models (LLMs). DIRA-LDT systematically optimizes resource management by achieving high allocation accuracy, minimizing communication latency, and maximizing bandwidth utilization. By leveraging detailed real-time insights and intelligent decision-making, the framework ensures balanced resource distribution across the datacenter while meeting stringent performance requirements. Among the key results, DIRA-LDT achieves an allocation accuracy of 98.5%, an average latency reduction to 5.3 ms, and a bandwidth utilization of 82.4%, significantly outperforming heuristic-based, statistical, machine learning, and reinforcement learning approaches.
Download

Paper Nr: 54
Title:

SLO and Cost-Driven Container Autoscaling on Kubernetes Clusters

Authors:

Angelo Marchese and Orazio Tomarchio

Abstract: Modern web services must meet critical non-functional requirements such as availability, responsiveness, scalability, and reliability, which are formalized through Service Level Agreements (SLAs). These agreements specify Service Level Objectives (SLOs), which define performance targets like uptime, latency, and throughput, essential for ensuring consistent service quality. Failure to meet SLOs can result in penalties and reputational damage. Service providers also face the challenge of avoiding over-provisioning resources, as this leads to unnecessary costs and inefficient resource use. To address this, autoscaling mechanisms dynamically adjust the number of service replicas to match user demand. However, traditional autoscaling solutions typically rely on low-level metrics (e.g., CPU or memory usage), making it difficult for providers to optimize both SLOs and infrastructure costs. This paper proposes an enhanced autoscaling methodology for containerized workloads in Kubernetes clusters, integrating SLOs with a cost-driven autoscaling policy. This approach overcomes the limitations of conventional autoscaling by making more efficient decisions that balance service-level requirements with operational costs, offering a comprehensive solution for managing containerized applications and their infrastructure in Kubernetes environments. The results, obtained by evaluating a prototype of our system in a testbed environment, show significant advantages over the vanilla Kubernetes Horizontal Pod Autoscaler.
Download

Paper Nr: 57
Title:

Performance Analysis of mdx II: A Next-Generation Cloud Platform for Cross-Disciplinary Data Science Research

Authors:

Keichi Takahashi, Tomonori Hayami, Yu Mukaizono, Yuki Teramae and Susumu Date

Abstract: mdx II is an Infrastructure-as-a-Service (IaaS) cloud platform designed to accelerate data science research and foster cross-disciplinary collaborations among universities and research institutions in Japan. Unlike traditional high-performance computing systems, mdx II leverages OpenStack to provide customizable and isolated computing environments consisting of virtual machines, virtual networks, and advanced storage. This paper presents a comprehensive performance evaluation of mdx II, including a comparison to Amazon Web Services (AWS). We evaluated the performance of a 16-vCPU VM from multiple aspects including floating-point computing performance, memory throughput, network throughput, file system and object storage performance, and real-world application performance. Compared to an AWS 16-vCPU instance, the results indicated that mdx II outperforms AWS in many aspects and demonstrated that mdx II holds significant promise for high-performance data analytics (HPDA) workloads. We also evaluated the virtualization overhead using a 224-vCPU VM occupying an entire host. The results suggested that the virtualization overhead is minimal for compute-intensive benchmarks, while memory-intensive benchmarks experienced larger overheads. These findings are expected to help users of mdx II to obtain high performance for their data science workloads and offer insights to the designers of future data-centric cloud platforms.
Download

Short Papers
Paper Nr: 12
Title:

Data Orchestration Platform for AI Workflows Execution Across Computing Continuum

Authors:

Gabriel Ioan Arcas and Tudor Cioara

Abstract: Cloud AI technologies have emerged to exploit the vast amount of data produced by digitized activities. However, despite these advancements, they still face challenges in several areas, including data processing, achieving fast response times, and reducing latency. This paper proposes a data orchestration platform for AI workflows, considering the computing continuum setup. The edge layer of the platform focuses on immediate data collection, the fog layer provides intermediate processing, and the cloud layer manages long-term storage and complex data analysis. The orchestration platform incorporates the Lambda Architecture principles for flexibility in managing batch processing and real-time data streams, enabling effective management of large data volumes for AI workflows. The platform was used to manage an AI workflow dealing with the prediction of household energy consumption, showcasing how each layer supports different stages of the machine learning pipeline. The results are promising the models are being trained, validated, and deployed effectively, with reduced latency and use of computational resources.
Download

Paper Nr: 28
Title:

Literature Review on Cloud-Based Service-Oriented Architecture for IEC 61499 Distributed Control Systems

Authors:

Tomás Torres, Gil Gonçalves and Rui Pinto

Abstract: The 4th Industrial Revolution has driven innovations in integrating Information Technologies (IT) with Operations Technologies (OT). This integration is essential for developing Cyber-Physical Production Systems (CPPS), which enhance distributed automation and optimize industrial production processes. The IEC 61499 standard facilitates this integration through its modularity, reusability, and interoperability, making it crucial for distributed control and system automation. Despite its advantages, IEC 61499’s application is predominantly at the Edge layer, limiting its functionality in higher layers such as the Cloud. To address this gap, Cloud-based Service-Oriented Architectures (SoA) have emerged as a key study area, offering modular, reusable, and scalable services extending beyond the Edge layer. This paper presents a comprehensive literature review focused on expanding the IEC 61499 applications to reconfigure CPPS, by integrating Cloud layer services through SoA. The review highlights advancements, challenges, and future directions in achieving greater modularity, interoperability, scalability, and abstraction within distributed control systems. By synthesizing current research, this work provides insights into the potential enhancements of CPPS using a Cloud-based SoA approach.
Download

Paper Nr: 44
Title:

Advancing Serverless Workflow Efficiency: Integrating Functional Programming Constructs and DAG-Based Execution

Authors:

Nimród Földvári and Florin Crăciun

Abstract: Serverless computing, also known as the Function-as-a-Service (FaaS) paradigm, has become a cornerstone of modern cloud-based applications, enabling developers to build and execute workflows by composing server-less functions. However, current serverless platforms are limited by constrained orchestration and function composition capabilities, which reduce their expressiveness and performance. To address these limitations, this paper introduces three enhancements to Apache OpenWhisk: native support for currying, continuation, and Directed Acyclic Graphs (DAGs). These enhancements are designed to improve the expressiveness of serverless workflows, reduce latency, and enable partial execution of dynamic workflows as soon as data becomes available.
Download

Paper Nr: 17
Title:

OTRA: A Risk Management Ontology for Transparent Service Level Agreements in Federated Cloud Environment

Authors:

Giulia Biagioni, Resul Serkan Keskin, Lawrence Cook, Edwin Harmsma and Erik Langius

Abstract: Cloud Federation, a model where multiple cloud providers collaborate to offer interconnected services, has become increasingly renowned in cloud computing. In this environment, Service Level Agreements (SLAs) play a critical role by formalising service expectations and fostering transparency between providers and consumers. Although existing SLA ontologies provide a semantic foundation for standardised SLA management, they lack components for detailed risk management and representation within SLA. This paper presents our work on developing OTRA - Ontology for Transparent Agreement - specifically designed for risk management within SLAs in federated cloud environments. By integrating structured risk identification, prevention and mitigation strategies, our ontology extends the Gaia-X Core Ontology. We validate our ontology through a controlled simulation that demonstrates its effectiveness in structuring SLA risk information, offering a foundation for transparent, accountable service agreements.
Download

Paper Nr: 21
Title:

Job Generator for Evaluating and Comparing Scheduling Algorithms for Modern GPU Clouds

Authors:

Michal Konopa, Jan Fesl and Ladislav Beránek

Abstract: The steep technological and performance advances in GPU cards have led to their increasing use in data centers in the recent years, especially in machine learning jobs. However, high hardware performance alone does not guarantee (sub) optimal utilization of computing resources, especially when the cost associated with power consumption also needs to be increasingly considered. As consequence of these realities, various job scheduling algorithms have been and are being developed to optimize the power consumption in data centers with respect to defined constraints. Unfortunately, there is still no known, widely used, parametrizable dataset that serves as a de facto standard for simulating scheduling algorithms and the resulting ability to compare their performance against each other. The goal of this paper is to describe a simple job set generator designed to run on modern GPU architectures and to introduce the newly created data set suitable for evaluation the scheduling algorithms.
Download

Paper Nr: 22
Title:

Optimization of Food Inputs in Restaurants in Metropolitan Lima Through Prediction and Monitoring Based on Machine Learning

Authors:

Marcos Olivos, Alexandre Motta and Pedro Castaneda

Abstract: This work presents the development of a web-based monitoring and prediction system designed to optimize food supply in restaurants in Metropolitan Lima, addressing challenges such as efficient inventory management and food waste reduction. The solution employs six Machine Learning models (Random Forest, Gradient Boosting, Ridge Regression, Lasso Regression, Linear SVR, and Neural Network), evaluated using accuracy metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). Among the models, Gradient Boosting demonstrated the best performance, with an MSE of 0.0032, RMSE of 0.057, and MAE of 0.027, outperforming the others in terms of accuracy, including Neural Network and Random Forest, which also offered competitive results. While the approach was developed in the specific context of Metropolitan Lima, the applied methods and obtained results can be adapted to other urban markets with similar dynamics, demonstrating broader applicability. This system not only promotes more efficient and sustainable inventory planning, but also contributes to the economic growth of restaurants by optimizing resources and improving their profitability in a highly competitive environment.
Download

Paper Nr: 26
Title:

A Framework for Evaluating Integration Testing Criteria in Serverless Applications

Authors:

Stefan Winzinger and Guido Wirtz

Abstract: Serverless applications are based on Function-as-a-Service (FaaS) platforms where serverless functions interact with other cloud-specific services. The integration of these components is crucial for the application’s functionality and must be adequately tested. Testing criteria can help here by supporting developers in evaluating test suites and identifying missing test cases. This paper presents a framework for evaluating integration testing criteria in serverless applications by showing the processes needed for evaluating the criteria. The process defined is shown for some control and data flow criteria and applied to some serverless applications showing the feasibility of the approach.
Download

Paper Nr: 51
Title:

A Framework for Real-Time Monitoring of Power Consumption of Distributed Calculation on Computational Cluster

Authors:

Adam Krechowicz

Abstract: This paper proposes a framework for real-time monitoring of the power consumption of distributed calculation on the nodes of the cluster. The framework allows to visualize and analyze the provider results based on the context information about the performed calculation. The first part of the framework is devoted to monitoring the power consumption during the execution of machine learning algorithms and the performance of NoSQL storage. The second part is dedicated to the testing of distributed data storage, called Scalable Distributed Two–Layered Data Structure (SD2DS). The results show that the framework can be used in the development of a management system that could schedule computations to take full advantage of renewable energy.
Download

Area 2 - Cloud Security and Privacy

Full Papers
Paper Nr: 56
Title:

Security-Aware Allocation of Replicated Data in Distributed Storage Systems

Authors:

Sabrina De Capitani di Vimercati, Sara Foresti, Giovanni Livraga, Pierangela Samarati and Mauro Tedesco

Abstract: Distributed storage systems offer scalable and cost-effective solutions for managing large data collections. A critical factor for the adoption of these systems is the allocation of data (possibly including replicas) to the storage nodes satisfying operational and security requirements, while ensuring economic effectiveness. Appropriate data and replica management provides significant benefits, ranging from enhanced fault tolerance and improved data availability, to reduced latency and optimized workload distribution. A suboptimal placement of data and replicas can instead lead to excessive costs, security risks, and performance bottlenecks. This paper proposes a novel model for permitting data owners to specify in a friendly manner complex data and replica allocation constraints, and an approach for computing optimal (satisfying operational and security requirements and minimizing costs) data allocations in distributed storage environments. Our work aims to improve the reliability, security, and cost-effectiveness of distributed storage systems.
Download

Short Papers
Paper Nr: 32
Title:

EMERALD: Evidence Management for Continuous Certification as a Service in the Cloud

Authors:

Christian Banse, Björn Fanta, Juncal Alonso and Cristina Martinez

Abstract: The conspicuous lack of cloud-specific security certifications, in addition to the existing market fragmentation, hinder transparency and accountability in the provision and usage of European cloud services. Both issues ultimately reflect on the level of customers’ trustworthiness and adoption of cloud services. The upcoming demand for continuous certification has not yet been definitively addressed and it remains unclear how the level ’high’ of the European Cybersecurity Certification Scheme for Cloud Services (EUCS) shall be technologically achieved. The introduction of AI in cloud services is raising the complexity of certification even further. This paper presents the EMERALD Certification-as-a-Service (CaaS) concept for continuous certification of harmonized cybersecurity schemes, like the EUCS. EMERALD CaaS aims to provide agile and lean re-certification to consumers that adhere to a defined level of security and trust in a uniform way across heterogeneous environments consisting of combinations of different resources (Cloud, Edge, IoT). Initial findings suggest that EMERALD will significantly contribute to continuous certification, boosting providers and users of cloud services to maintain regulatory compliance towards the latest and upcoming security schemes.
Download

Paper Nr: 33
Title:

Protecting Privacy in Federated Time Series Analysis: A Pragmatic Technology Review for Application Developers

Authors:

Daniel Bachlechner, Ruben Hetfleisch, Stephan Krenn, Thomas Lorünser and Michael Rader

Abstract: The federated analysis of sensitive time series has huge potential in various domains, such as healthcare or manufacturing. Yet, to fully unlock this potential, requirements imposed by various stakeholders must be fulfilled, regarding, e.g., efficiency or trust assumptions. While many of these requirements can be addressed by deploying advanced secure computation paradigms such as fully homomorphic encryption, certain aspects require an integration with additional privacy-preserving technologies. In this work, we perform a qualitative requirements elicitation based on selected real-world use cases. We match the derived requirements categories against the features and guarantees provided by available technologies. For each technology, we additionally perform a maturity assessment, including the state of standardization and availability on the market. Furthermore, we provide a decision tree supporting application developers in identifying the most promising candidate technologies as a starting point for further investigation. Finally, existing gaps are identified, highlighting research potential to advance the field.
Download

Area 3 - Edge Cloud and Fog Computing

Full Papers
Paper Nr: 27
Title:

Optimization of Cloud-Native Application Execution over the Edge-Cloud Continuum Enabled by DVFS

Authors:

Georgios Kontos, Polyzois Soumplis and Emmanouel Varvarigos

Abstract: Microservice-based application architecture, despite its many merits - including enhanced flexibility, scalability and robustness-, adds significant complexity to the application’s orchestration process. Complex execution paths emerge during runtime as the demands traverse the application’s graph within an edge-cloud topology. In this work, we leverage Dynamic Voltage and Frequency Scaling (DVFS) combined with the application’s structure-represented as a Directed Acyclic Graph (DAG)-to determine optimal configuration for each service. Our goal is to perform assignments that optimize the weighted combination of the application’s execution time (i.e., the resulting critical path’s length) and the total energy consumption, subject to node capacity and power constraints, the communication limits of the microservices, and the different frequency levels of the processing units. The problem is initially modeled as a Mixed Integer Linear Problem (MILP). To tackle its complexity, we segregate the problem into two closely related subproblems. The first is addressed by a genetic algorithm, while a best-fit heuristic algorithm obtains the final solution, leveraging the genetics’ decisions. Extensive simulations demonstrate the efficiency of the proposed mechanism by contrasting its results with two baseline policies, while highlighting the inherent trade-offs between performance and energy consumption.
Download

Short Papers
Paper Nr: 25
Title:

WFQ-Based SLA-Aware Edge Applications Provisioning

Authors:

Pedro Henrique Sachete Garcia, Arthur Francisco Lorenzon, Marcelo Caggiani Luizelli, Paulo Silas Severo de Souza and Fábio Diniz Rossi

Abstract: Provisioning delays can severely degrade the performance of real-time applications, especially in critical sectors such as healthcare, smart cities, and autonomous vehicles, where fast and reliable responses are essential. While managing data traffic effectively, existing flow scheduling techniques often fail to account for high-level metrics like Service-Level Agreements, leading to suboptimal prioritization of critical applications. This paper introduces a novel algorithm designed to optimize network bandwidth allocation for edge applications by incorporating SLA-based metrics. Evaluations demonstrate that our proposal outperforms conventional scheduling techniques. The results show that under varying infrastructure usage levels, our proposal consistently reduces provisioning times and minimizes delay violations for edge applications.
Download

Area 4 - Cloud Computing Platforms and Applications

Full Papers
Paper Nr: 23
Title:

Operations Patterns for Hybrid Quantum Applications

Authors:

Martin Beisel, Johanna Barzen, Frank Leymann and Benjamin Weder

Abstract: With the rapidly improving capabilities of today’s quantum devices, the development of high-quality quantum applications, that can be utilized to solve practically relevant problems, becomes necessary. Quantum software engineering is an emerging paradigm aiming to improve the quality, reusability, and maintainability of quantum applications. However, while various concepts have been presented for developing and operating quantum applications, there is a lack of structured documentation supporting developers and operations personnel in applying the best practices. To facilitate the development of quantum applications, a pattern language for the quantum computing domain has been introduced. It documents proven solutions to commonly recurring problems during the quantum software development lifecycle in an abstract manner. However, it does not contain patterns for the operations of hybrid quantum applications. To bridge this gap, in this work, we introduce novel patterns focusing on packaging, testing, executing, and observing hybrid quantum applications.
Download

Short Papers
Paper Nr: 37
Title:

A Cost-Benefit Analysis of Additive Manufacturing as a Service

Authors:

Igor Ivkić, Tobias Buhmann and Burkhard List

Abstract: The global manufacturing landscape is undergoing a fundamental shift from resource-intensive mass production to sustainable, localised manufacturing. This paper presents a comprehensive analysis of a Cloud Crafting Platform that enables Manufacturing as a Service (MaaS) through additive manufacturing technologies. The platform connects web shops with local three-dimensional (3D) printing facilities, allowing customers to purchase products that are manufactured on-demand in their vicinity. We present the platform’s Service-Oriented Architecture (SOA), deployment on the Microsoft Azure cloud, and integration with three different 3D printer models in a testbed environment. A detailed cost-benefit analysis demonstrates the economic viability of the approach, which generates significant profit margins. The platform implements a weighted profit-sharing model that fairly compensates all stakeholders based on their investment and operational responsibilities. Our results show that on-demand, localised manufacturing through MaaS is not only technically feasible but also economically viable, while reducing environmental impact through shortened supply chains and elimination of inventory waste. The platform’s extensible architecture allows for future integration of additional manufacturing technologies beyond 3D printing.
Download

Paper Nr: 13
Title:

Framework for Decentralized Data Strategies in Virtual Banking: Navigating Scalability, Innovation, and Regulatory Challenges in Thailand

Authors:

Worapol Alex Pongpech and Pasd Putthapipat

Abstract: In the rapidly advancing realm of virtual banking, a robust data strategy is crucial for competitiveness and meeting growing customer demands. In 2025, the Bank of Thailand will be issued three virtual banking licenses, marking a pivotal shift in the financial landscape. This paper outlines key components of a virtual banking data strategy, focusing on real-time service delivery, innovative financial products, enhanced customer support, and strong data governance. This research offers strategic insights into the navigation of these complexities and the driving force of successful digital transformation in the banking sector.
Download

Paper Nr: 36
Title:

The Evolution of Cloud Computing Towards a Vendor Agnostic Market Place Using the SKY CONTROL Framework

Authors:

Henry-Norbert Cocos, Christian Baun and Martin Kappes

Abstract: Multi-cloud environments offer benefits like vendor diversification and resilience but pose challenges such as increased management complexity, lack of cost transparency, and compliance. This concept paper introduces SKY CONTROL, a vendor-agnostic framework for small and medium-sized enterprises (SMEs). SKY CONTROL integrates cost control and risk management into multi-cloud setups, providing static and dynamic resource analyses, a cost calculator, and risk assessment tools. By leveraging the Sky Computing paradigm, SKY CONTROL simplifies resource orchestration and enhances security. This novel framework is the first implementation of the innovative Sky Computing concept. It aims to improve cost efficiency, regulatory compliance, and strategic IT planning for SMEs, offering a unified approach to managing hybrid infrastructures.
Download

Paper Nr: 42
Title:

Towards Optimizing Cost and Performance for Parallel Workloads in Cloud Computing

Authors:

William Maas, Fábio Diniz Rossi, Marcelo C. Luizelli, Philippe O. A. Navaux and Arthur F. Lorenzon

Abstract: The growing popularity of data-intensive applications in cloud computing necessitates a cost-effective approach to harnessing distributed processing capabilities. However, the wide variety of instance types and configurations available can lead to substantial costs if not selected based on the parallel workload requirements, such as CPU and memory usage and thread scalability. This situation underscores the need for scalable and economical infrastructure that effectively balances parallel workloads’ performance and expenses. To tackle this issue, this paper comprehensively analyzes performance, costs, and trade-offs across 18 parallel workloads utilizing 52 high-performance computing (HPC) optimized instances from three leading cloud providers. Our findings reveal that no single instance type can simultaneously offer the best performance and the lowest costs across all workloads. Instances that excel in performance do not always provide the best cost efficiency, while the most affordable options often struggle to deliver adequate performance. Moreover, we demonstrate that by customizing instance selection to meet the specific needs of each workload, users can achieve up to 81.2% higher performance and reduce costs by 95.5% compared to using a single instance type for every workload.
Download

Paper Nr: 48
Title:

A Concept for Accelerating Long-Term Prototype Testing Using Anomaly Detection and Digital Twins

Authors:

Vincent Nebel, Pia Goßrau-Lenau, Harshvardhan Agarwal and Dirk Werth

Abstract: Developing mechanical components, especially complex assemblies like pumps, is a resource and time intensive process. Testing pump prototypes for long-term durability is critical to ensure error-free operation of the final product. Prototypes undergo material and operational tests to determine their expected lifespan, focusing on defects caused by material degradation and water contamination. Long-term tests, lasting months, are necessary to simulate real-world conditions, but limited test bench capacities create bottlenecks, restricting material experimentation. Moreover, monitoring the internal state of pumps during tests is challenging. Undetected defects can worsen or trigger secondary issues, complicating the root cause analysis, which provides valuable information for further product improvements. To address these challenges, a digital twin that integrates geometry and material data, simulations, and sensor measurements was developed. This twin is used as data source for machine learning based anomaly detection, allowing tests to stop sooner and preventing further damage when first signs of a defect are detected. A modular serverless architecture is used to host the model inference on the cloud, improving resource usage and scalability as well as reducing operational costs.
Download

Area 5 - Cloud Computing Enabling Technology

Full Papers
Paper Nr: 16
Title:

Performance and Usability Implications of Multiplatform and WebAssembly Containers

Authors:

Sangeeta Kakati and Mats Brorsson

Abstract: Docker and WebAssembly (Wasm) are two pivotal technologies in modern software development, each offering unique strengths in portability and performance. The rise of Wasm, particularly in conjunction with container runtimes, highlights its potential to enhance efficiency in diverse application stacks. However, a notable gap remains in understanding how Wasm containers perform relative to traditional multi-platform containers across multiple architectures and workloads, especially when optimizations are employed. In this paper, we aim to empirically assess the performance and usability implications of native multi-platform containers versus Wasm containers under optimized configurations. We focus on critical metrics including startup time, pull time (both fresh and cached), and image sizes for three distinct workloads and architectures- AMD64 (AWS bare metal) and two embedded boards: Nvidia Jetson Nano with ARM64 and Starfive VisionFive2 with RISCV64. To address these objectives, we conducted a series of experiments using docker and containerd in multi-platform built images for native containers and Wasmtime as the WebAssembly runtime within contain-erd/docker’s ecosystem. Our findings show that while native containers achieve slightly faster startup times, Wasm containers excel in agility and maintain image sizes of approximately 27.0% of their native counterparts and a significant reduction in pull times across all three architectures of up to 25% using containerd. With continued optimizations, Wasm has the potential to emerge as a viable choice in environments that demand both reduced image size and cross-platform portability. It will not replace the current container paradigm soon; rather, it will be integrated into this framework and complement containers instead of replacing them.
Download

Paper Nr: 40
Title:

Energy-Aware Node Selection for Cloud-Based Parallel Workloads with Machine Learning and Infrastructure as Code

Authors:

Denis B. Citadin, Fábio Diniz Rossi, Marcelo C. Luizelli, Philippe O. A. Navaux and Arthur F. Lorenzon

Abstract: Cloud computing has become essential for executing high-performance computing (HPC) workloads due to its on-demand resource provisioning and customization advantages. However, energy efficiency challenges persist, as performance gains from thread-level parallelism (TLP) often come with increased energy consumption. To address the challenging task of optimizing the balance between performance and energy consumption, we propose SmartNodeTuner. It is a framework that leverages artificial intelligence and Infrastructure as Code (Iac) to optimize performance-energy trade-offs in cloud environments and provide seamless infrastructure management. SmartNodeTuner is split into two main modules: a BuiltModel Engine leveraging an artificial neural network (ANN) model trained to predict optimal TLP and node configurations; and AutoDeploy Engine using IaC with Terraform to automate the deployment and resource allocation, reducing manual efforts and ensuring efficient infrastructure management. Using ten well-known parallel workloads, we validate SmartNodeTuner on a private cloud cluster with diverse architectures. It achieves a 38.2% improvement in the Energy-Delay Product (EDP) compared to Kubernetes’ default scheduler and consistently predicts near-optimal configurations. Our results also demonstrate significant energy savings with negligible performance degradation, highlighting SmartNodeTuner ’s effectiveness in optimizing resource use in heterogeneous cloud environments.
Download

Short Papers
Paper Nr: 24
Title:

Infrastructure as Code: Technology Review and Research Challenges

Authors:

Claus Pahl, Niyazi Gokberk Gunduz, Övgüm Can Sezen, Ali Ghamgosar and Nabil El Ioini

Abstract: The quality of software management in infrastructure operations for application software is important as automation in software operations continues to grow. Infrastructure as Code (IaC) refers to a systematic, technology-supported approach to manage deployment infrastructure for software applications. Sample contexts are general software automation, but also cloud and edge and various software-defined networking applications. DevOps (development and operations) practices, which are already applied in the Infrastructure as Code (IaC) context, need to be extended to cover the whole IaC life cycle from code generation to dynamic, automated control. The ultimate objective would range from IaC generation to full self-adaptation of IaC code in an automated setting. We review available IaC technologies based on a comprehensive comparison framework to capture the state-of-the-art. We also introduce an IaC-specific DevOps process. This serves as a basis to identify open research challenges. A discussion of defect categories is at the centre of this process.
Download

Paper Nr: 29
Title:

Idempotency in Service Mesh: For Resiliency of Fog-Native Applications in Multi-Domain Edge-to-Cloud Ecosystems

Authors:

Matthew Whitaker, Bruno Volckaert and Mays Al-Naday

Abstract: Resilient operation of cloud-native applications is a critical requirement to service continuity, and to fostering trust in the cloud paradigm. So far, service meshes have been offering resiliency to a subset of failures. But, they fall short in achieving idempotency for HTTP POST requests. In fact, their current resiliency measures may escalate the impact of a POST request failure. Besides, the current tight control over failures - within central clouds - is being threatened by the growing distribution of applications across heterogeneous clouds. Namely, in moving towards a fog-native paradigm of applications. This renders achieving both idempotency and request satisfaction for POST microservices a non-trivial challenge. To address this challenge, we propose a novel, two-pattern, resiliency solution: Idempotency and Completer. The first is an idempotency management system that enables safe retries, following transient network/infrastructure failure. While the second is a FaaS-based completer system that enables automated resolution of microservice functional failures. This is realised through systematic integration and application of developer-defined error solvers. The proposed solution has been implemented as a fog-native service, and integrated over example service mesh Consul. The solution is evaluated experimentally, and results show considerable improvement in user satisfaction, including 100% request completion rate. The results further illustrate the scalability of the solution and benefit in closing the current gap in service mesh systems.
Download

Paper Nr: 50
Title:

Hybrid Root Cause Analysis for Partially Observable Microservices Based on Architecture Profiling

Authors:

Isidora Erakovic and Claus Pahl

Abstract: Managing and diagnosing faults in microservices architectures is a challenge. Solutions such as anomaly detection and root cause analysis (RCA) can help, as anomalies often indicate underlying problems that can lead to system failures. This investigation provides an integrated solution that extracts microservice architecture knowledge, detects anomalies, and identifies their root causes. Our approach combines the use of latency thresholds with other techniques to learn the normal behavior of the system and detect deviations that point to faults. Once deviations are identified, a hybrid RCA method is applied that integrates empirical data analysis with an understanding of the system’s architecture to accurately trace the root causes of these anomalies. The solution was validated using trace log data from an Internet Service Provider’s (ISP) microservices system.
Download

Paper Nr: 20
Title:

Anomaly Detection for Partially Observable Container Systems Based on Architecture Profiling

Authors:

Isidora Erakovic and Claus Pahl

Abstract: Managing and diagnosing faults in microservices architectures is a challenge, especially in a service provider environment that hosts third-party services. Solutions such as anomaly detection can help, as anomalies often indicate underlying problems that can lead to system failures. We develop an integrated solution that extracts microservice architecture knowledge and detects anomalies using the architecture knowledge to provide context for these anomalies. Our approach combines the use of latency thresholds with temporal distribution of latency anomalies to determine normal behavior of a system and detect deviations that point to faults. The solution proposed was validated using data from an Internet Service Provider’s microservices system. We were able to identify critical components as key points of failure during fault conditions. The combined use of architecture mining and anomaly detection enabled us to analyse anomalies in depth.
Download