Publications

Authors:Somnath Mazumdar, Daniel Seybold, Kyriakos Kritikos, Yiannis Verginadis

Published:Journal of Big Data 2019

Abstract:Currently, the data to be explored and exploited by computing systems increases at an exponential rate. The massive amount of data or so called “Big Data” put pressure on existing technologies for providing scalable, fast and efficient support. Recent applications and the current user support from multi-domain computing, assisted in migrating from data-centric
to knowledge-centric computing. However, it remains a challenge to optimally store and place or migrate such huge data sets across data centers (DCs). In particular, due to the frequent change of application and DC behaviour (i.e., resources or latencies), data access or usage patterns need to be analyzed as well. Primarily, the main objective is to find a better data storage location that improves the overall data placement cost as well as the application performance (such as throughput). In this survey paper, we are providing a state of the art overview of Cloud-centric Big Data placement together with the data storage methodologies. It is an attempt to highlight the actual correlation between these two in terms of better supporting Big Data management.Our focus is on management aspects which are seen under the prism of nonfunctional properties. In the end, the readers can appreciate the deep analysis of respective technologies related to the management of Big Data and be guided towards their selection in the context of satisfying their non-functional application requirements. Furthermore, challenges are supplied highlighting the current gaps in Big Data management marking down the way it needs to evolve in the near future.

Author:Amir Taherkordi, Feroz Zahid, Yiannis Verginadis, Geir Horn

Published:IEEE Access 6: 74120-74150 (2018)

Abstract:Cloud computing has been recognized as the de facto utility computing standard for hosting and delivering services over the Internet. Cloud platforms are being rapidly adopted by business owners and end-users thanks to its many benefits to traditional computing models such as cost saving, scalability, unlimited storage, anytime anywhere access, better security, and high fault-tolerance capability. However, despite the fact that clouds offer huge opportunities and services to the industry, the landscape of cloud computing research is evolving for several reasons, such as emerging data-intensive applications, multicloud deployment models, and more strict non-functional requirements on cloud-based services. In this paper, we develop a comprehensive taxonomy of main cloud computing research areas, discuss state-of-the-art approaches for each area and the associated sub-areas, and highlight the challenges and future directions per research area. The survey framework, presented in this paper, provides useful insights and outlook for the cloud computing research and development, allows broader understanding of the design challenges of cloud computing, and sheds light on the future of this fast-growing utility computing paradigm.

Author:Dipesh Pradhan, Feroz Zahid

Published:http://voyager.ce.fit.ac.jp/conf/aina/2019/

Abstract:Geographically-distributed application deployments are critical for a variety of cloud applications, such as those employed in the Internet-of-Things (IoT), edge computing, and multimedia. However, selecting appropriate cloud data centers for the applications, from a large number available locations, is a difficult task. The users need to consider several different aspects in the data center selection, such as inter-data center network performance, data transfer costs, and the application requirements with respect to the network performance.

This paper proposes a data center clustering mechanism to group befitting cloud data centers together in order to automate data center selection task as governed by the application needs. Employing our clustering mechanism, we present four different types of clustering schemes, with different importance given to available bandwidth, latency, and cloud costs between pair of data centers. The proposed clustering schemes are evaluated using a large number of data centers from two major public clouds, Amazon Web Services, and Google Cloud Platform. The results, based on a comprehensive empirical evaluation of the quality of obtained clusters, show that the proposed clustering schemes are very effective in optimizing data center selection as per the application requirements.

Author:Katarzyna Materka, Geir Horn, Tomasz Przeździęk, Paweł Skrzypek

Published:http://voyager.ce.fit.ac.jp/conf/aina/2019/

Abstract:Cost savings is one of the main motivations for deploying commercial applications in the Cloud. These savings are more pronounced for applications with varying computational needs, like Computational Intelligence (CI) applications. However, continuously deploying, adapting, and decommissioning the provided Cloud resources manually is challenging, and autonomous deployment support is necessary. This paper discusses the specific challenges of CI applications and provide calculations to show that dynamic use of Cloud resources will result in significant cost benefits for CI applications.

Author:Thomas Dreibholz, Paweł Skrzypek, Geir Horn, Tomasz Przeździęk, Kasia Materka, Feroz Zahid, Nicoly Mohebi, Yannis Verginadis

Published:http://voyager.ce.fit.ac.jp/conf/aina/2019/

Abstract:The research undertakings in cloud computing often require designing new algorithms, techniques, and solutions requiring large-scale cloud deployments for comprehensive evaluation. Simulations make a powerful and cost-effective tool for testing, evaluation, and repeated experimentation for new cloud algorithms. Unfortunately, even though cloud federation and hybrid cloud simulations are explored in the literature, Cross-Cloud simulations are still largely an unsupported feature in most popular cloud simulation frameworks.
In this paper, we present a Cross-Cloud simulation framework, which makes it possible to test scheduling and reasoning algorithms on Cross-Cloud deployments with arbitrary workload. The support of Cross-Cloud simulations, where individual application components are allowed to be deployed on different cloud platforms, can be a valuable asset in selecting appropriate mixture of cloud services for the applications. We also implement a Cross-Cloud aware reasoner using our Cross-Cloud simulation framework. Simulations using both simple applications and complex multi-stage workflows show that the Cross-Cloud aware reasoner can substantially save cloud usage costs for most multi-component cloud applications.

 

Author:Thomas Dreibholz, Paweł Skrzypek, Geir Horn, Tomasz Przeździęk, Kasia Materka, Feroz Zahid, Nicoly Mohebi, Yannis Verginadis

Published:http://voyager.ce.fit.ac.jp/conf/aina/2019/

Abstract:The research undertakings in cloud computing often require designing new algorithms, techniques, and solutions requiring large-scale cloud deployments for comprehensive evaluation. Simulations make a powerful and cost-effective tool for testing, evaluation, and repeated experimentation for new cloud algorithms. Unfortunately, even though cloud federation and hybrid cloud simulations are explored in the literature, Cross-Cloud simulations are still largely an unsupported feature in most popular cloud simulation frameworks.
In this paper, we present a Cross-Cloud simulation framework, which makes it possible to test scheduling and reasoning algorithms on Cross-Cloud deployments with arbitrary workload. The support of Cross-Cloud simulations, where individual application components are allowed to be deployed on different cloud platforms, can be a valuable asset in selecting appropriate mixture of cloud services for the applications. We also implement a Cross-Cloud aware reasoner using our Cross-Cloud simulation framework. Simulations using both simple applications and complex multi-stage workflows show that the Cross-Cloud aware reasoner can substantially save cloud usage costs for most multi-component cloud applications.

 

Author:Daniel Seybold, Jörg Domaschka

Published:https://icpe2019.spec.org/

Abstract:Big Data and IoT applications require highly-scalable database management system (DBMS), preferably operated in the cloud to ensure scalability also on the resource level. As the number of existing distributed DBMS is extensive, the selection and operation of a distributed DBMS in the cloud is a challenging task. While DBMS benchmarking is a supportive approach, existing frameworks do not cope with the runtime constraints of distributed DBMS and the volatility of cloud environments. Hence, DBMS evaluation frameworks need to consider DBMS runtime and cloud resource constraints to enable portable and reproducible results. In this paper we present Mowgli, a novel evaluation framework that enables the evaluation of non-functional DBMS features in correlation with DBMS runtime and cloud resource constraints. Mowgli fully automates the
execution of cloud and DBMS agnostic evaluation scenarios, including DBMS cluster adaptations. The evaluation of Mowgli is based on two IoT-driven scenarios, comprising the DBMSs Apache Cassandra and Couchbase, nine DBMS runtime configurations, two cloud providers with two different storage backends. Mowgli automates the execution of the resulting 102 evaluation scenarios, verifying its support for portable and reproducible DBMS evaluations. The results provide extensive insights into the DBMS scalability and the impact of different cloud resources. The significance of the results is validated by the correlation with existing DBMS evaluation results.

Authors:Daniel Baur, F. Griesinger, Yiannis Verginadis, Vasilis Stefanidis, Ioannis Patiniotakis

Abstract:Cloud computing and its computing as an utility paradigm offers on-demand resources, enabling its users to seamlessly adapt applications to the current demand. With its (virtually) unlimited elasticity, managing deployed applications becomes more and more complex raising the need for automation. Such autonomous systems leverage the importance to constantly monitor and analyse the deployed workload and the underlying infrastructure serving as knowledge-base for deriving corrective actions like scaling. Existing monitoring solutions, however are not designed to cope with a frequently changing topology. We propose a monitoring and event processing framework following a model-driven approach, that allows users to express i) the monitoring demand by directly referencing entities of the deployment context, ii) aggregate the monitoring data using mathematical expressions, iii) trigger and process events based on the monitoring data and finally iv) attach scalability rule to those events. We accompany the modelling language with a monitoring orchestration and distributed complex event processing framework, capable of enacting the model in a frequently changing multi-cloud infrastructure, considering cloud-specific aspects like communication costs.

Authors:Kyriakos Kritikos, Geir Horn

Abstract:Cloud computing is a paradigm that has revolutionized the way service-based applications are developed and provisioned due to the main benefits that it introduces, including more flexible pricing and resource management. The most widely used kind of cloud service is the Infrastructure-as-a-Service (IaaS) one. In this service kind, an infrastructure in the form of a VM is offered over which users can create the suitable environment for provisioning their application components. By following the micro-service paradigm, not just one but multiple cloud services are required to provision an application. This leads to requiring to solve an optimisation problem for selecting the right IaaS services according to the user requirements. The current techniques employed to solve this problem are either exhaustive, so not scalable, or adopt heuristics, sacrificing optimality with a reduced solving time. In this respect, this paper proposes a novel technique which involves the modelling of an optimisation problem in a different form than the most common one. In particular, this form enables the use of exhaustive techniques, like constraint programming (CP), such that both an optimal solution is delivered in a much more scalable manner. The main benefits of this technique are highlighted through conducting an experimental evaluation against a classical CP-based exhaustive approach.

Author:Daniel Seybold

Published:https://sites.google.com/view/cbdp18/

Abstract:Containers emerged as cloud resource offerings. While the advantages of containers, such as easing the application deployment, orchestration and adaptation, work well for stateless applications, the feasibility of containerization of stateful applications, such as database management system (DBMS), still remains unclear due to potential performance overhead. The myriad of container operation models and storage backends even raises the complexity of operating a containerized DBMS. Here, we present an extensible evaluation methodology to identify performance overhead of a containerized DBMS by combining three operational models and two storage backends. For each combination a memory-bound and disk-bound workload is applied. The results show a clear performance overhead for containerized DBMS on top of virtual machines (VMs) compared to physical resources. Further, a containerized DBMS on top of VMs with different storage backends results in a tolerable performance overhead. Building upon these baseline results, we derive a set of open evaluation challenges for containerized DBMSs.

Author:Kyriakos Kritikos

Conference:CLOSER 2019 (9th International Conference on Cloud Computing and Service Science)

Abstract:Currently, there is a move towards adopting multi-clouds due to their main benefits, including vendor lock-in avoidance and optimal application realisation via different cloud services. However, such multi-cloud applications face a new challenge related to the dynamicity and uncertainty that even a single cloud environment exhibits. As such, they cannot deliver a suitable service level to their customers, resulting in SLA penalty costs and application provider reputation reduction. To this end, we have previously proposed a cross-level and multi-cloud application adaptation architecture. Towards realising this architecture, this paper proposes two extensions of the CAMEL language allowing to specify advanced adaptation rules and histories. Such extensions not only enable to cover cross-level application adaptation by executing adaptation workflows but also to progress such an adaptation to address both the application and exploited cloud services evolution.

Authors:Feroz Zahid, Amir Taherkordi, Ernst Gunnar Gran, Tor Skeie, Bjørn Dag Johnsen

Published:IEEE Transactions on Parallel and Distributed Systems (Volume: 29, Issue: 12, Dec. 1 2018)

AbstractClouds offer flexible and economically attractive compute and storage solutions for enterprises. However, the effectiveness of cloud computing for high-performance computing (HPC) systems still remains questionable. When clouds are deployed on lossless interconnection networks, like InfiniBand (IB), challenges related to load-balancing, low-overhead virtualization, and performance isolation hinder full potential utilization of the underlying interconnect. Moreover, cloud data centers incorporate a highly dynamic environment rendering static network reconfigurations, typically used in IB systems, infeasible. In this paper, we present a framework for a self-adaptive network architecture for HPC clouds based on lossless interconnection networks, demonstrated by means of our implemented IB prototype. Our solution, based on a feedback control and optimization loop, enables the lossless HPC network to dynamically adapt to the varying traffic patterns, current resource availability, workload distributions, and also in accordance with the service provider-defined policies. Furthermore, we present IBAdapt, a simplified ruled-based language for the service providers to specify adaptation strategies used by the framework. Our developed self-adaptive IB network prototype is demonstrated using state-of-the-art industry software. The results obtained on a test cluster demonstrate the feasibility and effectiveness of the framework when it comes to improving Quality-of-Service compliance in HPC clouds.

Author:Daniel Seybold

Published:2018 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC)

Abstract:With the rapid rise of the cloud computing paradigm, the manual maintenance and provisioning of the technological layers behind it, both in their hardware and virtualized form, became cumbersome and error-prone. This has opened up the need for automated capacity planning strategies in heterogeneous cloud computing environments. However, even with mechanisms to fully accommodate customers and fulfill service-level agreements, providers often tend to over-provision their hardware and virtual resources. A proliferation of unused capacity leads to higher energy costs, and correspondingly, the price for cloud technology services. Capacity planning algorithms rely on data collected from the utilized resources. Yet, the amount of data aggregated through the monitoring of hardware and virtual instances does not allow for a manual supervision, much less data analysis or a correlation and anomaly detection. Current data science advancements enable the assistance of efficient automation, scheduling and provisioning of cloud computing resources based on supervised and unsupervised machine learning techniques. In this work, we present the current state of the art in monitoring, storage, analysis and adaptation approaches for the data produced by cloud computing environments, to enable proactive, dynamic resource provisioning.

Author:Kyriakos Kritikos

Published:2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)

Abstract:Nowadays, data are being produced at a very fast pace. This leads to the generation of big data that need to be properly managed, especially due to the increased complexity that their size introduces. Such data are usually subject to further processing to obtain added-value knowledge out of them. Current systems seem to focus more on how to more optimally perform this processing while they neglect that data placement can have a tremendous effect on the processing performance. In this respect, big data placement algorithms have been already proposed. However, most of them are either suggested in isolation to the big data processing system or are not dynamic to deal with required big data placement changes at runtime. As such, this paper proposes a novel, dynamic big data placement algorithm which can more optimally find the best placement solution by considering multiple optimisation objectives and solving in a more precise manner the big data placement problem with respect to the state-of-the-art. Further, a novel suggestion for optimally combining such an algorithm with a big data application management system is proposed so as to have the ability to address in conjunction both big data placement, processing and resource management issues. Respective experimental evaluation results showcase the efficiency of our algorithm in producing optimal big data placement solutions

Author:Yiannis Verginadis, Vasilis Stefanidis, Ioanis Patiniotakis, Gregoris Mentzas

Published:http://esocc2018.disco.unimib.it/
Abstract:The last few years, the generation of vast amounts of heterogeneous data with different velocity and veracity and the requirement to process them, has significantly challenged the computational capacity and efficiency of the modern infrastructural resources. The propagation of Big Data among different processing and storage architectures, has amplified the need for adequate and cost-efficient infrastructures to host them. An overabundance of cloud service offerings is currently available and is being rapidly adopted by small and medium enterprises based on its many benefits to traditional computing models. However, at the same time the Big Data computing requirements pose new research challenges that question the adoption of single cloud provider resources. Nowadays, we discuss the emerging data-intensive applications that necessitate the wide adoption of multicloud deployment models, in order to use all the advantages of cloud computing. A key tool for managing such multicloud applications and guarantying their quality of service, even in extreme scenarios of workload fluctuations, are adequate distributed monitoring mechanisms. In this work, we discuss a distributed complex event processing architecture that follows automatically the big data application deployment in order to efficiently monitor its health status and detect reconfiguration opportunities. This proposal is examined against an illustrative scenario and is preliminary evaluated for revealing its performance results.

Author:Yiannis Verginadis

Published:http://inista.org/

Abstract:Cloud computing has been recognized as the most prominent way for hosting and delivering services over the Internet. A plethora of cloud service offerings are currently available and are being rapidly adopted by small and medium enterprises but also by larger organisations based on their many superiorities to traditional computing models. However, at the same time the computing requirements of the modern cloud application has been exponentially increased due to the available big data for processing. Nowadays, we discuss the emerging data-intensive applications that necessitate the wide adoption of multi-cloud deployment models, in order to use all the advantages of cloud computing without any restrictions with respect to who is providing infrastructural services. In this paper, we discuss a Metadata Schema for data-aware multi-cloud computing which aspires to form the appropriate background vocabulary that will aid the big data-aware application deployment for distributed and loosely-coupled multi-cloud applications.

Authors:Geir Horn, Paweł Skrzypek

Published:2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA)

Abstract:Cross Cloud deployment of applications allows for many additional benefits, like using the best Cloud provider for a given application component, increasing the reliability owing to the diversification of Cloud providers, and providing additional elasticity and capacity. On the other side, in practical applications, it is currently very difficult to properly plan and optimise the architecture of the application for cross Cloud deployment. Different Cloud providers uses different types of infrastructure, making direct comparisons difficult. Additionally, the requirements of the application could change over time and according to the application’s execution context, workload, users, and many other aspects. This paper presents the fundamentals of the MELODIC solution based on a high level model of the application and dynamic, Cloud provider agnostic optimised deployment and reconfiguration of the application.

Authors:Daniel Baur, Daniel Seybold, Frank Griesinger, Hynek Masata, Jörg Domaschka

Published: http://2018.middleware-conference.org/index.php/call-for-doctoral-symposium-papers//

Abstract: Orchestrating workloads in a multi-cloud environment is a challenging task, as one needs to overcome vendor lock-in and select a matching offer from a large and heterogenous market. Yet, existing cloud management tools rely on provider dependent models and manual selection, making runtime changes to the selection in case of provider failures impossible.
We propose a provider agnostic, workload centric approach to multi-cloud orchestration relying on a constraint language that allows automatic selection and runtime management of cloud resources overcoming e.g. provider failures.

Author:Daniel Seybold

Published:http://2017.middleware-conference.org/

Abstract:The selection and operation of a distributed database management system (DDBMS) in the cloud is a challenging task as supportive evaluation frameworks miss orchestrated evaluation scenarios, hindering comparable and reproducible evaluations for heterogeneous cloud resources. We propose a novel evaluation approach that supports orchestrated evaluation scenarios for scalability, elasticity and availability by exploiting cloud resources. We highlight the challenges in evaluating DDBMSs in the cloud and introduce a cloud-centric framework for orchestrated DDBMS evaluation, enabling reproducible evaluations and significant rating indices.