Open Access

Energy efficiency in big data complex systems: a comprehensive survey of modern energy saving techniques

Complex Adaptive Systems Modeling20153:6

DOI: 10.1186/s40294-015-0012-5

Received: 7 July 2015

Accepted: 3 December 2015

Published: 15 December 2015

Abstract

The growing need of computation and processing has led to the generation of data centers. These data centers are usually comprised of hundreds of thousands of servers and other components. This complicated arrangement of the systems lead to the adoption of complex systems. Complex systems prevail in our society as combination of lots of entities, e.g., immune system, human brain and ecosystems. The adoption and interaction of the entities is possible through nonlinear interactions. The interaction between the components of the complex system is carried out in distributed fashion. Big data which is comprised of thousands of machines is also considered to be a form of complex adaptive systems which makes use of large entities, components and nonlinear interactions with each other. The development of such a complex systems raises certain challenges. Apart from management, energy is the most concerned one which is the core discussion of this research. This paper, surveys the state of the art on modern tools, techniques, architectures and algorithms which has been proposed and deployed to achieve energy efficiency in big data over the period of 2007–2015. We group existing approaches aimed at achieving energy efficiency in the complex paradigm of big data. In this categorization, we aim to provide an easy and concise view of the underlined model adapted by each approach in the context of big data.

Keywords

Energy efficiency Big data Complex system Data centers Adoption Immune system

Introduction

Due to the advancement in computer technology, the computer systems have become widespread and complex. This complex arrangement of the system results into a complex adaptive system (CAS). A huge increase in the scale and complexity of the systems have been observed (Niazi and Laghari 2012) since from last few years. CAS exists in different forms. The management of such systems is carried out with efficient algorithms and is only controlled by different computational methods (Batool and Niazit 2015). Big data is one of the examples of such complex systems. CAS is comprised of large entities and components which require interaction and adoption in performing certain operations. The nature of the big data is more or less the same as complex systems. In other words, big data is one of the forms of the complex systems. Most of the CASs are tied to the concept of our society, i.e., immune system, human brain (neuron structure), ecosystem and human societies.

Big data is providing numerous services and infrastructures to the companies and has opened new research directions in the field of computer science. Most of the current applications of the cloud computing uses distributed computing with varying degree of connectivity and interaction. Big data is providing computation and efficient processing to millions of users which has same complexity level just like CAS (Habiba et al. 2014). Apart from the complexity of the CAS, achieving energy efficiency in cloud computing and big data is a global challenge. There are plenty of methods which have been proposed by researchers to reduce power consumption in cloud and big data infrastructure. Most of the solution, proposes the powering off unused components. Other solutions are focused on optimal distribution of the data among different components (Negru et al. 2013). Cloud computing provides numerous services to the users but poses certain challenges because of its complex nature. The devices used in the cloud are so large in number that complexity of such system is even more complex than human brain structure. However, apart from complexity, clouds also come across certain challenges like security and privacy of the data. Big data allow users to host data, access data and process data at any time. The volume of data is increasing with gigantic amount day by day, and no doubt the era of big data has arrived. Big data requires different management techniques to help communities (e.g., users) in performing their tasks quickly and efficiently. CAS helps in modeling user behaviors, which helps cloud provider to manage users efficiently. In order to have an energy efficient cloud infrastructure, we must understand the interaction between different components of the complex systems that consume power in order to meet energy requirements estimating power and performance trade off.

Volume of data is increasing with amazing speed, i.e., 90 % of data available on big data is created in just last 2 years (Habiba et al. 2014). Facebook is also popular and processing data at high speed nearly 500 terabytes of data daily (Kumar et al. 2014). Large hadrons collider (LHC) computing grid is also contributing in vast data generation. Dozens of petabytes of data is produced daily and dissemination, transmission and processing is subject to consumption of huge amount of energy (Shuja et al. 2012). However, these data generators do not address how energy will be saved and used wisely to meet this ever increasing need of data. GreenHadoop is contributing toward energy efficiency using solar energy (Menon 2012). However, it comes across bottleneck when weather is cloudy for many days. Hadoop also uses different techniques, like map reduce which deals with how effectively a query will be answered it has no concern with energy efficiency.

Big data is helping different companies in solving business problems with ease. Big data is utilizing hardware, software, algorithms and many related techniques to perform desired function, and utilizes standardized approaches to help users in performing their tasks with ease. Big data has always helped users by assuring that desired data is always accessible. However, the systems, servers, components and subsystems which are facilitating users, consumes enormous amount of energy. Big data is also servicing user with its unique features and at the same time it’s facing variety of challenges. When we talk about volume of data, firstly, there might be an issue of data storage and secondly, privacy or integrity of the data is also a major concern. Users might be affected by viruses, Trojan horse and hackers. Another feature of big data is that information and data is always accessible to the user. On the other hand, user can come across the situation when data is not accessible due to poor network connection. Keeping in mind all issues, energy is another important concern which needs to be addressed. Due to increase in technology trends and growth in wired, wireless and mobile devices network, energy consumption has increased a lot. The increase in energy consumption has led to a huge demand for tools/techniques which could manage this growing demand of energy.

Because of increase in the volume of data, more resources are required to hold data. Similarly, more energy is required as well to stabilize them (Ekanayake et al. 2008). However, there exists no such technique that can efficiently address all energy consumption issues. Researchers and scientists are developing different techniques which aim to minimize energy consumption in big data. Energy consumption has always special concerns in cloud computing data centers where thousands of computers, servers, routers, switches and bridges are operating and consuming thousands kilowatt of energy. Stakeholders of cloud computing are thinking of efficient energy algorithms which reduce cost of energy (Goiri 2012).

Although there exist many surveys on the energy efficiency in big data but the existing research does not provide a thorough insight of energy efficiency in the context of big data and CASs. Our unique contribution is to provide energy utilization methods, techniques and algorithms for CAS. In this paper, we provide comprehensive evaluation of existing techniques in the form of tables (i.e., Tables 2, 3, 4, 5, 6, 7), we provide extension and expansion of existing taxonomy of hardware based energy efficiency techniques as expressed in Fig. 5. We estimate energy consumption per server class for year 2007 and onward in Table 2. We provide component based taxonomy of energy efficient techniques in Table 1. We examine big data in the context of complex adaptive systems and overview variety of services provided by cloud provider, challenges faced by cloud provider. We further identify hardware and software based techniques and approaches used for overcoming the energy demands of the cloud and outline different techniques and characteristics in tabular form. We survey latest literature over the period of 2007–2015. Finally, we present our findings about one of the best techniques for energy efficiency which has some limitations but is considered comparatively a better technique.
Table 1

Energy efficiency techniques, component wise taxonomy

Name of components

Techniques

Processor

Speed Step

Power now

Cool & Quiet

Demand based Switching

DVS (Dynamic voltage scaling, i.e., EDP, AGGREE, A2E etc.)

Disks

Solid states disks

System

Low power Mode of operation

Hardware level (CPU sleep/doze/busy LCD on/off)

Software Level (Time Driven Sampling, Energy Driven sampling)

Soft watt built upon Sim (OS) System Simulator

Clustered system

CVS (co-ordinate voltage Scaling)

Approaches

DPM (Dynamic Power Management)

SPM (Static Power Management)

Virtualization

Networking

Energy efficient and clustering

power based query optimization

Sampling

The remainder of the paper is organized as follows: “Background” describes the background of the big data services, key challenges of big data and overview of the energy efficient techniques. “Critical analysis of existing surveys” provides the critical analysis of existing surveys. “Energy efficient techniques” details different techniques used in big data. We also provide the evaluation of each technique against certain parameters in the context of big data in this section. We provide our summary and findings in “Summary and findings”. Some open issues with DVFS are elaborated in “Open issues”. The paper is concluded in “Conclusion and future work” where the future directions are also elaborated.

Background

Efficient energy consumption has remained a concern for researchers and experts because too much energy consumption also results in depletion of natural resources, which in turn increase pollution and cause health hazards. According to a survey (Goiri 2012), there is 6 % increase in CO2 emission from information technology (IT) sector which is also a great hazard for human health. In recent years, various organizations like IBM, Google and Microsoft have developed data centers in which thousands of machines are running and consuming large amount of energy. In order to cope up with this challenge of energy, different techniques are developed which minimize energy consumption in data centers.

Dealing with energy efficiency is necessary, otherwise, in coming few years cost of energy will increase from the cost of hardware. In order to deal with this issue, different software and hardware based techniques have been proposed and deployed in data centers (Bosilca et al. 2012). Energy consumed in big datacenter is computed by determining how much energy is consumed by each device when its operating. Efficient utilization of energy has drawn much more attention from cost and environment perspectives (Chen et al. 2014). When lots of machines are operating in the cloud infrastructure, this results in emission of CO2. The use of the Internet, exchange of data over Internet, and the processing and analytical demand result in lots of energy consumption. Therefore, power consumption methodology, control, check and balance of power resources are necessary along with the expendability and accessibility of big data.

Different models have also been proposed for energy efficiency but each comes across with different bottlenecks because of service level and configuration changes (Krauth 2006). However, this issue is resolved to very much extent by modern service providing companies. It is believed that every algorithm has some pitfalls. If we talk about resource usage, control on carbon emission and policies specific domain are really challenging to build one common solution for all. Some new techniques, i.e., virtualization, sampling are also contributing towards energy efficiency like map reduce and intelligent power saving architecture (ISPA) (Meisner et al. 2009; Moise and Carpen-Amarie 2012; Ibrahim et al. 2009; Jiang et al. 2010). Big data Services are numerous which are supporting companies functioning rigorously. Big data is helping users to perform their tasks with its unique quality of services. Big data support networking services which has helped companies to develop CRM (Customer Relationship Management) and extend services to the user with the help of remote access and without time constraint.

We briefly and precisely present the overview of big data services, big data challenges and critical review of different energy efficiency techniques in the context of CAS in the following sections.

Big data services

Big data is serving organizations in different operations including management which is one of the key operation performed by any company weather its large or small. Big data is providing services of storage e.g., we can use provider storage for storing some important data, i.e., documents, employees details, business plans, strategies, logs data, top secret documents and dumps of memory. Big data providers are generating profit by providing infrastructure, services and products to the users. Overall 15 % budget of most popular companies is going to cloud providers. Big data has possessed different challenges related to services and configuration. Researchers has opposed and expressed valuable contribution in this field. In near future, big data will extend its services to the other domains as well, i.e., security analytics, developing visualizations, better understanding of threats domain, performance enhancement of engines, monitoring, managed service providers and vulnerability assessment etc. are future market as well, and most of the companies are in need of all these products.

Big data is providing custom software and providing execution environment which is good in term of saving personal resources. Big data is providing CIA (Confidentiality, Integrity, and Authenticity) triad which makes it trustworthy. Big data is always concerned with integrity of the data which is a key of building trust worthy relationship between clients and service providers. Big data has opened forums on which everyone can express their feeling and can share their experiences. Big data also maintains individual’s integrity by allowing user to test different proofs on the stored data. All recent applications are of distributed natures which are equally accessible to everyone which is only possible through the presence of big data.

Big data allow users to debug and install services in cloud. Big data has also provided help in medical field, by keeping data consistent and available to doctors every time. Complex adaptive system provides modeling and simulations, to manage the growing need of users demand in term of services. Complex adaptive systems provide good source of understanding the complex nature of different components and their functions. Few unique services provided by big data are summarized in Fig. 1.
Fig. 1

Overview of big data services

Big data is providing services in all fields including management, engineering and networking. Big data has also contributed towards software development. Some softwares are only used in clouds and they are executed in infrastructure provided by cloud (Fig. 1). One of the unique features of the big data is data storage which is useful in maintaining data for long term without being stolen or loss. Big data provides roadmap for many operations which are not efficiently managed through manual management. Big data is providing intelligence capabilities, analytics and solution which are the best services provided by the big data. Big data is contributing in social networking as well. It helps in creating forum for which people are providing efficient solutions for any problem and working voluntarily for any cause. Big data has also opened different research directions as well for researchers to explore common trends, and provide some solutions to those problems. The solutions and understanding of the trend are helpful for the success of the big data.

Big data challenges

With popularity of cloud it comes with certain challenges, which required a handsome solution in order to contribute in clouds popularity. Some of the common challenges in the context of big data are listed below.

Hacking

Hacking is the most common challenge of the big data. Hacking term means obtaining, copying, reading or viewing confidential information of someone which is not intended. Hacking is one of the global challenges and lots of organizations are spending millions of dollars to secure infrastructure from hacking. Most of the big data services are using cryptography to avoid such breaches and to deliver desired level functionality to users.

Unauthorized disclosure of data/information

Apart from hacking this is carried out with some sophisticated tool, transfer of most important documents/information over insecure network need protection during exchange. The unauthorized disclosure of information because of weak network protocol or misuse of computer results in loss of reputation and trust. In order to secure communication and computers resources, a new field of network and computer forensics has emerged to overcome this global challenge of information security.

Performance bottleneck

With the popularity of the cloud computing, one of the key challenges faced by cloud provider is to provide services as intended by users. This is not an easy task to manage services with user demands. One of the key demands of the user is better accessibility to the services at any time. When system does not perform as expected is referred as performance bottleneck. Performance bottleneck usually happens because of increase in user’s strength, lots of request to single server or lots of processing carried out at server. However, cloud provider has managed this challenge by deploying lots of dedicated servers, completely benefiting from web services, providing replicas of information across different servers and middleware.

Scalability

Scalability refers to the state of the big data and cloud when there is an expected increase in number of resources or users. This is also a big challenge for cloud provider to ensure that system must always be available at any time. This could only be achieved when system resources are intelligently managed with workload. To ensure, that our service is available when millions of users are visiting our website or accessing resources from cloud. These global challenges require changes in system, application software and concurrent execution of the tasks.

Concurrency

Ensuring concurrent execution of the processes is one of the key challenges of the big data. Concurrency is allowing several processes to execute simultaneously using shared resources. Here, key challenge for cloud provider is to ensure no deadlock when several processes are executing. In rennet times, concurrency can be ensured with the help of synchronization.

Fault tolerance

Apart from hacking, unauthorized disclosure of information, scalability, transparency and concurrency, fault tolerance is also a global challenge which exists in big data. Fault tolerance leads to the detection of the failure in a predictable way and stop functioning of malfunctioned components for short period of time but with minimum loss of data. This requires the efficient management of the resources and control mechanism to properly acquire and release of resources.

Availability

Availability is one of the key features of the big data demanded by most of the users. This is the most common need that each system must be accessible from everywhere at any time. Its mandatory for cloud provider to make sure that at least one instance of machine is available each time, for example, if one machine goes down, the system should work with the reduced amount of resources. Critical resources must be replicated in order to provide desired level of services. Availability is hard to achieve in the sense that each replica must have consistent data over a specified period.

Energy efficiency

One of the potential issues which is addressed in this paper is energy efficiency. Different techniques for this challenge are summarized in Fig. 2. Some of these techniques are hardware based while others are software based. Energy efficiency is critical to keep cloud’s devices active and easily available. Meanwhile, in reliance to complex systems data is retrieved with the interaction of lots of components and other devices. Different activities in the complex system range from modeling, simulation, analysis and visualization of networks related to cloud. Efficient modeling and simulation of the CAS, reduces energy demand and increase the lifeline of the network which in turn increases network capabilities to handle user in energy efficient way. Therefore, it is necessary to achieve energy efficiency in CAS with small economical investment.
Fig. 2

Overview of energy efficiency approaches

Critical overview of energy efficient techniques

Figure 2, summarizes different solutions for energy efficiency in big data. Different services and application infrastructures provided by big data are correlating with power consumption, because power consumption is almost application specific (Chen et al. 2012). However, the exact measures of energy are not possible because of energy differences in different applications.

There are various techniques which are used in data centers to ensure efficient utilization of the energy. Data centers are designed for the prime objective of improved performance in term of services and throughput with minimum energy consumption. Different experiments have been conducted on Hadoop using map reduce, and various tradeoff has been made. In literature, wide range of input and output devices have been studied which support efficient utilization of energy (Ibrahim et al. 2009). However, if data is distributed across various location then map reduce can take lot of time in response of a query, which in results in consuming more energy. There are different techniques which are reviewed in this paper, the prime objective of each technique is to provide energy efficiency without performance loss. Furthermore, we also evaluate the performance of each technique with the help of certain parameters.

DPM (Dynamic Power Management), DVFS (Dynamic Voltage frequency scaling), Virtualization, apart from these three, few network related approaches (i.e., sampling) are also available which support energy efficiency. There exist a set of various policies which uses these approaches effectively to meet the growing need of energy consumption. However, these are not the only techniques that are meeting this global challenge but this paper mainly focuses on detailed comparison of the following techniques.

DPM

DPM is also called APM (Automatic power Manager) in which devices are automatically powered off depending upon load. If load is low then some unused devices will be turned off automatically. This is one of the best techniques that manages load automatically and is implemented in both hardware and software. This technique is mostly implemented in big data which has metrics for keeping track of devices (Niaziand and Laghari 2012). However, bottleneck is there when metrics are not updated in real time, and if some intelligent algorithms are not implemented, misuse of resources can occur which in result cause maximum use of energy.

DVFS

DVFS technique is another best energy efficient technique which expresses relation between voltage and frequency. The relationship between these two quantities is given as P = V 2 f (Dean and Ghemawat 2004). In this particular equation, P is representing the power consumed, while V is representing voltage, and f is representing frequency. By reducing voltage or frequency, power can be saved which is best way for determining energy efficiency in big data and it works in different states (Jiang et al. 2010). However, when to change frequency and under what circumstances, needs significant improvements.

Virtualization

Virtualization is another energy efficient technique which requires sharing of one physical device with multiple instances of multiple operating systems. Memory in virtualization and CPU are provisioned dynamically to maintain performance. Virtualization is used in most of the data centers to provide efficient use of energy. Virtualization also provides efficient resource utilization and provides management facilities.

Network approaches

In networking, several techniques have been proposed to reduce energy consumptions that growing needs of energy are fulfilled from available resources. Techniques at network level, helps to meet challenges of limited power (Dean and Ghemawat 2004). As networking resources involves extensive processing, some techniques like sampling are used to manage energy needs. In wireless sensor network, lifetime of a node depends upon energy level. Processing, clustering and routing are helpful to extend the life time of the nodes in the network.

Energy efficient clustering

Energy efficient clustering is one of the most important technique which is all about the behavior of the nodes. It reduces amount of energy required in inter communication and intra-communication between nodes and clustered architecture. It implements an algorithm which saves energy. This type of architecture is mostly used in wireless sensor network where in-network processing and efficient routing are used to meet energy need.

Sampling technique

Sampling technique is another energy efficient technique in networking. Power based query optimization is one of the most important techniques which to saves energy for different purposes. The increasing popularity of big data results in much power consumption because of the variety of the services which is a challenge for researchers in coming few years. Due to growing needs of energy in big data centers, there is a need of efficient algorithms and techniques to cope up these challenges being cost effective. To be energy efficient, we need several software and hardware based techniques and different algorithms to meet future challenges of power consumption. Energy management has several dimensions which contribute to the energy efficient computing, while facilitating user with constant supply of energy. Proper resource management, climate protection and saving costs will help meet the shortage of energy which is going to be doubled by 2050 when our resources will not be efficient to meet the demand of energy consumption (Chenet 2010). However, efficient policies and betterment in above mentioned techniques can cope up with this challenge of energy deficiency. Good policy-making about power resources and development in energy sectors aim to support balancing between energy consumption and energy saving. However, sampling and its implementation possess different challenges like selecting appropriate samples and creating links between those samples.

Following section covers the overview of available energy efficient techniques, meanwhile, detailed discussion of each above mentioned technique and evaluation of each technique upon certain parameters will be covered in section IV. Most of the discussion will be about the technical perspectives and contribution toward energy efficiency.

Critical analysis of existing surveys

In this section, we discuss and analyze different solutions for energy efficiency in big data. These solutions exist in the form of techniques, architectures and policies. The aim is to provide review of different techniques which target to fulfill ever growing need of energy. The current need of energy is expected to be doubled by 2050 (Chenet 2010). The core concept of energy efficiency techniques with respect to computing resources is expressed in Table 1.

Energy needs proper utilization. Due to the recent advancements in computer technology it has become a challenge for the data providers to maintain balance between services and energy. Most of the data provider whose initial focus was on services, come across with power challenges. Software based techniques have helped in reducing energy needs with source code which is to be executed at some CPU which ultimately needs energy efficient hardware.

The proposed solutions are deployed on different components, i.e., CPU, disks, system, clustered system and networks. These components play vital role in energy management. Each component aims to reduce energy consumption by its configuration or operation. Apart from this taxonomy certain approaches like sampling also helps in achieving energy efficiency in complex network segments. Architectures also support the efficient management of energy. Complex system’s components also need proper management to contribute in energy saving with the help of implemented architectures and policies. Big data entities, nonlinear interactions and distributions require the implementation of both hardware and software based techniques to achieve energy efficiency.

The core of this paper is based on software, hardware and combined hardware and software based techniques, which are explored and new solutions are identified which results in achieving energy efficiency in the complex systems. These techniques will help big data centers to manage energy efficiency. DPM technique is one of the best power management techniques which comprises of two other techniques, software or load balancing technique. DPM is actually load balancing technique, which balances the load of energy between different components either CPUs or other components. CPU in a sense is the most energy consuming component of the computer which consumers 30–35 % of the entire energy when compared with other components (Dean and Ghemawat 2004). The evaluation of different energy saving techniques is carried out with the help of certain parameters. These parameters are adjusted according to the software and hardware configuration and operational management. DPM turns off the CPU when it’s idle or non-functioning (Dean and Ghemawat 2004). However, special intentions are required at sensitive places to deploy this approach. DPM techniques concerned with most energy absorption component, i.e., CPU. First it makes CPU energy efficient, and then focuses on memory modules because low power memory modules and CPU results in energy efficiency. IBM has developed certain techniques which are most energy efficient (Menon 2012) considering DPM concept.

SPM is best energy efficient technique which balances and manages workload. SPM approach is good for mobile devices and handheld devices which don’t require higher energies like big data. SPM is good for supercomputers and HPC (High performance computing).Virtualization has achieved better intentions by making use of physical device for multiple operating systems. Virtualization is used by most of the data centers for event management of the whole cloud. Virtualization makes best use of computer resources and makes hardware more productive.

We explore different techniques contributing toward energy efficiency. The categorization of these techniques is given in Fig. 3 which shows the relationship of static and dynamic power management techniques. Apart from these techniques, energy efficient chips also exist which support in energy saving. Cache RAMs also support energy efficiency by letting respective cache to be powered down and powered up by specific control register. Memory management unit (MMU) RAMs concept is also same like cache concept and also supports energy efficiency through successive operation.
Fig. 3

Energy efficiency techniques

Initially, the designers of the system focus on performance when they developed certain systems which are used today but lot of power consumption and electricity bills compel designers to rethink about energy efficient mechanism (Dean and Ghemawat 2004). Performance consideration poses many challenges and energy saving is one of big challenges by keeping services equally accessible by everyone. Most of the developers have considered this issue but no solution has been identified. Early researchers have developed different techniques which are providing certain advantages but still on the other hand, they are degrading either performance or response time. Hardware level techniques are of main importance which improves energy efficiency that result in less bills, excellent performance and long run benefits. Software level techniques have considerably better performance but yet bugs in the code and execution environment makes the deployment a bit technical. Most of the data centers deployed variety of software techniques in combination with hardware techniques to achieve energy efficiency. Energy consumption which was surveyed in previous years in server class (W/Unit) from 2000 to 2006 is summarized in Table 2 (Valentini et al. 2011b).
Table 2

Estimated energy consumption per server class

Server class (year)

Volume

Mid-range

High-end

2000

186

424

5534

2001

193

457

5832

2002

200

491

6130

2003

207

524

6428

2004

213

574

6973

2005

219

625

7651

2006

225

675

8163

2007

231

725

8761

2008

236

776

9439

2009

241

820

10,117

2010

246

870

10,837

2011

251

935

11,435

2012

255

985

12,065

2013

259

1025

12,663

2014

263

1075

13,261

2015

267

1129

13,859

2016

270

1179

14,537

2017

273

1220

15,157

2018

276

1270

15,755

2019

279

1320

16,353

2020

281

1380

16,951

With the increasing amount of the data, the energy consumption is increasing every day. Energy consumption is increasing with server classes as depicted in Table 2. The use of certain techniques and approaches can reduce that power consumption. Beyond the services provided by the cloud, certain approaches need significant improvements. Power consumption tradeoff is not suitable in the big data environment. The development of the tools is not up to the mark and thus it does not simulate and model the behavior of the users over specific period of the time efficiently and accurately. The tools must be capable to model self-organization and other complex phenomena related to human life.

Some of the cloud providers are unstructured, i.e., P2P system which requires applications and development of different tools to cater growing energy needs. Modern systems are unstructured and therefore algorithms like self-organized power consumption approximation algorithm (SOPCA), which are used to monitor power consumption of the different devices. Modern complex systems not only need to change ranges and other parameters but also need to model and simulate the behavior of the entities. Some tools have been developed to handle this task but these tools are very limited in scope. In order to get better understanding and accurate results, some tools like Net Logo and agent based toolkit have been proposed and used by researchers to model complexity of the CAS.

One of the earlier works in which power management has been applied at the data center level has been done by Pinheiro et al. (2001). In their work, the authors have proposed a technique for energy efficiency in heterogeneous cluster of nodes serving as web applications. The main contribution of this work was concentrating the workload of each node and switching idle nodes off. However, the load balancing and weak implementation of SLAs results in performance degradation. Nathuji and Schwan (2007) have studied power management techniques in the context of virtualized data centers. The authors have introduced and applied a power management technique named “soft resource scaling”. However, the adoption and implementation of this technique has not achieved required result because of guest operating systems which were legacy or power unaware.

Gupta et al. (2003) have suggested putting network interfaces, links, switches and routers into sleep modes when they are idle in order to save the energy consumed by the Internet backbone and consumers. However, the adoption of such technique result in communication loss if necessary components are in sleep mode and power consumption at wake up of different devices. Disks design also contributes in energy efficiency, the authors Colarelli and Grunewald (2002) has presented the concept of MAID (massive arrays of idle disks), a technique which power off the unused disks when they are not in use. That is basically an array of disks spins which writes recently used data on cache disks. However, these cache always remain spin up and regular disks remains idle which in turn increases the energy consumption.

Kim et al. (2012) presented a novel approach, called FREP (Fractional Replication for Energy Proportionality), for energy management in big data. FREP includes a replication strategy and basic functions to enable flexible energy management according to the cloud needs, including load distribution and update consistency. However, the impact of the replication on the over storage cost of the system has not presented. Kaushik and Bhandarkar (2010) proposed an energy conserving hybrid multi zone variant of HDFS for intensive data processing, commodity Hadoop clusters. This variant has considerably improved energy efficiency up to 26 % in 3 months as a simulation run. This technique has cut the power budget to $14.6 million dollars. Different types of cloud infrastructures including traditional cloud and high performance computing (HPC), need to be enhanced to support dynamic power demands (i.e., adjust powers automatically), which in turn creates new challenges in designing architecture, infrastructure, and communications which are energy efficient and power aware resources. This concept was given by Bruschi et al. in (2011).

A comprehensive survey about energy saving strategies in both network and computer system that has potential impact in saving energy of integrated systems is given by Berl et al. (2009). Beloglazov and Buyya (2012) highlight the energy concerns while designing system, performance and energy efficient application development. They explained the goal of the computer system design shift to power and energy concerns. The authors carried out a detailed survey about the power consumption problems, different hardware and firm level techniques, how operating system contributes toward energy efficiency, and data center level technique of energy efficiency and importance of virtualization in data centers to achieve energy efficiency. The detailed survey also explains the power consumption at different levels in computing system in terms of electricity bills, power budget and Co2 emission.

DVFS has offered great reduction in energy consumption in cloud infrastructure by changing voltage and frequency according to workload. The implementation of such technique in the cloud has reduced the power consumption significantly. Most of the cloud has implemented this technique which is CPU level technique the most energy absorption component. DVFS has attained lots of attention from research community being adoptive and efficient. Complex adaptive system modeling and simulations are used to clearly communicate the facts about the complex systems nature. The entities interaction and co-ordination helps in understanding behaviors of the complex systems. To manage and meet energy needs in complex systems some of the approaches have been proposed and used by cloud providers. Intelligent self-organizing power-saving architecture (ISPA), which assists in identifying suitable idle computers intelligently, let the system shut down or hibernate automatically based on a uniform rule-based company-wide policy. This architecture results in minimum performance loss as compared to other techniques. The detailed description of the hardware and software based techniques is elaborated in the next section.

Energy efficient techniques

To achieve energy efficiency in the big data, lot of techniques, algorithms and architectures have been proposed by the researchers. These techniques have been categorized in Fig. 3 and will be elaborated in the coming section. Mix uses of these techniques have helped cloud providers to manage services and infrastructure in an efficient way. These techniques have achieved considerably lot of attentions from research community and helped in achieving energy efficiency.

Growing needs of the energy is managed by these techniques and it is expected that it will become quite challenging for coming few years if significant improvement have not been done in this area. Energy efficient techniques have not helped only in reducing energy consumption but also in delivering services with ease. These techniques are better in achieving energy efficiency in cost effective way. These techniques are categorized in SPM and DPM techniques.

SPM techniques

SPM is basically a hardware level technique and mainly focuses on the transistor and process technology. The main focus of such technique is on design of the system, for example, if the hardware is initially designed while keeping in mind the energy consideration, it must be of prime importance. SPM techniques are mostly focused on CPU design and architecture. These are mostly hardware based techniques. If we talk about CPU it’s further subdivided into cycle level and instruction level. These techniques involve transition from low energy consumption to high energy consumption depending upon the load.

In Fig. 4, CPU level energy efficient techniques have been elaborated. At cycle level, certain components are activated or deactivated by the demand of processes in energy consumption sense. This is the best technique because only required components are activated. Instruction level execution, determines the energy consumption at instruction level and cost of each instruction is characterized at certain levels. During execution of instruction, several parameters like frequency and voltage are associated. Apart from instruction level and cycle level, the circuit’s level approach also helps in achieving energy efficiency. The configuration of the circuits determines the flow of current and transition from low energy flow to high energy flow. All these three dimensions help in balancing energy demands in CPU, which is the most energy consumption component of the big data.
Fig. 4

Energy efficiency techniques (CPU level)

From a system perspective, disk is the large consumer of the energy but simple disks can be replaced with solid state disks, which can reduce the energy as compared to old disks, which are consuming about to 34 % energy. During the design of chips, special attention is required toward energy consumption, and energy related technologies must be applied because technology implemented circuits can reduce this cost of 34 % to round about 15 or 20 %. Another state which consumes large energy, in user mode, when system is executing instruction in the user mode then it consumes large amount of energy, while kernel mode needs less energy which is almost 15 %. Invocation of kernel mode result in 10 % of energy consumption.

Some major concerns, like whenever CPU is not doing useful work it must have to execute at least one process called idle process, during the execution of that process CPU consumes 5 % of the energy, which is overhead. No efficient techniques are developed so for which can minimize this loss. However, different techniques are developed which let processer sleep when it’s not doing useful work. This not only saves energy but also improves the performance of the computer system.

Hardware level techniques

Just like software techniques, there exists a combination of hardware techniques, which is dependent on hardware configuration. These techniques have lots of benefits being energy efficient. The taxonomy of these techniques in the context of big data is given in Fig. 5 which elaborates the existing hardware approaches for energy efficiency. There are mainly two approaches DPS and DCD. The further division of these two techniques is component based and each has some features which contribute in energy efficiency of big data.
Fig. 5

Hardware based energy efficient techniques

Design constraints

We selected different parameters for evaluation of each technique such as:
  • Performance

  • Goal

  • Cost (in term of money & in term of manpower)

  • Complexity

  • Response time

  • Energy efficiency

  • User awareness

  • Workload management

  • Platform dependency

  • Management

  • Maintenance

The reason to select these parameters is that complexity means the complex nature of the big data and heterogeneous elements (objects and agents), for example, the implementation of any energy efficient technique will certainly influence the behavior of other agents which in turn affect the other parts of the systems, e.g., security or resilience. Energy efficiency means how much energy is saved in correspondence to workload and management of energy in relation to context switching and transitioning of the system in active and sleep states. Each technique has a unique goal of reducing amount of energy consumption and increasing throughput. User awareness and management referred as the adoption of the technique in relevance to infrastructure size, supply companies and stockholders interest. If the level of user-awareness is high and management is up to the mark then it will be helpful in adaption of common practices for energy use and particular market energy frameworks. Response time of each technique is task specific, for example, if user reading ten lines from same document results in less energy consumption as compared to reading ten lines from ten different documents.

Cost is the major factor and here we calculate two types of the costs, one is in term of manpower (e.g., lines of code, language and chips) and the other in term of money (e.g., installation, execution, testing, hardware resources and configuration cost). Each technique has different types of costs associated with it depending upon complexity and operations. Maintenance referred as support provided after the implementation of each technique and testing of each technique on different hardware resources. Workload is another determinant of each technique and workload management is mandatory. It means efficient operations and working of cloud under critical circumstances. However, hardware designs, material to build hardware, technique/approaches, tools and execution environment used to build software affect its operation and energy consumption. The hardware which is built keeping in mind the performance concerns has high energy consumption rate than those which is built keeping in mind the energy concerns.

Dynamic component deactivation (DCD)

This technique is most energy efficient, which disabled unused components on the basis of some predefined rules. However, prediction of activation and deactivation causes lots of energy consumption. No such efficient algorithm exists to determine interval with accuracy. Such transition degrades system performance as well but based on some historical data, transitions are done effectively to transit system in different states (Valentini et al. 2011a; Koomey 2007). DCD performance evaluation is expressed in Table 3. The design parameters are selected for relevant technique on the basis of complexity and power saving mechanism. The evaluation of each technique is carried out on the basis of few common parameters.
Table 3

Performance evaluation of DCD

Parameter(s)

Evaluation

Performance

Over all Satisfactory in a small enterprise. Can be improved when next transition is already known or have a system model which determine transition interval in order to avoid from overhead of activation and deactivation

Goal

To achieve maximum energy efficiency and minimize energy consumption.

Cost (in terms of man power)

In terms of man power these techniques are hard to develop which require more efforts, advanced techniques and latest technology implication makes it more costly

Switching cost

Whenever switching is done which not only degrade performance but also increase energy consumption

Maintaince

Maintaince of DCD is difficult, mainly if we talk about delays, inactivation, deactivation and intervals predictions this make it very expansive and hard to ménage

Complexity

The complexity of technique increase with non-linear interactions and diverse nature of the complex systems. If the interactions between objects and agents are high, complexity will increase

Response time

These approaches don’t have best response time if a component is deactivated and urgent activation is required then it might take very long time which is performance degradation

Cost (in terms of money)

These are hard to build and are very expensive as well. Logic is burnt on chips or any other component then it becomes more costly

Figure 5, summarizes all hardware techniques which are supporting energy efficiency. Hardware support is a key to achieve energy efficiency using algorithms, policies and software approaches. Hardware are properly evaluated and tested by reputed companies before deployment to achieve energy efficiency effectively. All those companies who are investing money to cope up with energy issue using hardware are benefiting more than those who are investing in software. Different software and hardware techniques and their implementation produce desired results. Recent advancements are remarkable which have enhanced big data popularity by all means, and delivering services to intended user in cost effective and desired way. Performance evaluation of desired technique is expressed in Table 3 with few important parameters which are used to assess its performance.

DCD is further divided into various techniques, i.e., predictive and stochastic which contribute toward energy efficiency. If we talk about predictive techniques on the basis of prediction, decision will be made when to activate and deactivate the system components. Different policies exist, which ensures the correlation between active and inactive states. Energy is consumed when we let different components to wake up and go to sleep, which also hinders performance overhead and cause serious drawbacks. Predictive wakeup and predictive shutdown provide solution to above problem these are on the other hand provides best solution to deal with the above mentioned problem. However, certain issues related to intelligence are implemented in these mechanisms.

Predictive shutdown policies address the issues of inactivity. According to the instance or situation of the predictive shutdown, historical data predicts the next idle period. These approaches involve decision making and are highly dependent on actual utilization of energy and the strength of co-relation between previous and next events. History predictors are energy efficient but they are not as safer as timeouts which works on predictions (Benini et al. 2000). However, predictions are not supportive in many situations. Predictive wakeup techniques aim to reduce the energy which is consumed on activation. Meanwhile, most of the components require lots of energy at wakeup. The transition from active to inactive state is computed on the basis of some previous record, and sometime on the requirement of the user (Albers 2010). In these techniques, energy consumption is high but minimum performance overheads are there on wakeup. Performance evaluation of fixed timeout, predictive shutdown and predictive wakeup is expressed in Table 4. The accuracy of such techniques is determined in terms of the complexity, performance, maintenance, costs and energy efficiency.
Table 4

Performance evaluation of Fixed time out, Predictive shutdown and Predictive wakeup

Parameter (s)/techniques

Fixed timeout

Predictive shutdown

Predictive wakeup

Performance

Fixed time out gives poor performance and increase performance overhead which is not acceptable in any infrastructure

Predictive shutdown provides good performance

Predictive wakeup provides good performance

Energy efficiency

They save less amount of energy and sometime leads to energy wastage during beginning of idle periods

Presence of these techniques increases consumption of energy during transition

Presence of these techniques increases consumption of energy during transition in repeated wake up calls

Goal

Goal of Fixed Time Out is to increase energy efficiency and decrease energy consumption

Goal of predictive Shut Down is to increase energy efficiency and decrease energy consumption

Goal of predictive Wakeup is to increase energy efficiency and decrease energy consumption

Intervals prediction

Sometime wrong selection of threshold value halts performance

Prediction about next transition may be not as it is as predicted

Prediction about next transition may be not as it is as predicted

Maintaince

Maintaince is quite difficult especially in term of setting time spans and setting threshold value

It’s almost difficult to implement such system when predictions are not properly monitored; carefully designed system model is required

It’s almost difficult to implement such system when predictions are not properly monitored; carefully designed system model is required

Complexity

Fixed Timeout are more complex even when interactions are high

Relatively easy but prediction some time misleads toward performance overhead and halt interactions

Relatively easy but prediction some time misleads toward performance overhead

Cost (in terms of money)

The development and implementation of these techniques increase cost in terms of man power and in the terms of money

The development and implementation of these techniques increase cost in terms of man power and in the terms of money

The development and implementation of these techniques increase cost in terms of man power and in the terms of money

In the above section, the related concepts to SPM and sub techniques have been explored. All these techniques are static. In order to deal with problem of intelligently determining idle components, the adoptive techniques have been developed. Prediction about next transition is inefficient when workload is not determined in advance. Several practical techniques have been discussed in the literature which mainly focuses on energy efficiency (Srivastava et al. 1996). SPM considers architecture of the RAMs and CPU and related components. SPM is specially designed to control the internal structure of CPU, including circuits, chips structure of buses and ports. SPM uses intelligent approaches to determine the transition and sequences of inactive and active states.

Dynamic performance scaling

This technique is used in different hardware components which are used for energy efficiency. Each component dynamically adjusts performance proportional to power consumption. Instead of deactivation, it gradually decreases supply when resources have not been used. This concept leads toward further discussion which lies in all dynamic voltage and frequency scaling. This is considered to be the best energy efficiency technique and produces better result when compared with others. This technique is helpful in maintaining performance of the system and ensures that energy is consumed in a balanced way because each component maintains energy states which are stable.

Dynamic voltage and frequency scaling

DVFS contributes well in energy efficiency especially in the cloud environment. CPU frequencies need proper adjustment but frequency adjustment requires voltage scaling as well. Both these parameters need adjustments collectively in order to contribute towards energy efficiency. Sometime increase in voltage causes increase in temperature which in turn increases energy consumptions. DVFS minimizes the number of instructions that can be issued by the CPU in a particular instance of time, which results in the reduced performance. This in turn increases performance overhead especially for CPU bound processes. Researchers and designers are exploring the same issue from several years but are unable to provide optimal solution. General formula used for voltage and frequency calculation and related parameter details is expressed in (Dean and Ghemawat 2004) which given as:
$$P = C{ \cdot }V^{2}\,{ \cdot }\,f$$
(1)

DVFS looks straightforward but implementation is not so easy. The structure of real system has imposed certain technicalities on the DVFS. Production of desired frequency to meet application performance is also tricky. However, the authors are not sure about power consumed by processor its quadratic, linear or non-linear to voltage supplied (Chen et al. 2012). Several approaches have been practiced that reduce energy consumption. This energy consumption can be categorized as interval based; intra-task based and inter-task based (Hwang and Wu 2000). Interval based technique is same as adaptive technique which predicts the CPU cycles and transitioning is done in various orders.

Inter-task approach dynamically distinguishes between processes based on their execution time and assign them a different CPU speed (Hwang and Wu 2000; Douglis et al. 1995). However, this can cause an issue when different scheduling algorithms are applied because execution time using round robin (RR) scheduling algorithm will be different than first come first served (FCFS) algorithm. Voltage and frequencies can be best adjusted if we know the workload in advance, or its constant throughout the execution. In comparison with inter-task, intra-task approach provides fine grained information about the structure of the programs and tune the processor voltage and frequency in the tasks effectively (Buttazzo 2002; Andrew et al. 2010; Wiessel and Bellosa 2002). Performance evaluation of DFVS is provided in Table 5.
Table 5

Performance evaluation of DVFS

Parameter(s)

Evaluation

Performance

DVFS provide good performance, it reduces the instruction that processor issue in a particular instance of time therefore which results in power reduction

Energy efficiency

Provide good energy efficiency if workload is known or dividing task and assigning different frequency. But still it’s better from static power management

V and P relationship

As equation suggest that relationship between power and voltage is quadratic However, it might not be quadratic sometime linear and sometime nonlinear depend on interactions

Complexity

DVFS architecture is much complex and sometime structure of the system also increase its complexity

Cost (in terms of money)

The implementation of the same logic on chip is required huge efforts. Due to the technicalities involved it is costly

Maintenance

At each instruction, the CPU frequencies need to be adjusted so it’s hard to operate and improvement and enhancement is not always easy

Response time

RT consists of non-linearity but it’s executed fast so it provide better Response Time

Sometime program execution is independent to cpu, I/O bound processes executed without CPU involvement

Platform effect

It’s obvious that energy savings vary from platform to platform and infrastructure to infrastructure. It’s also varying with different hardware and software architecture

Timeslots management

In this technique, if we are defining timeslots without considering any model like Round Robin then problems like starvation can occur some processes may exceeds their execution time and some processes might not get the fair share of CPU time

User awareness

User Awareness is required to execute programs or implementing DVFS at compiler level. If a program is developed without paying any intention to energy may mislead toward bad results

Cost (in terms of man power)

Lots of efforts are required to develop energy efficient system, which required a lot of programming skills and intention

Workload

When workload is known, DVFS works well, but when workload is unknown then response is non-trivial

DVFS is always concerned with energy saving from its efficient energy scheduling method. It saves energy when peak performance of any component is not required. It also adjusts CPU cycles, when CPU is not doing useful work, i.e., reading data from disk. DVFS scheduling is one of the best technique, which contribute toward energy efficiency. DVFS uses A2E which makes it different from all other techniques available for energy efficiency. It scales up and down voltage and frequency so well that performance is not hindered. DVFS uses simple method to save energy which is high enough to keep servers on all the time. However, for most data intensive solutions it may not be suitable option because these applications mostly use read/write operation. It compete all other techniques which are available for energy saving with minimum performance compromises. This is adoptive and scheduling is runtime which is a key to success. This is the reason DVFS is mostly used by companies who are crowd king of big data (Lee and Sakurai 2000). Dynamic voltage and frequency scaling is deployed in many data centers to fulfill the energy needs. The devices needs to be built with service oriented and energy oriented architecture. The performance evaluation of the DVFS is provided in Table 5.

Powernap

Meisner et al. 2009 gave an idea of active power consumption in data centers based on fast switching between active and dump power states. The proposed goal of these techniques is same as other, i.e., minimize energy consumption. Each technique aim is partial amount of energy reduction instead of getting proportional of energy consumption computing proposed by Barroso and Holzle 2007. In Powernap, main focus is not only on two states, i.e., (sleep and active) but also on CPU switching and workload management etc. In other techniques, the major concern is fast transition between states, i.e., very low power consumption in sleep states and high very energy consumption in active states.

During powernap the implementation was tested on different systems and their transitions were determined and comparisons have been made at different states. Different conclusion have been drawn based on the assumption that if switching time is less than 10 ms or equal than 10 ms power savings are approximately smooth and linear and are more than DVFS. However, in ideal situation the transition time is 300 ms. desired requirements are hard to meet but if authors have determined the mechanism for transition time then average server power can be reduced to 74 % (Lee and Sakurai 2000). Performance evaluation of powernap is provided in Table 6 where we compare it on the basis of certain parameters such as complexity and cost etc.
Table 6

Performance evaluation of Powernap

Parameter(s)

Powernap

Performance

Powernap gives poor performance and increase performance overhead which is not acceptable in any infrastructure. However, ideal situation in server environment never meet

Energy efficiency

They save less amount of energy and sometime leads to energy wastage during beginning of idle periods. Powernap does not prevent such wastage of energy which is problem for existing data centers and Big data today

Goal

Goal of Powernap is to increase energy efficiency and decrease energy consumption

Intervals prediction

Sometime wrong selection of threshold value halts performance. Its quiet difficult to guess about next interval and about transition time

Maintenance

Maintaince is quite difficult especially in term of setting transition time and adjusting transition time

Complexity

Powernap is very much complex

Cost (in terms of money)

The development and implementation of these techniques increase cost in terms of man power and in the terms of money. Chips are costly and implementation and prototype development is hard. Testing is also difficult

Virtualization

Virtualization is another economical technique which allows us to develop layers of multiple operating systems and application running on single hardware. Resources can be split into a number of logical pieces which are called virtual machines. Each Virtual machine provides an individual operating system, creating a view of allocated physical resources ensuring performance and failure isolation between virtual machines sharing single physical machines between them. Virtualization provides variety of benefits like it provide us an environment to build multiple operating systems on the single hardware, which saves cost of buying another physical hardware.

Virtualization provides lots of support in energy efficiency which is major concern, in today’s data centers. Virtual machines are contributing in delivering big data services optimally and proper management is giving numerous benefits. According to the recent studies, virtual machine manager always behaves like power-aware operating system without distinctions between virtual machines which monitors all system performance and ensures the correct application of DVFS or any DCD technique to all system components.

Another way to achieve energy is leverage operating system power management policies and application processing base knowledge that check the power management of different virtual machines. On the basis of changes in the hardware power state or enforce power limits energy efficiency in an effective manner. Different virtual machines (VMs) are available such as Xen, VMware and KMP which provide virtualization with variety of benefits. Xen provides lots of support in C-states (CPU sleeping States) (Wei et al. 2009). Xen determines CPU utilization periodically, then determine the appropriate P-state and issue a command to change in hardware power state.

This transition helps in saving energy. Xen provides four governors related to power saving which are:
  • On demand-choose the best P-state according to current resource requirements.

  • User space-sets the CPU frequency specified by the user.

  • Performance-set the highest available clock frequency.

  • Power save- set the lowest clock frequency.

Apart from governors, it also supports C-states very well. When CPU is not doing any useful work, it lets CPU to go to C states and when activation is required it again switches to active state. It has lots of benefits but like offline and online migration of VMs which can help in reducing significant amount of energy by using VM consolidation algorithms. XEN has also performance overheads like DCD technique explained previously. VMware also contributes to energy efficiency by supporting basic system level power management, using dynamic voltage frequency scaling. The system dynamically monitors the CPU processing and its utilization and periodically applies appropriate ACPI’s P-states (Wei et al. 2009). In pool of servers where virtualization is implemented, power is reduced by dynamically switching off spare servers (VMware Inc 2009). Performance evaluation of virtualization is provided in Table 7.
Table 7

Performance evaluation of virtualization

Parameter(s)

Evaluation

Complexity

VMs implementation is very complex, the functionality and implementation requires lots of efforts and logic. Complexity can be explored considering LOC (Line of Code) and interactions of the components with other components

Management

Management of virtual machine is also difficult. Concurrent execution of process needs isolation which in turns difficult to manage. Administrative techniques are required

Goal

Goal of this technique is to develop an energy efficient approach which minimizes energy consumption. This approach has very good result up to some extent but still significant improvements are required

Energy efficiency

Provides good energy efficiency but some time transition into C-states and active state cost performance minimization

Cost (in terms of money)

Virtualization is good across this factor as it provide us an opportunity to build multiple OS on the single hardware which in term minimize cost

Cost (in terms of man power)

As this is complex technique so lots of cost encored to hypervisors, VMs and their management

Maintenance

Being complex maintenance is not so easy. Different hypervisors are developed overtime and each provides best ability to upgrade, install and manage new hypervisor. Xen, VMware and kmp are best ever used virtualization machines which are good for power management as well

User awareness

User awareness is required at each step to effectively gain benefits of energy consumption

It can be inferred that virtualization has helped in achieving energy efficiency. Experimental results suggest that prototype is responsible for implementing power limits between energy aware and energy unaware guests. Different devices with different power states processor which is dedicated for hardware-assisted virtualization, multiple core architecture have been used in data centers (Friedman 2009).

It is believed that these areas require further investigation and research to progress and explore more energy efficient algorithms. Amongst all techniques available, virtualization provides best energy efficiency if properly designed and implemented can result into long run benefits which reduce efforts in terms of man power and in terms of money.

Summary and findings

Energy efficiency in big data remains challenging because of the variety of the services that need to be processed by big data and continued demand for high performance computing. The main methods to provide energy efficiency in complex systems are: (a) software based energy efficient techniques (b) hardware based energy efficient techniques (c) to focus energy efficient architectures and energy efficient algorithms and policies. The main objective of this paper is to focus hardware and software based techniques which are used in big data complex adaptive system. These techniques are deployed to achieve energy efficiency in the context of big data. Energy efficiency is an important requirement of current and future complex adaptive systems. Each of the above mentioned technique has certain advantages and disadvantages. The two main categories of energy efficient techniques are: SPM and DPM, SPM has better energy efficiency but degraded performance. Meanwhile, DPM has shown better results from both energy and performance perspectives. However, the implementation of each technique is dependent upon the company economy scale and operations. If performance concerns are important for an organization operation then they will certainly go for DPM. However, if execution of the task is simple, runtime response is not needed and workload is manageable then the organization will certainly adopt SPM techniques. Virtualization is mostly used in data centers to increase the hardware utilization and efficient management of the resources. Parallel data processing and techniques like map reduce is contributing a lot in providing services efficiently. On the other side, data storage at multiple locations also poses certain challenges of making it available all the time.

In this research, we found DVFS to be best according to different parameters and it is implemented by most of the data centers. According to the researcher, it is one of the best techniques of the future which will contribute well in energy efficiency with the help of multi-component approach. Multi component approach uses DVFS to reduce power consumption of CPU during communication, execution and computation phases. However, changing trends cope up with new techniques like virtualization or some other techniques which are also contributing in efficient cloud management. Software and hardware support is also required in implementation of specific technique. Since last few years, researchers are also working on power scalable hardware and software components to reduce growing energy need while limiting performance loss.

Intelligent self-organizing power-saving architecture (ISPA), helps in solving the problem of determining idle component, i.e., which computer needs to be shut down in order to save power. This architecture assists in determining suitable idle computers for shutting down or hibernation automatically, based on some rules depending on organization policy. By means of selective and intelligent track of computers and turning them off or hibernate them, the scheme helps in achieving productivity and lower cost. This architecture is helpful when the energy consumption is high and needs efficient utilization of the resources without performance loss. All contributions of our study are summarized in Table 8.
Table 8

Energy efficiency in big data techniques and architecture

Techniques

Complexity

Energy efficiency

Management

User awareness

Maintenance

Cost (in terms of money)

Goal

Cost (in terms of man efforts)

Response time (per server)

Statistical implications

Advantage

Limitation

SPM

Very complex

Satisfactory

Easy

Good

Easy

Very much

Average

Very much

Average

Implemented

Less energy efficient

CPU and RAM specific

DCD

Very complex

Poor

Hard

Good

Easy

Very much

Average

Average

60 %

Implemented

Less energy efficient

Component specific

Fixed time out

Very complex

Poor

Easy

Average

Easy

Very much

Satisfactory

Average

40 %

Implemented

Average energy efficient

Inefficient

Predictive wakeup

Very complex

Average

Hard

Good

Easy

Very much

Average

Average

60 %

Intervals selected statistically

Average energy efficient

Prediction is required

Predictive shutdown

Very complex

Average

Hard

Good

Easy

Very much

Average

Very much

60 %

intervals Selected statistically

Average energy efficient

Prediction is required

DVFS

Very Complex

Good

Hard

Good

Hard

Average

Achieved

Very much

80 %

Frequency adjustments

Best energy efficient

Frequency adjustment

Powernap

Complex

Good

Hard

Good

Easy

Average

Achieved

Average

60 %

Switching between states

Average energy efficient

Switching cost

Virtualization

Complex

Good

Hard

Good

Easy

Average

Achieved

Average

70 %

Isolation is concerned

Average energy efficient

Isolation is required

Sampling

Easy

Good

Hard

Good

Easy

Average

Achieved

Average

72 %

Implemented

Average energy efficient

Sample selection

Clustering

Complex

Satisfactory

Hard

Average

Easy

Average

Achieved

Average

70 %

Nodes are grouped

average Energy efficient

Minimum path selection

ISPA

Complex

Excellent

Relatively easy

Good

Easy

Average

Achieved

Average

90 %

Implemented

Maximum energy efficiency

Works on specified rules

Open issues

Recent research revealed that different energy efficient techniques can significantly improve energy saving in cloud computing, big data and CASs. Most importantly, DVFS has much better energy saving measures than all other existing techniques. Implementation of DVFS is not only helpful in energy saving but also improves the energy utilization at different levels of network infrastructure. However, its important to evaluate and design different test environments to enhance the utility of the DVFS in different settings. Few common challenges of DVFS are listed below.

Lack in software controls

As DVFS is mostly implemented in the form of software, some software does not allow regulation of voltage properly because of hardware dependence. As a result, DVFS fails to save energy. Some software requires hardware changes to regulate over and under volts. DVFS fails to achieve energy efficiency because of different components configuration. In order to be successful on different software, DVFS must have voltage mods to adjust energy in RAMs, motherboards and chips etc.

Formula modification

Because of the different materials used with hardware, energy saving varies from component to component. In order to achieve maximum energy efficiency, its desirable that most of the components include metal oxide semi-conductor (CMOS) technology. As most of the chips are not cent percent made up of CMOS so (Eq. 1) gives different results. For this it’s desirable that instead of C in (Eq. 1), there should be some component specific constant to provide its capacitance and voltage flows etc.

No detection of short burst

Sometime CPU bound processes need very short CPU burst to complete execution. During this activity CPU frequencies are not adjusted accordingly. Therefore, significant amount of energy is wasted and bursts are not detected even by DVFS. Therefore short burst detection and proper load management needs further investigation.

Energy efficiency in multi user-cases

The existing literature mainly focuses on wired networks which are limited to servers only. In multi user case, different user may have different platform and different operating systems. This diverse nature of the network needs further exploration w.r.t energy saving. Therefore the exact measures and energy saving in multi user case is still unknown.

Frequency performance mismatch

DVFS is mainly based on different frequencies which are adjusted according to workload. However the accuracy between frequencies and workloads is still unknown. As most of the time wrong burst size and frequencies adjustment hinder performance of the system. In this alteration of frequencies, an intelligent and real-time decision process is still unknown.

Workload mismatch

In DVFS, frequencies are altered on the basis of workload. A strong statistical analysis of frequencies on workload is necessary to achieve required objectives of energy efficiency. For this, either workload must be known in advance or runtime adjustment needs to be made. How to allocate resources and frequencies according to workload correctly is not known clearly.

Inaccurate frequency measures for CPU bound processes

In DVFS implementation, sometime change in frequencies results in performance degradation. Change in frequencies for I/O bound processes result in maximum performance while same in CPU bound process cause performance loss. CPU bound processes cause performance loss and frequencies variation. In order to achieve maximum benefits from DVFS implementation, modifications are required in prediction models. DVFS policies variation and prediction model, still needs further investigation. Based on the preliminary results and studies we consider DVFS a better energy efficiency technique to quickly estimate the power consumption of different components. However, power-aware job scheduling, workload management, accurate selection of bursts and component based policy making can enhance DFVS performance.

Conclusion and future work

In this paper, we surveyed different techniques used for enhancing the energy efficiency of big data. We have surveyed the contributions that are available in this domain from the recent research. We critically analyzed different software and hardware based techniques used for processing, communicating, computation and providing services to the users in big data. We have evaluated the performance of each technique and drawn core comparison expressed in Table 8. In cloud, significant amount of energy can be saved using integrated techniques. While designing applications related to big data, energy must be prime concern. Some cloud computing application consumes more energy than traditional applications. ISPA helps in reducing energy need by transitioning between low and high energy in the complex systems. This approach is helpful to maintain growing energy needs of the big data.

We critically investigated various techniques, and found that DVFS is the best among all but still some performance issues are associated with it. In future, for example, this option can be explored further instead of shutting down processor, there can be some improvement in process based energy efficiency. If process is I/O bound then it decreases energy of the CPU and if process is CPU bound then it gives energy as required to do processing. Most importantly, a benchmark needs to be set for saving energy in big data considering future trends and needs. Our broad conclusion is that energy efficiency must be given preferences and implementation of energy efficient techniques is required to meet this global challenge of energy.

In future, algorithms can be developed which can optimize energy efficiency in big data based on artificial intelligence. DVFS can also be optimized by adding some specific parameters and performance can be enhanced. Some substitutes version of DVFS can be figure out which is good for data intensive application with minimum performance halt. However, processing specific energy consumption concept can also be explored (i.e., application involves in complex processing require more energy). Complex adaptive systems have reliance from the most of real life situations and have gained lots of attention from research community. In future, key trend related to the complex adaptive systems in big data and essential features and challenges need to be explored further. We aim to explore different architectures of ISPA. These architectures will be analyzed to see how they fulfill the high energy demands of the corporate sectors without disruption of user services or loss of grade of user experience. In addition to this, there is a need of special attention in integrating ISPA with other techniques to cater energy demand in big data.

Declarations

Authors’ contributions

MS designed the study. AM carried out the studies and performed the analysis. AM drafted the manuscript and MS wrote the conclusion. Both authors edited the paper individually and together. Both authors read and approved the final manuscript.

Acknowledgements

This research was supported by [COMSATS Institute of Information Technology Islamabad, Pakistan].

Competing interests

The authors declare that they have no competing interests.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

(1)
Department of Computer Science, COMSATS Institute of Information Technology

References

  1. (2009) How VMware virtualization right size IT infrastructure to reduce power consumptionGoogle Scholar
  2. Albers S (2010) Energy efficient algorithms. Commun ACM 53(5):86–96View ArticleMathSciNetGoogle Scholar
  3. Andrew LL, Lin M, Wieman A (2010) Optimality fairness and robustness in speed scaling design. In: Proceedings of ACM international Confrence on measurement and modeling of international computer System(SIGMETRICS 2010), New York, USAGoogle Scholar
  4. Barroso LA, Holzle U (2007) The case for energy-proportional computing. In: IEEE computer pp 33–7Google Scholar
  5. Batool K, Niazit MA (2015) Self-organized power consumption approximationin the internet of things. In: International Conference on Consumer Electronics (ICCE), 2015 IEEE, 9–12 Jan. 2015Google Scholar
  6. Beloglazov A, Buyya R (2012) Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr Comput Pract Exper 24(13):1397–1420View ArticleGoogle Scholar
  7. Benini L, Bogliolo A, Micheli GD (2000) A survey of design techniques for system-level dynamic power management. IEEE Trans Very Large Scale Integration(VLSI) Syst 8(3):299–316Google Scholar
  8. Berl A, Gelenbe E, Girolamo M, Giuliani G, Meer H, Dang M, Pentikousis K (2009) Energy-Efficient Cloud computing. In: Oxford University Press on behalf of The British Computer SocietyGoogle Scholar
  9. Bosilca G, Bouteiller A, Danalis A, Herault T, Lemarinier P, Dongarra J (2012) DAGuE: Aa generic distributed DAG engine for High Performance Computing. Parall Comp 3751Google Scholar
  10. Bruschi J, Rumsey P, Anliker R, Chu L, Gregson S (2011) Best Practices Guide for Energy-Efficient Data Center Design by Rumsey Engineers under contract to the National Renewable Energy LaboratoryGoogle Scholar
  11. Buttazzo G (2002) Scalable Application for energy aware processors. Embed Softw pp153–165Google Scholar
  12. Chen M, Mao S, Liu Y (2014) Big data: a survey. Mobile NetwAppl 19:171–209View ArticleMathSciNetGoogle Scholar
  13. Chen Y, Alspaugh S, Borthakur D, Katz R (2012) Energy efficiency for large-scale map reduce workloads with significant interactive analysis. In: Proceedings of the 7th ACM European conference on Computer Systems, ser. EuroSys’12, 2012, pp. 43–56Google Scholar
  14. Chenet Y (2010) To compress or not to compress-compute vs. IO tradeoffs for map reduce energy efficiency. In: Proceedings of the first ACM SIGCOMM workshop on Green networking, ACM, 2010, pp 23–28Google Scholar
  15. Colarelli D, Grunewald D (2002) Massive arrays of idle disks for storage archives. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, ser. Supercomputing’02. Los Alamitos, CA, USA: IEEE Computer Society Press, 2002, pp 1–11Google Scholar
  16. Dean J, Ghemawat S (2004) Map reduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on Symposium on Operating Systems Design and Implementation, 2004, pp. 10–10Google Scholar
  17. Douglis F, Krishnan P, Bershad B (1995) Adaptive disk spin down policies for mobile computers. Comp Syst 8(4):381–413Google Scholar
  18. Ekanayake J, Pallickara S, Fox G (2008) Map reduce for data intensive scientific analyses. In: Proceedings of the 2008 Fourth IEEE International Conference on eScience, ser. ESCIENCE’8-8-2008, pp 277–284Google Scholar
  19. Friedman E (2009) SQL/Map Reduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. Proceedings VLDB Endowment 2(2):1402–1413View ArticleGoogle Scholar
  20. Goiri In (2012) GreenHadoop: leveraging green energy in data processing frameworks. In: Proceedings of the 7th ACM European conference on Computer Systems, ser. EuroSys’12, 2012, pp 57–70Google Scholar
  21. Gupta M, Singh S (2003) Greening of the internet. In: Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM, 2003, New York, NY, USA, pp 19–26Google Scholar
  22. Habiba U, Masood R, Shibliand M, Niazi MA (2014) Cloud identity management security issues and solutions: a taxonomy. Comp Adap Syst ModelGoogle Scholar
  23. Hwang CH, Wu AC (2000) A predictive system shutdown method for energy saving of event-driven computation. ACM Trans Design Automat Elect Syst (TODAES) 5(2):241Google Scholar
  24. Ibrahim S et al. (2009) Evaluating map reduce on virtual machines: the Hadoop case. In: Proceedings of the 1st International Conference on Cloud Computing, ser. CloudCom’09, 2009, pp 519–528Google Scholar
  25. Jiang D, Ooi BC, Shi L, Wu S (2010) The performance of map reduce: an in-depth study. Proc VLDB Endow 3(1–2):472–483View ArticleGoogle Scholar
  26. Kaushik RT, Bhandarkar M (2010) Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the 2010 international conference on Power aware computing and systems, ser. HotPower’10. Berkeley, CA, USA: USENIX Association, 2010, pp 1–9Google Scholar
  27. Kim J, Rotem D (2012) Frep: energy proportionality for disk storage using replication. J Parallel Distrib Comp 72(8):960–974View ArticleMATHGoogle Scholar
  28. Koomey JG (2007) Estimating Total power consumption by server in the US and the world, Analyst Press, OaklandGoogle Scholar
  29. Krauth W (2006) Statistical mechanics: algorithms and computations. Oxford University Press, USAGoogle Scholar
  30. Kumar R, Gupta N, Charu S, Jain K, Jangir, SK (2014) Open source solution for cloud computing platform using OpenStack. In: IEEE International Conference on Consumer Electronics (ICCE), 2014, 9-12-2014Google Scholar
  31. Lee S, Sakurai T (2000) Runtime voltage hopping for low power real time Systems. In: proceedings of the 37th Annual design Automation conference, Loss Angeles, CA, USA, pp 806–809Google Scholar
  32. Meisner D, Gold BT, Wenisch TF (2009) Powernap: eliminating Server idle Power. ACM SIGPLAN Notices 44(3):205–216View ArticleGoogle Scholar
  33. Meisner D, Gold BT, Thomas F (2009) Powernap: eliminating server idle power. In: Proceedings of ASPLOS09, Washington USAGoogle Scholar
  34. Menon A (2012) Big data i-e Facebook. In: Proceedings of the 2012 workshop on Management of big data systems, ser. MBDS’12, 2012, pp 31–32Google Scholar
  35. Moise D, Carpen-Amarie A (2012) Map reduce applications in the cloud: a cost evaluation of computation and storage. Data Manag Cloud Grid P2P Syst 7450:37–48Google Scholar
  36. Nathuji R, Schwan K (2007) Virtual power: coordinated power management in virtualized enterprise systems. ACM SIGOPS Operating Systems Review, pp 265–278Google Scholar
  37. Negru C, Pop F, Cristea V, Bessisy N, Li J (2013) Energy efficient cloud storage service: key issues and challenges. In: Fourth International Conference on Emerging Intelligent Data and Web TechnologiesGoogle Scholar
  38. Niazi MA, Laghari S (2012) An intelligent self-organizing power-saving architecture: an agent-based approach. In: Fourth International Conference on Computational Intelligence, Modelling and Simulation, 2012Google Scholar
  39. Pinheiro E, Balanchine R, Carrera EV, Heath T (2001) Load balancing and unbalancing for power and performance in cluster-based systems. In: Proceedings of the Workshop on Compilers and Operating Systems for Low Power, 2001, pp 182–195Google Scholar
  40. Shuja J, Madani SA, Hayat K (2012) Energy-efficient data centers. Computing 94(12):973–994View ArticleMATHGoogle Scholar
  41. Srivastava MB, Chandrakasan AP, Brodersen RW (1996) Predictive system shut down and other techniques for energy efficient programmable computation. IEEE Trans Very Large Scale Integ (VLSI) Syst 4(1):42–55View ArticleGoogle Scholar
  42. VMware Inc (2009) vSphere resource management GuideGoogle Scholar
  43. Valentini GL, Lassonde W, Khan SU, Min-Allah N, Madani SA, Li J, Zhang L, Wang L, Ghani N, Kolodziej J, Li H, Zomaya AY, Xu CZ, Balaji P, Vishnu A, Pinel F, Pecero JE, Kliazovich D, Bouvry P (2011a) An overview of energy efficiency techniques in cluster computing systems. Cluster Computing 16(1):3–15View ArticleGoogle Scholar
  44. Valentini GL, Lassonde W, Khan SU, Li J, Zhang L (2011b) NDSU-CIIT Green Computing and Communications Laboratory, Department of Electrical and Computer Engineering. North Dakota State University, Fargo, ND 58108-6050, USAGoogle Scholar
  45. Wei G, Liu J, Xu J, Lu G, Yu K, Tian K (2009) The on-going evolutions of power management in xen. Intel Corporation tech Rep 2009Google Scholar
  46. Wiessel A, Bellosa F (2002) Process Cruise Control: event driven clock scaling for dynamic power management. In: proceedings of 2012 international conference on compilers architecture and synthesis for Embeded Systems, Grenoble, France, p 246Google Scholar

Copyright

© Majeed and Shah. 2015