Energy efficiency in big data complex systems: a comprehensive survey of modern energy saving techniques

Majeed, Abdul; Shah, Munam Ali

doi:10.1186/s40294-015-0012-5

Review
Open access
Published: 15 December 2015

Energy efficiency in big data complex systems: a comprehensive survey of modern energy saving techniques

Abdul Majeed¹ &
Munam Ali Shah¹

Complex Adaptive Systems Modeling volume 3, Article number: 6 (2015) Cite this article

7472 Accesses
8 Citations
3 Altmetric
Metrics details

Abstract

The growing need of computation and processing has led to the generation of data centers. These data centers are usually comprised of hundreds of thousands of servers and other components. This complicated arrangement of the systems lead to the adoption of complex systems. Complex systems prevail in our society as combination of lots of entities, e.g., immune system, human brain and ecosystems. The adoption and interaction of the entities is possible through nonlinear interactions. The interaction between the components of the complex system is carried out in distributed fashion. Big data which is comprised of thousands of machines is also considered to be a form of complex adaptive systems which makes use of large entities, components and nonlinear interactions with each other. The development of such a complex systems raises certain challenges. Apart from management, energy is the most concerned one which is the core discussion of this research. This paper, surveys the state of the art on modern tools, techniques, architectures and algorithms which has been proposed and deployed to achieve energy efficiency in big data over the period of 2007–2015. We group existing approaches aimed at achieving energy efficiency in the complex paradigm of big data. In this categorization, we aim to provide an easy and concise view of the underlined model adapted by each approach in the context of big data.

Introduction

Due to the advancement in computer technology, the computer systems have become widespread and complex. This complex arrangement of the system results into a complex adaptive system (CAS). A huge increase in the scale and complexity of the systems have been observed (Niazi and Laghari 2012) since from last few years. CAS exists in different forms. The management of such systems is carried out with efficient algorithms and is only controlled by different computational methods (Batool and Niazit 2015). Big data is one of the examples of such complex systems. CAS is comprised of large entities and components which require interaction and adoption in performing certain operations. The nature of the big data is more or less the same as complex systems. In other words, big data is one of the forms of the complex systems. Most of the CASs are tied to the concept of our society, i.e., immune system, human brain (neuron structure), ecosystem and human societies.

Big data is providing numerous services and infrastructures to the companies and has opened new research directions in the field of computer science. Most of the current applications of the cloud computing uses distributed computing with varying degree of connectivity and interaction. Big data is providing computation and efficient processing to millions of users which has same complexity level just like CAS (Habiba et al. 2014). Apart from the complexity of the CAS, achieving energy efficiency in cloud computing and big data is a global challenge. There are plenty of methods which have been proposed by researchers to reduce power consumption in cloud and big data infrastructure. Most of the solution, proposes the powering off unused components. Other solutions are focused on optimal distribution of the data among different components (Negru et al. 2013). Cloud computing provides numerous services to the users but poses certain challenges because of its complex nature. The devices used in the cloud are so large in number that complexity of such system is even more complex than human brain structure. However, apart from complexity, clouds also come across certain challenges like security and privacy of the data. Big data allow users to host data, access data and process data at any time. The volume of data is increasing with gigantic amount day by day, and no doubt the era of big data has arrived. Big data requires different management techniques to help communities (e.g., users) in performing their tasks quickly and efficiently. CAS helps in modeling user behaviors, which helps cloud provider to manage users efficiently. In order to have an energy efficient cloud infrastructure, we must understand the interaction between different components of the complex systems that consume power in order to meet energy requirements estimating power and performance trade off.

Volume of data is increasing with amazing speed, i.e., 90 % of data available on big data is created in just last 2 years (Habiba et al. 2014). Facebook is also popular and processing data at high speed nearly 500 terabytes of data daily (Kumar et al. 2014). Large hadrons collider (LHC) computing grid is also contributing in vast data generation. Dozens of petabytes of data is produced daily and dissemination, transmission and processing is subject to consumption of huge amount of energy (Shuja et al. 2012). However, these data generators do not address how energy will be saved and used wisely to meet this ever increasing need of data. GreenHadoop is contributing toward energy efficiency using solar energy (Menon 2012). However, it comes across bottleneck when weather is cloudy for many days. Hadoop also uses different techniques, like map reduce which deals with how effectively a query will be answered it has no concern with energy efficiency.

Big data is helping different companies in solving business problems with ease. Big data is utilizing hardware, software, algorithms and many related techniques to perform desired function, and utilizes standardized approaches to help users in performing their tasks with ease. Big data has always helped users by assuring that desired data is always accessible. However, the systems, servers, components and subsystems which are facilitating users, consumes enormous amount of energy. Big data is also servicing user with its unique features and at the same time it’s facing variety of challenges. When we talk about volume of data, firstly, there might be an issue of data storage and secondly, privacy or integrity of the data is also a major concern. Users might be affected by viruses, Trojan horse and hackers. Another feature of big data is that information and data is always accessible to the user. On the other hand, user can come across the situation when data is not accessible due to poor network connection. Keeping in mind all issues, energy is another important concern which needs to be addressed. Due to increase in technology trends and growth in wired, wireless and mobile devices network, energy consumption has increased a lot. The increase in energy consumption has led to a huge demand for tools/techniques which could manage this growing demand of energy.

Because of increase in the volume of data, more resources are required to hold data. Similarly, more energy is required as well to stabilize them (Ekanayake et al. 2008). However, there exists no such technique that can efficiently address all energy consumption issues. Researchers and scientists are developing different techniques which aim to minimize energy consumption in big data. Energy consumption has always special concerns in cloud computing data centers where thousands of computers, servers, routers, switches and bridges are operating and consuming thousands kilowatt of energy. Stakeholders of cloud computing are thinking of efficient energy algorithms which reduce cost of energy (Goiri 2012).

Although there exist many surveys on the energy efficiency in big data but the existing research does not provide a thorough insight of energy efficiency in the context of big data and CASs. Our unique contribution is to provide energy utilization methods, techniques and algorithms for CAS. In this paper, we provide comprehensive evaluation of existing techniques in the form of tables (i.e., Tables 2, 3, 4, 5, 6, 7), we provide extension and expansion of existing taxonomy of hardware based energy efficiency techniques as expressed in Fig. 5. We estimate energy consumption per server class for year 2007 and onward in Table 2. We provide component based taxonomy of energy efficient techniques in Table 1. We examine big data in the context of complex adaptive systems and overview variety of services provided by cloud provider, challenges faced by cloud provider. We further identify hardware and software based techniques and approaches used for overcoming the energy demands of the cloud and outline different techniques and characteristics in tabular form. We survey latest literature over the period of 2007–2015. Finally, we present our findings about one of the best techniques for energy efficiency which has some limitations but is considered comparatively a better technique.

Table 1 Energy efficiency techniques, component wise taxonomy

Full size table

The remainder of the paper is organized as follows: “Background” describes the background of the big data services, key challenges of big data and overview of the energy efficient techniques. “Critical analysis of existing surveys” provides the critical analysis of existing surveys. “Energy efficient techniques” details different techniques used in big data. We also provide the evaluation of each technique against certain parameters in the context of big data in this section. We provide our summary and findings in “Summary and findings”. Some open issues with DVFS are elaborated in “Open issues”. The paper is concluded in “Conclusion and future work” where the future directions are also elaborated.

Background

Efficient energy consumption has remained a concern for researchers and experts because too much energy consumption also results in depletion of natural resources, which in turn increase pollution and cause health hazards. According to a survey (Goiri 2012), there is 6 % increase in CO₂ emission from information technology (IT) sector which is also a great hazard for human health. In recent years, various organizations like IBM, Google and Microsoft have developed data centers in which thousands of machines are running and consuming large amount of energy. In order to cope up with this challenge of energy, different techniques are developed which minimize energy consumption in data centers.

Dealing with energy efficiency is necessary, otherwise, in coming few years cost of energy will increase from the cost of hardware. In order to deal with this issue, different software and hardware based techniques have been proposed and deployed in data centers (Bosilca et al. 2012). Energy consumed in big datacenter is computed by determining how much energy is consumed by each device when its operating. Efficient utilization of energy has drawn much more attention from cost and environment perspectives (Chen et al. 2014). When lots of machines are operating in the cloud infrastructure, this results in emission of CO₂. The use of the Internet, exchange of data over Internet, and the processing and analytical demand result in lots of energy consumption. Therefore, power consumption methodology, control, check and balance of power resources are necessary along with the expendability and accessibility of big data.

Different models have also been proposed for energy efficiency but each comes across with different bottlenecks because of service level and configuration changes (Krauth 2006). However, this issue is resolved to very much extent by modern service providing companies. It is believed that every algorithm has some pitfalls. If we talk about resource usage, control on carbon emission and policies specific domain are really challenging to build one common solution for all. Some new techniques, i.e., virtualization, sampling are also contributing towards energy efficiency like map reduce and intelligent power saving architecture (ISPA) (Meisner et al. 2009; Moise and Carpen-Amarie 2012; Ibrahim et al. 2009; Jiang et al. 2010). Big data Services are numerous which are supporting companies functioning rigorously. Big data is helping users to perform their tasks with its unique quality of services. Big data support networking services which has helped companies to develop CRM (Customer Relationship Management) and extend services to the user with the help of remote access and without time constraint.

We briefly and precisely present the overview of big data services, big data challenges and critical review of different energy efficiency techniques in the context of CAS in the following sections.

Big data services

Big data is serving organizations in different operations including management which is one of the key operation performed by any company weather its large or small. Big data is providing services of storage e.g., we can use provider storage for storing some important data, i.e., documents, employees details, business plans, strategies, logs data, top secret documents and dumps of memory. Big data providers are generating profit by providing infrastructure, services and products to the users. Overall 15 % budget of most popular companies is going to cloud providers. Big data has possessed different challenges related to services and configuration. Researchers has opposed and expressed valuable contribution in this field. In near future, big data will extend its services to the other domains as well, i.e., security analytics, developing visualizations, better understanding of threats domain, performance enhancement of engines, monitoring, managed service providers and vulnerability assessment etc. are future market as well, and most of the companies are in need of all these products.

Big data is providing custom software and providing execution environment which is good in term of saving personal resources. Big data is providing CIA (Confidentiality, Integrity, and Authenticity) triad which makes it trustworthy. Big data is always concerned with integrity of the data which is a key of building trust worthy relationship between clients and service providers. Big data has opened forums on which everyone can express their feeling and can share their experiences. Big data also maintains individual’s integrity by allowing user to test different proofs on the stored data. All recent applications are of distributed natures which are equally accessible to everyone which is only possible through the presence of big data.

Big data allow users to debug and install services in cloud. Big data has also provided help in medical field, by keeping data consistent and available to doctors every time. Complex adaptive system provides modeling and simulations, to manage the growing need of users demand in term of services. Complex adaptive systems provide good source of understanding the complex nature of different components and their functions. Few unique services provided by big data are summarized in Fig. 1.

Big data is providing services in all fields including management, engineering and networking. Big data has also contributed towards software development. Some softwares are only used in clouds and they are executed in infrastructure provided by cloud (Fig. 1). One of the unique features of the big data is data storage which is useful in maintaining data for long term without being stolen or loss. Big data provides roadmap for many operations which are not efficiently managed through manual management. Big data is providing intelligence capabilities, analytics and solution which are the best services provided by the big data. Big data is contributing in social networking as well. It helps in creating forum for which people are providing efficient solutions for any problem and working voluntarily for any cause. Big data has also opened different research directions as well for researchers to explore common trends, and provide some solutions to those problems. The solutions and understanding of the trend are helpful for the success of the big data.

Big data challenges

With popularity of cloud it comes with certain challenges, which required a handsome solution in order to contribute in clouds popularity. Some of the common challenges in the context of big data are listed below.

Hacking

Hacking is the most common challenge of the big data. Hacking term means obtaining, copying, reading or viewing confidential information of someone which is not intended. Hacking is one of the global challenges and lots of organizations are spending millions of dollars to secure infrastructure from hacking. Most of the big data services are using cryptography to avoid such breaches and to deliver desired level functionality to users.

Unauthorized disclosure of data/information

Apart from hacking this is carried out with some sophisticated tool, transfer of most important documents/information over insecure network need protection during exchange. The unauthorized disclosure of information because of weak network protocol or misuse of computer results in loss of reputation and trust. In order to secure communication and computers resources, a new field of network and computer forensics has emerged to overcome this global challenge of information security.

Performance bottleneck

With the popularity of the cloud computing, one of the key challenges faced by cloud provider is to provide services as intended by users. This is not an easy task to manage services with user demands. One of the key demands of the user is better accessibility to the services at any time. When system does not perform as expected is referred as performance bottleneck. Performance bottleneck usually happens because of increase in user’s strength, lots of request to single server or lots of processing carried out at server. However, cloud provider has managed this challenge by deploying lots of dedicated servers, completely benefiting from web services, providing replicas of information across different servers and middleware.

Scalability

Scalability refers to the state of the big data and cloud when there is an expected increase in number of resources or users. This is also a big challenge for cloud provider to ensure that system must always be available at any time. This could only be achieved when system resources are intelligently managed with workload. To ensure, that our service is available when millions of users are visiting our website or accessing resources from cloud. These global challenges require changes in system, application software and concurrent execution of the tasks.

Concurrency

Ensuring concurrent execution of the processes is one of the key challenges of the big data. Concurrency is allowing several processes to execute simultaneously using shared resources. Here, key challenge for cloud provider is to ensure no deadlock when several processes are executing. In rennet times, concurrency can be ensured with the help of synchronization.

Fault tolerance

Apart from hacking, unauthorized disclosure of information, scalability, transparency and concurrency, fault tolerance is also a global challenge which exists in big data. Fault tolerance leads to the detection of the failure in a predictable way and stop functioning of malfunctioned components for short period of time but with minimum loss of data. This requires the efficient management of the resources and control mechanism to properly acquire and release of resources.

Availability

Availability is one of the key features of the big data demanded by most of the users. This is the most common need that each system must be accessible from everywhere at any time. Its mandatory for cloud provider to make sure that at least one instance of machine is available each time, for example, if one machine goes down, the system should work with the reduced amount of resources. Critical resources must be replicated in order to provide desired level of services. Availability is hard to achieve in the sense that each replica must have consistent data over a specified period.

Energy efficiency

One of the potential issues which is addressed in this paper is energy efficiency. Different techniques for this challenge are summarized in Fig. 2. Some of these techniques are hardware based while others are software based. Energy efficiency is critical to keep cloud’s devices active and easily available. Meanwhile, in reliance to complex systems data is retrieved with the interaction of lots of components and other devices. Different activities in the complex system range from modeling, simulation, analysis and visualization of networks related to cloud. Efficient modeling and simulation of the CAS, reduces energy demand and increase the lifeline of the network which in turn increases network capabilities to handle user in energy efficient way. Therefore, it is necessary to achieve energy efficiency in CAS with small economical investment.

Critical overview of energy efficient techniques

Figure 2, summarizes different solutions for energy efficiency in big data. Different services and application infrastructures provided by big data are correlating with power consumption, because power consumption is almost application specific (Chen et al. 2012). However, the exact measures of energy are not possible because of energy differences in different applications.

There are various techniques which are used in data centers to ensure efficient utilization of the energy. Data centers are designed for the prime objective of improved performance in term of services and throughput with minimum energy consumption. Different experiments have been conducted on Hadoop using map reduce, and various tradeoff has been made. In literature, wide range of input and output devices have been studied which support efficient utilization of energy (Ibrahim et al. 2009). However, if data is distributed across various location then map reduce can take lot of time in response of a query, which in results in consuming more energy. There are different techniques which are reviewed in this paper, the prime objective of each technique is to provide energy efficiency without performance loss. Furthermore, we also evaluate the performance of each technique with the help of certain parameters.

DPM (Dynamic Power Management), DVFS (Dynamic Voltage frequency scaling), Virtualization, apart from these three, few network related approaches (i.e., sampling) are also available which support energy efficiency. There exist a set of various policies which uses these approaches effectively to meet the growing need of energy consumption. However, these are not the only techniques that are meeting this global challenge but this paper mainly focuses on detailed comparison of the following techniques.

DPM

DPM is also called APM (Automatic power Manager) in which devices are automatically powered off depending upon load. If load is low then some unused devices will be turned off automatically. This is one of the best techniques that manages load automatically and is implemented in both hardware and software. This technique is mostly implemented in big data which has metrics for keeping track of devices (Niaziand and Laghari 2012). However, bottleneck is there when metrics are not updated in real time, and if some intelligent algorithms are not implemented, misuse of resources can occur which in result cause maximum use of energy.

DVFS

DVFS technique is another best energy efficient technique which expresses relation between voltage and frequency. The relationship between these two quantities is given as P = V ² ∙ f (Dean and Ghemawat 2004). In this particular equation, P is representing the power consumed, while V is representing voltage, and f is representing frequency. By reducing voltage or frequency, power can be saved which is best way for determining energy efficiency in big data and it works in different states (Jiang et al. 2010). However, when to change frequency and under what circumstances, needs significant improvements.

Virtualization

Virtualization is another energy efficient technique which requires sharing of one physical device with multiple instances of multiple operating systems. Memory in virtualization and CPU are provisioned dynamically to maintain performance. Virtualization is used in most of the data centers to provide efficient use of energy. Virtualization also provides efficient resource utilization and provides management facilities.

Network approaches

In networking, several techniques have been proposed to reduce energy consumptions that growing needs of energy are fulfilled from available resources. Techniques at network level, helps to meet challenges of limited power (Dean and Ghemawat 2004). As networking resources involves extensive processing, some techniques like sampling are used to manage energy needs. In wireless sensor network, lifetime of a node depends upon energy level. Processing, clustering and routing are helpful to extend the life time of the nodes in the network.

Energy efficient clustering

Energy efficient clustering is one of the most important technique which is all about the behavior of the nodes. It reduces amount of energy required in inter communication and intra-communication between nodes and clustered architecture. It implements an algorithm which saves energy. This type of architecture is mostly used in wireless sensor network where in-network processing and efficient routing are used to meet energy need.

Sampling technique

Sampling technique is another energy efficient technique in networking. Power based query optimization is one of the most important techniques which to saves energy for different purposes. The increasing popularity of big data results in much power consumption because of the variety of the services which is a challenge for researchers in coming few years. Due to growing needs of energy in big data centers, there is a need of efficient algorithms and techniques to cope up these challenges being cost effective. To be energy efficient, we need several software and hardware based techniques and different algorithms to meet future challenges of power consumption. Energy management has several dimensions which contribute to the energy efficient computing, while facilitating user with constant supply of energy. Proper resource management, climate protection and saving costs will help meet the shortage of energy which is going to be doubled by 2050 when our resources will not be efficient to meet the demand of energy consumption (Chenet 2010). However, efficient policies and betterment in above mentioned techniques can cope up with this challenge of energy deficiency. Good policy-making about power resources and development in energy sectors aim to support balancing between energy consumption and energy saving. However, sampling and its implementation possess different challenges like selecting appropriate samples and creating links between those samples.

Following section covers the overview of available energy efficient techniques, meanwhile, detailed discussion of each above mentioned technique and evaluation of each technique upon certain parameters will be covered in section IV. Most of the discussion will be about the technical perspectives and contribution toward energy efficiency.

Critical analysis of existing surveys

In this section, we discuss and analyze different solutions for energy efficiency in big data. These solutions exist in the form of techniques, architectures and policies. The aim is to provide review of different techniques which target to fulfill ever growing need of energy. The current need of energy is expected to be doubled by 2050 (Chenet 2010). The core concept of energy efficiency techniques with respect to computing resources is expressed in Table 1.

Energy needs proper utilization. Due to the recent advancements in computer technology it has become a challenge for the data providers to maintain balance between services and energy. Most of the data provider whose initial focus was on services, come across with power challenges. Software based techniques have helped in reducing energy needs with source code which is to be executed at some CPU which ultimately needs energy efficient hardware.

The proposed solutions are deployed on different components, i.e., CPU, disks, system, clustered system and networks. These components play vital role in energy management. Each component aims to reduce energy consumption by its configuration or operation. Apart from this taxonomy certain approaches like sampling also helps in achieving energy efficiency in complex network segments. Architectures also support the efficient management of energy. Complex system’s components also need proper management to contribute in energy saving with the help of implemented architectures and policies. Big data entities, nonlinear interactions and distributions require the implementation of both hardware and software based techniques to achieve energy efficiency.

The core of this paper is based on software, hardware and combined hardware and software based techniques, which are explored and new solutions are identified which results in achieving energy efficiency in the complex systems. These techniques will help big data centers to manage energy efficiency. DPM technique is one of the best power management techniques which comprises of two other techniques, software or load balancing technique. DPM is actually load balancing technique, which balances the load of energy between different components either CPUs or other components. CPU in a sense is the most energy consuming component of the computer which consumers 30–35 % of the entire energy when compared with other components (Dean and Ghemawat 2004). The evaluation of different energy saving techniques is carried out with the help of certain parameters. These parameters are adjusted according to the software and hardware configuration and operational management. DPM turns off the CPU when it’s idle or non-functioning (Dean and Ghemawat 2004). However, special intentions are required at sensitive places to deploy this approach. DPM techniques concerned with most energy absorption component, i.e., CPU. First it makes CPU energy efficient, and then focuses on memory modules because low power memory modules and CPU results in energy efficiency. IBM has developed certain techniques which are most energy efficient (Menon 2012) considering DPM concept.

SPM is best energy efficient technique which balances and manages workload. SPM approach is good for mobile devices and handheld devices which don’t require higher energies like big data. SPM is good for supercomputers and HPC (High performance computing).Virtualization has achieved better intentions by making use of physical device for multiple operating systems. Virtualization is used by most of the data centers for event management of the whole cloud. Virtualization makes best use of computer resources and makes hardware more productive.

We explore different techniques contributing toward energy efficiency. The categorization of these techniques is given in Fig. 3 which shows the relationship of static and dynamic power management techniques. Apart from these techniques, energy efficient chips also exist which support in energy saving. Cache RAMs also support energy efficiency by letting respective cache to be powered down and powered up by specific control register. Memory management unit (MMU) RAMs concept is also same like cache concept and also supports energy efficiency through successive operation.

Initially, the designers of the system focus on performance when they developed certain systems which are used today but lot of power consumption and electricity bills compel designers to rethink about energy efficient mechanism (Dean and Ghemawat 2004). Performance consideration poses many challenges and energy saving is one of big challenges by keeping services equally accessible by everyone. Most of the developers have considered this issue but no solution has been identified. Early researchers have developed different techniques which are providing certain advantages but still on the other hand, they are degrading either performance or response time. Hardware level techniques are of main importance which improves energy efficiency that result in less bills, excellent performance and long run benefits. Software level techniques have considerably better performance but yet bugs in the code and execution environment makes the deployment a bit technical. Most of the data centers deployed variety of software techniques in combination with hardware techniques to achieve energy efficiency. Energy consumption which was surveyed in previous years in server class (W/Unit) from 2000 to 2006 is summarized in Table 2 (Valentini et al. 2011b).

Table 2 Estimated energy consumption per server class

Full size table

With the increasing amount of the data, the energy consumption is increasing every day. Energy consumption is increasing with server classes as depicted in Table 2. The use of certain techniques and approaches can reduce that power consumption. Beyond the services provided by the cloud, certain approaches need significant improvements. Power consumption tradeoff is not suitable in the big data environment. The development of the tools is not up to the mark and thus it does not simulate and model the behavior of the users over specific period of the time efficiently and accurately. The tools must be capable to model self-organization and other complex phenomena related to human life.

Some of the cloud providers are unstructured, i.e., P2P system which requires applications and development of different tools to cater growing energy needs. Modern systems are unstructured and therefore algorithms like self-organized power consumption approximation algorithm (SOPCA), which are used to monitor power consumption of the different devices. Modern complex systems not only need to change ranges and other parameters but also need to model and simulate the behavior of the entities. Some tools have been developed to handle this task but these tools are very limited in scope. In order to get better understanding and accurate results, some tools like Net Logo and agent based toolkit have been proposed and used by researchers to model complexity of the CAS.

One of the earlier works in which power management has been applied at the data center level has been done by Pinheiro et al. (2001). In their work, the authors have proposed a technique for energy efficiency in heterogeneous cluster of nodes serving as web applications. The main contribution of this work was concentrating the workload of each node and switching idle nodes off. However, the load balancing and weak implementation of SLAs results in performance degradation. Nathuji and Schwan (2007) have studied power management techniques in the context of virtualized data centers. The authors have introduced and applied a power management technique named “soft resource scaling”. However, the adoption and implementation of this technique has not achieved required result because of guest operating systems which were legacy or power unaware.

Gupta et al. (2003) have suggested putting network interfaces, links, switches and routers into sleep modes when they are idle in order to save the energy consumed by the Internet backbone and consumers. However, the adoption of such technique result in communication loss if necessary components are in sleep mode and power consumption at wake up of different devices. Disks design also contributes in energy efficiency, the authors Colarelli and Grunewald (2002) has presented the concept of MAID (massive arrays of idle disks), a technique which power off the unused disks when they are not in use. That is basically an array of disks spins which writes recently used data on cache disks. However, these cache always remain spin up and regular disks remains idle which in turn increases the energy consumption.

Kim et al. (2012) presented a novel approach, called FREP (Fractional Replication for Energy Proportionality), for energy management in big data. FREP includes a replication strategy and basic functions to enable flexible energy management according to the cloud needs, including load distribution and update consistency. However, the impact of the replication on the over storage cost of the system has not presented. Kaushik and Bhandarkar (2010) proposed an energy conserving hybrid multi zone variant of HDFS for intensive data processing, commodity Hadoop clusters. This variant has considerably improved energy efficiency up to 26 % in 3 months as a simulation run. This technique has cut the power budget to $14.6 million dollars. Different types of cloud infrastructures including traditional cloud and high performance computing (HPC), need to be enhanced to support dynamic power demands (i.e., adjust powers automatically), which in turn creates new challenges in designing architecture, infrastructure, and communications which are energy efficient and power aware resources. This concept was given by Bruschi et al. in (2011).

A comprehensive survey about energy saving strategies in both network and computer system that has potential impact in saving energy of integrated systems is given by Berl et al. (2009). Beloglazov and Buyya (2012) highlight the energy concerns while designing system, performance and energy efficient application development. They explained the goal of the computer system design shift to power and energy concerns. The authors carried out a detailed survey about the power consumption problems, different hardware and firm level techniques, how operating system contributes toward energy efficiency, and data center level technique of energy efficiency and importance of virtualization in data centers to achieve energy efficiency. The detailed survey also explains the power consumption at different levels in computing system in terms of electricity bills, power budget and Co₂ emission.

DVFS has offered great reduction in energy consumption in cloud infrastructure by changing voltage and frequency according to workload. The implementation of such technique in the cloud has reduced the power consumption significantly. Most of the cloud has implemented this technique which is CPU level technique the most energy absorption component. DVFS has attained lots of attention from research community being adoptive and efficient. Complex adaptive system modeling and simulations are used to clearly communicate the facts about the complex systems nature. The entities interaction and co-ordination helps in understanding behaviors of the complex systems. To manage and meet energy needs in complex systems some of the approaches have been proposed and used by cloud providers. Intelligent self-organizing power-saving architecture (ISPA), which assists in identifying suitable idle computers intelligently, let the system shut down or hibernate automatically based on a uniform rule-based company-wide policy. This architecture results in minimum performance loss as compared to other techniques. The detailed description of the hardware and software based techniques is elaborated in the next section.

Energy efficient techniques

To achieve energy efficiency in the big data, lot of techniques, algorithms and architectures have been proposed by the researchers. These techniques have been categorized in Fig. 3 and will be elaborated in the coming section. Mix uses of these techniques have helped cloud providers to manage services and infrastructure in an efficient way. These techniques have achieved considerably lot of attentions from research community and helped in achieving energy efficiency.

Growing needs of the energy is managed by these techniques and it is expected that it will become quite challenging for coming few years if significant improvement have not been done in this area. Energy efficient techniques have not helped only in reducing energy consumption but also in delivering services with ease. These techniques are better in achieving energy efficiency in cost effective way. These techniques are categorized in SPM and DPM techniques.

SPM techniques

SPM is basically a hardware level technique and mainly focuses on the transistor and process technology. The main focus of such technique is on design of the system, for example, if the hardware is initially designed while keeping in mind the energy consideration, it must be of prime importance. SPM techniques are mostly focused on CPU design and architecture. These are mostly hardware based techniques. If we talk about CPU it’s further subdivided into cycle level and instruction level. These techniques involve transition from low energy consumption to high energy consumption depending upon the load.

In Fig. 4, CPU level energy efficient techniques have been elaborated. At cycle level, certain components are activated or deactivated by the demand of processes in energy consumption sense. This is the best technique because only required components are activated. Instruction level execution, determines the energy consumption at instruction level and cost of each instruction is characterized at certain levels. During execution of instruction, several parameters like frequency and voltage are associated. Apart from instruction level and cycle level, the circuit’s level approach also helps in achieving energy efficiency. The configuration of the circuits determines the flow of current and transition from low energy flow to high energy flow. All these three dimensions help in balancing energy demands in CPU, which is the most energy consumption component of the big data.

From a system perspective, disk is the large consumer of the energy but simple disks can be replaced with solid state disks, which can reduce the energy as compared to old disks, which are consuming about to 34 % energy. During the design of chips, special attention is required toward energy consumption, and energy related technologies must be applied because technology implemented circuits can reduce this cost of 34 % to round about 15 or 20 %. Another state which consumes large energy, in user mode, when system is executing instruction in the user mode then it consumes large amount of energy, while kernel mode needs less energy which is almost 15 %. Invocation of kernel mode result in 10 % of energy consumption.

Some major concerns, like whenever CPU is not doing useful work it must have to execute at least one process called idle process, during the execution of that process CPU consumes 5 % of the energy, which is overhead. No efficient techniques are developed so for which can minimize this loss. However, different techniques are developed which let processer sleep when it’s not doing useful work. This not only saves energy but also improves the performance of the computer system.

Hardware level techniques

Just like software techniques, there exists a combination of hardware techniques, which is dependent on hardware configuration. These techniques have lots of benefits being energy efficient. The taxonomy of these techniques in the context of big data is given in Fig. 5 which elaborates the existing hardware approaches for energy efficiency. There are mainly two approaches DPS and DCD. The further division of these two techniques is component based and each has some features which contribute in energy efficiency of big data.

Design constraints

We selected different parameters for evaluation of each technique such as:

Performance
Goal
Cost (in term of money & in term of manpower)
Complexity
Response time
Energy efficiency
User awareness
Workload management
Platform dependency
Management
Maintenance

The reason to select these parameters is that complexity means the complex nature of the big data and heterogeneous elements (objects and agents), for example, the implementation of any energy efficient technique will certainly influence the behavior of other agents which in turn affect the other parts of the systems, e.g., security or resilience. Energy efficiency means how much energy is saved in correspondence to workload and management of energy in relation to context switching and transitioning of the system in active and sleep states. Each technique has a unique goal of reducing amount of energy consumption and increasing throughput. User awareness and management referred as the adoption of the technique in relevance to infrastructure size, supply companies and stockholders interest. If the level of user-awareness is high and management is up to the mark then it will be helpful in adaption of common practices for energy use and particular market energy frameworks. Response time of each technique is task specific, for example, if user reading ten lines from same document results in less energy consumption as compared to reading ten lines from ten different documents.

Cost is the major factor and here we calculate two types of the costs, one is in term of manpower (e.g., lines of code, language and chips) and the other in term of money (e.g., installation, execution, testing, hardware resources and configuration cost). Each technique has different types of costs associated with it depending upon complexity and operations. Maintenance referred as support provided after the implementation of each technique and testing of each technique on different hardware resources. Workload is another determinant of each technique and workload management is mandatory. It means efficient operations and working of cloud under critical circumstances. However, hardware designs, material to build hardware, technique/approaches, tools and execution environment used to build software affect its operation and energy consumption. The hardware which is built keeping in mind the performance concerns has high energy consumption rate than those which is built keeping in mind the energy concerns.

Dynamic component deactivation (DCD)

This technique is most energy efficient, which disabled unused components on the basis of some predefined rules. However, prediction of activation and deactivation causes lots of energy consumption. No such efficient algorithm exists to determine interval with accuracy. Such transition degrades system performance as well but based on some historical data, transitions are done effectively to transit system in different states (Valentini et al. 2011a; Koomey 2007). DCD performance evaluation is expressed in Table 3. The design parameters are selected for relevant technique on the basis of complexity and power saving mechanism. The evaluation of each technique is carried out on the basis of few common parameters.

Table 3 Performance evaluation of DCD

Full size table

Figure 5, summarizes all hardware techniques which are supporting energy efficiency. Hardware support is a key to achieve energy efficiency using algorithms, policies and software approaches. Hardware are properly evaluated and tested by reputed companies before deployment to achieve energy efficiency effectively. All those companies who are investing money to cope up with energy issue using hardware are benefiting more than those who are investing in software. Different software and hardware techniques and their implementation produce desired results. Recent advancements are remarkable which have enhanced big data popularity by all means, and delivering services to intended user in cost effective and desired way. Performance evaluation of desired technique is expressed in Table 3 with few important parameters which are used to assess its performance.

DCD is further divided into various techniques, i.e., predictive and stochastic which contribute toward energy efficiency. If we talk about predictive techniques on the basis of prediction, decision will be made when to activate and deactivate the system components. Different policies exist, which ensures the correlation between active and inactive states. Energy is consumed when we let different components to wake up and go to sleep, which also hinders performance overhead and cause serious drawbacks. Predictive wakeup and predictive shutdown provide solution to above problem these are on the other hand provides best solution to deal with the above mentioned problem. However, certain issues related to intelligence are implemented in these mechanisms.

Predictive shutdown policies address the issues of inactivity. According to the instance or situation of the predictive shutdown, historical data predicts the next idle period. These approaches involve decision making and are highly dependent on actual utilization of energy and the strength of co-relation between previous and next events. History predictors are energy efficient but they are not as safer as timeouts which works on predictions (Benini et al. 2000). However, predictions are not supportive in many situations. Predictive wakeup techniques aim to reduce the energy which is consumed on activation. Meanwhile, most of the components require lots of energy at wakeup. The transition from active to inactive state is computed on the basis of some previous record, and sometime on the requirement of the user (Albers 2010). In these techniques, energy consumption is high but minimum performance overheads are there on wakeup. Performance evaluation of fixed timeout, predictive shutdown and predictive wakeup is expressed in Table 4. The accuracy of such techniques is determined in terms of the complexity, performance, maintenance, costs and energy efficiency.

Table 4 Performance evaluation of Fixed time out, Predictive shutdown and Predictive wakeup

Full size table

In the above section, the related concepts to SPM and sub techniques have been explored. All these techniques are static. In order to deal with problem of intelligently determining idle components, the adoptive techniques have been developed. Prediction about next transition is inefficient when workload is not determined in advance. Several practical techniques have been discussed in the literature which mainly focuses on energy efficiency (Srivastava et al. 1996). SPM considers architecture of the RAMs and CPU and related components. SPM is specially designed to control the internal structure of CPU, including circuits, chips structure of buses and ports. SPM uses intelligent approaches to determine the transition and sequences of inactive and active states.

Dynamic performance scaling

This technique is used in different hardware components which are used for energy efficiency. Each component dynamically adjusts performance proportional to power consumption. Instead of deactivation, it gradually decreases supply when resources have not been used. This concept leads toward further discussion which lies in all dynamic voltage and frequency scaling. This is considered to be the best energy efficiency technique and produces better result when compared with others. This technique is helpful in maintaining performance of the system and ensures that energy is consumed in a balanced way because each component maintains energy states which are stable.

Dynamic voltage and frequency scaling

DVFS contributes well in energy efficiency especially in the cloud environment. CPU frequencies need proper adjustment but frequency adjustment requires voltage scaling as well. Both these parameters need adjustments collectively in order to contribute towards energy efficiency. Sometime increase in voltage causes increase in temperature which in turn increases energy consumptions. DVFS minimizes the number of instructions that can be issued by the CPU in a particular instance of time, which results in the reduced performance. This in turn increases performance overhead especially for CPU bound processes. Researchers and designers are exploring the same issue from several years but are unable to provide optimal solution. General formula used for voltage and frequency calculation and related parameter details is expressed in (Dean and Ghemawat 2004) which given as:

$$P = C{ \cdot }V^{2}\,{ \cdot }\,f$$

(1)

DVFS looks straightforward but implementation is not so easy. The structure of real system has imposed certain technicalities on the DVFS. Production of desired frequency to meet application performance is also tricky. However, the authors are not sure about power consumed by processor its quadratic, linear or non-linear to voltage supplied (Chen et al. 2012). Several approaches have been practiced that reduce energy consumption. This energy consumption can be categorized as interval based; intra-task based and inter-task based (Hwang and Wu 2000). Interval based technique is same as adaptive technique which predicts the CPU cycles and transitioning is done in various orders.

Inter-task approach dynamically distinguishes between processes based on their execution time and assign them a different CPU speed (Hwang and Wu 2000; Douglis et al. 1995). However, this can cause an issue when different scheduling algorithms are applied because execution time using round robin (RR) scheduling algorithm will be different than first come first served (FCFS) algorithm. Voltage and frequencies can be best adjusted if we know the workload in advance, or its constant throughout the execution. In comparison with inter-task, intra-task approach provides fine grained information about the structure of the programs and tune the processor voltage and frequency in the tasks effectively (Buttazzo 2002; Andrew et al. 2010; Wiessel and Bellosa 2002). Performance evaluation of DFVS is provided in Table 5.

Table 5 Performance evaluation of DVFS

Full size table

DVFS is always concerned with energy saving from its efficient energy scheduling method. It saves energy when peak performance of any component is not required. It also adjusts CPU cycles, when CPU is not doing useful work, i.e., reading data from disk. DVFS scheduling is one of the best technique, which contribute toward energy efficiency. DVFS uses A2E which makes it different from all other techniques available for energy efficiency. It scales up and down voltage and frequency so well that performance is not hindered. DVFS uses simple method to save energy which is high enough to keep servers on all the time. However, for most data intensive solutions it may not be suitable option because these applications mostly use read/write operation. It compete all other techniques which are available for energy saving with minimum performance compromises. This is adoptive and scheduling is runtime which is a key to success. This is the reason DVFS is mostly used by companies who are crowd king of big data (Lee and Sakurai 2000). Dynamic voltage and frequency scaling is deployed in many data centers to fulfill the energy needs. The devices needs to be built with service oriented and energy oriented architecture. The performance evaluation of the DVFS is provided in Table 5.

Powernap

Meisner et al. 2009 gave an idea of active power consumption in data centers based on fast switching between active and dump power states. The proposed goal of these techniques is same as other, i.e., minimize energy consumption. Each technique aim is partial amount of energy reduction instead of getting proportional of energy consumption computing proposed by Barroso and Holzle 2007. In Powernap, main focus is not only on two states, i.e., (sleep and active) but also on CPU switching and workload management etc. In other techniques, the major concern is fast transition between states, i.e., very low power consumption in sleep states and high very energy consumption in active states.

During powernap the implementation was tested on different systems and their transitions were determined and comparisons have been made at different states. Different conclusion have been drawn based on the assumption that if switching time is less than 10 ms or equal than 10 ms power savings are approximately smooth and linear and are more than DVFS. However, in ideal situation the transition time is 300 ms. desired requirements are hard to meet but if authors have determined the mechanism for transition time then average server power can be reduced to 74 % (Lee and Sakurai 2000). Performance evaluation of powernap is provided in Table 6 where we compare it on the basis of certain parameters such as complexity and cost etc.

Table 6 Performance evaluation of Powernap

Full size table

Virtualization

Virtualization is another economical technique which allows us to develop layers of multiple operating systems and application running on single hardware. Resources can be split into a number of logical pieces which are called virtual machines. Each Virtual machine provides an individual operating system, creating a view of allocated physical resources ensuring performance and failure isolation between virtual machines sharing single physical machines between them. Virtualization provides variety of benefits like it provide us an environment to build multiple operating systems on the single hardware, which saves cost of buying another physical hardware.

Virtualization provides lots of support in energy efficiency which is major concern, in today’s data centers. Virtual machines are contributing in delivering big data services optimally and proper management is giving numerous benefits. According to the recent studies, virtual machine manager always behaves like power-aware operating system without distinctions between virtual machines which monitors all system performance and ensures the correct application of DVFS or any DCD technique to all system components.

Another way to achieve energy is leverage operating system power management policies and application processing base knowledge that check the power management of different virtual machines. On the basis of changes in the hardware power state or enforce power limits energy efficiency in an effective manner. Different virtual machines (VMs) are available such as Xen, VMware and KMP which provide virtualization with variety of benefits. Xen provides lots of support in C-states (CPU sleeping States) (Wei et al. 2009). Xen determines CPU utilization periodically, then determine the appropriate P-state and issue a command to change in hardware power state.

This transition helps in saving energy. Xen provides four governors related to power saving which are:

On demand-choose the best P-state according to current resource requirements.
User space-sets the CPU frequency specified by the user.
Performance-set the highest available clock frequency.
Power save- set the lowest clock frequency.

Apart from governors, it also supports C-states very well. When CPU is not doing any useful work, it lets CPU to go to C states and when activation is required it again switches to active state. It has lots of benefits but like offline and online migration of VMs which can help in reducing significant amount of energy by using VM consolidation algorithms. XEN has also performance overheads like DCD technique explained previously. VMware also contributes to energy efficiency by supporting basic system level power management, using dynamic voltage frequency scaling. The system dynamically monitors the CPU processing and its utilization and periodically applies appropriate ACPI’s P-states (Wei et al. 2009). In pool of servers where virtualization is implemented, power is reduced by dynamically switching off spare servers (VMware Inc 2009). Performance evaluation of virtualization is provided in Table 7.

Table 7 Performance evaluation of virtualization

Full size table

It can be inferred that virtualization has helped in achieving energy efficiency. Experimental results suggest that prototype is responsible for implementing power limits between energy aware and energy unaware guests. Different devices with different power states processor which is dedicated for hardware-assisted virtualization, multiple core architecture have been used in data centers (Friedman 2009).

It is believed that these areas require further investigation and research to progress and explore more energy efficient algorithms. Amongst all techniques available, virtualization provides best energy efficiency if properly designed and implemented can result into long run benefits which reduce efforts in terms of man power and in terms of money.

Summary and findings

Energy efficiency in big data remains challenging because of the variety of the services that need to be processed by big data and continued demand for high performance computing. The main methods to provide energy efficiency in complex systems are: (a) software based energy efficient techniques (b) hardware based energy efficient techniques (c) to focus energy efficient architectures and energy efficient algorithms and policies. The main objective of this paper is to focus hardware and software based techniques which are used in big data complex adaptive system. These techniques are deployed to achieve energy efficiency in the context of big data. Energy efficiency is an important requirement of current and future complex adaptive systems. Each of the above mentioned technique has certain advantages and disadvantages. The two main categories of energy efficient techniques are: SPM and DPM, SPM has better energy efficiency but degraded performance. Meanwhile, DPM has shown better results from both energy and performance perspectives. However, the implementation of each technique is dependent upon the company economy scale and operations. If performance concerns are important for an organization operation then they will certainly go for DPM. However, if execution of the task is simple, runtime response is not needed and workload is manageable then the organization will certainly adopt SPM techniques. Virtualization is mostly used in data centers to increase the hardware utilization and efficient management of the resources. Parallel data processing and techniques like map reduce is contributing a lot in providing services efficiently. On the other side, data storage at multiple locations also poses certain challenges of making it available all the time.

In this research, we found DVFS to be best according to different parameters and it is implemented by most of the data centers. According to the researcher, it is one of the best techniques of the future which will contribute well in energy efficiency with the help of multi-component approach. Multi component approach uses DVFS to reduce power consumption of CPU during communication, execution and computation phases. However, changing trends cope up with new techniques like virtualization or some other techniques which are also contributing in efficient cloud management. Software and hardware support is also required in implementation of specific technique. Since last few years, researchers are also working on power scalable hardware and software components to reduce growing energy need while limiting performance loss.

Intelligent self-organizing power-saving architecture (ISPA), helps in solving the problem of determining idle component, i.e., which computer needs to be shut down in order to save power. This architecture assists in determining suitable idle computers for shutting down or hibernation automatically, based on some rules depending on organization policy. By means of selective and intelligent track of computers and turning them off or hibernate them, the scheme helps in achieving productivity and lower cost. This architecture is helpful when the energy consumption is high and needs efficient utilization of the resources without performance loss. All contributions of our study are summarized in Table 8.

Table 8 Energy efficiency in big data techniques and architecture

Full size table

Open issues

Recent research revealed that different energy efficient techniques can significantly improve energy saving in cloud computing, big data and CASs. Most importantly, DVFS has much better energy saving measures than all other existing techniques. Implementation of DVFS is not only helpful in energy saving but also improves the energy utilization at different levels of network infrastructure. However, its important to evaluate and design different test environments to enhance the utility of the DVFS in different settings. Few common challenges of DVFS are listed below.

Lack in software controls

As DVFS is mostly implemented in the form of software, some software does not allow regulation of voltage properly because of hardware dependence. As a result, DVFS fails to save energy. Some software requires hardware changes to regulate over and under volts. DVFS fails to achieve energy efficiency because of different components configuration. In order to be successful on different software, DVFS must have voltage mods to adjust energy in RAMs, motherboards and chips etc.

Formula modification

Because of the different materials used with hardware, energy saving varies from component to component. In order to achieve maximum energy efficiency, its desirable that most of the components include metal oxide semi-conductor (CMOS) technology. As most of the chips are not cent percent made up of CMOS so (Eq. 1) gives different results. For this it’s desirable that instead of C in (Eq. 1), there should be some component specific constant to provide its capacitance and voltage flows etc.

No detection of short burst

Sometime CPU bound processes need very short CPU burst to complete execution. During this activity CPU frequencies are not adjusted accordingly. Therefore, significant amount of energy is wasted and bursts are not detected even by DVFS. Therefore short burst detection and proper load management needs further investigation.

Energy efficiency in multi user-cases

The existing literature mainly focuses on wired networks which are limited to servers only. In multi user case, different user may have different platform and different operating systems. This diverse nature of the network needs further exploration w.r.t energy saving. Therefore the exact measures and energy saving in multi user case is still unknown.

Frequency performance mismatch

DVFS is mainly based on different frequencies which are adjusted according to workload. However the accuracy between frequencies and workloads is still unknown. As most of the time wrong burst size and frequencies adjustment hinder performance of the system. In this alteration of frequencies, an intelligent and real-time decision process is still unknown.

Workload mismatch

In DVFS, frequencies are altered on the basis of workload. A strong statistical analysis of frequencies on workload is necessary to achieve required objectives of energy efficiency. For this, either workload must be known in advance or runtime adjustment needs to be made. How to allocate resources and frequencies according to workload correctly is not known clearly.

Inaccurate frequency measures for CPU bound processes

In DVFS implementation, sometime change in frequencies results in performance degradation. Change in frequencies for I/O bound processes result in maximum performance while same in CPU bound process cause performance loss. CPU bound processes cause performance loss and frequencies variation. In order to achieve maximum benefits from DVFS implementation, modifications are required in prediction models. DVFS policies variation and prediction model, still needs further investigation. Based on the preliminary results and studies we consider DVFS a better energy efficiency technique to quickly estimate the power consumption of different components. However, power-aware job scheduling, workload management, accurate selection of bursts and component based policy making can enhance DFVS performance.

Conclusion and future work

In this paper, we surveyed different techniques used for enhancing the energy efficiency of big data. We have surveyed the contributions that are available in this domain from the recent research. We critically analyzed different software and hardware based techniques used for processing, communicating, computation and providing services to the users in big data. We have evaluated the performance of each technique and drawn core comparison expressed in Table 8. In cloud, significant amount of energy can be saved using integrated techniques. While designing applications related to big data, energy must be prime concern. Some cloud computing application consumes more energy than traditional applications. ISPA helps in reducing energy need by transitioning between low and high energy in the complex systems. This approach is helpful to maintain growing energy needs of the big data.

We critically investigated various techniques, and found that DVFS is the best among all but still some performance issues are associated with it. In future, for example, this option can be explored further instead of shutting down processor, there can be some improvement in process based energy efficiency. If process is I/O bound then it decreases energy of the CPU and if process is CPU bound then it gives energy as required to do processing. Most importantly, a benchmark needs to be set for saving energy in big data considering future trends and needs. Our broad conclusion is that energy efficiency must be given preferences and implementation of energy efficient techniques is required to meet this global challenge of energy.

In future, algorithms can be developed which can optimize energy efficiency in big data based on artificial intelligence. DVFS can also be optimized by adding some specific parameters and performance can be enhanced. Some substitutes version of DVFS can be figure out which is good for data intensive application with minimum performance halt. However, processing specific energy consumption concept can also be explored (i.e., application involves in complex processing require more energy). Complex adaptive systems have reliance from the most of real life situations and have gained lots of attention from research community. In future, key trend related to the complex adaptive systems in big data and essential features and challenges need to be explored further. We aim to explore different architectures of ISPA. These architectures will be analyzed to see how they fulfill the high energy demands of the corporate sectors without disruption of user services or loss of grade of user experience. In addition to this, there is a need of special attention in integrating ISPA with other techniques to cater energy demand in big data.

References

(2009) How VMware virtualization right size IT infrastructure to reduce power consumption
Albers S (2010) Energy efficient algorithms. Commun ACM 53(5):86–96
Article MathSciNet Google Scholar
Andrew LL, Lin M, Wieman A (2010) Optimality fairness and robustness in speed scaling design. In: Proceedings of ACM international Confrence on measurement and modeling of international computer System(SIGMETRICS 2010), New York, USA
Barroso LA, Holzle U (2007) The case for energy-proportional computing. In: IEEE computer pp 33–7
Batool K, Niazit MA (2015) Self-organized power consumption approximationin the internet of things. In: International Conference on Consumer Electronics (ICCE), 2015 IEEE, 9–12 Jan. 2015
Beloglazov A, Buyya R (2012) Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr Comput Pract Exper 24(13):1397–1420
Article Google Scholar
Benini L, Bogliolo A, Micheli GD (2000) A survey of design techniques for system-level dynamic power management. IEEE Trans Very Large Scale Integration(VLSI) Syst 8(3):299–316
Berl A, Gelenbe E, Girolamo M, Giuliani G, Meer H, Dang M, Pentikousis K (2009) Energy-Efficient Cloud computing. In: Oxford University Press on behalf of The British Computer Society
Bosilca G, Bouteiller A, Danalis A, Herault T, Lemarinier P, Dongarra J (2012) DAGuE: Aa generic distributed DAG engine for High Performance Computing. Parall Comp 3751
Bruschi J, Rumsey P, Anliker R, Chu L, Gregson S (2011) Best Practices Guide for Energy-Efficient Data Center Design by Rumsey Engineers under contract to the National Renewable Energy Laboratory
Buttazzo G (2002) Scalable Application for energy aware processors. Embed Softw pp153–165
Chen M, Mao S, Liu Y (2014) Big data: a survey. Mobile NetwAppl 19:171–209
Article MathSciNet Google Scholar
Chen Y, Alspaugh S, Borthakur D, Katz R (2012) Energy efficiency for large-scale map reduce workloads with significant interactive analysis. In: Proceedings of the 7th ACM European conference on Computer Systems, ser. EuroSys’12, 2012, pp. 43–56
Chenet Y (2010) To compress or not to compress-compute vs. IO tradeoffs for map reduce energy efficiency. In: Proceedings of the first ACM SIGCOMM workshop on Green networking, ACM, 2010, pp 23–28
Colarelli D, Grunewald D (2002) Massive arrays of idle disks for storage archives. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, ser. Supercomputing’02. Los Alamitos, CA, USA: IEEE Computer Society Press, 2002, pp 1–11
Dean J, Ghemawat S (2004) Map reduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on Symposium on Operating Systems Design and Implementation, 2004, pp. 10–10
Douglis F, Krishnan P, Bershad B (1995) Adaptive disk spin down policies for mobile computers. Comp Syst 8(4):381–413
Google Scholar
Ekanayake J, Pallickara S, Fox G (2008) Map reduce for data intensive scientific analyses. In: Proceedings of the 2008 Fourth IEEE International Conference on eScience, ser. ESCIENCE’8-8-2008, pp 277–284
Friedman E (2009) SQL/Map Reduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. Proceedings VLDB Endowment 2(2):1402–1413
Article Google Scholar
Goiri In (2012) GreenHadoop: leveraging green energy in data processing frameworks. In: Proceedings of the 7th ACM European conference on Computer Systems, ser. EuroSys’12, 2012, pp 57–70
Gupta M, Singh S (2003) Greening of the internet. In: Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM, 2003, New York, NY, USA, pp 19–26
Habiba U, Masood R, Shibliand M, Niazi MA (2014) Cloud identity management security issues and solutions: a taxonomy. Comp Adap Syst Model
Hwang CH, Wu AC (2000) A predictive system shutdown method for energy saving of event-driven computation. ACM Trans Design Automat Elect Syst (TODAES) 5(2):241
Ibrahim S et al. (2009) Evaluating map reduce on virtual machines: the Hadoop case. In: Proceedings of the 1st International Conference on Cloud Computing, ser. CloudCom’09, 2009, pp 519–528
Jiang D, Ooi BC, Shi L, Wu S (2010) The performance of map reduce: an in-depth study. Proc VLDB Endow 3(1–2):472–483
Article Google Scholar
Kaushik RT, Bhandarkar M (2010) Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the 2010 international conference on Power aware computing and systems, ser. HotPower’10. Berkeley, CA, USA: USENIX Association, 2010, pp 1–9
Kim J, Rotem D (2012) Frep: energy proportionality for disk storage using replication. J Parallel Distrib Comp 72(8):960–974
Article MATH Google Scholar
Koomey JG (2007) Estimating Total power consumption by server in the US and the world, Analyst Press, Oakland
Krauth W (2006) Statistical mechanics: algorithms and computations. Oxford University Press, USA
Google Scholar
Kumar R, Gupta N, Charu S, Jain K, Jangir, SK (2014) Open source solution for cloud computing platform using OpenStack. In: IEEE International Conference on Consumer Electronics (ICCE), 2014, 9-12-2014
Lee S, Sakurai T (2000) Runtime voltage hopping for low power real time Systems. In: proceedings of the 37th Annual design Automation conference, Loss Angeles, CA, USA, pp 806–809
Meisner D, Gold BT, Wenisch TF (2009) Powernap: eliminating Server idle Power. ACM SIGPLAN Notices 44(3):205–216
Article Google Scholar
Meisner D, Gold BT, Thomas F (2009) Powernap: eliminating server idle power. In: Proceedings of ASPLOS09, Washington USA
Menon A (2012) Big data i-e Facebook. In: Proceedings of the 2012 workshop on Management of big data systems, ser. MBDS’12, 2012, pp 31–32
Moise D, Carpen-Amarie A (2012) Map reduce applications in the cloud: a cost evaluation of computation and storage. Data Manag Cloud Grid P2P Syst 7450:37–48
Nathuji R, Schwan K (2007) Virtual power: coordinated power management in virtualized enterprise systems. ACM SIGOPS Operating Systems Review, pp 265–278
Negru C, Pop F, Cristea V, Bessisy N, Li J (2013) Energy efficient cloud storage service: key issues and challenges. In: Fourth International Conference on Emerging Intelligent Data and Web Technologies
Niazi MA, Laghari S (2012) An intelligent self-organizing power-saving architecture: an agent-based approach. In: Fourth International Conference on Computational Intelligence, Modelling and Simulation, 2012
Pinheiro E, Balanchine R, Carrera EV, Heath T (2001) Load balancing and unbalancing for power and performance in cluster-based systems. In: Proceedings of the Workshop on Compilers and Operating Systems for Low Power, 2001, pp 182–195
Shuja J, Madani SA, Hayat K (2012) Energy-efficient data centers. Computing 94(12):973–994
Article MATH Google Scholar
Srivastava MB, Chandrakasan AP, Brodersen RW (1996) Predictive system shut down and other techniques for energy efficient programmable computation. IEEE Trans Very Large Scale Integ (VLSI) Syst 4(1):42–55
Article Google Scholar
VMware Inc (2009) vSphere resource management Guide
Valentini GL, Lassonde W, Khan SU, Min-Allah N, Madani SA, Li J, Zhang L, Wang L, Ghani N, Kolodziej J, Li H, Zomaya AY, Xu CZ, Balaji P, Vishnu A, Pinel F, Pecero JE, Kliazovich D, Bouvry P (2011a) An overview of energy efficiency techniques in cluster computing systems. Cluster Computing 16(1):3–15
Article Google Scholar
Valentini GL, Lassonde W, Khan SU, Li J, Zhang L (2011b) NDSU-CIIT Green Computing and Communications Laboratory, Department of Electrical and Computer Engineering. North Dakota State University, Fargo, ND 58108-6050, USA
Wei G, Liu J, Xu J, Lu G, Yu K, Tian K (2009) The on-going evolutions of power management in xen. Intel Corporation tech Rep 2009
Wiessel A, Bellosa F (2002) Process Cruise Control: event driven clock scaling for dynamic power management. In: proceedings of 2012 international conference on compilers architecture and synthesis for Embeded Systems, Grenoble, France, p 246

Download references

Authors’ contributions

MS designed the study. AM carried out the studies and performed the analysis. AM drafted the manuscript and MS wrote the conclusion. Both authors edited the paper individually and together. Both authors read and approved the final manuscript.

Acknowledgements

This research was supported by [COMSATS Institute of Information Technology Islamabad, Pakistan].

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Department of Computer Science, COMSATS Institute of Information Technology, Islamabad, Pakistan
Abdul Majeed & Munam Ali Shah

Authors

Abdul Majeed
View author publications
You can also search for this author in PubMed Google Scholar
Munam Ali Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdul Majeed.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Majeed, A., Shah, M.A. Energy efficiency in big data complex systems: a comprehensive survey of modern energy saving techniques. Complex Adapt Syst Model 3, 6 (2015). https://doi.org/10.1186/s40294-015-0012-5

Download citation

Received: 07 July 2015
Accepted: 03 December 2015
Published: 15 December 2015
DOI: https://doi.org/10.1186/s40294-015-0012-5

Energy efficiency in big data complex systems: a comprehensive survey of modern energy saving techniques

Abstract

Introduction

Background

Big data services

Big data challenges

Hacking

Unauthorized disclosure of data/information

Performance bottleneck

Scalability

Concurrency

Fault tolerance

Availability

Energy efficiency

Critical overview of energy efficient techniques

DPM

DVFS

Virtualization

Network approaches

Energy efficient clustering

Sampling technique

Critical analysis of existing surveys

Energy efficient techniques

SPM techniques

Hardware level techniques

Design constraints

Dynamic component deactivation (DCD)

Dynamic performance scaling

Dynamic voltage and frequency scaling

Powernap

Virtualization

Summary and findings

Open issues

Lack in software controls

Formula modification

No detection of short burst

Energy efficiency in multi user-cases

Frequency performance mismatch

Workload mismatch

Inaccurate frequency measures for CPU bound processes

Conclusion and future work

References

Authors’ contributions

Acknowledgements

Competing interests

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords