Optimal Control Policy for Energy Management of a Commercial Bank

Tahir, Ifrah; Nasir, Ali; Algethami, Abdullah

doi:10.3390/en15062112

Open AccessArticle

Optimal Control Policy for Energy Management of a Commercial Bank

by

Ifrah Tahir

¹,

Ali Nasir

^1,*

and

Abdullah Algethami

²

¹

Department of Electrical Engineering, University of Central Punjab, Lahore 54782, Pakistan

²

Department of Mechanical Engineering, Taif University, Taif 21944, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(6), 2112; https://0-doi-org.brum.beds.ac.uk/10.3390/en15062112

Submission received: 15 February 2022 / Revised: 8 March 2022 / Accepted: 10 March 2022 / Published: 14 March 2022

Download

Browse Figures

Versions Notes

Abstract

:

There has been substantial research on Building Energy Management Systems. Most of the work has focused more on the management scheme and less on the specific structure or the nature of activities within each building. However, recently some attention is being paid to these specifics, and this paper is one of such efforts, where we consider the structure and nature of activities in the building for developing an energy management system custom designed for a bank branch where customers may arrive randomly based on a known probability distribution. Specifically, this paper presents a model for generating an optimal control policy to manage the electrical energy of a commercial bank building. A Markov Decision Process (MDP) model is proposed. The MDP model is solved for the calculation of an optimal control policy using stochastic dynamic programming. The advantage of the proposed model is that it can incorporate uncertainty involved in the problem. Another advantage is that the output control policy is optimal with respect to a discounted cost/reward function. A disadvantage of the proposed scheme is computational complexity. To overcome this disadvantage, a decomposition-based approach is proposed. A unique feature of the proposed MDP-based model is that it was developed for a specific type of building, i.e., a bank. The need for a Building Management System (BMS) that is specific for a particular type of building arises due to the fact that each building has its own working parameters and environment. Our focus is to give a customized BMS framework for a bank building. Practical implementation of the developed model is discussed and a case study is included for demonstration purposes. Results obtained from the case study indicate that considerable savings in the electrical energy expenditure can be achieved without compromising comfort. This is possible due to optimization of the control policy using the statistical information relevant to the problem.

Keywords:

energy management; optimization; Markov Decision Process (MDP)

1. Introduction

Energy demands all over the world are increasing day by day. As a result, the energy crisis has become a global challenge. Major work on the need for energy saving started in the early 1970s and 1980s. Various analyses and studies that were carried out resulted in attracting researchers towards solving the energy crises problem. Through looking at the total amount of energy consumed in various types of buildings, both residential and commercial, it was concluded that there is a need for buildings that consume energy intelligently. With ongoing technological advancements, increasing population and other factors [1], it can be observed [2] that with the passing years there is an increased risk of energy crisis. According to an estimation by the International Energy Agency (IEA), in buildings 32% of total energy consumption is in final form while 40% is in crude form. The environment program developed by the United Nations [2] has estimated that residential and commercial buildings consume about 60% of total electricity in the world. Therefore, implementing a Building Energy Management System (BEMS) that is cost effective has resulted in energy efficiency improvement from 5% to 40% [2,3,4]. Currently, many types of smart systems and energy saving techniques are in use to overcome the present scenario. The Building Energy Management System (BEMS) is a smart Building Management System which has defined techniques for managing energy effectively to obtain substantial benefits. It is important for a system to provide greater worth, i.e., apart from saving energy: it should have realizable design and must be cost effective.

It has been observed that most of the research conducted until now has been generic to all types of buildings, except for some very recent techniques [5,6,7,8]. This recent emerging trend points towards the importance of developing customized models for building energy management. Our work, however, differs from the recent research in that we consider co-optimization of energy and comfort as opposed to simple energy optimization. Secondly, we consider the dynamics of the change in the number of people in the building. Third, a bank branch building in itself has its own unique feature in terms of the activities, equipment, and internal structure. Note that not all buildings have similar parameters. Every building seen around us has different structure, type of construction, usage, location and environmental conditions. All the basic parameters and factors vary from building to building. For example, number of operational hours is one of the building-specific parameters. A school building or an office building operates at certain times, while a hospital building is operational day and night. Therefore, in this paper, the focus is solving an energy management problem that is specific for a bank branch building. A bank branch is the most common type of a commercial building. Detailed specifications are given later in this paper. Specifically, the purpose of this research is to provide an optimal control system for electrical energy management of a bank building that caters to the thermal and visual comfort of the occupants of the building. There are two main reasons to focus on bank branch buildings. First, there is a large number of bank branch buildings in urban areas, and hence the impact of energy savings is sizeable. Second, due to the nature of activities performed inside a bank building, the variation in the number of people (customers) inside the building at any given time is considerably high compared to other buildings such as schools, offices, and residential buildings. In a bank branch building, the customers continue to enter and exit at different rates throughout the day. This provides us a challenge in terms of energy and comfort management inside the building that we try to address in this paper.

Markov Decision Process (MDP)-based modelling is selected to solve the energy management problem. The MDP solution is optimal with respect to expected and discounted value of a specified cost function. MDP includes state variables, actions, transition probabilities, and cost functions. Going through the parameters that are required for computing the mentioned problem, the MDP model becomes computationally complex. The variables in the problem include number of customers, time of the day, day of the week, week of the month, month of the year, air conditioning units, lights, temperature, and source of electric power. Considering all these variables, our decision-making process becomes so large that computation of its solution is a challenge. Therefore, the decision-making process was divided into parts such that each month has a separate decision-making process, followed by a separate decision-making process for a week, a day and lastly an hour. This division of processes can help us reduce the computations by a large number. As a result, different decision-making processes shall be active during each hour and hence the control policy shall change accordingly.

Practical implementation of our proposed model requires collection of data related to the statistics of customers visiting the bank branch during each hour of a day. Probability generation for the random variables is performed using statistical data. Following that, the MDP model is created using the benchmark presented in this paper. The calculation of optimal policy is performed offline. The system can be installed in the bank to obtain feedback from the sensors and counters. Various sensors are needed to measure the lighting and temperature conditions of the environment. A counter connected at the gate of the branch is required to count the number of people entering and leaving. Microcontroller-based embedded systems and other embedded devices help to program and control the system. Connecting relays and switches are required to operate on the control signal and further implement the policy by acting on commands from the microcontroller to make devices work accordingly.

2. Literature Review

Building energy management has been discussed in the literature in many contexts, such as technological advancements, increasing population, shortage of natural resources, economic crisis, energy efficiency and effectiveness. For example, a model based on predictive control using a combination of regression tree and random forest has been developed [9]. The results showed a finite time receding horizon-controlled data driven model without using grey boxes of system dynamics [9]. Using the “SARSA” algorithm, a control operation was developed to maximize long-term rewards to encounter the load demand. The results in [10] showed a significant increase of total rewards, a faster convergence speed and good effect. A reinforcement learning controller developed and simulated using the MATLAB/Simulink environment shows that as compared to the other controllers, the reinforcement learning controller has equivalent or better performance even after a couple of simulated years [11]. The model-free Monte Carlo method proposed uses metrics based on state action value function or Q-function and an algorithm designed to a find feedforward plan for reinforcement learning [12]. After observing the residential demand response, an online algorithm was developed that estimates the impact of future prices and schedules residential device usage depending on a consumer’s decision on long-term cost analysis [13].

Various control schemes have been designed to ensure the sustainability and efficiency of BEMS. The most extensively used controllers in building engineering as per [14,15] are on-off switching controllers, proportional integral (PI) controllers, and proportional integral derivative (PID) controllers. To overcome the comfort factor inside buildings, research was carried out in the 1980s in [16,17] on predictive, adaptive and optimal controllers. These controllers have numerous drawbacks, such as that they require the model of a building for efficient operation. These controllers are sensitive in real-time application, due to which [18] one may obtain inaccurate results. Decentralized or distributed controllers explained in [19,20] act as agents and utilize true control techniques such as proportional-integral-derivative (PID) control, predictive control, etc. On the other hand, smart homes are the best example of centralized control, where only one agent is responsible for all sub-system control [21]. A latest hybrid approach discussed in [22] provides a hybrid design of building automation and control system with a smart readiness indicator. Advanced research based on artificial intelligence (neural networks, fuzzy logic, genetic algorithm, etc.) and distributed control networks offers numerous benefits in the field of building energy and comfort management. A recent review of strategies for Building Energy Management Systems regarding modelling predictive control, demand side management, optimization, and fault detection and diagnosis is presented in [23], whereas another review of the reinforcement of learning-based approaches for building energy management is discussed in [24].

Despite all of the valuable work and recent advances in the building energy management and automation systems, there still exists a gap between the generic approaches and more specific application of the same. In addition, there is limited work on stochastic optimization when it comes to designing a building automation system. This paper addresses both of these gaps, i.e., a specific model (with relevant variables such as number of customer) which caters to the requirements of a bank branch building. Secondly, our proposed model incorporates the uncertainty involved in the problem. Major contributions in the proposed work include the following:

Design of specific MDP model for solving the building energy management problem for a bank branch building, in addition to a proposed computational saving mechanism by decomposing the problem into two sub-problems, i.e., lighting control and thermal control (Section 4).
Specification of the practical implementation process involved with the proposed model (Section 3).
Presentation of a sample case study and analysis of the simulation results based on the statistical data obtained from an actual bank branch building in Lahore, Pakistan (Section 5).
Comparison of the proposed model with a generic MDP-based model for building energy management, highlighting the advantages of the proposed model over the existing approach (Section 6).

3. Background and Problem Formulation

3.1. Description of Problem

As mentioned earlier, the purpose of this research is to provide an optimal control system for electrical energy management of a bank building that caters to the thermal and visual comfort of the occupants of the building. The task of balancing the tradeoff between human comfort and energy consumption is a difficult one. Furthermore, different types of buildings have different environmental statuses, and also carry different parameters. A school, a hospital, a commercial building, a shopping mall and an office building all have a different set of parameters that needs consideration while developing an energy saving control system. There is not much of this sort in the literature that can be implemented to genre-specific buildings. Most of the systems are generic, so there is a need to introduce a control system that is customized for a specific type of building. The focus is upon a particular type of building, and for this work that is a bank building. A customized approach needs to be given to accomplish this task, as shown in Figure 1.

Some of the basic differences between any other commercial building and a bank building after the survey are as follows:

A bank building has two types of occupants: one that is fixed, i.e., the number of bank officers and staff, and second, the customers.
Number of customers visiting a branch can vary from 200 to 700 per day. This number increases before festivals and on salary days of the month.
Customer Service is fixed by a specific time zones, normally from 9 a.m. to 5 p.m.
It is mostly active for 5 days a week.
It has an Automatic Teller Machine (ATM) room that is operational 24 h.
It has an IT room that is also operational 24 h.
Size of a branch of a bank is usually between 1300 sq. feet to 2000 sq. feet. However, the head office of a bank can be larger and can have multiple floors.
Most of the staff have computers active during active hours of operation of the bank.
Multiple air conditioning units are used in a branch.

Considering the above-mentioned points, it can be noticed that a considerable amount of energy is consumed in banks across the globe. Therefore, there is a need to develop a customized BEMS that is dedicated to banks.

3.2. Markov Decision Process (MDP)

MDP is used in this work as there are uncertainties in the problem; for example, the arrival of a customer in the building is a stochastic event that is modeled in this work in Section 4.3. The number of occupants changes from time to time on a daily basis. This process consists of two basic sets, one is the set of the states of the problem and other is the set of actions/decisions. The process continues from state to state, and at every step the transition from one state to another state is dependent on the action.

The state space and other MDP sets (action, reward and transition) are defined as follows:

A set of states, S (assumed discrete).
A set of actions, A (assumed discrete).
Transition probabilities, P, which define the probability distribution over next states given the current state and current action P(S_t₊₁|S_t,A_t). Here the subscript represents the decision instant (or time).
A policy $π : S \to A$ is a mapping from states to actions.
A value function for a policy that gives the expected sum of discounted rewards when acting under that policy $V^{π} : S \to R$ .

V^{π} (s) = E [\sum_{t = 0}^{\infty} γ^{t} R (s_{t}) | s_{0} = s, a_{t} = π (s_{t})]

(1)

R (s_{t})

in the above equation is the reward of state

s

at instant

t

. Initial state

s_{0}

,

γ

is the discount factor that is used to differentiate between the immediate and non-immediate rewards. The action is represented by

a_{t}

. Optimal policy is the policy that yields the highest expected utility. Calculation of the optimal policy involves iterations.

3.3. Scalability Issues

Variables involved in our problem are number of customers, time of the day, day of the week, week of the month, month of the year, air conditioning units, on/off status of the lights, temperature, and power. Considering all these variables, our decision-making process becomes so large that its computation is a non-trivial challenge. Therefore, we divided our decision-making process in parts such that each hour of each day of each week of each month has its own decision policy. For example, the decision policy from 10:00 a.m. to 11:00 a.m. on a Tuesday of the first week of November is different from the decision policy from 10:00 a.m. to 11:00 a.m. on a Tuesday of the second week of November (there would be 2080 policies for the whole year assuming 52 weeks in the year and five working days during each week with eight working hours per day). The factors associated with the time span and working hours are day of the week, week of the month and month of the year. The days are represented through the symbol “D”, starting from Monday and ending Friday. Week of the month is symbolled as “W”, such that there are four weeks a month. All 12 months of the year are included from 1 to 12 denoted through the variable “M”, and hours are represented by “H”. Figure 2 shows the state space model decomposition.

This division of our major process can help us to reduce the computations by a large number. The decision-making process runs after every hour and the control is changed accordingly.

The optimal control policy created for our problem works such that after every hour the control will change, refreshing the parameters to obtain the desired upshot. This is performed to reduce the number of calculations in the decision-making process. Policy selection according to the requirements is carried out as shown in Figure 3. A clock input (Clk) is given to the controller with multiple policies (P1, P2, …, Pn). Each policy is capable of generating a corresponding optimal decision (

μ_{1}, μ_{2}, \dots, μ_{n}

) based on the information of the state (S). On the basis of the clock input, a control policy is selected and implemented to control the devices in the electrical network, i.e., the relevant

μ

is executed to operate the electrical devices. There are different transition probabilities for different cases; however, the state space remains the same.

4. Proposed MDP Model

In this section, the MDP model for the BEMS of a bank branch building is developed. Components of the MDP model listed earlier are developed here one by one.

4.1. Set of States

To reduce this computational complexity, it is favorable to make separate policies on an hourly basis (as discussed earlier). For each hour in a day a separate policy is worked through. The system refreshes itself every hour. The reduced state space with months, weeks and days omitted is given as under:

S = {s_{1}, s_{2}, s_{3}, \dots \dots, s_{n}}

(2)

The details of each state and its further subset are given by:

\begin{array}{l} s_{i} = {c_{i}, a c_{i, 1}, \dots \dots, a c_{i, m 1}, l_{i, 1}, \dots \dots, l_{i, m 2}, l_{i, s e t}, t_{i}, p_{i}} i \in {1, 2, \dots, n}, \\ c_{i} \in {0, 1, 2, \dots, y}, \\ a c_{i, j} \in {o n, o f f, e c o, p o w}, j \in {1, 2, \dots, m 1}, \\ l_{i, j} \in {o n, o f f}, j \in {1, 2, \dots, m 2}, \\ l_{s e t, i} \in {1, 2, \dots, m 2}, \\ t_{i} \in {- x, - x + δ, - x + 2 δ, \dots, 0, δ, 2 δ, \dots, x}, \\ p_{i} \in {g r i d, g e n} . \end{array}

(3)

where c_i represents number of customers present in the building in state i, ac_i,j represents the operating mode of j^th air conditioner (on, off, economy mode, and power mode) in state i, l_i,j represents the operating mode of j^th light in state i (a light in any state can either be on or off), l_i,set represents the amount of lights required to be on in state i, t_i represents difference between actual and desired temperature in state i, and p_i represents the active source of power in state i (the active power source in any state can either be the grid or the generators). Note that the total number of air conditioners in the building is represented by m1 and the total number of lights is represented by m2. Similarly, maximum number of customers is represented by y and maximum difference between actual and desired temperature is ±x. It is assumed that all the lights are identical, and all air conditioners are also identical.

The solution to this complexity can be carried out through the development of virtual zones. Creating separate virtual thermal zones and virtual visual zones for the whole bank branch can help reduce the computations. Reduced state space for thermal control is given by:

s_{i} = {c_{i}, a c_{i, 1}, \dots \dots, a c_{i, m 1}, t_{i}, p_{i}}

(4)

where

i ϵ {1, 2, 3, \dots \dots, n 1}

(5)

Similarly, for lighting control, the state space shall be given as:

s_{j} = {c_{j}, l_{j, 1}, \dots \dots, l_{j, m 2}, l_{j, s e t}, p_{j}}

where

j ϵ {1, 2, 3, \dots \dots, n 2}

Here,

n 1

and

n 2

are the number of states in the individual thermal and lighting control models. Note that the value of

n 1

and

n 2

shall be substantially less than that of

n

(in Equation (3)), as the variables are reduced by dividing the model into two separate ones.

4.2. Actions

In our model the actions that are needed to be taken are temperature control and light control. Action A1 represents the actions required for thermal control while A2 represents the actions for visual control. The action space sets for both are given below:

A 1 = {a c_{1} = o n, a c_{1} = o f f, a c_{1} = e c o, a c_{1} = p o w, \dots, a c_{m 1} = o n, a c_{m 1} = o f f, a c_{m 1} = e c o, a c_{m 1} = p o w,}

(6)

A 2 = {l_{1} = o n, l_{1} = o f f, \dots \dots \dots, l_{m 2} = o n, l_{m 2} = o f f}

(7)

4.3. Transition Probabilities

The random variables in our problem are the number of customers and temperature. The number of customers entering in the bank every day cannot be exactly determined in any case. The number of customers that enter the bank every hour is non-deterministic. We conducted a small survey ourselves in bank branches and managed to obtain details of the number of customers in a bank every hour for a whole week. For the same hour in different days the numbers vary.

The other random variable is the temperature. In looking through the variation of temperature there are several things which change the temperature. The opening and closing of the entrance door tend to change the temperature, and especially very close to the door the temperature maintenance is challenging. Furthermore, it is assumed that a laminated wall is there that does not allow external temperature to affect the internal environment. However, this is ideal case. In reality, the walls of the bank branch become hot in extreme summers and do affect the internal temperature. In addition to this, small variations such as heat from electronic devices, presence of hot beverages, and some autonomous factors also cause temperature changes. Resulting state transition diagram is shown in Figure 4.

The transition of states regarding temperature and number of customers can be observed (using appropriate sensors). The change in number of customers, i.e., entering and leaving the bank, affects the temperature. The addition of customers results in an increase in the temperature, and hence the air conditioner mode then needs to be changed. The continuous updating of information changes the states after fixed intervals. The states are linked together to provide all relevant data without any isolation for better and efficient control. Therefore, a state change causes transition of other states. The simple formulation of the probability for temperature change from data collection is given as:

P_{r} = \frac{N o . o f t i m e s t e m p^{+} = x}{T o t a l n u m b e r o f t r i a l s}

(8)

Otherwise, the transition probabilities for temperature change and number of customers can be given as functions of temperature itself, the present state of customers and air conditioning mode.

P_{r} (t e m p^{+} = x | t e m p, c, a c) = f (t e m p, c, a c)

(9)

P_{r} (c^{+} | c) = g (c)

(10)

Here, “f” and “g” are functions of state parameters that are collected either from statistical data or from probability theory.

4.4. Reward/Cost Functions

Reward or cost function describe how good or bad it is to take an action in a state. To create an appropriate cost function, one needs to find out the important parameters in the problem. The goal is to provide temperature and visual comfort, in addition to that saving energy and decreasing the overall energy consumption is also desirable. The total energy being consumed defines the working cost. The selection of cost function is performed to make actual values of state parameters closer to the desired values. For defining the cost, numerical values are assigned to the modes of air conditioners and lights as follows:

a c_{i} = 0 \leftrightarrow a c_{i} = o f f a c_{i} = 1 \leftrightarrow a c_{i} = e c o a c_{i} = 2 \leftrightarrow a c_{i} = o n a c_{i} = 3 \leftrightarrow a c_{i} = p o w l_{i} = 0 \leftrightarrow l_{i} = o f f l_{i} = 1 \leftrightarrow l_{i} = o n

(11)

The cost function for temperature control is given as:

C_{1} (s) = {| t |}^{2} c_{i} \times \sum_{j = 1}^{m 1} a c_{j} \times [(p = = g r i d) α_{1} + (p = = g e n) α_{2}]

(12)

The cost function for lighting control is given as:

C_{2} (s) = {| \sum_{i = 1}^{m 2} l_{i} - l_{i, s e t} |}^{2} c_{i} + \sum_{i = 1}^{m_{2}} l_{i} \times [(p = = g r i d) α_{1} + (p = = g e n) α_{2}]

(13)

The square of the difference of actual and desired states of temperature and lighting help us to avoid uncomfortable conditions. Square of the error is commonly used in optimal control problems [25] due to its characteristic of increasing the effort when the error is more than unity and reducing the effort when the error is less than unity. Due to this reason, the same approach was adopted in this work. When the difference is high then the required effort is also high, and on the other hand when there is less difference between actual and desired values, less effort is used.

α_{1}

and

α_{2}

are constants determining the voltage values for power “p” usage of either grid or generator.

4.5. Implementation Methodology

Optimal decision policy is created by taking all parameters into account, comparing actual and desired values, status of the power coming from either grid or generator, and most importantly number of customers present within the bank. Hence, a number of policies are created from different data for different time intervals. Afterwards, a clock-based selection control (as described earlier in Figure 3) provides the correct choice of policy as desired from the already calculated policies. Implementation steps are listed in Figure 5.

From the flow chart, the first four steps in the implementation phase can be performed offline. Creating an MDP model from the given data and evaluating it, then decomposing the model into further small MDPs to reduce computations, is performed offline. Then, an optimal control policy for each small model is evaluated. These policies are then uploaded into the embedded system and then closed loop control is executed. The next section presents a simulation-based case study for further clarification and analysis regarding our proposed approach.

5. Case Study

5.1. Parameter Values for Temperature Control

We defined values to state parameters (Table 1) using observations from a real bank routine. Four air conditioners are assumed to be mounted in the building with four modes each (as in Equation (11) of previous section).

The number of customers vary from 0 to 9 in our case study. The temperature is assumed to be in between 24 to 41 degrees Celsius, while that of the desired room temperature starts from a lowest of 24 and goes to a maximum of 27. The two operational modes of power are grid and generator, which have a cost function attached to them in the form of α1 and α2 with values 1 and 2 assigned to them, respectively, showing the cost of power from generator being double that of the grid. The total number of states in this case turns out to be 368,639.

The probabilities of increase in customers, customers remaining the same and decrease in number of customers are defined as:

P (C_{i n c} | C) = {\begin{matrix} 0.8, & C < 3 \\ 0.5, & 3 \leq C < 6 \\ 0.3, & C \geq 6 \end{matrix}

(14)

P (C_{s a m e} | C) = {\begin{matrix} 0.15, & C < 3 \\ 0.3, & 3 \leq C < 6 \\ 0.2, & C \geq 6 \end{matrix}

(15)

P (C_{d e c} | C) = {\begin{matrix} 0.05, & C < 3 \\ 0.2, & 3 \leq C < 6 \\ 0.5, & C \geq 6 \end{matrix}

(16)

The desired temperature once entered by the user is specified for the policy until changed by the user itself. The increase in temperature is uncertain as mentioned earlier. Therefore, the probability of temperature increase is given by:

P (t_{i n c} | t, C, a c) = 1 - (\frac{s a c}{12}) (σ)

(17)

where, sac is the sum of all air conditioners that are in the on state and

σ

is the specific case value taken for the policy ranging from 1 to 6. The probability defined in the above equation can be defined from the value of sac and

σ

. Note that Equation (17) is designed in such a way that if none of the air conditioners is on, then the temperature shall rise with a probability of 1. On the other hand, as more air conditioners are turned on, the probability of increase in the temperature decreases. The intensity of the increase in temperature is 1 degree per decision horizon (which is the precision of the model), and a decision horizon is the time interval between two consecutive decisions (actions). Selection of this time interval depends upon the user (usually it will be a minute to a few minutes). The probability of decrease in temperature shall be a complement of the probability in Equation (17).The values of

σ

for different ranges of temperature and customers are shown in the Table 2.

There are 16 actions in total regarding the air conditioners, as one air conditioner is being controlled with one action. The air conditioners are not all controlled simultaneously, instead we implied sequential control. By applying simultaneous control, the number of actions becomes much higher and hence the computations become complicated.

5.2. Parameter Values for Lighting Control

For lighting control, the following parameters are taken through our case study. We take 10 lights in total. The desired lighting condition can vary from no light on, i.e., 0, to all lights on, i.e., 10. Each light has its mode with 0 representing that the light is off and 1 indicating that the light is on. The increase and decrease in the lighting are certain. The two operational modes of power are grid and generator for the lighting case, as its effect was taken into account so that for higher values of lights the cost can differ noticeably. The cost function constants α₁ and α₂ with values 1 and 2 assigned, respectively, show that the cost of power from the generator is double that of the grid, as already stated. The total number of states for this case turns out to be 225,280 as shown in Table 3.

The probabilities of increase in customers, customers remaining the same and decrease in number of customers, however, remains the same for this case. As the increase and decrease of the lighting condition is certain, therefore no probabilities involved in turning the lights on or off.

5.3. Calculation of Optimal Policies

The optimal policy is calculated using the value iterations algorithm. In this algorithm, value of each state is updated according to the following relationship:

V a l u e_{c u r r e n t} (s) \leftarrow_{a}^{m a x} (- C (s, a) + γ \sum_{s^{'} \in S} P r (s^{'} | a, s) V a l u e_{p r e v i o u s} (s^{'}))

(18)

The values are repeatedly updated until we reach consensus between the consecutive values. In this work, we used the following criterion for the termination of the iterations:

E r r o r = | V a l u e_{p r e v i o u s} - V a l u e_{c u r r e n t} |

(19)

Here,

V a l u e_{p r e v i o u s}

is the value of a state in previous iteration (during the execution of the value iteration algorithm) and

V a l u e_{c u r r e n t}

is the corresponding value in the current iteration. The threshold for the value of error is set to go to a maximum of 10⁻⁶. After the error has reached this value, the iterations are terminated.

Once the iterations are terminated, the optimal policy is calculated using the following relationship:

P o l i c y (s) = \underset{a}{argmax} (- C (s, a) + \sum_{s^{'}} P r (s^{'} | a, s) V a l u e^{*} (s^{'}))

(20)

Here,

V a l u e^{*} (s^{'})

is the optimal value of the state

s^{'}

obtained through the above-mentioned iterations. The policy is needed to be far sighted, so the value of “γ” for value iteration was fixed to 0.95. This makes our policy able to be optimized for long-term decision making. For temperature control, a total of 203 iterations were required to obtain the error below the threshold. It took 40 s per iteration using core i5 with 6 GB RAM, while for the case of light control it took about 218 iterations. Each iteration was calculated in about 20 s using the same computer.

6. Results

6.1. Simulation Case 1: No Change in Parameters

The thermal control parameters (as mentioned in Table 1 above) when kept constant throughout a policy show a transition of levels randomly as an effect of natural fluctuations of temperature fall and rise. Figure 6 shows the results from the optimal policy where the state variables are plotted as a function of the number of decisions (actions). In this simulation, a total of 51 decisions (actions) were made (using the optimal policy), as indicated in the x-axis. At every decision, the new state is calculated based on the maximum likelihood. Likelihood is calculated based on the probabilities from Equations (14)–(17). In this particular case, the number of customers, that was a random parameter throughout, was introduced as a constant value of 5 (the bottom plot). The error is taken to zero from the beginning, taking the sum of air conditioners in the on state to maximum (in the top plot). The error reaches zero (within first couple of decisions) in the second plot of Figure 6. The policy makes a few air conditioners turn off, making the sum reach a minimum value of 4, perceived from the first plot. The lower air conditioning and other random factors tend to increase the temperature yet again and the error rises from the zero point. This change in error value makes the system operate more air conditioners in turn, and hence the process continues to trigger between the situations.

6.2. Simulation Case 2: Variation in Number of Customers

In the next results shown in Figure 7, we have varied the number of customers in the building as shown in the bottom graph of the figure. It can be observed from the top graph that the sum of air conditioners tuned on at any given time reaches its max value of 12 when the number of customers is increased for the first time (after the decision number 10). Afterwards, the number rises to 11 when the customers are increased again (after the decision number 30). However, throughout this process, the error in the desirable and actual temperature is maintained at near-zero value as indicated by the middle graph.

6.3. Simulation Case 3: Desired Temperature

Figure 8 shows the results where the desired temperature is varied periodically as shown in the bottom graph of the figure. In response, the sum of air conditioners turned on at any given time (top graph) has a clear rise and fall which matches with the rise and fall of the desired temperature. Most importantly, the error between the desired and the actual temperature is kept close to zero throughout the process except for the small transitions due to fast changes in the desired temperature.

6.4. Simulation Case 4: Effect of Power Sources

Figure 9 shows the results of variation in the cost of electricity due to changes in the available power source. Note that the power source 2 (bottom graph) is more expensive than the power source 1. Hence it may be noted from the top graph that whenever the power source is at value 2, the optimal policy tends to lower the number of air conditioners turned on. As for the earlier cases, the error in this case is kept near zero (middle graph).

6.5. Simulation Case 5: Desired Lighting

Figure 10 shows the results from the optimal lighting policy where the number of customers are varied periodically (the middle graph). The power source however is kept constant in this case to avoid confusion regarding the inference that we get from the results. It can be noted from the top graph that the lighting error is driven to zero and kept at near zero except for small positive value during the periods where the number of customers is low. This small value is an indicator of the anticipation of the optimal policy regarding the arrival of a customer. In this sense, the policy obtained using the probabilities of arrival and departure of customers (as proposed in this paper) is proactive in nature. Also, note that the cost of lighting is low as compared to the cost of air conditioning and the cost of discomfort due to lack of lighting. Therefore, it makes sense to minimize the risk of visual discomfort at the expense of minimal extra lighting.

6.6. Comparison with Existing Approaches

From the literature survey performed, to date, there is a wide range of ideas and work carried out on giving some optimal control solutions for Building Management Systems. However, no work has been conducted for the design and specifications of a bank branch in particular. The parameter designs differ from building to building due to change in design and environment. The Table 4 shows the difference between our approach and the ones that are already in existence.

7. Conclusions

This paper addressed the need for a customized model for energy management of a bank building that can be solved for a stochastic optimal decision policy using a stochastic dynamic programming algorithm, e.g., value iteration. Various strategies for implementation of the proposed approach as well as computational complexity reduction for scalability were also proposed. Simulation results and a comparison with the existing approach indicate the superiority and utility of the proposed approach. Specifically, we developed an MDP model that incorporates the probabilities of the arrival and departure of customers as well as the probabilities of the variations in the temperature. This innovation enables the optimal policy to make decisions that are aware of the dynamics of the number of persons present in the building at any given time. For example, just before the rush hour, the air conditioners shall be turned on in anticipation of the arrival of the customers. One of the basic challenges was the amount of uncertainty in our system. Researchers have mostly solved the constraint satisfaction problem to obtain desired results, which is not much efficient. In the case of other non-MDP schemes, a policy is not created. Instead, the constraint values are satisfied one by one. However, a decision process is embedded into the model for the Markov Decision Process and multiple decisions are made over time. We used MDP-based modelling that caters to uncertainty, and an optimal policy can be developed. The MDP model helps to represent problems where there are a large number of embedded decisions to be made for the formation of an optimal policy. By using MDP we allowed multiple decisions to be made within multiple time periods. We kept our problem case study simple enough to make an MDP model to better describe the sequential decisions for all plausible timing strategies. Our model works for calculations that are both online and offline. The majority of calculations for policy development are performed offline.

Our model is significantly comprehensive in design parameters. Decomposed decision making is easy and computationally less complex, with a lower number of calculations. We conducted a multi-level decomposition of MDP that made a perceptible reduction in the complexity of calculations. The decomposed MDP model has total computations divided into smaller numbers. The decomposition levels are categorized in the paper as level of time (hourly basis) and level of goal (temperature and lighting).

We introduced here a stochastic approach, which requires a certain amount of data before implementing it in an actual situation. The statistical data of number of customers from the bank branch are required. In order to derive the transition probabilities for temperature, it is necessary to provide the cause of temperature change in the environment. Once the data are collected and the probabilities are calculated, then this presented approach provides maximum advantage in terms of efficient energy management. Practically, in terms of its implementation, a microcontroller-based embedded system is required as is indicated in the paper.

As mentioned in the paper, the main idea behind this work is to develop a customized model for building energy and comfort management, as most of the existing research is generic in nature. In addition, the trend is already emerging for developing customized models for specific types of buildings. The idea proposed in this paper may be further extended to more types of buildings, for example, school buildings, library buildings, hospital buildings, etc. The advantage of developing such customized models is that there are certain specific state and action variables that are relevant to a specific type of building. Furthermore, the decomposition ideas discussed in this paper may also be carried forward to application in other MDP models for building energy management as well as to the general purpose MDP models.

Author Contributions

Conceptualization, A.N. and I.T.; methodology, I.T.; software, A.N.; validation, A.A., I.T. and A.N.; formal analysis, I.T.; investigation, I.T.; resources, A.A.; data curation, I.T.; writing—original draft preparation, I.T.; writing—review and editing, A.N. and A.A.; visualization, A.N.; supervision, A.N. and A.A.; project administration, A.N. and A.A.; funding acquisition, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data are provided within the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Code Availability

Code is available on request with the corresponding author.

Abbreviations

Here we present the list of symbols and the associated meanings.

Symbol	Meaning
$V^{π} (s)$	Value of state $s$ under policy $π$
$E [.]$	Expected value
$γ$	Discount factor
$R (s)$	Reward of state $s$
$S$	Set of states
$s$	An element of $S$
$c_{i}$	Number of customers in i^th state
$a c_{i, j}$	Status of j^th air conditioner in i^th state
$l_{i, j}$	Status of j^th light bulb in i^th state
$l_{i, s e t}$	Amount of lighting required in i^th state
$t_{i}$	Difference between the actual and the desired temperature in i^th state
$p_{i}$	Active power source in i^th state
$A$	Set of actions
$a$	An element of $A$
$P_{r}$	Probability
$C (s)$	Cost associated with state $s$
$P (C_{i n c} \| C)$	Probability of increase in cost given cost $C$
$P (C_{d e c} \| C)$	Probability of decrease in cost given cost

References

Rinkesh. Causes and Solutions to the Global Energy Crisis. Available online: https://www.conserve-energy-future.com/causes-and-solutions-to-the-global-energy-crisis.php (accessed on 13 March 2022).
Dounis, A.; Caraiscos, C. Advanced control systems engineering for energy and comfort management in a building environment—A review. Renew. Sustain. Energy Rev. 2009, 13, 1246–1261. [Google Scholar] [CrossRef]
Costa, A.; Keane, M.; Torrens, J.I.; Corry, E. Building operation and energy performance: Monitoring, analysis and optimization toolkit. Appl. Energy 2013, 101, 310–316. [Google Scholar] [CrossRef]
Whitehouse, K.; Ranjan, J.; Lu, J.; Sookoor, T.; Saadat, M.; Burke, C.M.; Staengl, G.; Canfora, A.; Haj-Hariri, H. Towards occupancy-driven heating and cooling. IEEE Des.Test Comput. 2012, 29, 17–25. [Google Scholar] [CrossRef]
Luo, Z.; Lu, Y.; Cang, Y.; Yang, L. Study on dual-objective optimization method of life cycle energy consumption and economy of office building based on HypE genetic algorithm. Energy Build. 2022, 256, 111749. [Google Scholar] [CrossRef]
Hachem-Vermette, C.; Singh, K. Optimization of energy resources in various building cluster archetypes. Renew. Sustain. Energy Rev. 2022, 157, 112050. [Google Scholar] [CrossRef]
Iqbal, N.; Kim, D.H. IoT Task Management Mechanism Based on Predictive Optimization for Efficient Energy Consumption in Smart Residential Buildings. Energy Build. 2022, 257, 111762. [Google Scholar]
Xu, Y.; Zhang, G.; Yan, C.; Wang, G.; Jiang, Y.; Zhao, K. A two-stage multi-objective optimization method for envelope and energy generation systems of primary and secondary school teaching buildings in China. Build. Environ. 2021, 204, 108142. [Google Scholar] [CrossRef]
Jain, A.; Behl, M.; Mangharam, R. Data Predictive Control for building energy management. In Proceedings of the 2017 American Control Conference (ACC), Seattle, WA, USA, 24–26 May 2017; pp. 44–49. [Google Scholar]
Liu, J.; Tang, H.; Matsui, M.; Takanokura, M.; Zhou, L.; Gao, X.; Jing, L.; Hao, T.; Matsui, M.; Lei, Z.; et al. Optimal management of energy storage system based on reinforcement learning. In Proceedings of the 33rd Chinese Control Conference (CCC), Seattle, WA, USA, 24–26 May 2014. [Google Scholar]
Dalamagkidis, K.; Kolokotsa, D.; Kalaitzakis, K.; Stavrakakis, G. Reinforcement learning for energy conservation and comfort in buildings. Build. Environ. 2007, 42, 2686–2698. [Google Scholar] [CrossRef]
Ruelens, F.; Claessens, B.J.; Vandael, S.; De Schutter, B.; Babuska, R.; Belmans, R. Residential demand response of thermostatically controlled loads using batch reinforcement learning. IEEE Trans. Smart Grid 2017, 8, 2519–2531. [Google Scholar] [CrossRef] [Green Version]
O’Neill, D.; Levorato, M.; Goldsmith, A.; Mitra, U. Residential demand response using reinforcement learning. Smart Grid Communications (SmartGridComm). In Proceedings of the 2010 First IEEE International Conference, Gaithersburg, MD, USA, 4–6 October 2010. [Google Scholar]
Kasahara, M.; Matsuba, T.; Kuzuu, Y.; Yamazaki, T. Design and tuning of robust PID controller for HVAC systems. ASHRAE Trans. 1999, 105, 154. [Google Scholar]
Mathews, E.H.; Arndt, D.C.; Piani, C.B.; Van Heerden, E. Developing cost efficient control strategies to ensure optimal energy use and sufficient indoor comfort. Appl. Energy 2000, 66, 135–159. [Google Scholar] [CrossRef]
Henze, G.P.; Dodier, R.H.; Krarti, M. Development of a predictive optimal controller for thermal energy storage systems. HVACR Res. 1997, 3, 233–264. [Google Scholar] [CrossRef]
Curtiss, P.S.; Kreider, J.F.; Shavit, G. Neural Networks Applied to Buildings—A Tutorial and Case Studies in Prediction and Adaptive Control; No. CONF-960254; American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc.: Atlanta, GA, USA, 1996. [Google Scholar]
Yang, R.; Wang, L. Multi-zone building energy management using intelligent control and optimization. Sustain. Cities Soc. 2013, 6, 16–21. [Google Scholar] [CrossRef]
Wang, L.; Wang, Z.; Yang, R. Intelligent multiagent control system for energy and comfort management in smart and sustainable buildings. IEEE Trans. Smart Grid 2012, 3, 605–617. [Google Scholar] [CrossRef]
Kolokotsa, D.; Pouliezos, A.; Stavrakakis, G.; Lazos, C. Predictive control techniques for energy and indoor environmental quality management in buildings. Build. Environ. 2009, 44, 1850–1863. [Google Scholar] [CrossRef]
Cook, D.; Youngblood, M.; Heierman, E.; Gopalratnam, K.; Rao, S.; Litvin, A.; Khawaja, F. MavHome: An Agent-Based Smart Home. In Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, (PerCom 2003), Fort Worth, TX, USA, 4 March 2003. [Google Scholar]
Ożadowicz, A. A Hybrid Approach in Design of Building Energy Management System with Smart Readiness Indicator and Building as a Service Concept. Energies 2022, 15, 1432. [Google Scholar] [CrossRef]
Mariano-Hernández, D.; Callejo, L.H.; Zorita-Lamadrid, A.; Duque-Pérez, O.; García, F.S. A review of strategies for building energy management system: Model predictive control, demand side management, optimization, and fault detect & diagnosis. J. Build. Eng. 2021, 33, 101692. [Google Scholar]
Yu, L.; Qin, S.; Zhang, M.; Shen, C.; Jiang, T.; Guan, X. A review of deep reinforcement learning for smart building energy management. IEEE Internet Things J. 2021, 8, 12046–12063. [Google Scholar] [CrossRef]
Gomes, J.; Paiva, A.; Costa, S.; Balestrassi, P.; Paiva, E. Weighted multivariate mean square error for processes optimization: A case study on flux-cored arc welding for stainless steel claddings. Eur. J. Oper. Res. 2013, 226, 522–535. [Google Scholar] [CrossRef]

Figure 1. Block diagram.

Figure 2. State space decomposition.

Figure 3. Policy selection decomposed model.

Figure 4. State transition diagram.

Figure 5. Flow chart.

Figure 6. No change in parameters.

Figure 7. Variation in number of customers.

Figure 8. Desired temperature.

Figure 9. Effect of power sources.

Figure 10. Desired lighting.

Table 1. Parameter values for thermal control.

Sr. No.	Parameter Description	Values and Ranges
1	No. of air conditioners	0 to 4
2	Modes of each air conditioner	4 (0, 1, 2, 3)
3	Number of customers	0 to 9
4	Power modes	2 (1, 2)
5	Range of desirable temperature	24 to 27
6	Range of room temperature	24 to 41
7	α1	1
8	α2	2
9	Total states	368,639

Table 2. Selection of σ.

Sr. No.	Range of Customers	Temperature Range	$σ$
1	C ≥ 5	t (24, 28)	0.75
2	C ≥ 5	t (29, 35)	0.8
3	C ≤ 5	t (24, 28)	0.85
4	C ≤ 5	t (29, 35)	0.9
5	C ≥ 5	t (36, 41)	0.95
6	C ≤ 5	t (36, 41)	0.98

Table 3. Parameter values for lighting control.

Sr. No.	Parameter Description	Values and Ranges
1	Number of lights	10
2	Mode of light	2 (0, 1)
3	Desired lighting conditions	0 to 10
4	Number of customers	0 to 9
5	Power modes	2 (1, 2)
6	α1	1
7	α2	2
8	Total states	225,280

Table 4. Comparison.

Sr. No.	Proposed Approach	Previous Work
1	Specific for a particular type of building	Generic model was given
2	Time dependent	Time not taken into consideration
3	Different MDP model, i.e., decomposition is with respect to time	Model decomposition was goal based
4	Multiple air conditioning units	Only one centralized air conditioning unit
5	Multiple lighting switches with separate control of each	Only one lighting switch that controlled lighting levels

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tahir, I.; Nasir, A.; Algethami, A. Optimal Control Policy for Energy Management of a Commercial Bank. Energies 2022, 15, 2112. https://0-doi-org.brum.beds.ac.uk/10.3390/en15062112

AMA Style

Tahir I, Nasir A, Algethami A. Optimal Control Policy for Energy Management of a Commercial Bank. Energies. 2022; 15(6):2112. https://0-doi-org.brum.beds.ac.uk/10.3390/en15062112

Chicago/Turabian Style

Tahir, Ifrah, Ali Nasir, and Abdullah Algethami. 2022. "Optimal Control Policy for Energy Management of a Commercial Bank" Energies 15, no. 6: 2112. https://0-doi-org.brum.beds.ac.uk/10.3390/en15062112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Control Policy for Energy Management of a Commercial Bank

Abstract

1. Introduction

2. Literature Review

3. Background and Problem Formulation

3.1. Description of Problem

3.2. Markov Decision Process (MDP)

3.3. Scalability Issues

4. Proposed MDP Model

4.1. Set of States

4.2. Actions

4.3. Transition Probabilities

4.4. Reward/Cost Functions

4.5. Implementation Methodology

5. Case Study

5.1. Parameter Values for Temperature Control

5.2. Parameter Values for Lighting Control

5.3. Calculation of Optimal Policies

6. Results

6.1. Simulation Case 1: No Change in Parameters

6.2. Simulation Case 2: Variation in Number of Customers

6.3. Simulation Case 3: Desired Temperature

6.4. Simulation Case 4: Effect of Power Sources

6.5. Simulation Case 5: Desired Lighting

6.6. Comparison with Existing Approaches

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Code Availability

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI