close

Вход

Забыли?

вход по аккаунту

?

19401493.2017.1387607

код для вставкиСкачать
Journal of Building Performance Simulation
ISSN: 1940-1493 (Print) 1940-1507 (Online) Journal homepage: http://www.tandfonline.com/loi/tbps20
Performance testing of energy models: are we
using the right statistical metrics?
Debaditya Chakraborty & Hazem Elzarka
To cite this article: Debaditya Chakraborty & Hazem Elzarka (2017): Performance testing
of energy models: are we using the right statistical metrics?, Journal of Building Performance
Simulation, DOI: 10.1080/19401493.2017.1387607
To link to this article: http://dx.doi.org/10.1080/19401493.2017.1387607
Published online: 20 Oct 2017.
Submit your article to this journal
Article views: 39
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=tbps20
Download by: [UAE University]
Date: 25 October 2017, At: 16:00
Journal of Building Performance Simulation, 2017
https://doi.org/10.1080/19401493.2017.1387607
Performance testing of energy models: are we using the right statistical metrics?
Debaditya Chakraborty
∗
and Hazem Elzarka
College of Engineering and Applied Science, University of Cincinnati, P.O. Box 210071, Cincinnati, OH 45221-0071, USA
Downloaded by [UAE University] at 16:00 25 October 2017
(Received 3 April 2017; accepted 29 September 2017 )
Testing the predictive performance of energy models (EMs) is necessary to evaluate their accuracies. This paper investigates
the adequacy of existing statistical metrics that are often used by professionals and researchers to test EMs. It discerns that
coefficient of variance of root mean squared error (CVRMSE) and mean bias error (MBE), which are prescribed in ASHRAE
guideline 14, are not suitable for system-level energy model testing. It points out the limitations of CVRMSE, MBE, and
also root mean squared error (RMSE). The analysis shows that the normalizing term of statistical metrics influences its
accuracy in determining the predictive performance of EMs. An alternative metric (range normalized root mean squared
error, RN_RMSE) is proposed that normalizes the RMSE by the range of the data as a replacement for CVRMSE. It is
shown that RN_RMSE when used in tandem with R2 can provide more meaningful and accurate representation of the
performance of system-level EMs.
Abbreviations Ems: energy models; IDFs: input data files; R2 : coefficient of determination; RMSE: root mean squared
error; CVRMSE: coefficient of variance of root mean squared error; MBE: mean bias error; RN RMSE: range normalized
root mean squared error
Keywords: energy modelling; statistical metrics; performance testing of energy models; normalized root mean squared
error
1. Introduction
Whole-building energy models (EMs) have been used in
the past for building design and retrofit, measurement and
verification, code compliance, green certification, qualification for tax credits, and utility incentives.1 Recently,
the trend is shifting towards more advanced systemlevel energy modelling techniques, algorithms, and tools
for model-based control, fault detection, optimization
of building operation, predictive analysis, etc. (e.g. AlHomoud 2001; Clarke et al. 2002; Lee, House, and
Kyong 2004; Wetter 2011; Zhao and Magoulès 2012). The
existing statistical metrics for testing the predictive performance of EMs were proposed a few decades ago for
whole-building measurement and verification (M&V) of
energy conservation measures (Reddy and Claridge 2000;
Reddy 2006; ASHRAE 2014a). For example, ASHRAE
still prescribes the use of coefficient of variance of root
mean squared error (CVRMSE) and mean bias error
(MBE) for calibration of whole-building EMs. This paper
evaluates whether the widely used statistical metrics,
including CVRMSE and MBE, are suitable for testing the
predictive performance of system-level EMs such as cooling and heating energy consumption. We focus on systemlevel energy modelling because it is often considered for
*Corresponding author. Email: chakrada@mail.uc.edu
© 2017 International Building Performance Simulation Association (IBPSA)
in-depth analysis of energy uses and identify and eliminate
energy wastage.
‘Testing the predictive performance’, sometimes
referred to as ‘validation of models’ in the existing
literature, is an important step involved in energy modelling. History and development of testing techniques of
energy modelling software is well established in Judkoff
et al. (1983), Bloomfield (1999), ASHRAE (2013, Chapter 19), and ASHRAE (2014b, Annex 23), which includes
the following elements:
• Analytical tests – in which predicted results from a
program or algorithm are compared to results from
a known analytical solution or a generally accepted
mathematical method.
• Comparative tests – in which a program is compared
to itself or to other programs. Bloomfield (1999)
showed that this type of comparison can be a very
powerful way of identifying errors. This approach is
adopted in this research paper.
• Empirical tests – in which calculated results from
a program or algorithm are compared to monitored
data from a real building, test cell, or laboratory
experiment. Such comparisons are often used to
2
D. Chakraborty and H. Elzarka
calibrate EMs for M&V and cross-validate datadriven EMs.
Table 1. Summary of statistical performance metrics used by
researchers in the last two decades.
Metric
Downloaded by [UAE University] at 16:00 25 October 2017
Every type of tests mentioned above requires statistical
metrics to report the results. For example, in comparative
testing, the statistical metrics are used to represent the gap
between modelled results of one program to the modelled
result of another. Different types of statistical metrics have
been used by researchers for testing the predictive performance of EMs, including CVRMSE and MBE, which
are prescribed in ASHRAE Guideline 14 to test calibrated
EMs. Garrett and New (2016) have questioned the suitability of CVRMSE and MBE. They suggested that further
work is required to evaluate the adequacy of such metrics.
In this regard, the objectives of our research paper are set
as follows:
• To analyse the adequacy of prescribed and commonly used statistical performance metrics for validating system-level EMs.
• To identify the limitations associated with each one
of these statistical performance metrics.
• To suggest viable alternative statistical performance
metrics to overcome the identified limitations of the
widely used metrics.
In this research, we have used prototype IDFs (input
data files) (available online2 ) developed by the U.S.
Department of Energy (DOE) with EnergyPlus to generate a synthetic database for comparative testing. These
prototype IDFs are used to generate various types of
energy consumption data for some benchmarked buildings as per ASHRAE standards. These IDFs provide a
consistent basis for research, energy code development,
appliance standards, and measurement of progress toward
the DOE energy goals (Torcellini et al. 2008). Using these
IDFs, a controlled test set-up is created to generate different sets of energy-related data by varying the time-step
granularity. A high-frequency time step (1 min) providing
the synthetic baseline dataset and lower frequency time
steps (15 min, 30 min, and 1 h) providing the modelled
datasets.
The rest of the paper is organized as follows. Section 2
explains the existing statistical metrics that are commonly
used by researchers to test the predictive performance of
EMs. Section 3 introduces an alternative statistical metric and explains the theoretical advantages of this metric.
Section 4 outlines the methodology adopted in this paper
to evaluate the adequacy of various statistical metrics. The
results are provided and discussed in Section 5. Section 6
concludes this paper by reporting the major findings from
this research and Section 7 highlights the future scope of
work.
R2
RMSE
CVRMSE
MBE
Occurrences in research papers
Dhar, Reddy, and Claridge (1999), AydinalpKoksal, Ugursal, and Fung (2002),
Ben-Nakhi and Mahmoud (2004),
Aydinalp-Koksal and Ugursal (2008), Lam
et al. (2010),
Jacob et al. (2010), Zhang et al. (2015), Deb
et al. (2016)
Aydinalp-Koksal, Ugursal, and Fung (2002), Zhang
et al. (2015)
Dhar, Reddy, and Claridge (1999), AydinalpKoksal, Ugursal, and Fung (2002), Dong, Cao,
and Lee (2005);
Pan, Huang, and Wu (2007), Aydinalp-Koksal and
Ugursal (2008),
Lam et al. (2010), Kwok, Yuen, and Lee (2011),
Ke, Yeh, and Jian (2013),
Kandil and Love (2014), Zhang et al. (2015)
Pan, Huang, and Wu (2007), Lam et al. (2010), Ke,
Yeh, and Jian (2013),
Kandil and Love (2014), Zhang et al. (2015),
Gestwick and Love (2014)
Note: CVRMSE and MBE are the statistical metrics recommended
in ASHRAE Guideline 14 for energy model calibration. Their
respective tolerance values for calibrated simulated EMs should
be within 15% and ± 5% when monthly data are used; 30% and
± 10% when hourly data are used.
R2 denotes coefficient of determination. RMSE denotes root mean
squared error.
2.
Statistical metrics for energy model performance
testing
Predictive performance of EMs is quantified by summarizing the pairwise distance between the baseline values
and modelled values using statistical metrics, which is also
referred to as ‘model accuracy’ or ‘error’ or ‘goodnessof-fit’ (e.g. Granderson and Price 2014). The statistical
metrics commonly used by researchers to test the predictive performance of EMs along with their occurrences in
research papers is summarized in Table 1.
Based on the literature review, the authors found that
the statistical metrics used for testing the predictive performance of EMs varied from article-to-article. As a
result, future researchers may have difficulty in replicating or comparing their results to previous work. In
this research, the various statistical metrics are described
and compared quantitatively to examine their robustness
and flexibility.
Following are the descriptions of the statistical metrics given in Table 1. (For all equations described below,
n is the total number of data points in the dataset; k =
1, 2, . . . , n. Yk indicate baseline values, Ŷk indicate modelled values, and μ is the mean of the baseline dataset
‘Y’):
3
Journal of Building Performance Simulation
Downloaded by [UAE University] at 16:00 25 October 2017
• Coefficient of determination (R2 ) indicates how well
the regression estimate fits the data. The formula is
given by
n
k )2
(Yk − Y
.
(1)
R2 = 1 − k=1
n
2
k=1 (Yk − μ)
A value of 1 suggests that the model is perfect,
whereas 0 indicates that there is no correlation
between the modelled and baseline values. In other
words, a value closer to 1 is desirable. Sometimes
R2 can take negative values, which indicates that the
variance of the error is more than the variance of the
baseline data. In such cases, the mean of the baseline data is a better predictor than the model and such
models should be treated accordingly before implementation. More details about this are provided in
Section 5.4. An R2 value of 0.9 may be interpreted
as: ‘Ninety percent of the variance in the baseline
values can be explained by the modelled values.’
• Root mean squared error (RMSE) represents the
sample standard deviation of the differences between
modelled and baseline values, which is a measure of
accuracy in the modelled values. Mathematically, it
is represented as follows:
n
2
k=1 (Yk − Yk )
.
(2)
RMSE =
n
This metric is sensitive to the scale of data and may
range between 0 and ∞, where a lower value is
desirable.
• Coefficient of variance of root mean squared error
(CVRMSE) is derived by normalizing the RMSE
with the mean of the data and has the advantage of
providing a unit-less percentage value representing
the accuracy of EMs. It is mathematically defined as
follows:
n
2
k=1 (Yk −Yk )
CVRMSE(%) =
n
μ
RMSE
× 100.
=
μ
× 100
(3)
This metric was proposed to eliminate the dependency of RMSE on the scale of data. As per
ASHRAE (2014a), the CVRMSE value for calibrated EMs developed using hourly data must be less
than 30%. Lower values of this metric are desirable.
• MBE indicates how well the modelled values match
with the baseline values. Mathematically, it is represented as follows:
n
k )
(Yk − Y
n
× 100.
(4)
MBE(%) = k=1
k=1 (Yk )
Positive values indicate that the model under predicts the baseline values; negative values indicate
that the model over predicts the baseline values. As
per ASHRAE (2014a), the value of MBE for calibrated EMs developed by using hourly data must be
within ±10%. An MBE of 0 suggests that there is
no bias in the model. A major disadvantage associated with this metric is that it depicts the percentage
of total difference between baseline and modelled
values with respect to the total baseline value over
the entire simulated period. As a result, this metric
suffers from the cancellation of positive and negative errors, which can be misleading in terms of
interpreting the true performance of EMs.
3.
Proposed alternative statistical metric – range
normalized root mean squared error (RN_RMSE)
Normalized forms of statistical metrics are useful to compare models developed on different datasets having dissimilar properties. As the name suggests, RN_RMSE is a
normalized form of RMSE in which the range of the dataset
is used for normalization. RN_RMSE is mathematically
defined as follows:
n
2
k=1 (Yk −Yk )
n
RN_RMSE(%) =
maximum(Y) − minimum(Y)
RMSE
=
× 100.
range(Y)
× 100
(5)
This metric evaluates the accuracy of models and a lower
value is desirable. It is hypothesized that RN_RMSE can
provide more reliable estimates of the predictive performance of EMs as compared to CVRMSE. The reasons
behind this hypothesis are discussed below in Section 3.1.
3.1.
Advantage of RN_RMSE over CVRMSE from a
theoretical perspective
In the analysis of data, it is often desirable to convert the
original data to a norm or common standard (Dodge 2003).
Converting the original data to z-scores can set a standard scale for different datasets (Salkind 2007). z-Score
(Z) transformation is a technique in which the mean (μ)
and standard deviation (σ ) of the original data (Y) are
transformed to zero and one, respectively, using
Z=
Y−μ
.
σ
(6)
Transforming the baseline and predicted energy data by zscore technique was exploited in this work. However, the
drawback of such a process is that the importance of the
mean (μ) and standard deviation (σ ) is lost. Also, a metric
based on the z-scores of the baseline and predicted energy
data will not be able to determine the bias in the model
because the transformation always brings back the basis to
zero. For example, consider a case where all the predicted
4
D. Chakraborty and H. Elzarka
values (Ŷ) are biased from the baseline values (Y) by a
constant large number (H ), resulting in:
Downloaded by [UAE University] at 16:00 25 October 2017
Ẑ =
(Y + H ) − (μ + H )
Y−μ
Ŷ − μ̂
=
=
= Z. (7)
σ̂
σ
σ
Even though there is a large bias (H ) in the predictions, the
resulting metric value (based on the respective z-scores)
will turn out to be zero, which is undesirable. Readers may
note that σ̂ = σ in Equation (7) because constant additive difference between datasets only affect the measures
of central tendency (e.g. mean, median, mode) but not the
measures of variability (e.g. standard deviation, range).
Although z-score transformation was not utilized to
develop the alternative metric, its basic properties were
exploited to evaluate the adequacy of RN_RMSE over
CVRMSE. Z-score transformation works on the principle of shifting (by the mean) and scaling (by the standard
deviation) of data to allow comparison between datasets.
Accordingly, if a statistical metric can account for shifting and scaling differences between datasets, then it can be
considered to be a well-normalized metric that is suitable
for comparison. Scaled (referred as multiplicative) differences are witnessed when datasets have different units-ofmeasurement, such as kilowatt-hour (kWh) or mega joules
(MJ). Normalization techniques utilized in statistical performance metrics should be able to bring such differences
to a notionally common scale. This is analogous to the
scaling of data. Shifted (referred as additive) differences
are witnessed when datasets occupy different regions in the
Euclidean space. Normalization techniques utilized in statistical performance metrics should also be able to account
for such differences by normalizing w.r.t. (with respect to)
the respective regions of datasets and not w.r.t. the global
k ) resulting
origin ‘(0, 0)’. Otherwise, residuals (Yk − Y
from datasets that are far off from the origin ‘(0, 0)’ will
give the illusion of being smaller in comparison to the
residuals from datasets that are nearer to the origin. This
is analogous to the shifting of data.
In case of CVRMSE, the numerator (RMSE) is adjusted
for models based on different data scales to a notionally common scale. The basic idea is that when two
datasets differ in a multiplicative way by a constant ‘m’,
i.e. when every data point is multiplied by ‘m’, then
RMSE is not useful for comparison. In contrast, CVRMSE
can be compared as the resulting values RMSE/μ and
(
m × RMSE)/(
m × μ) are equal, where μ is the mean
of the independent variable. Therefore, for multiplicative
difference between datasets, CVRMSE provides both interesting and useful information about comparative model
performance.
The problem with CVRMSE exists when datasets differ in an additive way by a constant ‘a’, i.e. when
‘a’ is added to every data point in the dataset. In
that case, the respective CVRMSE values RMSE/μ
and RMSE/(a + μ) are not useful for comparison
anymore. In such cases, RN_RMSE can provide meaningful and interesting information as the resulting values RMSE/(maximum(Y) − minimum(Y)) and RMSE/
((maximum(Y) + a) − (minimum(Y) + a)) are still equal.
Notice that in case of additive differences between datasets,
‘a’ does not appear in the numerator because a uniform shift of all data points by a distance ‘a’ relative to the origin does not change the resulting RMSE
value. In addition, RN_RMSE can also account for
multiplicative difference between datasets as RMSE/
m × RMSE/
m×
(maximum(Y) − minimum(Y)) and (maximum(Y) − minimum(Y)) are equal. The abovementioned concepts are illustrated using an example as
shown in Figure 1.
The statistical drawbacks of RMSE, CVRMSE and
MBE discussed above are conceptually illustrated in
Figure 1 with linear models developed using least
squares method in MATLAB.3 In these plots, hypothetical peak cooling energy consumption datasets are used
to model their relationship w.r.t. the outside temperature. The drawbacks are interpreted via two cases that are
as follows:
Case 1 – Consider the comparison between Figure 1(a)
and 1(b) where both datasets have same variation (σ ), but
different mean (μ). The dataset in Figure 1(a) occupies
the rectangular region between ‘(18, 107)’ and ‘(28, 230)’,
whereas the dataset in Figure 1(b) occupies the rectangular region between ‘(18, 207)’ and ‘(28, 330)’. This is a
case where an additive difference exists between datasets
as mentioned above, where Y2 = Y1 + 100. Both models result in exactly the same residual terms as shown
in the plots. The corresponding CVRMSE and MBE values are different, whereas the RMSE, RN_RMSE, and
R2 values are same for these two models. Therefore, the
point is, when both the models are developed on very
similar datasets, involved the same level of difficulty in
model development, generated using the same algorithm,
resulted in exactly the same residual terms, then why are
the respective CVRMSE and MBE values different?
Case 2 – Consider the comparison between Figure 1(a)
and 1(c) where both datasets have different variation
(σ ) and mean (μ). This is a case where multiplicative
difference exists between datasets as mentioned above,
where Y3 = Y1 × 3.6. Notice that the normalized metrics
(CVRMSE, MBE, R2 and RN_RMSE) do not change their
respective values in the case of multiplicative difference,
which is desirable.
The cases discussed above imply that RN_RMSE and
R2 provide more accurate normalized estimates of the
predictive performance of models because they can normalize additive differences between datasets. In general,
energy consumption related datasets may differ additively,
multiplicatively or a combination of both. For example,
Figure 2 shows that it is not uncommon for additive differences to exist between energy consumption datasets.
Notice the shift of region in Euclidean space corresponding
5
Journal of Building Performance Simulation
220
Y1 (kWh)
200
= 42.28 kWh
= 187.65 kWh
range = 120.53 kWh
180
160
140
Datapoints
Model: Y1=12.98*X-122.4
Pred bnds (90%)
120
Residuals - Y1 (kWh)
40
MBE = -0.025%
CVRMSE = 10.03%
RMSE = 18.82 kWh
RN_RMSE = 15.61%
2
R = 0.78
30
20
10
0
-10
-20
-30
320
Y2 (kWh)
300
20
21
22
23
X (° C)
24
25
26
27
= 42.28 kWh
= 287.65 kWh
range = 120.53 kWh
280
260
240
Datapoints
Model: Y2 =12.98*X-22.4
220
Pred bnds (90%)
40
MBE = -0.016%
CVRMSE = 6.54%
RMSE = 18.82 kWh
RN_RMSE = 15.61%
2
R = 0.78
Residuals - Y2 (kWh)
30
20
10
0
-10
-20
-30
19
800
750
Y3 (MJ)
700
20
21
22
23
X (° C)
24
25
26
27
= 152.21 MJ
= 675.54 MJ
range = 433.91 MJ
650
600
550
500
Datapoints
Model Y3 = 46.728*X-440.64
Pred bnds (90%)
450
400
150
Residuals - Y3 (MJ)
Downloaded by [UAE University] at 16:00 25 October 2017
19
MBE = -0.025%
CVRMSE = 10.03%
RMSE = 67.75 MJ
RN_RMSE = 15.61%
2
R = 0.78
100
50
0
-50
-100
19
20
21
22
23
X (° C)
24
25
26
27
Figure 1. Conceptual plots to illustrate the advantage of the proposed metrics from a statistical perspective: (a) linear model on dataset
A, (b) linear model on dataset B and (c) linear model on dataset C. Note: σ is the standard deviation; μ is the mean of data.
6
D. Chakraborty and H. Elzarka
2200
Total electricity consumed by facility
Electricity consumed from cooling
2000
1800
Energy Consumption (kWh)
1600
1400
1200
1000
800
600
Shift of region in Euclidean space signifies the presence of additive difference between two data sets.
400
200
Downloaded by [UAE University] at 16:00 25 October 2017
0
0
100
200
300
400
Time (hours)
500
600
700
800
Figure 2. Plot to illustrate that additive difference exists between energy consumption datasets. Note: The data shown above are
synthetically produced in EnergyPlus with a prototype input data file as described in Section 1.
to each dataset, which is similar to the situation mentioned in Case 1. In reference to Figure 2, it is obvious that
residuals resulting from any model built on the total electricity consumption dataset will appear to be comparatively
smaller when normalized w.r.t. the reference point (0, 0)
in the Euclidean space. Therefore, statistical metric values may appear comparatively smaller or bigger when the
mean or sum of the data is used for normalization. However, if the actual originating points (minimum values) of
respective datasets are included in the denominator term of
normalized metrics, then more accurate estimates of model
performance can be obtained. RN_RMSE (Equation (5))
directly includes the minimum (originating point) in the
denominator term, whereas R2 (Equation (1)) refers to the
minimum indirectly in the denominator term.
Variability in energy consumption datasets provides a
way to describe how much they differ and allows comparison between them. Difference in the variability between
datasets influences the difficulty in modelling and the
resulting residuals. Since model residuals are influenced
by the variability in data, it is also important to include
a measure of data variability in the denominator term of
normalized statistical performance metrics. Otherwise, the
numerator and denominator may be out of proportion that
can create the illusion of an inflated or deflated metric
value. Thus, measures of variability such as the range
(in Equation (5)) and standard deviation (in Equation (1))
are preferred for normalizing the residuals as compared
to measures of central tendencies such as the mean (in
Equation (3)) or the sum (in Equation (4)) of the baseline
datasets. The theoretical points discussed in this section
suggest that RN_RMSE and R2 can not only provide more
accurate estimates of model performance but also allow
fair comparison between models developed on different
datasets. Therefore, this paper suggests that RN_RMSE
should be used as the primary metric for validation of
EMs instead of CVRMSE. R2 should be used as a secondary metric for validation of EMs, the reason for which
is explained in Section 5.4.
4. Methodology
In Section 3, the advantages of using RN_RMSE,
and R2 for testing the predictive performance of EMs
are explained at a conceptual level. To further affirm
the advantages, a comparative testing methodology is
employed using synthetic energy consumption data as
described previously in Section 1. The synthetic data are
obtained from EnergyPlus simulations with prototype IDFs
representing small- and medium-size office buildings,
which are elaborated in Section 4.1. The technique adopted
in this research to prepare and compare multiple sets of
data points for each building under study is explained
in Section 4.2. The visualization technique that is used
to identify inadequacies of existing statistical metrics is
explained in Section 4.3.
4.1. EnergyPlus prototype IDFs
EnergyPlus is a simulation engine for energy modelling
in buildings, which is supported by the United States
Department of Energy and promoted through the Building and Technology Program of the Energy Efficiency and
Renewable Energy Office. EnergyPlus is well known as
an efficient tool in the building energy analysis community that combines the best capabilities and features from
BLAST and DOE-2 along with various new capabilities
(Crawley et al. 2001; EnergyPlus 2017). Applications of
7
Journal of Building Performance Simulation
Downloaded by [UAE University] at 16:00 25 October 2017
Figure 3. Visualizing bad vs. good models using scatter plot between modelled and baseline values: (a) bad model and (b) good model.
EnergyPlus include load calculation, energy simulation,
building performance simulation, energy performance,
heat, and mass balance that can be used to model cooling, heating, ventilating, lighting, and water consumption
in buildings (Fumo, Mago, and Luck 2010). EnergyPlus is
also open sourced and free to use and can be downloaded
online.4
U.S. Department of Energy’s Building Technologies Program in collaboration with the Pacific Northwest National Laboratory, Lawrence Berkeley National
Laboratory, and National Renewable Energy Laboratory
have developed and made available prototype IDFs for
different types of buildings in various locations representing all U.S. climate zones. Synthetic energy datasets are
generated in EnergyPlus with prototype IDFs for smalland medium-size office buildings, which are located in
Fairbanks (subarctic climate), Phoenix (hot and dry climate), and San Francisco (warm and marine climate). The
small office building is rectangular in shape with a total
area of 510.97 m2 covering 1 floor with four perimeter
zones, one core zone, and an attic zone. The window to
wall ratio is 24.4% for South and 19.8% for the other
three orientations. Air-source heat pump is used for conditioning the building and a gas furnace is used as a
backup for additional heating requirement. Conditioned
air is supplied to the building through constant air volume
units. The medium office building is also rectangular in
shape with a total area of 4979.6 m2 covering three floors
where each floor has four perimeter zones and one core
zone. The window-to-wall ratio is 33%. Packaged airconditioning unit (PACU) is used for cooling requirements
and a gas furnace inside the PACU is used for heating
requirements. Conditioned air is supplied to the building
through variable air volume units with damper and electric
reheating coil.
4.2.
Adequacy of various statistical performance
metrics
As mentioned in Section 1, this paper studies the adequacy
of various statistical metrics to validate the performance
of system-level EMs, because system-level energy modelling is often necessary for in-depth analysis of energy
uses and identify and eliminate energy wastage. All properties of both small- and medium-size prototype building
IDFs are kept unchanged except for the time step object
that is set as 60 min (1 h), 30 min, 15 min, and 1 min,
so as to generate four different models for each energy
consumption type (cooling and heating) for each of the
three building locations mentioned above. Therefore, in
total 2 × 4 × 2 × 3 = 48 EMs are developed during this
research. The time step object specifies the time interval
between successive zone heat and mass balance calculations in EnergyPlus simulation software. Usually, the lesser
the time step, the longer it takes for the software to run but
Table 2. Performance testing results for cooling electricity
consumption models for small office buildings.
Metrics
Comparison
Baseline – Model 1
MBEa (%) Baseline – Model 2
Baseline – Model 3
Baseline – Model 1
CVRMSEa Baseline – Model 2
(%)
Baseline – Model 3
Baseline – Model 1
RMSE
Baseline – Model 2
(kWh)
Baseline – Model 3
Baseline – Model 1
RN_RMSE Baseline – Model 2
(%)
Baseline – Model 3
Baseline – Model 1
R2 (0–1)
Baseline – Model 2
Baseline – Model 3
San
Fairbanks Phoenix Francisco
6.283
3.804
3.841
19.958
13.643
11.519
0.036
0.025
0.021
1.020
0.697
0.589
0.996
0.998
0.999
2.934
1.864
1.898
13.563
7.036
4.564
0.142
0.074
0.048
1.679
0.871
0.565
0.995
0.999
0.999
9.212
5.504
5.301
25.222
15.102
12.242
0.061
0.036
0.029
1.593
0.954
0.773
0.989
0.996
0.997
Note: RMSE, CVRMSE, MBE and RN_RMSE are estimates of
model accuracy and lower values are desirable. R2 is also an estimate of model accuracy and 1 indicates a perfect fit, whereas 0
or negative values indicate poor fit of models. (Note: 8760 data
points were present in each of the test sets that were used to evaluate model performance.)
a Prescribed in ASHRAE Guideline 14.
8
D. Chakraborty and H. Elzarka
generates more accurate results.5 Thus in this paper, the
energy consumption data generated using prototype IDFs
with one-minute time step is treated as the more accurate
baseline data that is compared against the energy consumption data obtained from 15, 30 and 60 min time steps. Such
comparisons aid in understanding the adequacy of various
statistical performance metrics. Henceforth, in this paper,
the developed EMs corresponding to different time steps
are referred by the following names:
In Section 5, the hourly cooling and heating energy
consumption data from models 1, 2, and 3 are compared
to the baseline data using various statistical performance
metrics mentioned in Sections 2 and 3. Therefore, in this
paper, nine different comparison reports are presented (3
models × 3 building locations) for each energy consumption type and each building type. The statistical metrics are
subsequently evaluated by comparing their relative values
Downloaded by [UAE University] at 16:00 25 October 2017
• 1 min time step – Baseline
• 60 min (1 h) time step – Model 1
• 30 min time step – Model 2
• 15 min time step – Model 3
Figure 4. Cooling electricity consumption for small office buildings located in Fairbanks, Phoenix and San Francisco: (a) Fairbanks
cooling, (b) Phoenix cooling and (c) San Francisco cooling. Note: 1 minute time step – Baseline. 60 min (1 hour) time step – Model 1. 30
min time step – Model 2. 15 min time step – Model 3. The axes are different for different cities.
9
Journal of Building Performance Simulation
Downloaded by [UAE University] at 16:00 25 October 2017
and graphically visualizing the difference in the data using
scatter plots.
4.3. Visualizing model performance using scatter plots
Testing model predictions is a critical step in energy model
validation. Scatter plots of modelled vs. baseline values is
a common and reliable way to evaluate model predictions
(Piñeiro et al. 2008). Plotting the data and showing the dispersion of the values is one way to find out how much the
modelled values vary from the baseline values. In an ideal
world, modelled values would be exactly equal to the baseline values, that is the relationship between modelled data
(x) and baseline data (y) can be represented as y = x. However, in the real world, modelled data often deviate from
baseline data, therefore, the primary objective is to always
keep the dispersion of data points around the y = x line to
a minimum. This is the fundamental statistical principle
that will be used in this paper to comparatively evaluate
the adequacy of various statistical metrics. For example,
the difference between a bad and a good model is illustrated in Figure 3(a) and 3(b), respectively. Although both
models consist of the same number of data points, the dispersion of these points from the y = x line in Figure 3(a) is
much larger than that in Figure 3(b), which suggests that
the modelled values in Figure 3(a) are inaccurate relative
to those in Figure 3(b). This visualization technique aids
in an adequate estimation of relative model performance
since it allows detailed observation of the deviation for
each individual data point. In contrast, statistical metrics
are summarized representation of the deviations between
modelled and baseline data points due to which some
information about the model performance is lost. Therefore, comparing the resulting metric values with detailed
scatter plots between modelled and baseline data points
can provide meaningful insight regarding the adequacy of
various statistical metrics. Although scatter plots are one of
the most powerful and widely used techniques for visual
data exploration (Keim et al. 2010), other methods may
also be appropriate depending upon the statistical context.
It is important to mention that statistical metrics are most
useful for computational purposes such as automatic selection of the best model from a variety of different models
and also to set threshold limits in codes and standards.
Therefore, it is inadvisable to overlook statistical metrics
and simply rely on visualization techniques to evaluate the
performance of EMs.
5. Results and discussion
As discussed in Sections 4, a comparative model assessment technique is adopted to determine the adequacy of
statistical performance metrics used by researchers to test
the predictive performance of EMs. The statistical metric values corresponding to each individual model are
calculated using Equations (1)–(5).
5.1.
Case 1a: Small office building simulation results
and discussion for cooling energy
The statistical metric values based on simulation results are
tabulated in Table 2 and the scatter plots of modelled vs.
baseline values are shown in Figure 4.
The following points can be clearly inferred by
analysing and comparing the metric values in Table 2
with corresponding scatter plots in Figure 4 as explained
in Section 4.3:
(1) For all three regions, model 3 is more accurate
than model 2, which in turn is more accurate than
model 1. This is also intuitive since model 3 has
a time step that is closer to the time step of the
baseline model as compared to the other two. However, this intrinsic property of the models cannot
be inferred from the MBE values. As it can be
seen that as per MBE values for Fairbanks and
Phoenix, model 2 appears to be more accurate than
model 3. Except for MBE, all other metrics clearly
depict this property.
(2) The scatter plot for Phoenix model 1 clearly shows
more deviation of data points from y = x line as
compared to Fairbanks model 1 and San Francisco
model 1. Nonetheless, as per the CVRMSE and
MBE values, Phoenix model 1 appears to be the
most accurate among all three, which is clearly an
incorrect interpretation of model performance.
Table 3. Performance testing results for heating electricity
consumption models for small office buildings.
Metrics
Comparison
San
Fairbanks Phoenix Francisco
Baseline – Model 1
0.196
MBEa (%) Baseline – Model 2 − 0.078
Baseline – Model 3 − 0.816
Baseline – Model 1 65.654
CVRMSEa Baseline – Model 2 36.575
(%)
Baseline – Model 3 18.453
Baseline – Model 1
0.249
RMSE
Baseline – Model 2
0.139
(kWh)
Baseline – Model 3
0.070
Baseline – Model 1
3.558
RN_RMSE Baseline – Model 2
1.982
(%)
Baseline – Model 3
1.000
Baseline – Model 1
0.930
R2 (0–1)
Baseline – Model 2
0.978
Baseline – Model 3
0.994
− 0.157 − 1.260
− 3.213 − 2.715
− 4.726 − 3.982
107.094 63.717
68.559 41.303
54.857 30.320
0.022
0.028
0.014
0.018
0.011
0.013
0.686
0.806
0.439
0.522
0.352
0.383
0.985
0.988
0.994
0.995
0.996
0.997
Note: RMSE, CVRMSE, MBE and RN_RMSE are estimates of
model accuracy and a lower value is desirable. R2 is also an estimate of model accuracy and 1 indicates a perfect fit whereas 0
or negative values indicate poor fit of models. (Note: 8760 data
points were present in each of the test sets that were used to evaluate model performance.)
a Prescribed in ASHRAE Guideline 14.
10
D. Chakraborty and H. Elzarka
corresponding values are very close to each other,
this metric does not provide incorrect estimate of
relative model performance.
(6) It can be critically argued that having a lower
CVRMSE for Phoenix is somehow justifiable
because the average cooling energy consumption is relatively higher in Phoenix compared to
the other two regions. However, as explained in
Section 3, just because the average energy consumption of a dataset is relatively higher, does
not necessarily indicate that the modelling task is
more difficult or the dataset is more complicated
Downloaded by [UAE University] at 16:00 25 October 2017
(3) Since RMSE is not a normalized statistical metric,
it can only be compared between models whose
errors are measured in the same scale and units.
In other words, it cannot account for multiplicative differences between datasets as mentioned
previously in Section 3.
(4) RN_RMSE does not suffer from the limitations
mentioned in points 1–3 and can provide an
accurate interpretation of corresponding model
performance.
(5) Although it is hard to clearly distinguish between
these models’ performance using R2 as its
Figure 5. Heating electricity consumption for small office buildings located in Fairbanks, Phoenix and San Francisco. Note: 1 minute
time step – Baseline. 60 min (1 hour) time step – Model 1. 30 min time step – Model 2. 15 min time step – Model 3. The axes are different
for different cities. (a) Fairbanks heating, (b) Phoenix heating and (c) San Francisco heating.
Journal of Building Performance Simulation
or the resulting models are relatively different in
terms of performance. Therefore, it is clear that
CVRMSE will provide unfair estimates of model
performance developed on a dataset that has lower
relative average energy consumption value. This
phenomenon is even more apparent when EMs
are developed on heating gas/electricity consumption data because heating requirement often varies
widely over the year that creates low average values for datasets with high variability. This results in
very high CVRMSE values that can mislead analysts to discard good EMs (see Section 5.2 and 5.4
for more details).
Downloaded by [UAE University] at 16:00 25 October 2017
5.2.
Case 1b: Small office building simulation results
and discussion for heating energy
The statistical metric values based on simulation results are
tabulated in Table 3 and the scatter plots of modelled vs.
baseline values are shown in Figure 5.
Analysis details for this case are provided as follows:
(1) The argument against MBE provided in Section 5.1
also holds true in this case. As shown in Table 3
and Figure 5, no patterns emerge from the
MBE values which would suggest that Model
3 is the closest and Model 1 is the farthest
from the baseline for each location. Thereby
(2)
(3)
(4)
(5)
Table 4. Performance testing results for cooling electricity
consumption models for medium office buildings.
Metrics
Comparison
Baseline – Model 1
MBEa (%) Baseline – Model 2
Baseline – Model 3
Baseline – Model 1
CVRMSEa Baseline – Model 2
(%)
Baseline – Model 3
Baseline – Model 1
RMSE
Baseline – Model 2
(kWh)
Baseline – Model 3
Baseline – Model 1
RN_RMSE Baseline – Model 2
(%)
Baseline – Model 3
Baseline – Model 1
R2 (0–1)
Baseline – Model 2
Baseline – Model 3
San
Fairbanks Phoenix Francisco
8.472
8.562
7.271
54.685
40.567
29.326
1.233
0.915
0.661
2.569
1.906
1.378
0.966
0.981
0.990
10.485
10.291
7.514
35.236
24.870
17.169
6.982
4.928
3.402
6.304
4.449
3.072
0.917
0.959
0.980
12.513
10.389
9.136
42.000
28.342
22.101
1.428
0.963
0.751
3.171
2.140
1.669
0.951
0.978
0.986
Note: RMSE, CVRMSE, MBE and RN_RMSE are estimates of
model accuracy and a lower value is desirable. R2 is also an estimate of model accuracy and 1 indicates a perfect fit whereas 0
or negative values indicate poor fit of models. (Note: 8760 data
points were present in each of the test sets that were used to evaluate model performance.)
a Prescribed in ASHRAE Guideline 14.
11
providing misleading information about corresponding model performance.
As mentioned in Section 5.1, CVRMSE can often
provide a biased estimate of model performance.
This point is further consolidated by the results
provided in Table 3 and Figure 5. It is important to notice the inflated CVRMSE values are
for Phoenix and San Francisco, although there
is no evidence of such inaccurate model estimation from the scatter plots provided in Figure 5.
Since Phoenix and San Francisco do not require
heating throughout the year, the average yearly
consumption value that is the denominator term
in Equation (3) is very low, which caused the
CVRMSE value to be so high.
RMSE clearly suggests that Model 3 is the closest and Model 1 is the farthest from the baseline
for each location, which is also intuitive since the
time step for Model 3 is closest and the time step
for Model 1 is farthest from the baseline. Since
RMSE is an absolute metric, it does not make sense
to compare its corresponding values for different
locations.
It can be seen from the RN_RMSE values in
Table 3 and Figure 5 that this metric does not suffer
from the limitations of MBE and CVRMSE mentioned in points 1 and 2. Thereby providing more
clarity about model’s performance to analysts.
Although R2 do not suffer from the limitations
mentioned above, it is not particularly useful in
this case because the corresponding values are very
close to each other. Therefore, it can be difficult for
an analyst to distinguish clearly.
5.3.
Case 2a: Medium office building simulation
results and discussion for cooling energy
It is theoretically explained in Section 3 that the proposed metric RN_RMSE can overcome the limitations of
CVRMSE and MBE. The benefits of the proposed metrics
are further consolidated using simulation results from a
medium-size office building. The statistical metric values
based on simulation results for cooling energy consumption are tabulated in Table 4 and the scatter plots of
modelled vs. baseline values are shown in Figure 6.
Results in Table 4 and Figure 6 support the same
inferences as those in Sections 5.1 and 5.2. For example:
(1) MBE for Fairbanks Model 2 is higher than Fairbanks Model 1, thereby suggesting that Model 1 is
closer to the baseline than Model 2. This is clearly
not the case based on the visualization provided
in Figure 6 as well the known time step of the
corresponding models.
(2) CVRMSE for Phoenix models are comparatively
lower than other two locations but the scatter
12
D. Chakraborty and H. Elzarka
5.4.
Case 2b: Medium office building simulation
results and discussion for heating energy
The statistical metric values based on simulation results
for heating energy consumption in medium-size office
building are tabulated in Table 5 and the scatter plots of
modelled vs. baseline values are shown in Figure 7.
It was briefly mentioned in Section 3 that the proposed metric RN_RMSE should be used in tandem with
R2 to correctly evaluate the performance EMs. Analysing
this case provides some very interesting insights about the
Downloaded by [UAE University] at 16:00 25 October 2017
plots in Figure 6 impart a different picture. The
scatter plots clearly reveal that the dispersion of
data points from the y = x line is maximum for
Phoenix models compared to the other locations
(refer to Section 4.3 for explanation). Therefore,
it is shown once again that CVRMSE provides
incorrect estimate of models’ performance.
(3) As expected, RN_RMSE and R2 depict correctly
the relative performance of EMs in this case as
well.
Figure 6. Cooling electricity consumption for medium office buildings located in Fairbanks, Phoenix and San Francisco. Note: 1 minute
time step – Baseline. 60 min (1 hour) time step – Model 1. 30 min time step – Model 2. 15 min time step – Model 3. The axes are different
for different cities. (a) Fairbanks cooling, (b) Phoenix cooling and (c) San Francisco cooling.
Journal of Building Performance Simulation
Table 5. Performance testing results for heating gas consumption models for medium office buildings.
Downloaded by [UAE University] at 16:00 25 October 2017
Metrics
Comparison
San
Fairbanks Phoenix Francisco
Baseline – Model 1 − 8.589 − 224.1 − 142.433
MBEa (%) Baseline – Model 2 − 5.749 − 202.717 − 109.047
Baseline – Model 3 − 4.643 − 187.53 − 94.67
Baseline – Model 1 16.704 1822.256 724.472
CVRMSEa Baseline – Model 2 10.756 1669.183 578.161
(%)
Baseline – Model 3
8.412 1558.341 511.576
Baseline – Model 1
5.713
0.881
1.039
RMSE
Baseline – Model 2
3.679
0.807
0.829
(kWh)
Baseline – Model 3
2.877
0.753
0.734
Baseline – Model 1
2.107
6.521
4.34
RN_RMSE Baseline – Model 2
1.357
5.974
3.463
(%)
Baseline – Model 3
1.061
5.577
3.064
Baseline – Model 1
0.992
− 1.197
0.114
R2 (0–1)
Baseline – Model 2
0.996
− 0.844
0.436
Baseline – Model 3
0.998
− 0.607
0.558
Note: RMSE, CVRMSE, MBE and RN_RMSE are estimates of
model accuracy and a lower value is desirable. R2 is also an estimate of model accuracy and 1 indicates a perfect fit whereas 0
or negative values indicate poor fit of models. (Note: 8760 data
points were present in each of the test sets that were used to evaluate model performance.)
a Prescribed in ASHRAE Guideline 14.
capability of RN_RMSE when used together with R2 . The
following points can be inferred from the results given in
Table 5 and Figure 7:
(1) By observing the MBE and CVRMSE values, at first, it seems obvious that all models
except for the Fairbanks models are unacceptable and therefore they must be discarded. These
values are just absurdly high suggesting that the
models are poor but do not provide any information to analysts about the reasons behind poor
performance.
(2) Interestingly, the RN_RMSE values are low, yet
R2 values are negative and low for Phoenix and
San Francisco models respectively. Statistically, it
is well known that high bias, high variance or nonlinear relationship between modelled and baseline
values can result in poor R2 values. Since the
RN_RMSE (normalized RMSE) values are low, it
is obvious that the models do not suffer from either
high bias or high variance. Therefore, the only possibility is that there exist nonlinear relationships
between the modelled and baseline values. This is
indeed true, which can be validated by observing
the scatter plots in Figure 7. Readers may refer to
any elemental statistical learning textbook, such as
James et al. (2013) to understand the basics of bias,
variance, and model error. Therefore, these two
metrics (RN_RMSE, and R2 when used together
13
can reveal important and interesting insights about
the performance of EMs to analysts.
(3) MBE and CVRMSE serve as pointless measures
in this case as they do not provide any information to the analyst about the problems in models.
In contrast, using RN_RMSE, and R2 together can
prove to be beneficial for the analyst. For example, consider Phoenix Model 3 in this case. Since
a low RN_RMSE value combined with a negative R2 value possibly suggest a nonlinear relationship between model and baseline, an analyst
may decide to apply a post-processing technique
to the modelled values for improving model performance. For instance, by passing the modelled
values through a second order equation (‘0.0246 ×
X 2 + 0.05323 × X − 0.0006672’), an analyst can
significantly improve the performance of this
model. MBE, CVRMSE, RMSE, RN_RMSE, and
R2 of the improved model is found to be − 0.01
%, 122.837%, 0.059 kWh, 0.44% and 0.99, respectively. The second order equation used above for
post-processing the modelled values is obtained
using the Curve fitting toolbox in MATLAB.6
There exist several post-processing techniques to
improve the performance of resulting models, discussing them is beyond the scope of this research
paper.
6. Conclusion
Testing the predictive performance (also called validation)
of EMs is an important step, which requires as much attention as simulation programs, algorithms, and model development. The widely used statistical metrics for validating
calibrated EMs were proposed a few decades ago from
the perspective of whole-building measurement and verification of energy conservation measures. A recent shift in
trend towards system-level energy modelling demands better and more reliable statistical metrics to test the predictive
performance of resulting models. This paper demonstrates
that metrics such as RMSE, CVRMSE, and MBE are not
as reliable as other measures, especially for validation of
system-level EMs. CVRMSE and MBE cannot normalize
additive differences between datasets. Also, MBE suffers from the cancellation of positive and negative error,
and hence it cannot be used to evaluate the total variance in error of resulting models. RMSE is not an unitless/dimensionless metric, and thus it cannot be used to
compare the predictive performance of EMs based on
different datasets.
Ideally, a statistical performance metric should be scale
and unit invariant; should not be prone to overestimation
or underestimation of the performance of EMs; should
be universally applicable for all types of EMs. Unfortunately, as illustrated in the paper, none of the widely
used statistical metrics satisfy all these criteria. Therefore,
D. Chakraborty and H. Elzarka
Downloaded by [UAE University] at 16:00 25 October 2017
14
Figure 7. Heating energy consumption for medium office buildings located in Fairbanks, Phoenix and San Francisco. Note: 1 minute
time step – Baseline. 60 min (1 hour) time step – Model 1. 30 min time step – Model 2. 15 min time step – Model 3. The axes are different
for different cities. (a) Fairbanks heating, (b) Phoenix heating and (c) San Francisco heating.
an alternative metric named range normalized root mean
squared error (RN_RMSE) is proposed in this paper. It is
shown that RN_RMSE can successfully normalize multiplicative as well as additive differences between datasets.
Also, RN_RMSE do not suffer from overestimation and
underestimation problems as the denominator, which is a
measure of the variability of the data, is proportional to the
numerator, which is a measure of variability of the residuals. Finally, it is suggested that R2 is used along with
RN_RMSE to identify any existing nonlinear relationship
between the modelled results and baseline data.
7.
Future work
It should be pointed out that data from real buildings often
suffer from measurement-related inaccuracies that may
harm the accuracy of the models. Therefore, data preprocessing is an essential step that must be carried out before
energy data analysis and modelling. In this research, the
necessity for data preprocessing was eliminated by using
synthetic data as a surrogate for sensor data from real
buildings. Future research may include evaluating the adequacy of the proposed metric using measured data from
real buildings.
Downloaded by [UAE University] at 16:00 25 October 2017
Journal of Building Performance Simulation
The notion of having an absolute cut-off criterion for
statistical metrics is baseless because case-specific energy
modelling tasks are unique. In some cases, the energy consumption may be important whereas, in others, peak load
or time of occurrence may be more critical. Also, cases
may vary for different building or system types. Therefore,
it is necessary to group cut-off criteria based on similarity
of cases. Future research may also include (1) synthesizing
larger and case specific datasets. (2) Using those datasets
to explore the range of values that the proposed statistical
metrics can acquire when different algorithms are applied;
this will allow us to evaluate the range of metric values that
correspond to good EMs developed using superior algorithms. (3) Develop a way to determine cut-off criteria that
must be met subject to specific applications, such as M&V,
predictive analysis and fault detection.
In this paper, we have only considered system-level
energy consumption types. According to our preliminary
analysis, whole building energy consumption profiles also
vary additively from one another. Therefore, the proposed alternative metrics are also likely to be more useful
for testing the predictive performance of whole-building
energy models. Future research may include further analysis regarding this.
Notes
1. https://energy.gov/eere/buildings/about-building-energymodeling
2. https://www.energycodes.gov/development/commercial/
prototype_models
3. https://www.mathworks.com/help/curvefit/least-squares-fit
ting.html
4. https://energyplus.net
5. http://bigladdersoftware.com/epx/docs/8-0/input-outputreference/page-006.html
6. https://www.mathworks.com/help/curvefit/curve-fitting.html
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Debaditya Chakraborty
9440
http://orcid.org/0000-0002-2165-
References
Al-Homoud, M. S. 2001. “Computer-Aided Building Energy
Analysis Techniques.” Building and Environment 36 (4):
421–433.
ASHRAE. 2013. Fundamentals Handbook. IP Edition. Atlanta,
GA: ASHRAE.
ASHRAE. 2014a. “Guideline 14-2014, Measurement of Energy,
Demand and Water Savings.” American Society of Heating,
Ventilating, and Air Conditioning Engineers, Atlanta, GA.
ASHRAE. 2014b. “Standard 140-2014: Standard Method of Test
for the Evaluation of Building Energy Analysis Computer
Programs.” ASHRAE, Atlanta.
Aydinalp-Koksal, M., and V. I. Ugursal. 2008. “Comparison
of Neural Network, Conditional Demand Analysis, and
15
Engineering Approaches for Modeling End-Use Energy
Consumption in the Residential Sector.” Applied Energy 85
(4): 271–296.
Aydinalp-Koksal, M., V. I. Ugursal, and A. S. Fung. 2002.
“Modeling of the Appliance, Lighting, and Space-Cooling
Energy Consumptions in the Residential Sector using Neural
Networks.” Applied Energy 71 (2): 87–110.
Ben-Nakhi, A. E., and M. A. Mahmoud. 2004. “Cooling Load
Prediction for Buildings using General Regression Neural
Networks.” Energy Conversion and Management 45 (13):
2127–2141.
Bloomfield, D. P. 1999. “An Overview of Validation Methods for Energy and Environmental Software.” ASHRAE
Transactions 105: 685.
Clarke, J., J. Cockroft, S. Conner, J. Hand, N. Kelly, R. Moore, T.
O’Brien, and P. Strachan. 2002. “Simulation-Assisted Control in Building Energy Management Systems.” Energy and
Buildings 34 (9): 933–940.
Crawley, D. B., L. K. Lawrie, F. C. Winkelmann, W. F. Buhl,
Y. J. Huang, C. O. Pedersen, R. K. Strand, et al. 2001.
“Energyplus: Creating a New-Generation Building Energy
Simulation Program.” Energy and Buildings 33 (4): 319–
331.
Deb, C., L. S. Eang, J. Yang, and M. Santamouris. 2016.
“Forecasting Diurnal Cooling Energy Load for Institutional
Buildings using Artificial Neural Networks.” Energy and
Buildings 121: 284–297.
Dhar, A., T. A. Reddy, and D. Claridge. 1999. “A Fourier
Series Model to Predict Hourly Heating and Cooling Energy
Use in Commercial Buildings with Outdoor Temperature
as the Only Weather Variable.” Journal of Solar Energy
Engineering 121 (1): 47–53.
Dodge, Y. 2003. The Oxford Dictionary of Statistical Terms.
Oxford: Oxford University Press on Demand.
Dong, B., C. Cao, and S. E. Lee. 2005. “Applying Support
Vector Machines to Predict Building Energy Consumption
in Tropical Region.” Energy and Buildings 37 (5): 545–
553.
EnergyPlus. 2017. “The Encyclopedic Reference to Energyplus
Input and Output.” https://energyplus.net/sites/default/files/
pdfs/pdfs_v8.3.0/InputOutputReference.pdf.
Fumo, N., P. Mago, and R. Luck. 2010. “Methodology to
Estimate Building Energy Consumption using Energyplus
Benchmark Models.” Energy and Buildings 42 (12): 2331–
2337.
Garrett, A., and J. New. 2016. “Suitability of Ashrae Guideline
14 Metrics for Calibration.” ASHRAE Transactions 122 (1):
469–477.
Gestwick, M. J., and J. A. Love. 2014. “Trial Application of
Ashrae 1051-rp: Calibration Method for Building Energy
Simulation.” Journal of Building Performance Simulation 7
(5): 346–359.
Granderson, J., and P. N. Price. 2014. “Development and Application of a Statistical Methodology to Evaluate the Predictive
Accuracy of Building Energy Baseline Models.” Energy 66:
981–990.
Jacob, D., S. Dietz, S. Komhard, C. Neumann, and S. Herkel.
2010. “Black-Box Models for Fault Detection and Performance Monitoring of Buildings.” Journal of Building
Performance Simulation 3 (1): 53–62.
James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013. An
Introduction to Statistical Learning, Vol. 112. New York:
Springer.
Judkoff, R., D. Wortman, B. O’doherty, and J. Burch.
1983. “A Methodology for Validating Building Energy
AnalysisSimulations.” Tech. Rep., Report TR-254-1508,
Solar Energy Research Institute.
Downloaded by [UAE University] at 16:00 25 October 2017
16
D. Chakraborty and H. Elzarka
Kandil, A.-E., and J. A. Love. 2014. “Signature Analysis Calibration of a School Energy Model Using Hourly Data.” Journal
of Building Performance Simulation 7 (5): 326–345.
Ke, M.-T., C.-H. Yeh, and J.-T. Jian. 2013. “Analysis of Building Energy Consumption Parameters and Energy Savings
Measurement and Verification by Applying Equest Software.” Energy and Buildings 61: 100–107.
Keim, D. A., M. C. Hao, U. Dayal, H. Janetzko, and P. Bak. 2010.
“Generalized Scatter Plots.” Information Visualization 9 (4):
301–311.
Kwok, S. S., R. K. Yuen, and E. W. Lee. 2011. “An intelligent Approach to Assessing the Effect of Building Occupancy on Building Cooling Load Prediction.” Building and
Environment 46 (8): 1681–1690.
Lam, J. C., K. K. Wan, S. Wong, and T. N. Lam. 2010. “Principal
Component Analysis and Long-Term Building Energy Simulation Correlation.” Energy Conversion and Management
51 (1): 135–139.
Lee, W.-Y., J. M. House, and N.-H. Kyong. 2004. “Subsystem Level Fault Diagnosis of a Building’s Air-Handling
Unit using General Regression Neural Networks.” Applied
Energy 77 (2): 153–170.
Pan, Y., Z. Huang, and G. Wu. 2007. “Calibrated Building Energy
Simulation and Its Application in a High-Rise Commercial
Building in Shanghai.” Energy and Buildings 39 (6): 651–
657.
Piñeiro, G., S. Perelman, J. P. Guerschman, and J. M. Paruelo.
2008. “How to Evaluate Models: Observed vs. Predicted
or Predicted vs. Observed?.” Ecological Modelling 216 (3):
316–322.
Reddy, T. A. 2006. “Literature Review on Calibration of Building
Energy Simulation Programs: Uses, Problems, Procedures,
Uncertainty, and Tools.” ASHRAE Transactions 112 (1):
226–240.
Reddy, T. A., and D. E. Claridge. 2000. “Uncertainty of “Measured” Energy Savings from Statistical Baseline Models.”
HVAC&R Research 6 (1): 3–20.
Salkind, N. J. 2007. Encyclopedia of Measurement and Statistics,
Vol. 2. Thousand Oaks, CA: Sage.
Torcellini, P., M. Deru, B. Griffith, K. Benne, M. Halverson,
D. Winiarski, and D. Crawley. 2008. “DOE Commercial Building Benchmark Models.” ACEEE 2008 Summer
Study on Energy Efficiency in Buildings. NREL Conference Paper NREL/CP-550-43291, 17–22. http://www.nrel.
gov/docs/fy08osti/43291.pdf.
Wetter, M.. 2011. “A View on Future Building System Modeling and Simulation.” In Building Performance Simulation
for Design and Operation, edited by J. L. Hensen and R.
Lamberts, 481–503. London: Routledge.
Zhang, Y., Z. O’Neill, B. Dong, and G. Augenbroe. 2015. “Comparisons of Inverse Modeling Approaches for Predicting
Building Energy Performance.” Building and Environment
86: 177–190.
Zhao, H.-X., and F. Magoulès. 2012. “A Review on the Prediction of Building Energy Consumption.” Renewable and
Sustainable Energy Reviews 16 (6): 3586–3592.
Документ
Категория
Без категории
Просмотров
0
Размер файла
2 378 Кб
Теги
1387607, 19401493, 2017
1/--страниц
Пожаловаться на содержимое документа