вход по аккаунту


978-3-319-66379-1 16

код для вставкиСкачать
Wind Power Production Forecasting Using
Ant Colony Optimization and Extreme
Learning Machines
Maria Carrillo, Javier Del Ser, Miren Nekane Bilbao, Cristina Perfecto
and David Camacho
Abstract Nowadays the energy generation strategy of almost every nation around
the world relies on a strong contribution from renewable energy sources. In certain
countries the relevance taken by wind energy is particularly high within its national
production share, mainly due to its large-scale wind flow patterns. This noted potentiality of wind energy has so far attracted public and private funds to support the
development of advanced wind energy technologies. However, the proliferation of
wind farms makes it challenging to achieve a proper electricity balance of the grid,
a problem that becomes further involved due to the fluctuations of wind generation
that occur at different time scales. Therefore, acquiring a predictive insight on the
variability of this renewable energy source becomes essential in order to optimally
inject the produced wind energy into the electricity grid. To this end the present
work elaborates on a hybrid predictive model for wind power production forecasting
based on meteorological data collected at different locations over the area where a
wind farm is located. The proposed method hybridizes Extreme Learning Machines
with a feature selection wrapper that models the discovery of the optimum subset of
predictors as a metric-based search for the optimum path through a solution graph
efficiently tackled via Ant Colony Optimization. Results obtained by our approach
for two real wind farms in Zamora and Galicia (Spain) are presented and discussed,
from which we conclude that the proposed hybrid model is able to efficiently reduce
the number of input features and enhance the overall model performance.
M. Carrillo ⋅ J. Del Ser (✉) ⋅ M. Nekane Bilbao ⋅ C. Perfecto
University of the Basque Country UPV/EHU,48013 Bilbao, Spain
M. Carrillo
J. Del Ser
TECNALIA, 48160 Derio, Spain
J. Del Ser
Basque Center for Applied Mathematics (BCAM), 48009 Bilbao, Spain
D. Camacho
Universidad Autónoma de Madrid, 28049 Madrid, Spain
© Springer International Publishing AG 2018
M. Ivanović et al. (eds.), Intelligent Distributed Computing XI,
Studies in Computational Intelligence 737,
M. Carrillo et al.
Keywords Wind production ⋅ Supervised learning ⋅ Feature selection ⋅ Ant Colony
Optimization ⋅ Extreme Learning Machines
1 Introduction
The pace at which the energy grid is becoming fully digital has accelerated in the
last few years as a result of the advent and massive deployment of ICT-powered
infrastructure over this large-scale network [1]. Such a progressive digitalization of
the grid finds its roots not only in the need for a more fine-grained supervision of
the energy delivery process along its lines, but also in the technical advantages of
ICT technologies allowing for bidirectional information flows from the grid to the
operator, supervisor or customer, e.g. demand side management, fraud detection or
improved energy efficiency in buildings [2].
A particular byproduct of the aforementioned digitalization is the fact that technicians supervising and managing different levels of the grid are provided with a rich
data substrate from which to infer valuable, timely insights on the current status and
operation of the grid. Examples abound in this matter: following the above cases,
by properly sampling and transmitting information about the energy consumption of
end customers—by means of smart meters—operators can trigger actions to match
the overall generation to consumption along time or detect abnormal patterns in the
consumed energy traces that could be symptomatic of a non technical loss (e.g. tampering). In the context of renewable energy sources profitable benefits also arise from
the digitalization of the equipment required to collect and convey the captured energy
flow, with a strong emphasis on the crucial role taken by this technology when the
focus is placed on the maximization of the installation productivity or the pattern
characterization of the produced energy towards its injection upstream.
The wind energy sector has particularly leveraged the plethora of technical advantages and possibilities unchained by the digitalization of equipments over the grid
[3]. A significant share of such advantages rely on the application of predictive models for the estimation of the produced energy within a certain time horizon. Such a
prediction can be inferred not only from the past produced energy up to the time
when the prediction is made, but also from other parameters that impact on the wind
flow patterns of the geographical area where the wind is located and, ultimately, on
its predicted generation. This close link between wind dynamics and the energy produced by a wind farm has hitherto steered research efforts towards the prediction
of wind-related physical characteristics, on the assumption that the generated power
can be estimated therefrom.
From the technical point of view a real plethora of predictive models have
been applied to this problem, from naïve Machine Learning approaches such as
Neural Networks [4–6], Support Vector Machines [7, 8] or Decision Tree Regressors [9] to more elaborated schemes such as model ensembles [10] or Deep Learners
[11, 12]. Among them a research grain that has lately gained momentum focuses
on the hybridization of nature-inspired heuristics and machine learning models as a
Wind Power Production Forecasting Using Ant Colony Optimization . . .
computationally efficient workaround to deal with the usually high dimensionality of
datasets processed in this application scenario. To cite a few, evolutionary solvers and
nature-inspired heuristics have been often utilized as efficient wrappers to configure
the underlying predictive model [13, 14] and/or select a subset of features [15–17]
under a maximal generalization performance criterion. Comprehensive surveys can
be found in [18–21].
This work joins the latter research trend by exploring the practical performance
of a hybrid wind power generation forecasting model based on Extreme Learning
Machines (ELM), a low-complexity variant of neural networks characterized by a
fast training process [22]. The novel ingredient with respect to the state of the art in
this topic is the conception of the feature selection process as a search for the optimal
path through a graph, which is algorithmically tackled by Ant Colony Optimization
(ACO, [23]), a bioinspired solver that has been applied in many graph-related problems such as scheduling [24, 25] and network analysis [26]. If the input features to
the ELM model are conceived as nodes of a fully connected graph, a colony of ants
can be used to find a good path through this feature space efficiently by virtue of the
collaborative behavior of this multi-agent solver. Results obtained for two different
wind farms located in Spain will be discussed, from where we will conclude that the
proposed hybrid scheme excels at constructing datasets of reduced dimensionality
and improved generalization performance.
The rest of the manuscript is structured as follows: the notation used throughout the paper and a formal statement of the feature selection problem on which this
research gravitates is given in Sect. 2, whereas Sect. 3 delves into the proposed hybrid
model, stressing on how ELM and ACO are combined in a single algorithmic flow.
Results are presented and discussed in Sect. 4, and finally Sect. 5 concludes the paper
and outlines future lines of related research.
2 Notation and Problem Formulation
Following the schematic diagram in Fig. 1 we assume a wind farm comprising M
wind turbines, producing a total instantaneous power Pt [W] at time t which we aim
to predict at time t − t , with t denoting the prediction horizon. We further consider
that V meteorological variables of interest (e.g. wind speed modulus, wind direction,
U/V components and temperature) for the target variable are obtained over the geographical location where wind turbines are located by a numerical weather prediction
(NWP) model. Let t denote the vector of meteorological variables obtained for
position  ∈ {1 , … , P }, where i ∈ ℝ2 denotes the geographical coordinates (lat◊,
itude/longitude) of point i and |t | = V. Therefore, the entire set of meteorological
features registered over all P locations at time t will be denoted as
t ≐ {t
, t
, … , t
M. Carrillo et al.
Fig. 1 Diagram of the system model addressed in this work with M = 3 turbines, V = 3 meteorological variables and t = X = 3 time steps
comprising a total of PV variables. In order to properly capture short-term time correlations that could prevail beneath the data we will extend the above partial feature
vector t with a X -sized window of both produced power values and meteorological variables before the instant at hand, namely,
t ≐ {t , t−1 , … , t− , Pt , Pt−1 , … , Pt−X }.
which is used as an input to a predictive model M (⋅) controlled by parameters 
used to predict Pt+t as Pt+t = M (t ). To this end the model is trained over a set
of supervised training examples {(t , Pt+t )}t∈trn and evaluated in terms of generalization performance over a test set {t }t∈tst . Here, trn and tst denote the time
instants corresponding to supervised training instances and unsupervised test samples, respectively.
Following the same rationale as in prior work on feature selection in wind prediction, by filtering out irrelevant or redundant features not only the model learning process is less time consuming, but also the model itself becomes less prone to
overfitting (lower variance) as per a more compact learned knowledge. This is specially important when dealing with supervised learning problems with a relatively
high number of features. In this context, wind power prediction problems based on
multi-parametric, multi-site meteorological variables with an additional window to
account for autoregressive components undoubtedly calls for the adoption of feature
selection schemes: the total number of features contained in t is |t | = P(V + 1)x
which for a minimal setup of e.g. P = 20 points, V = 8 variables, and a window of
x = 3 time instants, amounts up to |t | = 540 input features.
Having said this, feature selection can be formulated as an optimization problem
guided by a fitness metric that reflects the predictive performance of the model when
processing any given subset of features  ′ ⊆ , with  denoting the set of all variables included in the feature vector t as per (2). Such a metric should quantify the
generalization performance of the model when processing unseen data instances. To
this end, cross-validation (CV) methods help estimating how the predictive model
Wind Power Production Forecasting Using Ant Colony Optimization . . .
will generalize to an independent test set. In its simplest form (k-fold CV), the training set is divided into k disjoint subsets, and the estimation of the generalization of
the model is achieved by averaging the partial performance scores attained by the
model when trained over k − 1 folds and tested with the remaining one.
the time instants corresponding to the k-th
Mathematically, if we denote as trn
∪ trn
∪ … ∪ trn
= trn ), a model can
fold in which the training set is split (i.e. trn
k , whose output M ({ }
k ) can be compared
be learned from {(t , Pt+t )}t∈trn −Ttrn

t t∈trn
to {Pt+t }t∈trnk to yield a score (k) ∈ ℝ+ associated to fold k. By averaging such
scores over k an estimation of the expected performance 
̂ of the model when facing
̂ depends on the
the test dataset {t }t∈tst can be obtained. It is important to note that 
candidate feature subset  ′ , as it determines the dimensions of the datasets involved
in the model construction. For coherence through subsequent formulation we will
explicitly indicate this dependence in the performance scores (k) and 
̂ as (k, )
̂ (), respectively.
With this notation in mind, the optimization problem considered in this manuscript aims at finding an optimal feature subset such that the expected generalization
performance of a model M is maximized, i.e.

̂ ( ′ ) ≐
 ⊆
(k,  ′ ),
K k=1
where the search is done over the space of all possible combinations of features. Eval∑ (||)
uating exhaustively all such possibilities for  ′ would require a total of ||
x=1 x
cross-validation processes with the model M (⋅) and dataset at hand. The exponential
complexity of this search space motivates the adoption of heuristic wrappers capable of exploring it efficiently. This is indeed the rationale of the hybrid ACO-ELM
model explained in the following section.
3 Proposed Hybrid Scheme
As described in Algorithm 1, the proposed predictive model leverages the low complexity featured by ELMs and the efficient search over graphs provided by ACO.
The main idea is to model the search space as a fully connected graph  = { , },
where each node u ∈  represents a feature in .
When sent through this graph, ants construct the subset of features proposed as
a candidate solution by starting at a randomly selected node and moving along the
edges connecting every node to each other. As explained in [23], Ant System (AS),
when an ant finds a food source, it deposits a pheromone on its way back to the nest.
This pheromone trace can be detected by other ants that prefer to follow trails where
more pheromone was deposited. However, pheromones evaporate over time, which
implies that either more ants deposit pheromone on a trail or the pheromone on this
trail disappears.
M. Carrillo et al.
Algorithm 1: Proposed ACO-EML model for wind power forecasting.
: Historical meteorological information t 1 , … , t P at positions
{1 , … , P , test times tst , number of folds K, prediction horizon t ,
feature window size x , ELM model MELM (⋅), number of ACO generations
I, number of ants A.
: Predicted wind power Pt+t of every test instance.
Construct training instances {(t , )}t∈trn as per (1) and (2).
Construct test instances t by proceeding accordingly for t ∈ tst .
Set all edge probabilities of the solution graph to 1∕(|| − 1) (i.e. equally likely).
foreach i = 1 to I do
foreach a = 1 to A do
Deploy ant a in the solution graph  based on the average sum of edges connected
to every node in the graph.
Let ant a move along the graph based on the existing edge probabilities, assigning
the visited nodes to the components of the solution a (i) ⊆ .
Evaluate the quality 
̂ (a (i)) of the path a (i) as the average predictive
performance of MELM (⋅) computed via K-fold cross-validation over
{(t , Pt+t )}t∈trn using the reduced feature subset a (i).
Update edge probabilities pv (i) using Expression (4) with the pheromone v (i)
equal to (a (i)).
Evaporate the quantity of deployed pheromones on edge v as per (5).
Let the path with highest quality 
̂ (a (I)) among all ants denote the optimal feature subset
abest produced by the ACO wrapper.
Learn a model MELM (⋅) from the training set {(t , Pt+t )}t∈trn using abest .
Predict wind power for each test instance as Pt+t = MELM (t ) with t ∈ tst .
This nature-inspired principle is indeed embraced in our proposed approach:
the movement of ants through the solution graph is guided probabilistically by the
pheromone deployed by other ants through their paths, whose intensity is driven by
the quality of the solution found by every ant. The fitness of a given path is computed by computing a K-fold cross-validated measure of predictive performance of
an ELM model MELM (⋅) when learning the features of the training set discriminated
by nodes that compose the path of the ant at hand. When arranged in colonies, a
specific number of artificial ants A build a solution step by step selecting the next
edge considering the quantity of pheromone deposited by the previous ants. If more
pheromone is deposited, the probability is higher that the ant leads to this node. In
this regard, the probability that in step i ∈ {1, … , I} ant a ∈ {1, … , A} goes from
node u ∈  to node w ∈  in the solution graph is given by
pauw (i) = ∑
u (i) + w (i)
w′ ∈ ∕u,w uw′ (i)
where uw (i) is the total quantity of pheromones on the edge connecting nodes (features) u and w at generation i, and  ∕u, w denotes the set of all nodes in the graph
Wind Power Production Forecasting Using Ant Colony Optimization . . .
except u and w. If pheromones become more important at two any given nodes (features), then a promising area of found solutions is deeper explored.
Pheromone evaporation is also included in the model as a form of forgetting the
pheromone deployed on traversed edges if it is not reinforced by new ants passing
along them. At the end of a generation i, that is when all ants have built a solution,
the amount of pheromones on each node is updated as
u (i + 1) =  ⋅ u (i) +
(u ∈ a (i))(a (i)),
where  is the evaporation rate aimed at avoiding the convergence to a local optimal
solution, and (⋅) is an auxiliary indicator function taking value 1 if its argument
is true and 0 otherwise. Every ant also maintains a memory of its visited nodes
so as to avoid loops along its path. Once all generations have been completed, the
path characterized by the highest quality 
̂ (a (I)) among all ants in the colony is
declared as the optimal feature subset, based on which the ELM model is trained
and the predicted wind power for the test set is produced.
To end with the description of the proposed hybrid model, the ELM model used
in this paper is selected due to its fast learning procedure over a similar topological
neural structure to that of multi-layer perceptrons. The most significant characteristic
of the ELM training procedure is that it can be carried out by randomly setting the
weights of the underlying neural network, and then taking the inverse of the hiddenlayer output matrix [22]. This yields an extremely agile learning procedure which
makes this learner very suitable for wrapping-based feature selection problems.
4 Experimental Results and Discussion
In order to assess the performance of the proposed hybrid model several Monte Carlo
experiments have been carried out by using data from two different wind farms in
Spain, namely, Peña Roldana (Zamora, hereafter labeled as ROLDANA) and Faro
Farelo (Galicia, FARELO). The farm corresponding to the ROLDANA dataset comprises M = 22 turbines with a total nominal power of 36,740 KW, whereas FARELO
comprises M = 18 turbines with a total nominal power of 30,060 KW. The collected
data span from January 2013 to October 2015, with a time step between wind power
measurements of 1 h. A NWP model was used to interpolate temperature (V1 ), wind
module (V2 ) and wind U/V components (V3 and V4 ) over a rectangular grid of P = 45
points located in the surroundings of the wind farm. The prediction horizon was set
to t = 1 time steps (i.e. short-time forecasting), while a total of X = 2 past values of every feature were accumulated as potential input predictors, giving rise to
T45 ⋅ 5 ⋅ 2 = 450 possible features per scenario.
The scope of the simulations discussed in this section is to validate the predictive performance gain achieved when using the proposed ACO wrapper with respect
to the case when no feature selection is made. Methodologically speaking several
M. Carrillo et al.
simulation aspects are worth to highlight: to begin with, 20 independent experiments
per every simulated scenario have been run in order to account for the stochasticity
of the ACO algorithm as per (4). Consequently, results must be assessed statistically.
Furthermore, folds in which the training dataset is split towards evaluation (line 8 in
Algorithm 1) are not the same for all produced candidate solutions; otherwise there
is a risk for the overall feature selection process to overfit the model to the specific
distribution of folds computed from the beginning of the algorithm. To end with the
specifications of the simulation benchmark, a colony of A = 10 ants, an EML with
30 hidden neurons and I = 50 iterations have been configured. The measure of predictive performance will be the so-called coefficient of determination or R2 score,
whose best value (i.e. perfect prediction) is 1 while R2 = 0 corresponds to the case
where the model always output the expected value of the target variable.
Figure 2a, b summarize the results obtained by the ELM-ACO model over ROLDANA
and FARELO datasets, respectively. The plots depict the convergence of the cross-
(a) 0.605
R 2 score
Iteration number i
R 2 score
Iteration number i
Fig. 2 In blue, R2 convergence plots of the proposed ACO-ELM model for a ROLDANA b FARELO
datasets. Cross-validated R2 scores of 0.32 (ROLDANA) and 0.302 (FARELO) are obtained by the
ELM model when no feature selection is made
Wind Power Production Forecasting Using Ant Colony Optimization . . .
validated score R2 used as the fitness of the ACO wrapper during its search process. It
is important to see that in both simulated scenarios the feature selection process provides a predictive gain with respect to the case when no feature selection is made (in
the order of 0.3 in the R2 scale), which is of interest due to the 10 features selected on
average by the ACO wrapper for both scenarios (less than 3% of the original feature
5 Concluding Remarks and Future Research Lines
This paper has elaborated on a hybrid predictive model that combines Extreme
Learning Machines and Ant Colony Optimization for wind power production forecasting. Its main design principle is to represent the space of possible input features
to the model as nodes of a solution graph, which is efficiently explored by using
ant colonies guided by a fitness equal to the cross-validated prediction score of the
underlying model. The adoption of Extreme Learning Machines ensures an light
optimization procedure of the overall model due to the renowned low-complexity
training process of this particular class of supervised learners. The performance of
the proposed regression model has been put to practice with real data recorded in
two different wind farms located in Spain characterized by very distinct wind patterns. The performance enhancement obtained by the proposed hybrid approach is
promising, with R2 increases of near 0.3 in terms of R2 with respect to the case where
no feature selection is made.
Future research efforts will be invested towards accelerating the convergence
properties of the ACO wrapper by adding heuristic information to the pheromone
calculation in Expression (4). Among other ideas, we will concentrate on how to
reflect the collinearity between nodes u and v in this expression so as to avoid transitions between nodes (features) when they are strongly correlated to each other.
Furthermore, other swarm heuristics will be also under active investigation as alternative feature selection wrappers.
Acknowledgements This work has been co-funded by the following research projects: EphemeCH
(TIN2014-56494-C4-4-P) by the Spanish Ministry of Economy and Competitivity, CIBERDINE
(S2013/ICE-3095), both under the European Regional Development Fund FEDER, by Airbus
Defence & Space (FUAM-076914 and FUAM-076915), and by the Basque Government under its
ELKARTEK program (KK-2016/00096, BID3ABI project).
1. Farhangi, H.: The path of the smart grid. IEEE Power Energy Mag. 8(1) (2010)
2. Wissner, M.: The smart grid-a saucerful of secrets? Appl. Energy 88(7), 2509–2518 (2011)
3. Murthy, K.S.R., Rahi, O.P.: A comprehensive review of wind resource assessment. Renew.
Sustain. Energy Rev. 72, 1320–1342 (2017)
M. Carrillo et al.
4. Kaur, T., Kumar, S., Segal, R.: Application of artificial neural network for short term wind
speed forecasting. In: International Conference on Power and Energy Systems: Towards Sustainable Energy (PESTSE), pp. 1–5 (2016)
5. Ata, R.: Artificial neural networks applications in wind energy systems: a review. Renew. Sustain. Energy Rev. 49, 534–562 (2015)
6. Alexiadis, M.C., Dokopoulos, P.S., Sahsamanoglou, H.S., Manousaridis, I.M.: Short-term
forecasting of wind speed and related electrical power. Solar Energy 63(1), 61–68 (1998)
7. Mohandes, M.A., Halawani, T.O., Rehman, S., Hussain, A.A.: Support vector machines for
wind speed prediction. Renew. Energy 29(6), 939–947 (2004)
8. Zhao, P., Xia, J., Dai, Y., He, J.: Wind speed prediction using support vector regression. In: 5th
IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 882–886 (2010)
9. Troncoso, A., Salcedo-Sanz, S., Casanova-Mateo, C., Riquelme, J.C., Prieto, L.: Local modelsbased regression trees for very short-term wind speed prediction. Renew. Energy 81, 589–598
10. Heinermann, J., Kramer, O.: Machine learning ensembles for wind power prediction. Renew.
Energy 89, 671–679 (2016)
11. Hu, Q., Zhang, R., Zhou, Y.: Transfer learning for short-term wind speed prediction with deep
neural networks. Renew. Energy 85, 83–95 (2016)
12. Dalto, M., Matuško, J., Vašak, M.: Deep neural networks for ultra-short-term wind forecasting.
In: IEEE International Conference on Industrial Technology (ICIT), pp. 1657–1663 (2015)
13. Salcedo-Sanz, S., Ortiz-Garcia, E.G., Perez-Bellido, A.M., Portilla-Figueras, A., Prieto, L.:
Short term wind speed prediction based on evolutionary support vector regression algorithms.
Expert Syst. Appl. 38(4), 4052–4057 (2011)
14. Liu, D., Niu, D., Wang, H., Fan, L.: Short-term wind speed forecasting using wavelet transform
and support vector machines optimized by genetic algorithm. Renew. Energy 62, 592–597
15. Jursa, R., Rohrig, K.: Short-term wind power forecasting using evolutionary algorithms for
the automated specification of artificial intelligence models. Int. J. Forecast. 24(4), 694–709
16. Salcedo-Sanz, S., Pastor-Sanchez, A., Prieto, L., Blanco-Aguilera, A., Garcia-Herrera, R.: Feature selection in wind speed prediction systems based on a hybrid coral reefs optimization
Extreme learning machine approach. Energy Convers. Manag. 87, 10–18 (2014)
17. Salcedo-Sanz, S., Pastor-Sanchez, A., Del Ser, J., Prieto, L., Geem, Z.W.: A Coral Reefs Optimization algorithm with Harmony Search operators for accurate wind speed prediction. Renew.
Energy 75, 93–101 (2015)
18. Foley, A.M., Leahy, P.G., Marvuglia, A., McKeogh, E.J.: Current methods and advances in
forecasting of wind power generation. Renew. Energy 37(1), 1–8 (2012)
19. Colak, I., Sagiroglu, S., Yesilbudak, M.: Data mining and wind power prediction: A literature
review. Renew. Energy 46, 241–247 (2012)
20. Giebel, G., Brownsword, R., Kariniotakis, G., Denhard, M., Draxl, C.: The state-of-the-art in
short-term prediction of wind power: A literature overview. ANEMOS. plus (2011)
21. Lei, M., Shiyan, L., Chuanwen, J., Hongling, L., Yan, Z.: A review on the forecasting of wind
speed and generated power. Renew. Sustain. Energy Rev. 13(4), 915–920 (2009)
22. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
23. Dorigo, M., Birattari, M., Stutzle, T.: Ant Colony Optimization. IEEE Comput. Intell. Mag.
1(4), 28–39 (2006)
24. Gonzalez-Pardo, A., Camacho, D.: A new csp graph-based representation for ant colony optimization. In: IEEE Congress on Evolutionary Computation, pp. 689–696 (2013)
25. Gonzalez-Pardo, A., Camacho, D.: A new csp graph-based representation to resourceconstrained project scheduling problem. In: IEEE Congress on Evolutionary Computation
(CEC), pp. 344–351 (2014)
26. Gonzalez-Pardo, A., Jung, J.J., Camacho, D.: ACO-based clustering for Ego Network analysis.
Future Gener. Comput. Syst. 66, 160–170 (2017)
Без категории
Размер файла
225 Кб
978, 66379, 319
Пожаловаться на содержимое документа