R studio Machine learningClide
Book Chapters for Ideas/About-the-Editors_2018_Big-Data-Application-in-Power-Systems.pdf
About the Editors
Reza Arghandeh is an Assistant Professor in the ECE Department in Florida
State University. He is director of the Collaborative Intelligent Infrastructure Lab. He has been a postdoctoral scholar at the University of California,
Berkeley’s California Institute for Energy and Environment 2013–15. He has 5 years industrial experience in power and energy systems. He completed his PhD in Electrical Engineering with a specialization in power systems at Virginia
Tech. He holds Master’s degrees in Industrial and System Engineering from
Virginia Tech 2013 and in Energy Systems from the University of Manchester 2008. From 2011 to 2013, he was a power system software designer at Electrical
Distribution Design Inc. in Virginia. Dr. Arghandeh’s research interests include,
but are not limited to, data analysis and decision support for smart grids and smart cities using statistical inference, machine learning, information theory,
and operations research. He is a recipient of the Association of Energy Engineers
(AEE) Scholarship 2012, the UC Davis Green Tech Fellowship 2011, and the best paper award from the ASME 2012 Power Conference and IEEE PESGM
2015. He is the chair of the IEEE Task Force on Big Data Application for Power
Yuxun Zhou is currently a PhD candidate at Department of EECS, UC Berkeley.
Prior to that, he obtained the Diplome d’Ingenieur in applied mathematics
from Ecole Centrale Paris and a BS degree from Xi’an Jiaotong University. Yuxun has published more than 30 refereed articles, and has received several
student awards. His research interest is on machine learning theories and algo-
rithms for modern sensor rich, ubiquitously connected cyber-physical systems, including smart grid, power distribution networks, smart buildings, etc.
- About the Editors
Book Chapters for Ideas/Acknowledgments_2018_Big-Data-Application-in-Power-Systems.pdf
The idea for this book goes back to a few years ago when we were analyzing
smart meters and SCADA data from some Californian electric utilities using dif- ferent machine learning and statistical inferences. Later on, we started to work
on phasormeasurement units (PMU) andmicro-PMU data streams which have
much more resolution than the smart meters. The PMU and power quality recording data (120 Hz to 30 kHz and beyond) plus highly spatial distributed
data from smart meters marked the advent of big data in power systems.
Utilities are already dealing with big data challenges considering the lack of knowledge in workforce and the lack of suitable infrastructure to handle and
process the massive data. We are sure that some of our readers have a similar
experience. On top of that, in the near future every house may have rooftop solar panels, controllable loads, smart appliances, electric vehicles, and various
software-enabled hardware that will be more connected in the era of Internet
This book is a step toward data-driven utilities by presenting a combination of
the high-level view on utility enterprise architecture, data analysis methodol-
ogy, and various applications of data analytics in power transmission and distribution networks.
We have been lucky enough to have great maestros in our lives. Our parents Ali & Soodabeh Arghandeh and Yanping & Suxue Zhou, our advisers Prof. Robert
Broadwater and Prof. Saifur Rahman at Virginia Tech and Prof. Costas Spanos
and Prof. Alexandra von Meier at UC Berkeley.
In this book, we have a collection of highly recognized experts in academia
and industry in the field of power systems and data analysis from all around the world. We would like to thank them all for their outstanding contributions.
We would like to thank Dr. Heather Paudler for her valuable input on the
book. We extend special thanks to Renata R. Rodrigues and Ana C. A. Garcia from the Elsevier editorial team for their countless help and advice during
the different stages of preparation for this book. We also appreciate Honoka
Hamano’s efforts in designing the book cover, icons for each section, and
various other creative graphics inside the book.
Finally, we would like to thank several reviewers for valuable comments on
preliminary drafts of this book: Jeffrey S. Katz, Ricardo Bessa, JohnD.McDonald, Carol L. Stimmel, Mohammad Babakmehr, Elena Mocanu, Madeleine Gibescu,
Mehrdad Majidi, Gian Antonio Susto, Deepjyoti Deka, Fabio Rinaldi, Feng Gao,
Han Zou, Ming Jin, Ruoxi Jia, Yingchen Zhang, Behzad Najafi, Amin Hassanzadeh, Mihye Ahn, Hanif Livani, Matthias Stifter, Saverio Bolognani,
Michael Chertkov, Amirhessam Tahmassebi, Madhavi Konila Sriram, Roy Dong,
and Jose Cordova.
We look forward to hearing from our readership; please contact us with any
comments, suggestions, and questions.
Reza Arghandeh Florida State University, Tallahassee, FL, United States
Yuxun Zhou University of California, Berkeley, CA, United States
Book Chapters for Ideas/Chapter-10---Future-Trends-for-Big-Data-Applic_2018_Big-Data-Application-in-.pdf
Future Trends for Big Data Application in Power Systems
Ricardo J. Bessa INESC Technology and Science—INESC TEC, Porto, Portugal
The technological revolution in the electric power system sector is producing large volumes of data
with pertinent impact in the business and functional processes of systemoperators, generation com- panies, and grid users. Big data techniques can be applied to state estimation, forecasting, and con-
trol problems, as well as to support the participation of market agents in the electricity market. This
chapter presents a revision of the application of data mining techniques to these problems. Trends
like feature extraction/reduction and distributed learning are identified and discussed. The knowl- edge extracted from power system andmarket data has a significant impact in key performance indi-
cators, like operational efficiency (e.g., operating expenses), investment deferral, and quality of
supply. Furthermore, business models related to big data processing and mining are emerging and boosting new energy services.
The advent of Smart Grids with advances in information and communication technologies (ICT) and installation of new measurement devices, such as pha-
sor measurement unit (PMU) and remote terminal unit (RTU) in secondary
substations (MV/LV), allied to additional information collected by SCADA, will generate a large volume of data streams.
Equipment installed in MV/LV substations collects imported/exported active power, voltage magnitude, and reactive power in four quadrants, and a distri-
bution system operator (DSO) can easily operate more than 10,000 secondary
substations. In HV/MV substations, which can be more than 1000 in one DSO, additional data is collected through the SCADA, such as current, active and
reactive power flow in the network feeders, switcher and capacitor banks status,
as well as variables related to electric transformers (e.g., input/output voltage temperature, tap changer position, transformer oil level, insulation level of
transformer oil, load). This high volume of grid data has different constraints
in terms of communications’ latency and availability. For instance, significant technical and economic constraints are expected in the real-time communica-
tion between smart meters and secondary substation, which requires new
Big Data Application in Power Systems. https://doi.org/10.1016/B978-0-12-811968-6.00010-3
Copyright © 2018 Elsevier Inc. All rights reserved.
224 CHAPTER 10: Future Trends for Big Data Application in Power Systems
approaches for the real-time monitoring of low voltage (LV) networks. More-
over, the time resolution collected by different equipment differs, PMU collects high-frequency data, while RTU, in general, collects low-frequency data (e.g.,
PMU can provide high-update rate data to a transmission systemoperator (TSO).
For instance, the Texas Synchrophasor Network collects 30 measurements per
second from each PMU (e.g., voltage/current magnitude and phase, frequency), which means 108,000 lines of comma-separated data per hour and 2.6 million
lines for a 24 h’ period; for 15 PMUs, file storage is about 1 GB per day .
This data, collected at different voltage levels, is essential to revisit classical TSO and DSO grid management functions, such as forecasting, state estimation,
operational planning, and develop new tools to increase real-time awareness
of operators and design predictive maintenance strategies for network components.
The renewable energy sources (RES) industry is also installing and operating monitoring sensors at the wind turbine and photovoltaic panel level, which
generates a large volume of data that needs to be preprocessed and analyzed
in realtime and transferred to upstream decision centers. For instance, a 2.5 MWwind turbine hasmore than 120 sensors inside the rotor, the generator,
and on the blades, which gather 10,000 of data points every second. They feed
the information to a remote database, which stores 4 TB from 25,000 turbines around the world.1 The same is valid for gas turbine engine that generates
520 GB per day, in contrast to Twitter where a day of real-time feeds represents
around 80 GB.2 This data can be used for reliability and performance monitor- ing, predictive maintenance, and asset management of conventional and RES
power plants. Eventually, the outcome of the data analysis at the power plant
level can feed power system reliability assessment tools , by providing, for instance, data-driven time-varying failure rates.
In addition to all these electrical and mechanical variables, there are also exog- enous variables with significant impact on the power system and power plants
operation and planning, such as measured and predicted weather variables
(e.g., wind speed, temperature, and solar irradiance) that can form a grid of spatial-temporal weather information in a region and/or country.
Electricity markets are already generating large volumes of data like offers
curves (per unit) in different sessions, energy and ancillary services prices, as
1 Source: http://www.gereports.com/post/118712460090/move-over-slow-food-slow-wind-might-be-
the-latest/ (accessed on October 2016). 2 Source: http://www.computerweekly.com/news/2240176248/GE-uses-big-data-to-power-machine-
services-business (accessed on October 2016).
2252 Transmission System
well as locational marginal prices (LMP) for each node of the transmission net-
work. The foreseen creation of flexibility markets at the distribution level will increase the volume of data and its spatial scale. The planned investment in
interconnection capacity between different control areas, and the increase inte-
gration of RES in power systems with LMP,makes spatial-temporalmodeling of large-scale time series vital for operational and planning purposes. Therefore,
knowledge extraction from big data can create additional value for both market
players and system operators.
All these problems require different layers of data handling: (i) data acquisition
and transmission; (ii) data management (e.g., frameworks like Hadoop or Spark); (iii) data analytics, which can comprise knowledge extraction from data,
optimization, and decision-aid methods. The first two layers already achieved a
high-technology readiness level, with different solutions available in the market [3,4]. However, standardization of the data model, ICT for real-time data trans-
mission, and cybersecurity issues remain areas of significant improvement.
The scope of this chapter is the big data analytics layers and the overall objective
is to discuss the main challenges related to knowledge extraction in different
power system-related problems and cover new (and evolving) problems, such as distributed learning and optimization, spatial-temporal modeling of time
series, data reduction, assimilation, and visualization methods. The entire elec-
tric power system is covered, going from Extra HV to LV, without overlooking the wholesale and retailing electricity market.
This chapter is organized as follows: Section 2 describes the data-driven tech-
niques for dynamic and steady-state analysis of transmission systems, as well as the interaction between transmission and distribution system operators;
in Section 3, the additional monitoring and control capabilities provided by
advanced data mining techniques are discussed in a Smart Grids context; Section 4 discusses the knowledge extraction from failure data to support asset
management strategies of system operators and generation companies; the
added value of big data techniques for electricity market bidding and simula- tion is discussed in Section 5, while Section 6 discusses its application to boost
demand-side flexibility. The conclusions are presented in Section 7.
2 TRANSMISSION SYSTEM
At the transmission system level, the increasing penetration of RES is demand- ing for new monitoring and management tools for both interconnected and
isolated systems. A new generation of decision-aid tools will supply the oper-
ator with valuable information to check the security level of the economic dis- patch and/or electricity market-clearing, considering RES variability and
226 CHAPTER 10: Future Trends for Big Data Application in Power Systems
uncertainty, as well as to increase the real-time awareness and derive recom-
mendations to support preventive decisions.
2.1 Dynamic Behavior Analysis
The installation of PMU in different voltage levels generates important infor-
mation to warn operators and system level controllers about impending tran- sient stability issues, support their preventive decisions, and perform
postmortem analysis. The California independent system operator (CAISO)
defined use cases that describe the inclusion of PMU data for grid operations, control and modeling tasks . The use cases identified seven scenarios to
demonstrate the value of PMU data:
1. The PMU network triggers an alarm (e.g., rate of frequency change,
modes of oscillation, rate of damping) for a recommendation system that
generates a set of control actions for the operator. 2. Measure the frequency difference between main and isolated grids for
system restoration after a disturbance and determine how much
generation must be changed to reconnect the separated grids. 3. Postmortem analysis of system events to understand the causes of
disturbance, which is used to validate offline dynamic models and contingency simulation tools.
4. Validation of gridcode and market models for new types of resources,
such as RES and storage. 5. Detect transient instability and derive preventive control actions that can
respond to specific or wind-area grid problems, e.g., angular and voltage
stability, low-frequency oscillations. 6. Identify poorly damped interarea oscillations and design smart control
actions to mitigate the oscillations, e.g., use PMU to tune power system
stabilizers. 7. Increase the line ratingof transmission lines in realtime. ThePMUdata can
detect postcontingency technical problems and activate the preventive
control actions from scenario (5) to mitigate in realtime the violations by reconfiguring the system (e.g., increase generation or decrease load).
The electric power research institute (EPRI) identifies the following applica- tions for PMU data : (i) improvement of state estimation; (ii) oscillation
detection and control; (iii) voltage stability monitoring and control;
(iv) load model validation; (v) system restoration and event analysis.
It should be stressed that the use of PMU data demands for a portfolio of dif-
ferent tools at the control center level, which corresponds to the enhancement of classical functions and to the development of new functions. Examples of
related tools are the state estimator, voltage stability analysis, volt/Var control,
and RES dispatch. A PMU network combined with decision trees can be used to
2272 Transmission System
match the generator trips signature with the overall system dynamic, aiming at
finding the most likely location of an event in realtime . The data processing and machine learning fitting were performed offline and in a controlled envi-
ronment since the training consisted of 53 events that match known generator
trips. An industrialization of this solution would require machine learning algorithms for classification problems able to cope with high-speed data
streams and detect concept drift .
Other potential applications are: line trip detection that requires postprocessing
methods, such as a low-pass filter to remove high-frequency noise and a second
one to get the trend of frequency data ; online prediction of transient stability (i.e., three phase faults at different buses) with decision tree algorithm
in order to derive corrective control rules .
The seemly integration of PMU in power system operational tools will require a data analytics platform that integrates batch, real-time, and iterative data
processing. Apache Spark is emerging as the cluster computing platform for future power systems . The trend is toward distributed computing for data
collection and analytics. However, there is the need to develop algorithms that
are parallelizable to distribute the computational load across multiple nodes .
Furthermore, this efficient computational framework does not waive the appli-
cation of data reduction and compression techniques, which should be flexible to the different operating conditions, e.g., compress less data under disturbance
conditions . Classical techniques, such as principal component analysis
and discrete wavelet transform, can be extended to this problem to have time-varying (potentially combined with change detection) and situational-
dependent characteristics. Clustering algorithms can be also used to group
the dynamic response of generators (i.e., transient responses of generator rotor angles) and use a classification algorithm to forecast the dynamic signature of a
system using a dataset of postdisturbance responses .
Failure in communication creates missing values in the power system dynamic
response. The state of the art consists in using the linear auto-regressive with
exogenous input model to estimate system dynamics, together with an input location selection methodology based on a coherency function . The spatial
and temporal dependencies between the system variables can be further
exploited with the different families of covariance functions associated to Gauss- ian processes theory and improve the missing values estimation tasks .
Machine learning algorithms can be also used to give a real-time quantitative security evaluation of the current operating state system (i.e., expected fre-
quency deviation) based on historical states and observations of the power sys-
tem variables . This research line was further explored in microgrids and isolated systems .
228 CHAPTER 10: Future Trends for Big Data Application in Power Systems
2.2 Steady-State Analysis
The tools for steady-state analysis of power systems, such as power flow and state estimation algorithms, reached a high-technological readiness level and several
commercial solutions are already available. The current challenge is to integrate
new and diverse types of information in these classical algorithms, capture the spatial-temporal structure of variables dependency, while guaranteeing a high
Past development in state estimation algorithms already included information
from load forecasts to predict the future states of the power systems. For
instance, modeling the dependency between nodal injections forecast errors with a covariance matrix [19,20]. The load forecast and state estimation theo-
ries can be merged to forecast the future values of the power system state var-
iable (bus voltage magnitude and phase) and then calculate the load values as a function of the state parameters . This new load forecast paradigm enables
the use of additional data, such as voltage phase from PMU or electrical vari-
ables collected from multiarea networks, and the construction of local forecast models for different subnetworks.
However, the modeling of spatial-temporal dependencies is indispensable and requires a method suitable for a large-scale implementation. Gaussian copulas
can be employed to model the spatial-temporal dependency structure between
random variables , but have two limitations: (i) lack of flexibility in model- ing different types of tail’s dependency; (ii) low scalability when the number of
random variables increases.
The effect of RES and load uncertainty (and variability) in state estimation, together with frequent topological changes, leads to significant state shift in
power system operation. This problem can be mitigated by developing data-
driven solutions, instead of using single data point (last state estimation). Kernel ridge regression with a Bayesian framework that uses historical data
collected by the energy management system can tackle this problem .
Another relevant trend is the use of distributed learning approaches for robust
state estimation that results in minimum data exchanges between neighboring
areas , mitigates privacy issues, and can run locally in grid equipment. This distributed learning paradigm relies in the alternating directionmethod ofmul-
tipliers (ADMM) that combines the decomposability offered by the dual ascent
method with the superior convergence properties of the method of multipliers, which means that problems with nondifferentiable objective functions can be
easily addressed and it is possible to perform parallel optimization . It is
also possible to apply other variants, such as the Douglas-Rachford and block coordinate descent methods [26,27]. It is important to stress the nonlinear
nature of the AC power system, which results in a nonconvex problem for
the state estimator.
2292 Transmission System
The same paradigm can be applied to RES forecast to explore geographically
distributed time-series information . The vector autoregression (VAR) framework can be applied to forecast thousands of time series in a distributed
fashion by combining ADMMwith LASSO framework to explore the sparsity in
the model’s coefficients.
The practical implementation of the distributed learning paradigm requires an
adequate choice of the distributed processing platform, which can be divided into two types : (i) horizontal scaling: distribute the workload by several
servers—decentralized and distributed cluster (cloud) computing framework;
(ii) vertical scaling: involves installing more processors, memory, and faster hardware inside a single machine.
For horizontal scaling, message passing interface (MPI) was the first communi-
cation protocol to distribute and exchange the data between peers, Apache Hadoop with MapReduce as the data processing scheme emerged later, and
Apache Spark is the prevalent solution. For iterative algorithms like ADMM, MapReduce is not adequate due to disk I/O limitations, while Spark performs
in-memory computations that overcome these limitations for iterative pro-
cesses . The most popular vertical scale up technologies are high- performance computing clusters, multicore processors, and graphics processing
unit (GPU). The ADDM algorithm and variants can be implemented in these
2.3 TSO-DSO Cooperation
The data exchange between TSO and DSO will contribute to increase the secu-
rity of both systems in different time-scales, ranging from real-time to long-term planning. The European project evolvDSO developed a usecase for the TSO-
DSO cooperation, which firstly means bidirectional exchange of information,
both historical and real-time data, regarding the operating conditions of the transmission and distribution systems . Secondly, it can also mean the
DSO supporting the TSO operational and planning tasks, for instance, by con- trolling the active and reactive power in the primary substation or elaborating a
joint expansion plan of both systems. Cooperation is needed since presently
the distribution system is a blackbox to the TSO and viceversa. Moreover, con- sidering the increasing integration of distributed energy resources in the distri-
bution system, the operation of both networks becomes challenging and
cannot be decoupled. The new flexible resources (e.g., demand response— DR) are also at the distribution system level, which requires new TSO-DSO
technical protocols for its activation and management.
This increasing cooperation will mean additional data to be integrated and explored in the managing tasks of both TSO and DSO. One trend is the
230 CHAPTER 10: Future Trends for Big Data Application in Power Systems
development of tools capable of estimating the flexibility range of active and
reactive power in the TSO-DSO boundary and separating this flexibility by total cost . The same exercise can be conducted for lower voltage levels
of the power system .
For dynamic analysis, the trend is to estimate the dynamic response of load
aggregated at the network node level for a time domain between one and sev-
eral seconds. One example is probabilistic methodologies based on processing and classifying large amounts of historical load data at each bus and standard
dynamic signatures of individual load categories obtained from laboratory/
fieldtests . Another is dynamic equivalent models constructed for the dis- tribution networks that are able to reflect the aggregated behavior of different
resources with respect to system requirements such as frequency containment
reserve. Machine learning algorithms, such as artificial neural networks, can be used as surrogate models for the dynamic equivalents .
3 DISTRIBUTION SYSTEM
The big data trends in the distribution system are mainly driven by two objec-
tives. Firstly, increase the monitoring capability of MV and LV networks and
develop fast decision-aid methods for operators. Secondly, implemented pre- dictive active management strategies that take advantage of flexibility from
distributed energy resources to mitigate the impact of RES uncertainty and
3.1 Monitoring and Situational Awareness
The smart grid paradigm increases themonitoring capability of the distribution
system. However, it might be unmanageable to have real-timemonitoring of all the devices in the distribution system, particularly at the LV level. Machine
learning algorithms installed in intelligent electronic devices can support
power system monitoring by providing several functionalities, such as recon- struction of missing signals, state estimation, asset monitoring and diagnosing,
and fault location. These functions should have low computational require-
ments (e.g., no need to store data, capacity of running in low cost processors) and the possibility to adjust under evolving conditions.
For LV grids, the trend is to explore data collected from smart meters and RTU installed in MV/LV substations for close to real-time situational awareness of
operators and with low communication costs. Smart meter data can be used
to increase the knowledge about the LV network topology and characteristics. For instance, it can be used to reduce geographical information system errors
(e.g., connectivity errors in the network topology) and for phase detection .
2313 Distribution System
Data-driven methods, such as autoencoder extreme learning machines
(AE-ELM), can be employed to estimate, close to real-time, voltage magnitude and active power for all nodes of the LV network by using only a subset of
meters with real-time communication capability [36,37]. This new smart grid
function can generate under/overvoltage alarms to operators and trigger con- trol management functions to solve the technical problems. These techniques
provide accurate information about voltage magnitude. Only with 30% of the
total meters with real-time communication, the AE-ELM state estimator esti- mates : (i) voltage magnitude values with a mean absolute error (MAE)
of 0.49 V; (ii) active power quantities with an MAE of 0.35 kW. The largest
MAE was 0.79 V.
The challenge is on how to monitor the operating conditions of multiple LV
networks at the same time and derive control strategies to solve detected tech- nical problems. This problem requires new techniques for data streaming visu-
alization and dimension reduction that summarize the operating conditions of