טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
Ph.D Thesis
Ph.D StudentDalal Gal
SubjectDecision Making under Uncertainty: Power System Applications
and Finite-Time Analyses
DepartmentDepartment of Electrical Engineering
Supervisor Professor Shie Mannor
Full Thesis textFull thesis text - English Version


Abstract

The first part of this thesis focuses on decision-making processes for power networks. These can be categorized per four time-scales: real-time, short-term, mid-term and long-term; each encompasses its own set of visible information and considerations. Examples are day-ahead generator scheduling, months-ahead asset management, and years-ahead system development. We put our primary emphasis on mid-term asset management. We formulate it as a chance-constrained stochastic program, which proves challenging due to high-dimensional decision variables in the case of large networks; non-convex mathematical forms; sub-problems encapsulating shorter time-scale decision making; and high uncertainty stemming from intermittent energy. To tackle the computational burden of hierarchically simulating lower-level decision making, we introduce a novel concept -- using machine learning for designing `proxies' that quickly approximate outcomes of short-term decisions.

An additional, natural approach for solving such dynamic control problems is reinforcement learning (RL). To facilitate joint decision making among several stakeholders, we propose a new hierarchical RL model to be served as a proxy for real-time power grid reliability. Our matching algorithm alternates between slow time-scale policy improvement and fast time-scale value function evaluation.


The second part of this thesis provides novel finite sample analyses of widely used policy evaluation algorithms in RL. Namely, we consider the case of function approximation, on which existing literature is relatively scarce and apply only to somewhat modified versions, e.g., projected variants or ones where stepsizes depend on unknown problem parameters. Using recently developed stochastic approximation techniques, we provide convergence rates both in expectation and with high-probability for the celebrated TD(0) and obviate artificial alterations by exploiting strong properties of it. Additionally, we introduce the first-ever finite sample analysis for two-timescale stochastic approximation. It thus sheds light on the finite-time behavior and stepsize selection for the famous Gradient-TD algorithms. Moreover, for the first time in the RL literature, our results are also applicable to non-square-summable stepsizes .