טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentOliker Nurit
SubjectMultivariate Data Analysis for Contamination Event Detection
in Water Distribution Systems
DepartmentDepartment of Civil and Environmental Engineering
Supervisor Professor Avi Ostfeld
Full Thesis textFull thesis text - English Version


Abstract

The presented study features the development of two event detection models providing a decision support system alerting for contamination events in water distribution systems. The models utilize on-line measurements of general water quality parameters (e.g. pH, turbidity, conductivity, etc.) already being measured in water networks, for the aim of event detection.

Both models applied multivariate analysis of the measurements data, examines the data relatively to the multivariate system, explores the relations between water quality parameters and detects changes in their mutual patterns. The first model is a weighted support vector machine (SVM), a supervised classifier, which includes an autonomic calibration of its parameter. The second is an un-supervised minimum volume ellipsoid (MVE).

The models include a preliminary step of data cleaning, removing noise measurements and errors out of the data, and two major modular elements: a classifier for the detection of outlier measurements and a following sequence analysis for the classification of events. Both models are updated continuously and exploit a constantly growing data base.

The classic SVM was extracted to a weighted model attributes different weight to each measurement vector when constructing the classifier. The weights of the SVM model accomplish two goals: blurring the difference between the two classes’ data sets sizes (as there are much more normal/regular than event time measurements), and adhering the time factor attribute by a time decay coefficient, ascribing higher importance to recent observations when classifying a time step measurement. All model parameters were determined by data driven optimization so the calibration of the model is completely autonomic.

The MVE model applied an un-supervised classification method, eliminates the need for known data set (i.e. event examples) for the classifier construction. In the absent of satisfying knowledge regarding the influence of contamination event on the measured parameters, eliminating the use of any assumption on the event expression contributes to the model reliability and generality.

The study innovations in the field of contamination event detection models included mainly the multivariate data analysis, the application of a weighted SVM and a MVE models, and the use of un-supervised classification model reducing the use of unfounded simulated events.

The models were trained and tested on a real water utility data, showing convincing results compared to previous studies. On the whole, the MVE model showed superiority in all aspects feature better detection ability and higher classification accuracy.