טכניון מכון טכנולוגי לישראל
הטכניון מכון טכנולוגי לישראל - בית הספר ללימודי מוסמכים  
M.Sc Thesis
M.Sc StudentAshkenazi Ortal
SubjectCreating Screening Questionnaires from Archival Social Media
Data
DepartmentDepartment of Industrial Engineering and Management
Supervisors DR. Ofra Amir
DR. Elad Yom-Tov
Full Thesis textFull thesis text - English Version


Abstract

Screening questionnaires are used in medicine as a diagnostic aid. Creating them is a long and expensive process, which might be automated through analysis of social media posts related to symptoms and behaviors prior to diagnosis.

Here we propose a method for generating a screening questionnaire for a given medical condition from social media postings. The method first identifies a cohort of users through their posts in dedicated patient groups and a control group of users who reported similar symptoms but did not report being diagnosed with the condition of interest. Posts made prior to diagnosis are used to generate decision rules to differentiate between the different groups by clustering symptoms mentioned therein and training a decision tree. We validate the generated rules by correlating the rules with the scores given by medical doctors to matching cases.

Questionnaires for three conditions (endometriosis, lupus, and gout), were produced using the data of several hundreds of users from Reddit and rated by doctors. The average Pearson’s correlation between medical doctor’s scores and the decision rules were 0.58 (endometriosis), 0.40 (lupus) and 0.27 (gout).

Our results suggest that the process of questionnaire generation can be, at least partly, automated. The generated questionnaires are advantageous in that they are based on real-person experience, but are currently lacking in their ability to capture the context, duration, and timing of symptoms.