Burnout Syndrome is one of the most prevalent mental health conditions of our time and surveys have found that 49% of the Swiss working population suffers from an elevated risk of burnout. At the same time, identifying Burnout Syndrome is complex. This thesis explores methods from the field of Natural Language Processing (NLP) to detect burnout from text data. This approach could then be used in clinical settings to support the assessment of mental health professionals.
This research-oriented thesis explores possibilities of clinical burnout detection using methods from Machine Learning and Data Analysis. In a first step we were interested in detecting burnout, in a second one we tried to isolate and distinguish burnout from depression, another prevalent mental health condition, which often overlaps or co-occurs with burnout.
Since we did not have access to authentic clinical text data from burnout or depression patients, the data set had to be assembled by hand, using various sources such as social media platforms, articles and interviews. A lot of care had to be taken to not introduce unwanted bias as the different sources exhibited different style in language. The resulting data set is fully anonymous and has the labels Burnout, Depression and Control.
First, we trained a Random Forest classifier as a baseline. The main model developed in this project is a fine-tuned BERT model for the German language. BERT is a transformer-based state of the art NLP model, which was introduced by Google in 2018. The fine-tuning is done by adding additional layers on the BERT-base model using supervised learning. Finally, we explored Affective Word Lists, which are dictionaries with numerical scores indicating e.g. how strongly a word is associated with positive or negative emotions.
We obtained promising results with the fine-tuned BERT model, with an overall accuracy of 81.6%. The robustness of the model was stress-tested using keyword masking and cross validation. In addition, some interesting linguistic markers specific for burnout and depression could be identified using data analytic methods.