Mental Health Forecasting with R
Forecasting the fluctuation patterns of mental health indicators has become an increasingly researched domain in both statistics and clinical data science. Although some perspectives suggest that psychological states are too complex and individualized to predict, several studies have established that if correctly formulated and modeled, forecasting mood and mental health trajectories can be done with a meaningful level of accuracy. This viewpoint centers on constructing robust machine learning and statistical models through careful variable selection, hyperparameter tuning, and appropriate functional forms. Along the same lines, a strand of research has focused on time series analysis and decomposition for forecasting future mental health states.
Previous studies attempting to forecast mental health patterns can be classified into three strands based on estimation techniques and variables used. The first strand uses linear or non-linear regression techniques, often incorporating demographic and clinical covariates. The second strand capitalizes on time series forecasting models — such as ARIMA, quantile regression (QR), ARDL, and Granger causality tests — to forecast mood scores, symptom severity, or therapy adherence. The third strand includes work using modern machine learning and deep learning tools to predict mental health outcomes such as depression relapse or anxiety episodes.
As mentioned above, time series refers to a sequence of data points ordered in time, and forecasting models are optimized to predict future values based on previously observed patterns alongside time-varying or time-invariant covariates. Several key properties must be considered when working with mental health time series:
- Is the time series stationary?
- Is the dependent variable autocorrelated?
- Is there seasonality in the time series?
Stationarity
A time series is stationary if its statistical properties do not change over time. Stationarity is essential because, in its absence, a forecasting model will perform inconsistently across different time windows. For mental health data — such as daily mood scores or PHQ-9 ratings — stationarity is required so that summary statistics like means, variances, and correlations accurately describe the data across all time points of interest.
Autocorrelation
Autocorrelation measures the relationship between a variable’s current value and its past values — in other words, how much today’s mood predicts tomorrow’s mood. This temporal dependency can appear in a correlogram as a sinusoidal pattern, indicating that similar mood states recur at a predictable lag (e.g., weekly cycles driven by work schedules or therapy sessions).
Seasonality
Seasonality refers to periodic fluctuations occurring at regular intervals. In mental health contexts, this is especially meaningful: depressive symptoms often worsen in winter months (seasonal affective disorder), anxiety may peak around academic or fiscal deadlines, and mood can cycle weekly around social routines. Seasonality can be identified from an autocorrelation plot with a sinusoidal shape — the period of the sinusoid gives the length of the season.
An important note is that mental health time series are not always stationary. Non-stationary processes have variable means and variances, unlike stationary processes that revert around a constant long-term mean. Modeling non-stationary data with traditional methods can produce spurious associations — suggesting a relationship where none exists. To obtain consistent, reliable results, non-stationary mental health data must first be transformed into stationary data. Autocorrelation and seasonality in psychological and ecological momentary assessment (EMA) data have both been discussed extensively in the literature.
Here we focus on forecasting daily mood scores using the tidymodels framework with longitudinal cross-validation and parameter tuning. The dataset used is a simulated daily self-reported mood index (scale 1–10) collected over several years, loaded via a CSV export from an EMA platform.
library(PerformanceAnalytics)
library(tidyverse)
library(modeldata)
library(forecast)
library(tidymodels)
library(modeltime)
library(timetk)
library(lubridate)
mood_data <- read_csv("daily_mood_scores.csv")
The timetk package makes it easy to visualize mood trends and compute rolling returns or change scores across different periods:
plot(mood_data$daily_mood_change)
plot(mood_data$weekly_mood_change)
Here we add a structured date column and split the data into training and test sets for cross-validation:
ts <- mood_data %>%
mutate(date = as.Date(date))
train_data <- training(initial_time_split(ts, prop = .8))
test_data <- testing(initial_time_split(ts, prop = .8))
Visualizing the train/test split ensures the temporal ordering is preserved — a critical requirement for mental health time series where future data must never inform past predictions:
train_data %>% mutate(type = "train") %>%
bind_rows(test_data %>% mutate(type = "test")) %>%
ggplot(aes(x = date, y = daily_mood_score, color = type)) +
geom_line()
Three models are fitted on the training data. First, an ARIMA model capturing autoregressive mood dynamics:
arima_model <- arima_reg() %>%
set_engine("auto_arima") %>%
fit(daily_mood_score ~ date, data = train_data)
Second, a Prophet regression model that explicitly handles seasonality — well-suited for weekly and yearly mood cycles:
prophet_model <- prophet_reg() %>%
set_engine("prophet") %>%
fit(daily_mood_score ~ date, data = train_data)
Third, a linear regression model with month as a factor, capturing seasonal effects like winter depression:
tslm_model <- linear_reg() %>%
set_engine("lm") %>%
fit(daily_mood_score ~ as.numeric(date) + factor(month(date, label = TRUE)),
data = train_data)
All three models are combined into a modeltime table for unified comparison:
forecast_table <- modeltime_table(
arima_model,
prophet_model,
tslm_model
)
Model performance on the held-out test set is then evaluated:
forecast_table %>%
modeltime_calibrate(test_data) %>%
modeltime_accuracy()
Finally, the comparative forecast trajectories can be visualized to assess which model best tracks actual mood fluctuations over time:
forecast_table %>%
modeltime_calibrate(test_data) %>%
modeltime_forecast(actual_data = test_data) %>%
plot_modeltime_forecast()