library(dplyr)
library(tidyverse)
library(recipes)
library(here)
library(parsnip)
library(dotwhisker)
library(performance)
Flu Data Analysis
Fitting
Model Fitting
For the purpose of this exercise, the main predictor of interest is going to be RunnyNose
and that we care about all other predictors. We are going to fit models with each of the two outcomes (body temperature and nausea) and RunnyNose
. In addition, we are going to fit models with each of the two outcomes (body temperature and nausea) with all predictors. While fitting to one outcome, the other outcome will be considered a predictor (ex. when fitting predictors to body temperature, nausea will be considered a predictor).
This should all be completed with the tidymodels
framework. We will be using the following commands:
linear_reg()
set_engine("lm")
logistic_reg()
set_engine("glm")
1. Load Data + Fitting Prep
First, lets load the packages needed for the fitting.
Now, we can load and preview the processed data set.
<- readRDS(here::here("fluanalysis", "data","processed_data", "flu_data_processed"))
flu_clean glimpse(flu_clean)
Rows: 730
Columns: 32
$ SwollenLymphNodes <fct> Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, No, Yes, Y…
$ ChestCongestion <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y…
$ ChillsSweats <fct> No, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, …
$ NasalCongestion <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y…
$ CoughYN <fct> Yes, Yes, No, Yes, No, Yes, Yes, Yes, Yes, Yes, No, …
$ Sneeze <fct> No, No, Yes, Yes, No, Yes, No, Yes, No, No, No, No, …
$ Fatigue <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ SubjectiveFever <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes…
$ Headache <fct> Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes…
$ Weakness <fct> Mild, Severe, Severe, Severe, Moderate, Moderate, Mi…
$ WeaknessYN <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ CoughIntensity <fct> Severe, Severe, Mild, Moderate, None, Moderate, Seve…
$ CoughYN2 <fct> Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes…
$ Myalgia <fct> Mild, Severe, Severe, Severe, Mild, Moderate, Mild, …
$ MyalgiaYN <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ RunnyNose <fct> No, No, Yes, Yes, No, No, Yes, Yes, Yes, Yes, No, No…
$ AbPain <fct> No, No, Yes, No, No, No, No, No, No, No, Yes, Yes, N…
$ ChestPain <fct> No, No, Yes, No, No, Yes, Yes, No, No, No, No, Yes, …
$ Diarrhea <fct> No, No, No, No, No, Yes, No, No, No, No, No, No, No,…
$ EyePn <fct> No, No, No, No, Yes, No, No, No, No, No, Yes, No, Ye…
$ Insomnia <fct> No, No, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, Y…
$ ItchyEye <fct> No, No, No, No, No, No, No, No, No, No, No, No, Yes,…
$ Nausea <fct> No, No, Yes, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Y…
$ EarPn <fct> No, Yes, No, Yes, No, No, No, No, No, No, No, Yes, Y…
$ Hearing <fct> No, Yes, No, No, No, No, No, No, No, No, No, No, No,…
$ Pharyngitis <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, …
$ Breathless <fct> No, No, Yes, No, No, Yes, No, No, No, Yes, No, Yes, …
$ ToothPn <fct> No, No, Yes, No, No, No, No, No, Yes, No, No, Yes, N…
$ Vision <fct> No, No, No, No, No, No, No, No, No, No, No, No, No, …
$ Vomit <fct> No, No, No, No, No, No, Yes, No, No, No, Yes, Yes, N…
$ Wheeze <fct> No, No, No, Yes, No, Yes, No, No, No, No, No, Yes, N…
$ BodyTemp <dbl> 98.3, 100.4, 100.8, 98.8, 100.5, 98.4, 102.5, 98.4, …
Finally, we can use set_engine()
to create general linear and logistic models.
<- linear_reg() %>%
linear set_engine("lm")
<- logistic_reg() %>%
logistic set_engine("glm")
2. Linear Model Fitting
We are going to fit a linear model to the continuous outcome, body temperature, using the main predictor of interest, runny nose.
<- linear %>%
lm_fit_rn fit(BodyTemp~RunnyNose,data=flu_clean)
lm_fit_rn
parsnip model object
Call:
stats::lm(formula = BodyTemp ~ RunnyNose, data = data)
Coefficients:
(Intercept) RunnyNoseYes
99.1431 -0.2926
3. Linear Model Fitting (cont.)
Now we are going to fit a linear model to the continuous outcome, body temperature, using all the predictors.
<- linear %>%
lm_fit_all fit(BodyTemp~ . ,data=flu_clean)
lm_fit_all
parsnip model object
Call:
stats::lm(formula = BodyTemp ~ ., data = data)
Coefficients:
(Intercept) SwollenLymphNodesYes ChestCongestionYes
97.925243 -0.165302 0.087326
ChillsSweatsYes NasalCongestionYes CoughYNYes
0.201266 -0.215771 0.313893
SneezeYes FatigueYes SubjectiveFeverYes
-0.361924 0.264762 0.436837
HeadacheYes WeaknessMild WeaknessModerate
0.011453 0.018229 0.098944
WeaknessSevere WeaknessYNYes CoughIntensityMild
0.373435 NA 0.084881
CoughIntensityModerate CoughIntensitySevere CoughYN2Yes
-0.061384 -0.037272 NA
MyalgiaMild MyalgiaModerate MyalgiaSevere
0.164242 -0.024064 -0.129263
MyalgiaYNYes RunnyNoseYes AbPainYes
NA -0.080485 0.031574
ChestPainYes DiarrheaYes EyePnYes
0.105071 -0.156806 0.131544
InsomniaYes ItchyEyeYes NauseaYes
-0.006824 -0.008016 -0.034066
EarPnYes HearingYes PharyngitisYes
0.093790 0.232203 0.317581
BreathlessYes ToothPnYes VisionYes
0.090526 -0.022876 -0.274625
VomitYes WheezeYes
0.165272 -0.046665
4. Linear Model Comparisons
lm_fit_rn
parsnip model object
Call:
stats::lm(formula = BodyTemp ~ RunnyNose, data = data)
Coefficients:
(Intercept) RunnyNoseYes
99.1431 -0.2926
lm_fit_all
parsnip model object
Call:
stats::lm(formula = BodyTemp ~ ., data = data)
Coefficients:
(Intercept) SwollenLymphNodesYes ChestCongestionYes
97.925243 -0.165302 0.087326
ChillsSweatsYes NasalCongestionYes CoughYNYes
0.201266 -0.215771 0.313893
SneezeYes FatigueYes SubjectiveFeverYes
-0.361924 0.264762 0.436837
HeadacheYes WeaknessMild WeaknessModerate
0.011453 0.018229 0.098944
WeaknessSevere WeaknessYNYes CoughIntensityMild
0.373435 NA 0.084881
CoughIntensityModerate CoughIntensitySevere CoughYN2Yes
-0.061384 -0.037272 NA
MyalgiaMild MyalgiaModerate MyalgiaSevere
0.164242 -0.024064 -0.129263
MyalgiaYNYes RunnyNoseYes AbPainYes
NA -0.080485 0.031574
ChestPainYes DiarrheaYes EyePnYes
0.105071 -0.156806 0.131544
InsomniaYes ItchyEyeYes NauseaYes
-0.006824 -0.008016 -0.034066
EarPnYes HearingYes PharyngitisYes
0.093790 0.232203 0.317581
BreathlessYes ToothPnYes VisionYes
0.090526 -0.022876 -0.274625
VomitYes WheezeYes
0.165272 -0.046665
compare_performance(lm_fit_rn,lm_fit_all)
# Comparison of Model Performance Indices
Name | Model | AIC | AIC weights | BIC | BIC weights | R2 | R2 (adj.) | RMSE | Sigma
--------------------------------------------------------------------------------------------------------
lm_fit_rn | _lm | 2329.346 | 2.89e-06 | 2343.125 | 1.00 | 0.012 | 0.011 | 1.188 | 1.190
lm_fit_all | _lm | 2303.840 | 1.000 | 2469.189 | 4.22e-28 | 0.129 | 0.086 | 1.116 | 1.144
5. Logistic Model Fitting
<- logistic %>%
glm_fit_rn fit(Nausea~RunnyNose,data=flu_clean)
glm_fit_rn
parsnip model object
Call: stats::glm(formula = Nausea ~ RunnyNose, family = stats::binomial,
data = data)
Coefficients:
(Intercept) RunnyNoseYes
-0.65781 0.05018
Degrees of Freedom: 729 Total (i.e. Null); 728 Residual
Null Deviance: 944.7
Residual Deviance: 944.6 AIC: 948.6
6. Logistic Model Fitting (cont.)
<- logistic %>%
glm_fit_all fit(Nausea~ . ,data=flu_clean)
glm_fit_all
parsnip model object
Call: stats::glm(formula = Nausea ~ ., family = stats::binomial, data = data)
Coefficients:
(Intercept) SwollenLymphNodesYes ChestCongestionYes
0.222870 -0.251083 0.275554
ChillsSweatsYes NasalCongestionYes CoughYNYes
0.274097 0.425817 -0.140423
SneezeYes FatigueYes SubjectiveFeverYes
0.176724 0.229062 0.277741
HeadacheYes WeaknessMild WeaknessModerate
0.331259 -0.121606 0.310849
WeaknessSevere WeaknessYNYes CoughIntensityMild
0.823187 NA -0.220794
CoughIntensityModerate CoughIntensitySevere CoughYN2Yes
-0.362678 -0.950544 NA
MyalgiaMild MyalgiaModerate MyalgiaSevere
-0.004146 0.204743 0.120758
MyalgiaYNYes RunnyNoseYes AbPainYes
NA 0.045324 0.939304
ChestPainYes DiarrheaYes EyePnYes
0.070777 1.063934 -0.341991
InsomniaYes ItchyEyeYes EarPnYes
0.084175 -0.063364 -0.181719
HearingYes PharyngitisYes BreathlessYes
0.323052 0.275364 0.526801
ToothPnYes VisionYes VomitYes
0.480649 0.125498 2.458466
WheezeYes BodyTemp
-0.304435 -0.031246
Degrees of Freedom: 729 Total (i.e. Null); 695 Residual
Null Deviance: 944.7
Residual Deviance: 751.5 AIC: 821.5
7. Logistic Model Comparisons
glm_fit_rn
parsnip model object
Call: stats::glm(formula = Nausea ~ RunnyNose, family = stats::binomial,
data = data)
Coefficients:
(Intercept) RunnyNoseYes
-0.65781 0.05018
Degrees of Freedom: 729 Total (i.e. Null); 728 Residual
Null Deviance: 944.7
Residual Deviance: 944.6 AIC: 948.6
glm_fit_all
parsnip model object
Call: stats::glm(formula = Nausea ~ ., family = stats::binomial, data = data)
Coefficients:
(Intercept) SwollenLymphNodesYes ChestCongestionYes
0.222870 -0.251083 0.275554
ChillsSweatsYes NasalCongestionYes CoughYNYes
0.274097 0.425817 -0.140423
SneezeYes FatigueYes SubjectiveFeverYes
0.176724 0.229062 0.277741
HeadacheYes WeaknessMild WeaknessModerate
0.331259 -0.121606 0.310849
WeaknessSevere WeaknessYNYes CoughIntensityMild
0.823187 NA -0.220794
CoughIntensityModerate CoughIntensitySevere CoughYN2Yes
-0.362678 -0.950544 NA
MyalgiaMild MyalgiaModerate MyalgiaSevere
-0.004146 0.204743 0.120758
MyalgiaYNYes RunnyNoseYes AbPainYes
NA 0.045324 0.939304
ChestPainYes DiarrheaYes EyePnYes
0.070777 1.063934 -0.341991
InsomniaYes ItchyEyeYes EarPnYes
0.084175 -0.063364 -0.181719
HearingYes PharyngitisYes BreathlessYes
0.323052 0.275364 0.526801
ToothPnYes VisionYes VomitYes
0.480649 0.125498 2.458466
WheezeYes BodyTemp
-0.304435 -0.031246
Degrees of Freedom: 729 Total (i.e. Null); 695 Residual
Null Deviance: 944.7
Residual Deviance: 751.5 AIC: 821.5
compare_performance(glm_fit_rn,glm_fit_all)
Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
# Comparison of Model Performance Indices
Name | Model | AIC | AIC weights | BIC | BIC weights | Tjur's R2 | RMSE | Sigma | Log_loss | Score_log | Score_spherical | PCP
------------------------------------------------------------------------------------------------------------------------------------------------
glm_fit_rn | _glm | 948.566 | 2.52e-28 | 957.752 | 1.000 | 1.169e-04 | 0.477 | 1.139 | 0.647 | -107.871 | 0.012 | 0.545
glm_fit_all | _glm | 821.471 | 1.00 | 982.227 | 4.84e-06 | 0.247 | 0.414 | 1.040 | 0.515 | -Inf | 0.002 | 0.658