Flu Data Analysis

Fitting

Author

Vijay Panthayir

Model Fitting

For the purpose of this exercise, the main predictor of interest is going to be RunnyNose and that we care about all other predictors. We are going to fit models with each of the two outcomes (body temperature and nausea) and RunnyNose. In addition, we are going to fit models with each of the two outcomes (body temperature and nausea) with all predictors. While fitting to one outcome, the other outcome will be considered a predictor (ex. when fitting predictors to body temperature, nausea will be considered a predictor).

This should all be completed with the tidymodels framework. We will be using the following commands:
linear_reg()
set_engine("lm")
logistic_reg()
set_engine("glm")

1. Load Data + Fitting Prep

First, lets load the packages needed for the fitting.

library(dplyr)
library(tidyverse)
library(recipes)
library(here)
library(parsnip)
library(dotwhisker)
library(performance)

Now, we can load and preview the processed data set.

flu_clean <- readRDS(here::here("fluanalysis", "data","processed_data", "flu_data_processed"))
glimpse(flu_clean)
Rows: 730
Columns: 32
$ SwollenLymphNodes <fct> Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, No, Yes, Y…
$ ChestCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y…
$ ChillsSweats      <fct> No, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, …
$ NasalCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y…
$ CoughYN           <fct> Yes, Yes, No, Yes, No, Yes, Yes, Yes, Yes, Yes, No, …
$ Sneeze            <fct> No, No, Yes, Yes, No, Yes, No, Yes, No, No, No, No, …
$ Fatigue           <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ SubjectiveFever   <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes…
$ Headache          <fct> Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes…
$ Weakness          <fct> Mild, Severe, Severe, Severe, Moderate, Moderate, Mi…
$ WeaknessYN        <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ CoughIntensity    <fct> Severe, Severe, Mild, Moderate, None, Moderate, Seve…
$ CoughYN2          <fct> Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes…
$ Myalgia           <fct> Mild, Severe, Severe, Severe, Mild, Moderate, Mild, …
$ MyalgiaYN         <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ RunnyNose         <fct> No, No, Yes, Yes, No, No, Yes, Yes, Yes, Yes, No, No…
$ AbPain            <fct> No, No, Yes, No, No, No, No, No, No, No, Yes, Yes, N…
$ ChestPain         <fct> No, No, Yes, No, No, Yes, Yes, No, No, No, No, Yes, …
$ Diarrhea          <fct> No, No, No, No, No, Yes, No, No, No, No, No, No, No,…
$ EyePn             <fct> No, No, No, No, Yes, No, No, No, No, No, Yes, No, Ye…
$ Insomnia          <fct> No, No, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, Y…
$ ItchyEye          <fct> No, No, No, No, No, No, No, No, No, No, No, No, Yes,…
$ Nausea            <fct> No, No, Yes, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Y…
$ EarPn             <fct> No, Yes, No, Yes, No, No, No, No, No, No, No, Yes, Y…
$ Hearing           <fct> No, Yes, No, No, No, No, No, No, No, No, No, No, No,…
$ Pharyngitis       <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, …
$ Breathless        <fct> No, No, Yes, No, No, Yes, No, No, No, Yes, No, Yes, …
$ ToothPn           <fct> No, No, Yes, No, No, No, No, No, Yes, No, No, Yes, N…
$ Vision            <fct> No, No, No, No, No, No, No, No, No, No, No, No, No, …
$ Vomit             <fct> No, No, No, No, No, No, Yes, No, No, No, Yes, Yes, N…
$ Wheeze            <fct> No, No, No, Yes, No, Yes, No, No, No, No, No, Yes, N…
$ BodyTemp          <dbl> 98.3, 100.4, 100.8, 98.8, 100.5, 98.4, 102.5, 98.4, …

Finally, we can use set_engine() to create general linear and logistic models.

linear <- linear_reg() %>% 
              set_engine("lm")
logistic <- logistic_reg() %>%
              set_engine("glm")

2. Linear Model Fitting

We are going to fit a linear model to the continuous outcome, body temperature, using the main predictor of interest, runny nose.

lm_fit_rn <- linear %>%
              fit(BodyTemp~RunnyNose,data=flu_clean)
lm_fit_rn
parsnip model object


Call:
stats::lm(formula = BodyTemp ~ RunnyNose, data = data)

Coefficients:
 (Intercept)  RunnyNoseYes  
     99.1431       -0.2926  

3. Linear Model Fitting (cont.)

Now we are going to fit a linear model to the continuous outcome, body temperature, using all the predictors.

lm_fit_all <- linear %>%
              fit(BodyTemp~ . ,data=flu_clean)
lm_fit_all
parsnip model object


Call:
stats::lm(formula = BodyTemp ~ ., data = data)

Coefficients:
           (Intercept)    SwollenLymphNodesYes      ChestCongestionYes  
             97.925243               -0.165302                0.087326  
       ChillsSweatsYes      NasalCongestionYes              CoughYNYes  
              0.201266               -0.215771                0.313893  
             SneezeYes              FatigueYes      SubjectiveFeverYes  
             -0.361924                0.264762                0.436837  
           HeadacheYes            WeaknessMild        WeaknessModerate  
              0.011453                0.018229                0.098944  
        WeaknessSevere           WeaknessYNYes      CoughIntensityMild  
              0.373435                      NA                0.084881  
CoughIntensityModerate    CoughIntensitySevere             CoughYN2Yes  
             -0.061384               -0.037272                      NA  
           MyalgiaMild         MyalgiaModerate           MyalgiaSevere  
              0.164242               -0.024064               -0.129263  
          MyalgiaYNYes            RunnyNoseYes               AbPainYes  
                    NA               -0.080485                0.031574  
          ChestPainYes             DiarrheaYes                EyePnYes  
              0.105071               -0.156806                0.131544  
           InsomniaYes             ItchyEyeYes               NauseaYes  
             -0.006824               -0.008016               -0.034066  
              EarPnYes              HearingYes          PharyngitisYes  
              0.093790                0.232203                0.317581  
         BreathlessYes              ToothPnYes               VisionYes  
              0.090526               -0.022876               -0.274625  
              VomitYes               WheezeYes  
              0.165272               -0.046665  

4. Linear Model Comparisons

lm_fit_rn
parsnip model object


Call:
stats::lm(formula = BodyTemp ~ RunnyNose, data = data)

Coefficients:
 (Intercept)  RunnyNoseYes  
     99.1431       -0.2926  
lm_fit_all
parsnip model object


Call:
stats::lm(formula = BodyTemp ~ ., data = data)

Coefficients:
           (Intercept)    SwollenLymphNodesYes      ChestCongestionYes  
             97.925243               -0.165302                0.087326  
       ChillsSweatsYes      NasalCongestionYes              CoughYNYes  
              0.201266               -0.215771                0.313893  
             SneezeYes              FatigueYes      SubjectiveFeverYes  
             -0.361924                0.264762                0.436837  
           HeadacheYes            WeaknessMild        WeaknessModerate  
              0.011453                0.018229                0.098944  
        WeaknessSevere           WeaknessYNYes      CoughIntensityMild  
              0.373435                      NA                0.084881  
CoughIntensityModerate    CoughIntensitySevere             CoughYN2Yes  
             -0.061384               -0.037272                      NA  
           MyalgiaMild         MyalgiaModerate           MyalgiaSevere  
              0.164242               -0.024064               -0.129263  
          MyalgiaYNYes            RunnyNoseYes               AbPainYes  
                    NA               -0.080485                0.031574  
          ChestPainYes             DiarrheaYes                EyePnYes  
              0.105071               -0.156806                0.131544  
           InsomniaYes             ItchyEyeYes               NauseaYes  
             -0.006824               -0.008016               -0.034066  
              EarPnYes              HearingYes          PharyngitisYes  
              0.093790                0.232203                0.317581  
         BreathlessYes              ToothPnYes               VisionYes  
              0.090526               -0.022876               -0.274625  
              VomitYes               WheezeYes  
              0.165272               -0.046665  
compare_performance(lm_fit_rn,lm_fit_all)
# Comparison of Model Performance Indices

Name       | Model |      AIC | AIC weights |      BIC | BIC weights |    R2 | R2 (adj.) |  RMSE | Sigma
--------------------------------------------------------------------------------------------------------
lm_fit_rn  |   _lm | 2329.346 |    2.89e-06 | 2343.125 |        1.00 | 0.012 |     0.011 | 1.188 | 1.190
lm_fit_all |   _lm | 2303.840 |       1.000 | 2469.189 |    4.22e-28 | 0.129 |     0.086 | 1.116 | 1.144

5. Logistic Model Fitting

glm_fit_rn <- logistic %>%
              fit(Nausea~RunnyNose,data=flu_clean)
glm_fit_rn
parsnip model object


Call:  stats::glm(formula = Nausea ~ RunnyNose, family = stats::binomial, 
    data = data)

Coefficients:
 (Intercept)  RunnyNoseYes  
    -0.65781       0.05018  

Degrees of Freedom: 729 Total (i.e. Null);  728 Residual
Null Deviance:      944.7 
Residual Deviance: 944.6    AIC: 948.6

6. Logistic Model Fitting (cont.)

glm_fit_all <- logistic %>%
              fit(Nausea~ . ,data=flu_clean)
glm_fit_all
parsnip model object


Call:  stats::glm(formula = Nausea ~ ., family = stats::binomial, data = data)

Coefficients:
           (Intercept)    SwollenLymphNodesYes      ChestCongestionYes  
              0.222870               -0.251083                0.275554  
       ChillsSweatsYes      NasalCongestionYes              CoughYNYes  
              0.274097                0.425817               -0.140423  
             SneezeYes              FatigueYes      SubjectiveFeverYes  
              0.176724                0.229062                0.277741  
           HeadacheYes            WeaknessMild        WeaknessModerate  
              0.331259               -0.121606                0.310849  
        WeaknessSevere           WeaknessYNYes      CoughIntensityMild  
              0.823187                      NA               -0.220794  
CoughIntensityModerate    CoughIntensitySevere             CoughYN2Yes  
             -0.362678               -0.950544                      NA  
           MyalgiaMild         MyalgiaModerate           MyalgiaSevere  
             -0.004146                0.204743                0.120758  
          MyalgiaYNYes            RunnyNoseYes               AbPainYes  
                    NA                0.045324                0.939304  
          ChestPainYes             DiarrheaYes                EyePnYes  
              0.070777                1.063934               -0.341991  
           InsomniaYes             ItchyEyeYes                EarPnYes  
              0.084175               -0.063364               -0.181719  
            HearingYes          PharyngitisYes           BreathlessYes  
              0.323052                0.275364                0.526801  
            ToothPnYes               VisionYes                VomitYes  
              0.480649                0.125498                2.458466  
             WheezeYes                BodyTemp  
             -0.304435               -0.031246  

Degrees of Freedom: 729 Total (i.e. Null);  695 Residual
Null Deviance:      944.7 
Residual Deviance: 751.5    AIC: 821.5

7. Logistic Model Comparisons

glm_fit_rn
parsnip model object


Call:  stats::glm(formula = Nausea ~ RunnyNose, family = stats::binomial, 
    data = data)

Coefficients:
 (Intercept)  RunnyNoseYes  
    -0.65781       0.05018  

Degrees of Freedom: 729 Total (i.e. Null);  728 Residual
Null Deviance:      944.7 
Residual Deviance: 944.6    AIC: 948.6
glm_fit_all
parsnip model object


Call:  stats::glm(formula = Nausea ~ ., family = stats::binomial, data = data)

Coefficients:
           (Intercept)    SwollenLymphNodesYes      ChestCongestionYes  
              0.222870               -0.251083                0.275554  
       ChillsSweatsYes      NasalCongestionYes              CoughYNYes  
              0.274097                0.425817               -0.140423  
             SneezeYes              FatigueYes      SubjectiveFeverYes  
              0.176724                0.229062                0.277741  
           HeadacheYes            WeaknessMild        WeaknessModerate  
              0.331259               -0.121606                0.310849  
        WeaknessSevere           WeaknessYNYes      CoughIntensityMild  
              0.823187                      NA               -0.220794  
CoughIntensityModerate    CoughIntensitySevere             CoughYN2Yes  
             -0.362678               -0.950544                      NA  
           MyalgiaMild         MyalgiaModerate           MyalgiaSevere  
             -0.004146                0.204743                0.120758  
          MyalgiaYNYes            RunnyNoseYes               AbPainYes  
                    NA                0.045324                0.939304  
          ChestPainYes             DiarrheaYes                EyePnYes  
              0.070777                1.063934               -0.341991  
           InsomniaYes             ItchyEyeYes                EarPnYes  
              0.084175               -0.063364               -0.181719  
            HearingYes          PharyngitisYes           BreathlessYes  
              0.323052                0.275364                0.526801  
            ToothPnYes               VisionYes                VomitYes  
              0.480649                0.125498                2.458466  
             WheezeYes                BodyTemp  
             -0.304435               -0.031246  

Degrees of Freedom: 729 Total (i.e. Null);  695 Residual
Null Deviance:      944.7 
Residual Deviance: 751.5    AIC: 821.5
compare_performance(glm_fit_rn,glm_fit_all)
Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading

Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading

Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
# Comparison of Model Performance Indices

Name        | Model |     AIC | AIC weights |     BIC | BIC weights | Tjur's R2 |  RMSE | Sigma | Log_loss | Score_log | Score_spherical |   PCP
------------------------------------------------------------------------------------------------------------------------------------------------
glm_fit_rn  |  _glm | 948.566 |    2.52e-28 | 957.752 |       1.000 | 1.169e-04 | 0.477 | 1.139 |    0.647 |  -107.871 |           0.012 | 0.545
glm_fit_all |  _glm | 821.471 |        1.00 | 982.227 |    4.84e-06 |     0.247 | 0.414 | 1.040 |    0.515 |      -Inf |           0.002 | 0.658