Flu Data Analysis

Wrangling

Author

Vijay Panthayi

First, we load the tidyverse package.

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.2.2
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   0.3.4
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.2.0     ✔ stringr 1.5.0
✔ readr   2.1.2     ✔ forcats 0.5.1
Warning: package 'ggplot2' was built under R version 4.2.2
Warning: package 'dplyr' was built under R version 4.2.2
Warning: package 'stringr' was built under R version 4.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Next, using the here() function and using the readRDS function, we import the SympAct_Any_Pos.Rda file.

sympact_raw <- here::here("fluanalysis","data","raw_data","SympAct_Any_Pos.Rda")
flu_data_raw <- readRDS(sympact_raw)
summary(flu_data_raw)
                                           DxName1   
 Influenza like illness - Clinical Dx          :328  
 Influenza - Virus Identified                  :131  
 Fever, unspecified                            :101  
 Cough                                         : 66  
 Acute pharyngitis, unspecified                : 50  
 Acute upper respiratory infection, unspecified: 22  
 (Other)                                       : 37  
                                 DxName2   
 Influenza - Virus Identified        :126  
 Influenza like illness - Clinical Dx:115  
 Fever, unspecified                  : 45  
 Cough                               : 41  
 Acute pharyngitis, unspecified      : 31  
 (Other)                             : 97  
 NA's                                :280  
                                 DxName3   
 Influenza - Virus Identified        : 23  
 Influenza like illness - Clinical Dx: 14  
 Cough                               : 10  
 Fever, unspecified                  :  6  
 Acute pharyngitis, unspecified      :  4  
 (Other)                             : 52  
 NA's                                :626  
                                           DxName4   
 Influenza - Virus Identified                  :  3  
 Acute upper respiratory infection, unspecified:  2  
 Encounter for immunization                    :  2  
 Influenza like illness - Clinical Dx          :  2  
 Acute pharyngitis, unspecified                :  1  
 (Other)                                       :  9  
 NA's                                          :716  
                                                                                               DxName5   
 Acute suppurative otitis media without spontaneous rupture of ear drum, right ear                 :  0  
 Encounter for immunization                                                                        :  0  
 Headache                                                                                          :  1  
 Other infectious mononucleosis without complication                                               :  0  
 Strain of other flexor muscle, fascia and tendon at forearm level, right arm, subsequent encounter:  0  
 NA's                                                                                              :734  
                                                                                                         
 Unique.Visit       ActivityLevel    ActivityLevelF SwollenLymphNodes
 Length:735         Min.   : 0.000   3      :125    No :421          
 Class :character   1st Qu.: 3.000   5      : 97    Yes:314          
 Mode  :character   Median : 4.000   4      : 95                     
                    Mean   : 4.463   2      : 80                     
                    3rd Qu.: 6.000   7      : 68                     
                    Max.   :10.000   6      : 66                     
                                     (Other):204                     
 ChestCongestion ChillsSweats NasalCongestion CoughYN   Sneeze    Fatigue  
 No :326         No :131      No :170         No : 75   No :340   No : 64  
 Yes:409         Yes:604      Yes:565         Yes:660   Yes:395   Yes:671  
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
 SubjectiveFever Headache      Weakness   WeaknessYN  CoughIntensity CoughYN2 
 No :230         No :115   None    : 49   No : 49    None    : 47    No : 47  
 Yes:505         Yes:620   Mild    :224   Yes:686    Mild    :156    Yes:688  
                           Moderate:341              Moderate:360             
                           Severe  :121              Severe  :172             
                                                                              
                                                                              
                                                                              
     Myalgia    MyalgiaYN RunnyNose AbPain    ChestPain Diarrhea  EyePn    
 None    : 79   No : 79   No :211   No :642   No :501   No :636   No :622  
 Mild    :214   Yes:656   Yes:524   Yes: 93   Yes:234   Yes: 99   Yes:113  
 Moderate:327                                                              
 Severe  :115                                                              
                                                                           
                                                                           
                                                                           
 Insomnia  ItchyEye  Nausea    EarPn     Hearing   Pharyngitis Breathless
 No :316   No :553   No :477   No :573   No :705   No :121     No :438   
 Yes:419   Yes:182   Yes:258   Yes:162   Yes: 30   Yes:614     Yes:297   
                                                                         
                                                                         
                                                                         
                                                                         
                                                                         
 ToothPn   Vision    Vomit     Wheeze       BodyTemp     
 No :569   No :716   No :656   No :514   Min.   : 97.20  
 Yes:166   Yes: 19   Yes: 79   Yes:221   1st Qu.: 98.20  
                                         Median : 98.50  
                                         Mean   : 98.94  
                                         3rd Qu.: 99.30  
                                         Max.   :103.10  
                                         NA's   :5       
                                RapidFluA  
 Positive for Influenza A            :169  
 Presumptive Negative For Influenza A:159  
 NA's                                :407  
                                           
                                           
                                           
                                           
                                RapidFluB                        PCRFluA   
 Positive for Influenza B            : 26    Influenza A Detected    :120  
 Presumptive Negative For Influenza B:302    Influenza A Not Detected: 33  
 NA's                                :407   Assay Invalid            :  0  
                                            Indeterminate            :  1  
                                            NA's                     :581  
                                                                           
                                                                           
                      PCRFluB     TransScore1    TransScore1F  TransScore2   
  Influenza B Detected    :  9   Min.   :0.000   0: 13        Min.   :0.000  
  Influenza B Not Detected:145   1st Qu.:3.000   1: 53        1st Qu.:2.000  
 Assay Invalid            :  0   Median :4.000   2:107        Median :3.000  
 NA's                     :581   Mean   :3.473   3:157        Mean   :2.917  
                                 3rd Qu.:5.000   4:210        3rd Qu.:4.000  
                                 Max.   :5.000   5:195        Max.   :4.000  
                                                                             
 TransScore2F  TransScore3    TransScore3F  TransScore4    TransScore4F
 0: 13        Min.   :0.000   0: 24        Min.   :0.000   0: 50       
 1: 89        1st Qu.:1.000   1:166        1st Qu.:2.000   1:103       
 2:138        Median :2.000   2:222        Median :3.000   2:154       
 3:201        Mean   :2.148   3:323        Mean   :2.576   3:230       
 4:294        3rd Qu.:3.000                3rd Qu.:4.000   4:198       
              Max.   :3.000                Max.   :4.000               
                                                                       
  ImpactScore      ImpactScore2     ImpactScore3    ImpactScoreF ImpactScore2F
 Min.   : 2.000   Min.   : 2.000   Min.   : 0.00   8      :105   7      :107  
 1st Qu.: 8.000   1st Qu.: 7.000   1st Qu.: 3.00   9      :104   8      :102  
 Median : 9.000   Median : 8.000   Median : 5.00   10     : 88   9      : 90  
 Mean   : 9.514   Mean   : 8.581   Mean   : 5.06   7      : 84   10     : 86  
 3rd Qu.:11.000   3rd Qu.:10.000   3rd Qu.: 7.00   11     : 82   6      : 85  
 Max.   :18.000   Max.   :17.000   Max.   :13.00   12     : 58   11     : 59  
                                                   (Other):214   (Other):206  
 ImpactScore3F ImpactScoreFD   TotalSymp1     TotalSymp1F    TotalSymp2   
 4      :134   8      :105   Min.   : 5.00   12     : 86   Min.   : 4.00  
 5      :112   9      :104   1st Qu.:11.00   13     : 84   1st Qu.:10.00  
 3      :108   10     : 88   Median :13.00   14     : 80   Median :12.00  
 6      :102   7      : 84   Mean   :12.99   11     : 72   Mean   :12.43  
 7      : 66   11     : 82   3rd Qu.:15.00   10     : 62   3rd Qu.:15.00  
 2      : 64   12     : 58   Max.   :23.00   15     : 61   Max.   :22.00  
 (Other):149   (Other):214                   (Other):290                  
   TotalSymp3   
 Min.   : 3.00  
 1st Qu.:10.00  
 Median :12.00  
 Mean   :11.66  
 3rd Qu.:14.00  
 Max.   :21.00  
                

The following are steps taken to clean the data set for exploration:

  1. We can use !c() with the contains() function to exclude variables that we do not want to analyze.
  2. Finally, we use drop_na() to remove all NA observations.
  3. We use anyNA() and glimpse() to determine if the previous steps worked properly.
flu_data_clean <- flu_data_raw %>%
                select(!c(contains(c("Score","Total","FluA","FluB","Dxname","Activity")),"Unique.Visit")) %>%
                drop_na()
anyNA(flu_data_clean)
[1] FALSE
glimpse(flu_data_clean)
Rows: 730
Columns: 32
$ SwollenLymphNodes <fct> Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, No, Yes, Y…
$ ChestCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y…
$ ChillsSweats      <fct> No, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, …
$ NasalCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y…
$ CoughYN           <fct> Yes, Yes, No, Yes, No, Yes, Yes, Yes, Yes, Yes, No, …
$ Sneeze            <fct> No, No, Yes, Yes, No, Yes, No, Yes, No, No, No, No, …
$ Fatigue           <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ SubjectiveFever   <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes…
$ Headache          <fct> Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes…
$ Weakness          <fct> Mild, Severe, Severe, Severe, Moderate, Moderate, Mi…
$ WeaknessYN        <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ CoughIntensity    <fct> Severe, Severe, Mild, Moderate, None, Moderate, Seve…
$ CoughYN2          <fct> Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes…
$ Myalgia           <fct> Mild, Severe, Severe, Severe, Mild, Moderate, Mild, …
$ MyalgiaYN         <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ RunnyNose         <fct> No, No, Yes, Yes, No, No, Yes, Yes, Yes, Yes, No, No…
$ AbPain            <fct> No, No, Yes, No, No, No, No, No, No, No, Yes, Yes, N…
$ ChestPain         <fct> No, No, Yes, No, No, Yes, Yes, No, No, No, No, Yes, …
$ Diarrhea          <fct> No, No, No, No, No, Yes, No, No, No, No, No, No, No,…
$ EyePn             <fct> No, No, No, No, Yes, No, No, No, No, No, Yes, No, Ye…
$ Insomnia          <fct> No, No, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, Y…
$ ItchyEye          <fct> No, No, No, No, No, No, No, No, No, No, No, No, Yes,…
$ Nausea            <fct> No, No, Yes, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Y…
$ EarPn             <fct> No, Yes, No, Yes, No, No, No, No, No, No, No, Yes, Y…
$ Hearing           <fct> No, Yes, No, No, No, No, No, No, No, No, No, No, No,…
$ Pharyngitis       <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, …
$ Breathless        <fct> No, No, Yes, No, No, Yes, No, No, No, Yes, No, Yes, …
$ ToothPn           <fct> No, No, Yes, No, No, No, No, No, Yes, No, No, Yes, N…
$ Vision            <fct> No, No, No, No, No, No, No, No, No, No, No, No, No, …
$ Vomit             <fct> No, No, No, No, No, No, Yes, No, No, No, Yes, Yes, N…
$ Wheeze            <fct> No, No, No, Yes, No, Yes, No, No, No, No, No, Yes, N…
$ BodyTemp          <dbl> 98.3, 100.4, 100.8, 98.8, 100.5, 98.4, 102.5, 98.4, …

Finally, we save the cleaned data set into the processed_data folder using the saveRDS() function.

flu_clean <- here::here("fluanalysis","data","processed_data","flu_data_processed")
saveRDS(flu_data_clean,file=flu_clean)