| tags: [ Data Cleaning R ] categories: [Coding ]

Subsetting relevant phenotypic data

Introduction

The original dataset that was used for descriptive statistics in previous entries contained data for all 801 individuals. However, the ethics approval for this study only allows for adult data to be included in the study. All data for minor individuals must therefore be removed. Additionally, relevant quantitative variables must be selected from the bigger phenotypic dataset since not all quantitative measurements are endophenotypes.

Aim

Remove data for minor individuals and subset relevant quantitative measures.

Methods and Results

Upon inspection of the participants’ reported age on the questionnaire, there were some apparent discrepancies with their recorded DOB and some missing values. I therefore re-calculated age at the time of collection (2008) based on DOB, which has no missing values, using Microsoft Excel. I added the calculated ages in the phenotypic dataset under “Age.excel” and 1. loaded this dataset into R:

phen.data.age <- read.csv('C:/Users/Martha/Documents/Honours/Project/honours.project/Data/NIES_master_database-age.csv')
  1. Subset data for minor participants
phen.data.minors<-phen.data.age[phen.data.age$Age.excel<=17,]
head(phen.data.minors)
##       UUID LAB_ID PID SID Ped_ID_.BK_no.   NIES NIES_GWAS Miles_coreped
## 166 400167     NA                     NA ES_167         0             0
## 186 400187     NA                   7000 ES_187         1             1
## 207 400208     NA                   7010 ES_208         1             1
## 291 400292     NA                     NA ES_292         0             0
## 309 400310     NA                   3808 ES_310         1             1
## 314 400315     NA                   3830 ES_315         1             1
##     NIES_DNA DNA_Box DNA_Postition Gender        DOB Current.height
## 166        0      NA               Female  4/04/1991             NA
## 186        1       4            B9 Female 19/09/1997            153
## 207        1       4            A3   Male 24/09/1997             NA
## 291        0      NA                 Male 22/03/1991            183
## 309        1       4            C6 Female 22/06/1994            158
## 314        1       4            C7 Female 19/12/1995            161
##     Current.weight Smoker Glaucoma.medications          Eye.Colour
## 166             NA  FALSE                      Hazel / Light Brown
## 186             65  FALSE                                    Brown
## 207             NA  FALSE                                         
## 291             80  FALSE                                         
## 309             44  FALSE                                     Blue
## 314             60  FALSE                                    Green
##     Logmar.VA.Right RVA.with.PH Logmar.VA.Left LVA.with.PH
## 166           -0.18       -0.18          -0.18       -0.18
## 186           -0.12       -0.12          -0.12       -0.12
## 207            0.00        0.00           0.00        0.00
## 291            0.00        0.00          -0.10       -0.10
## 309           -0.10       -0.10          -0.20       -0.20
## 314            0.00        0.00           0.02        0.02
##     R.Sph..pre.dilate. R.Cyl..pre.dilate. R.Axis..pre.dilate.
## 166               0.50               0.00                   0
## 186               0.75              -0.50                  27
## 207               0.50              -0.25                 125
## 291               0.00               0.00                   0
## 309               0.25              -0.25                 146
## 314              -0.50              -0.25                 178
##     L.Sph..pre.dilate. L.Cyl..pre.dilate. L.Axis..pre.dilate. IPDprecyclo
## 166               0.75               0.00                   0          NA
## 186              -0.50              -0.25                  46          NA
## 207               0.25               0.00                   0          NA
## 291               0.50              -0.50                  28          NA
## 309               0.00               0.00                   0          NA
## 314              -0.75              -0.25                  19          NA
##     R.Sph R.Cyl R.Axis L.Sph L.Cyl L.Axis IPDpostcyclo Glasses R.Sph.gl
## 166    NA    NA     NA    NA    NA     NA           NA   s gls       NA
## 186    NA    NA     NA    NA    NA     NA           NA   s gls       NA
## 207    NA    NA     NA    NA    NA     NA           NA   s gls       NA
## 291    NA    NA     NA    NA    NA     NA           NA   s gls       NA
## 309    NA    NA     NA    NA    NA     NA           NA   s gls       NA
## 314    NA    NA     NA    NA    NA     NA           NA   s gls       NA
##     R.Cyl.gl R.Axis.gl Add L.Sph.gl L.Cyl.gl L.Axis.gl R.K.value.H
## 166       NA        NA  NA       NA       NA        NA       42.25
## 186       NA        NA  NA       NA       NA        NA       43.00
## 207       NA        NA  NA       NA       NA        NA       41.50
## 291       NA        NA  NA       NA       NA        NA       40.25
## 309       NA        NA  NA       NA       NA        NA       42.25
## 314       NA        NA  NA       NA       NA        NA       44.25
##     R.K.Value.H.Axis R.K.value.V R.K.value.V.Axis L.K.value.H
## 166              172       43.00               82       42.25
## 186              178       45.00               88       43.00
## 207              171       42.00               81       41.50
## 291               97       41.75                7       40.00
## 309               97       42.75                7       42.00
## 314               96       44.75                6       44.50
##     L.K.value.H.Axis L.K.value.V L.K.value.V.Axis Cover.Test.D Lang.score
## 166                8       43.00               98           NA        550
## 186                3       44.75               93           NA        550
## 207              176       42.00               86           NA        550
## 291               87       41.25              177           NA        550
## 309               89       42.50              179           NA        550
## 314              103       44.75               13           NA        550
##     Ocular.Motility  IOP.time R.Pachimetry L.Pachimetry R.Axial.Length
## 166                 0/01/1900          526          515          23.09
## 186                 0/01/1900          550          554          22.82
## 207                 0/01/1900          511          502          24.21
## 291                 0/01/1900          507          519          24.69
## 309                 0/01/1900          555          556          23.39
## 314                 0/01/1900          559          572          23.34
##     L.Axial.Length AC.Depth.R AC.Depth.L R.IOP.mmHg L.IOP.mmHg
## 166          23.10       3.38       3.35         17         15
## 186          22.82       3.51       3.49         13         14
## 207          24.23       3.27       3.80         14         15
## 291          24.73       3.43       3.18         17         15
## 309          23.52       3.52       3.56         19         17
## 314          23.27       3.76       3.80         19         21
##                        Anterior.segment Pterygium Disc.size.RE
## 166                              NAD OU     FALSE            M
## 186                              NAD OU     FALSE            M
## 207 OD corneal nerves prominent, OS NAD     FALSE            M
## 291                              NAD OU     FALSE            M
## 309                              NAD OU     FALSE            M
## 314                              NAD OU     FALSE           NR
##     Disc.size.LE CDR.RE CDR.LE FDT.MD_RE FDT.MD_LE FDT.PSD_RE FDT.PSD_LE
## 166            M    0.3    0.4        NA        NA         NA         NA
## 186            M    0.3    0.3        NA        NA         NA         NA
## 207            M    0.3    0.3        NA        NA         NA         NA
## 291            M    0.3    0.3        NA        NA         NA         NA
## 309            M    0.3    0.3        NA        NA         NA         NA
## 314           NR    0.1    0.1        NA        NA         NA         NA
##                   Comments R.L.handed R.L.eye.dominance Other.disease
## 166 Slight aniscoria L > R          R                                
## 186                                 R                                
## 207                                 R                                
## 291                                 R                                
## 309                                 R                                
## 314                                 R                                
##     UV.questionnaire_NIES.code UV.questionnaire_First.name Maiden.name
## 166                        167                          NA          NA
## 186                        187                          NA          NA
## 207                        208                          NA          NA
## 291                        292                          NA          NA
## 309                        310                          NA          NA
## 314                        315                          NA          NA
##     UV.questionnaire_Surname UV.questionnaire_Age Age.excel
## 166                       NA                   16        17
## 186                       NA                   10        11
## 207                       NA                   10        11
## 291                       NA                   16        17
## 309                       NA                   13        14
## 314                       NA                   11        13
##     UV.questionnaire_Sex Lived.in.Norfolk.Island Location.
## 166                    F                      16          
## 186                    F                      10          
## 207                    M                      10          
## 291                    M                       1   Vanuatu
## 309                    F                       7    Cairns
## 314                    F                      11          
##     From.age.in.years Duration.outside.Norfolk.Island Hair.colour
## 166                NA                              NA       Brown
## 186                NA                              NA       Brown
## 207                NA                              NA       Brown
## 291                 0                              15       Black
## 309                 0                               6       Brown
## 314                NA                              NA       Brown
##               Burns.Tans Sunburn.and.pain         Time.outside
## 166 never burns but tans       2-10 times greater than 3/4 day
## 186      burns then tans       2-10 times       1/2 of the day
## 207           don't know            never     less than 1/4day
## 291           don't know            never                     
## 309 never burns but tans       2-10 times     less than 1/4day
## 314      burns then tans       2-10 times       1/2 of the day
##     Outdoors.and.hats Outdoors.and.sunglasses Winter..Indoors.and.Outdoors
## 166            seldom                   never               mostly indoors
## 186           usually                  seldom              mostly outdoors
## 207   1/2 of the time                  seldom                  1/2 and 1/2
## 291                                                                       
## 309   1/2 of the time         1/2 of the time                  1/2 and 1/2
## 314   1/2 of the time                   never                  1/2 and 1/2
##         At.school Feel.colder.than.people Arc.Welding
## 166       neither                  seldom          No
## 186 yes, hat only                  seldom          No
## 207 yes, hat only                   never          No
## 291                                                  
## 309     yes, both                  seldom          No
## 314 yes, hat only                   never          No
##     If.yes..eye.protection If.yes..flash.burn.       Reason.for.sunglasses
## 166                                                                       
## 186                                                                fashion
## 207                                            protection from eye disease
## 291                                                                       
## 309                                                                fashion
## 314                                                                       
##     Do.not.wear.sunglasses
## 166                       
## 186           Inconvenient
## 207                       
## 291                       
## 309        Not fashionable
## 314         Uncomfortable"
  • There were 28 minors from the study.
  1. Remove rows of data of minors
phen.data.adults<-phen.data.age[phen.data.age$Age.excel>17,]
head(phen.data.adults)
##     UUID LAB_ID   PID      SID Ped_ID_.BK_no.   NIES NIES_GWAS
## 1 219960     NA 19960 10015417           2438 ES_001         0
## 2 313180   6876 13180 10016420            104 ES_002         0
## 3 314911   6864 14911 10014320           6059 ES_003         0
## 4 111150   1115                            NA ES_004         0
## 5 363161   6962 63161 10014308             NA ES_005         0
## 6 110460   1046                            NA ES_006         0
##   Miles_coreped NIES_DNA DNA_Box DNA_Postition Gender        DOB
## 1             1        0      NA                 Male  6/01/1955
## 2             1        0      NA                 Male 21/11/1953
## 3             1        0      NA               Female 17/04/1955
## 4             0        0      NA                 Male  2/10/1942
## 5             0        0      NA               Female  4/09/1948
## 6             0        0      NA               Female 21/10/1938
##   Current.height Current.weight Smoker Glaucoma.medications
## 1             NA             NA  FALSE                     
## 2            170             78  FALSE                     
## 3            180             86   TRUE                     
## 4            183             NA   TRUE                     
## 5             NA             69   TRUE                     
## 6            162             NA   TRUE                     
##            Eye.Colour Logmar.VA.Right RVA.with.PH Logmar.VA.Left
## 1               Brown            0.02        0.02          -0.04
## 2               Brown            0.10        0.10           0.16
## 3 Hazel / Light Brown            0.00        0.00           0.00
## 4                Blue            0.30        0.10           0.08
## 5               Brown            0.00        0.00          -0.10
## 6 Hazel / Light Brown            0.24        0.16           0.36
##   LVA.with.PH R.Sph..pre.dilate. R.Cyl..pre.dilate. R.Axis..pre.dilate.
## 1       -0.04               0.25               0.00                   0
## 2        0.10               0.00              -0.75                  28
## 3        0.00               1.25              -1.25                 148
## 4        0.08               1.25              -0.25                  97
## 5       -0.10               1.25              -0.25                  37
## 6        0.14               4.00              -0.50                  46
##   L.Sph..pre.dilate. L.Cyl..pre.dilate. L.Axis..pre.dilate. IPDprecyclo
## 1               0.25              -0.50                  79          NA
## 2              -0.50              -0.25                 164          NA
## 3               1.25              -0.25                  24          NA
## 4               1.50              -0.50                  81          NA
## 5               1.25              -0.75                 164          NA
## 6               3.75              -0.75                 180          NA
##   R.Sph R.Cyl R.Axis L.Sph L.Cyl L.Axis IPDpostcyclo Glasses R.Sph.gl
## 1    NA    NA     NA    NA    NA     NA           NA   s gls       NA
## 2    NA    NA     NA    NA    NA     NA           NA   s gls       NA
## 3    NA    NA     NA    NA    NA     NA           NA   c gls       NA
## 4    NA    NA     NA    NA    NA     NA           NA   s gls       NA
## 5    NA    NA     NA    NA    NA     NA           NA   c gls       NA
## 6    NA    NA     NA    NA    NA     NA           NA   s gls       NA
##   R.Cyl.gl R.Axis.gl Add L.Sph.gl L.Cyl.gl L.Axis.gl R.K.value.H
## 1       NA        NA  NA       NA       NA        NA       42.00
## 2       NA        NA  NA       NA       NA        NA       41.25
## 3       NA        NA  NA       NA       NA        NA          NA
## 4       NA        NA  NA       NA       NA        NA       44.75
## 5       NA        NA  NA       NA       NA        NA       44.75
## 6       NA        NA  NA       NA       NA        NA       42.00
##   R.K.Value.H.Axis R.K.value.V R.K.value.V.Axis L.K.value.H
## 1                2       43.00               92       42.50
## 2                7       42.25               97       41.50
## 3               NA          NA               NA          NA
## 4                5       45.00               95       45.00
## 5                0       44.75               90       44.25
## 6                8       43.25               98       42.25
##   L.K.value.H.Axis L.K.value.V L.K.value.V.Axis Cover.Test.D Lang.score
## 1                5       43.50               95           NA        550
## 2              168       42.00               78           NA        550
## 3               NA          NA               NA           NA        550
## 4               60       45.25              150           NA         NA
## 5              178       44.75               88           NA        550
## 6              177       43.25               87           NA        550
##   Ocular.Motility  IOP.time R.Pachimetry L.Pachimetry R.Axial.Length
## 1                 0/01/1900          532          554          24.31
## 2                 0/01/1900          608          612          25.02
## 3                 0/01/1900          507          510          22.78
## 4                 0/01/1900          560          559          23.02
## 5                 0/01/1900          556          562          21.75
## 6                 0/01/1900          498          501          23.06
##   L.Axial.Length AC.Depth.R AC.Depth.L R.IOP.mmHg L.IOP.mmHg
## 1          24.10       3.09       3.03         14         14
## 2          25.21       3.38       3.92         16         15
## 3          22.80       3.40       3.45         26         22
## 4          22.98       3.00       2.85         14         14
## 5          22.04       2.60       2.53         22         21
## 6          23.17       2.94       3.04         18         20
##                                                 Anterior.segment Pterygium
## 1                              OD NAD, OS mild cortical cataract     FALSE
## 2 OD NAD, OS flecks on posterior capsule 'present since age 16.'      TRUE
## 3                                                                    FALSE
## 4                                     cortical cataract OU (R>L)     FALSE
## 5                                           OD pterygium, OS NAD      TRUE
## 6                                                         NAD OU     FALSE
##   Disc.size.RE Disc.size.LE CDR.RE CDR.LE FDT.MD_RE FDT.MD_LE FDT.PSD_RE
## 1            L            L    0.9    0.9      6.08      5.07      2.050
## 2            L            L    0.9    0.7      1.21     -0.72      1.600
## 3            L            L    0.7    0.7      2.33      2.28      2.653
## 4           NR           NR    0.2    0.2        NA        NA         NA
## 5            M            M    0.3    0.3        NA        NA         NA
## 6            L            L    0.6    0.6        NA        NA         NA
##   FDT.PSD_LE
## 1       2.09
## 2       2.78
## 3       2.92
## 4         NA
## 5         NA
## 6         NA
##                                                                       Comments
## 1                                                        Suspicious discs (OU)
## 2   Suspicious discs. Fields OK, but left 1 small defect.  Check in 1-2 years.
## 3 OHT but field defect on R not classic. Need a 6-12/12 review no axial length
## 4                                                                             
## 5                                                      No diabetic retinopathy
## 6                                                                             
##   R.L.handed R.L.eye.dominance Other.disease UV.questionnaire_NIES.code
## 1          R                                                          1
## 2          L                                                          2
## 3          L                                                          3
## 4          R                                                          4
## 5          R                                                          5
## 6          R                                                          6
##   UV.questionnaire_First.name Maiden.name UV.questionnaire_Surname
## 1                          NA          NA                       NA
## 2                          NA          NA                       NA
## 3                          NA          NA                       NA
## 4                          NA          NA                       NA
## 5                          NA          NA                       NA
## 6                          NA          NA                       NA
##   UV.questionnaire_Age Age.excel UV.questionnaire_Sex
## 1                   52        53                    M
## 2                   53        55                    M
## 3                   52        53                    F
## 4                   65        66                    M
## 5                   59        60                    F
## 6                   68        70                    F
##   Lived.in.Norfolk.Island      Location. From.age.in.years
## 1                      37    SA, WA, NSW                18
## 2                      46            PNG                NA
## 3                      21 Australia/ PNG                 0
## 4                      28                                0
## 5                      16                                0
## 6                      16            QLD                 0
##   Duration.outside.Norfolk.Island Hair.colour           Burns.Tans
## 1                              15       Black      burns then tans
## 2                               7       Brown never burns but tans
## 3                              31       Brown        burns not tan
## 4                              37       Brown        burns not tan
## 5                              43       Brown      burns then tans
## 6                              52       Brown        burns not tan
##     Sunburn.and.pain         Time.outside Outdoors.and.hats
## 1 more than 10 times greater than 3/4 day            always
## 2         2-10 times                 none   1/2 of the time
## 3 more than 10 times     less than 1/4day   1/2 of the time
## 4         2-10 times         cannot judge            always
## 5               once     less than 1/4day           usually
## 6         2-10 times     less than 1/4day           usually
##   Outdoors.and.sunglasses Winter..Indoors.and.Outdoors
## 1         1/2 of the time              mostly outdoors
## 2                  seldom                  1/2 and 1/2
## 3                 usually                  1/2 and 1/2
## 4                  seldom                  1/2 and 1/2
## 5                  always               mostly indoors
## 6                 usually               mostly indoors
##              At.school Feel.colder.than.people Arc.Welding
## 1 yes, sunglasses only                  seldom         Yes
## 2              neither         1/2 of the time         Yes
## 3              neither         1/2 of the time          No
## 4              neither                  seldom         Yes
## 5              neither                  seldom          No
## 6              neither                  seldom          No
##   If.yes..eye.protection If.yes..flash.burn.       Reason.for.sunglasses
## 1                     No                  No protection from eye disease
## 2                     No                  No                       glare
## 3                                                                  glare
## 4                     No                  No                       glare
## 5                                                                driving
## 6                                                                  glare
##   Do.not.wear.sunglasses
## 1           Inconvenient
## 2        Decrease vision
## 3                       
## 4          Not necessary
## 5                       
## 6  Prescription glasses"
  1. Subset relevant quantitative variables - based on the methods paper (https://www.ncbi.nlm.nih.gov/pubmed/21314255)
quant.variables<- c("UUID", "Current.height", "Current.weight", "R.K.value.H", "R.K.Value.H.Axis", "R.K.value.V", "R.K.value.V.Axis", "L.K.value.H",
                    "L.K.value.H.Axis", "L.K.value.V", "L.K.value.V.Axis", "R.Pachimetry", "L.Pachimetry", "R.Axial.Length",
                    "L.Axial.Length", "AC.Depth.R", "AC.Depth.L", "R.IOP.mmHg", "L.IOP.mmHg", "CDR.RE", "CDR.LE", "Age.excel")
quant.data.adults<- phen.data.adults[quant.variables]
  1. Check summary statistics
summary(quant.data.adults)
##       UUID        Current.height  Current.weight    R.K.value.H   
##  Min.   :110150   Min.   :147.0   Min.   : 31.00   Min.   :36.00  
##  1st Qu.:273430   1st Qu.:162.5   1st Qu.: 65.00   1st Qu.:42.00  
##  Median :320921   Median :170.0   Median : 75.00   Median :43.00  
##  Mean   :321681   Mean   :170.0   Mean   : 77.56   Mean   :42.97  
##  3rd Qu.:400378   3rd Qu.:178.0   3rd Qu.: 87.00   3rd Qu.:44.00  
##  Max.   :400804   Max.   :195.0   Max.   :140.00   Max.   :48.25  
##                   NA's   :130     NA's   :155      NA's   :16     
##  R.K.Value.H.Axis  R.K.value.V    R.K.value.V.Axis  L.K.value.H   
##  Min.   :  0.00   Min.   :37.00   Min.   :  1.00   Min.   :32.75  
##  1st Qu.: 42.00   1st Qu.:42.75   1st Qu.: 62.00   1st Qu.:42.00  
##  Median : 91.00   Median :43.75   Median : 89.00   Median :43.00  
##  Mean   : 93.74   Mean   :43.80   Mean   : 91.19   Mean   :42.97  
##  3rd Qu.:156.00   3rd Qu.:45.00   3rd Qu.:127.00   3rd Qu.:44.00  
##  Max.   :180.00   Max.   :55.00   Max.   :189.00   Max.   :47.25  
##  NA's   :16       NA's   :16      NA's   :16       NA's   :15     
##  L.K.value.H.Axis  L.K.value.V    L.K.value.V.Axis  R.Pachimetry  
##  Min.   :  0.0    Min.   :39.00   Min.   :  0.00   Min.   :428.0  
##  1st Qu.: 24.0    1st Qu.:42.81   1st Qu.: 65.25   1st Qu.:527.0  
##  Median : 88.5    Median :43.75   Median : 90.00   Median :546.0  
##  Mean   : 87.8    Mean   :43.87   Mean   : 92.22   Mean   :546.2  
##  3rd Qu.:147.0    3rd Qu.:44.75   3rd Qu.:119.75   3rd Qu.:570.0  
##  Max.   :180.0    Max.   :52.00   Max.   :180.00   Max.   :656.0  
##  NA's   :15       NA's   :15      NA's   :15       NA's   :14     
##   L.Pachimetry   R.Axial.Length  L.Axial.Length    AC.Depth.R   
##  Min.   :424.0   Min.   :20.95   Min.   :21.29   Min.   :2.090  
##  1st Qu.:526.0   1st Qu.:22.90   1st Qu.:22.87   1st Qu.:3.058  
##  Median :547.0   Median :23.50   Median :23.46   Median :3.310  
##  Mean   :546.3   Mean   :23.56   Mean   :23.54   Mean   :3.320  
##  3rd Qu.:568.0   3rd Qu.:24.09   3rd Qu.:24.11   3rd Qu.:3.550  
##  Max.   :658.0   Max.   :27.66   Max.   :34.43   Max.   :4.950  
##  NA's   :16      NA's   :13      NA's   :13      NA's   :13     
##    AC.Depth.L      R.IOP.mmHg      L.IOP.mmHg        CDR.RE      
##  Min.   :2.000   Min.   : 6.00   Min.   : 8.00   Min.   :0.0000  
##  1st Qu.:3.040   1st Qu.:14.00   1st Qu.:14.00   1st Qu.:0.3000  
##  Median :3.280   Median :16.00   Median :16.00   Median :0.4000  
##  Mean   :3.306   Mean   :15.88   Mean   :16.06   Mean   :0.4057  
##  3rd Qu.:3.553   3rd Qu.:18.00   3rd Qu.:18.00   3rd Qu.:0.5000  
##  Max.   :5.130   Max.   :28.00   Max.   :33.00   Max.   :0.9900  
##  NA's   :13      NA's   :2       NA's   :3       NA's   :19      
##      CDR.LE         Age.excel    
##  Min.   :0.0000   Min.   :18.00  
##  1st Qu.:0.3000   1st Qu.:44.00  
##  Median :0.4000   Median :55.00  
##  Mean   :0.4022   Mean   :54.69  
##  3rd Qu.:0.5000   3rd Qu.:66.00  
##  Max.   :0.9000   Max.   :91.00  
##  NA's   :17

Discussion

This subset of data contains the quantitative measurements that are attributed to ocular endophenotypes, with the exception of the unique identifiers (UUID), height, weight, and age. I suspect that there may be non-sense data in the keratometry readings, given the definition of the measurements. In particular, the axis measurements in degrees should maximise at 180degrees. There is a value of 189 under R K-value V axis, which may be a typographical error and should be replaced in the data set. However, I am not certain whether a measurement of 0degrees is possible or if they signify a missing value. This should be fact checked with a professional.