| tags: [ Data Cleaning R ] categories: [Coding ]
Subsetting relevant phenotypic data
Introduction
The original dataset that was used for descriptive statistics in previous entries contained data for all 801 individuals. However, the ethics approval for this study only allows for adult data to be included in the study. All data for minor individuals must therefore be removed. Additionally, relevant quantitative variables must be selected from the bigger phenotypic dataset since not all quantitative measurements are endophenotypes.
Aim
Remove data for minor individuals and subset relevant quantitative measures.
Methods and Results
Upon inspection of the participants’ reported age on the questionnaire, there were some apparent discrepancies with their recorded DOB and some missing values. I therefore re-calculated age at the time of collection (2008) based on DOB, which has no missing values, using Microsoft Excel. I added the calculated ages in the phenotypic dataset under “Age.excel” and 1. loaded this dataset into R:
phen.data.age <- read.csv('C:/Users/Martha/Documents/Honours/Project/honours.project/Data/NIES_master_database-age.csv')
- Subset data for minor participants
phen.data.minors<-phen.data.age[phen.data.age$Age.excel<=17,]
head(phen.data.minors)
## UUID LAB_ID PID SID Ped_ID_.BK_no. NIES NIES_GWAS Miles_coreped
## 166 400167 NA NA ES_167 0 0
## 186 400187 NA 7000 ES_187 1 1
## 207 400208 NA 7010 ES_208 1 1
## 291 400292 NA NA ES_292 0 0
## 309 400310 NA 3808 ES_310 1 1
## 314 400315 NA 3830 ES_315 1 1
## NIES_DNA DNA_Box DNA_Postition Gender DOB Current.height
## 166 0 NA Female 4/04/1991 NA
## 186 1 4 B9 Female 19/09/1997 153
## 207 1 4 A3 Male 24/09/1997 NA
## 291 0 NA Male 22/03/1991 183
## 309 1 4 C6 Female 22/06/1994 158
## 314 1 4 C7 Female 19/12/1995 161
## Current.weight Smoker Glaucoma.medications Eye.Colour
## 166 NA FALSE Hazel / Light Brown
## 186 65 FALSE Brown
## 207 NA FALSE
## 291 80 FALSE
## 309 44 FALSE Blue
## 314 60 FALSE Green
## Logmar.VA.Right RVA.with.PH Logmar.VA.Left LVA.with.PH
## 166 -0.18 -0.18 -0.18 -0.18
## 186 -0.12 -0.12 -0.12 -0.12
## 207 0.00 0.00 0.00 0.00
## 291 0.00 0.00 -0.10 -0.10
## 309 -0.10 -0.10 -0.20 -0.20
## 314 0.00 0.00 0.02 0.02
## R.Sph..pre.dilate. R.Cyl..pre.dilate. R.Axis..pre.dilate.
## 166 0.50 0.00 0
## 186 0.75 -0.50 27
## 207 0.50 -0.25 125
## 291 0.00 0.00 0
## 309 0.25 -0.25 146
## 314 -0.50 -0.25 178
## L.Sph..pre.dilate. L.Cyl..pre.dilate. L.Axis..pre.dilate. IPDprecyclo
## 166 0.75 0.00 0 NA
## 186 -0.50 -0.25 46 NA
## 207 0.25 0.00 0 NA
## 291 0.50 -0.50 28 NA
## 309 0.00 0.00 0 NA
## 314 -0.75 -0.25 19 NA
## R.Sph R.Cyl R.Axis L.Sph L.Cyl L.Axis IPDpostcyclo Glasses R.Sph.gl
## 166 NA NA NA NA NA NA NA s gls NA
## 186 NA NA NA NA NA NA NA s gls NA
## 207 NA NA NA NA NA NA NA s gls NA
## 291 NA NA NA NA NA NA NA s gls NA
## 309 NA NA NA NA NA NA NA s gls NA
## 314 NA NA NA NA NA NA NA s gls NA
## R.Cyl.gl R.Axis.gl Add L.Sph.gl L.Cyl.gl L.Axis.gl R.K.value.H
## 166 NA NA NA NA NA NA 42.25
## 186 NA NA NA NA NA NA 43.00
## 207 NA NA NA NA NA NA 41.50
## 291 NA NA NA NA NA NA 40.25
## 309 NA NA NA NA NA NA 42.25
## 314 NA NA NA NA NA NA 44.25
## R.K.Value.H.Axis R.K.value.V R.K.value.V.Axis L.K.value.H
## 166 172 43.00 82 42.25
## 186 178 45.00 88 43.00
## 207 171 42.00 81 41.50
## 291 97 41.75 7 40.00
## 309 97 42.75 7 42.00
## 314 96 44.75 6 44.50
## L.K.value.H.Axis L.K.value.V L.K.value.V.Axis Cover.Test.D Lang.score
## 166 8 43.00 98 NA 550
## 186 3 44.75 93 NA 550
## 207 176 42.00 86 NA 550
## 291 87 41.25 177 NA 550
## 309 89 42.50 179 NA 550
## 314 103 44.75 13 NA 550
## Ocular.Motility IOP.time R.Pachimetry L.Pachimetry R.Axial.Length
## 166 0/01/1900 526 515 23.09
## 186 0/01/1900 550 554 22.82
## 207 0/01/1900 511 502 24.21
## 291 0/01/1900 507 519 24.69
## 309 0/01/1900 555 556 23.39
## 314 0/01/1900 559 572 23.34
## L.Axial.Length AC.Depth.R AC.Depth.L R.IOP.mmHg L.IOP.mmHg
## 166 23.10 3.38 3.35 17 15
## 186 22.82 3.51 3.49 13 14
## 207 24.23 3.27 3.80 14 15
## 291 24.73 3.43 3.18 17 15
## 309 23.52 3.52 3.56 19 17
## 314 23.27 3.76 3.80 19 21
## Anterior.segment Pterygium Disc.size.RE
## 166 NAD OU FALSE M
## 186 NAD OU FALSE M
## 207 OD corneal nerves prominent, OS NAD FALSE M
## 291 NAD OU FALSE M
## 309 NAD OU FALSE M
## 314 NAD OU FALSE NR
## Disc.size.LE CDR.RE CDR.LE FDT.MD_RE FDT.MD_LE FDT.PSD_RE FDT.PSD_LE
## 166 M 0.3 0.4 NA NA NA NA
## 186 M 0.3 0.3 NA NA NA NA
## 207 M 0.3 0.3 NA NA NA NA
## 291 M 0.3 0.3 NA NA NA NA
## 309 M 0.3 0.3 NA NA NA NA
## 314 NR 0.1 0.1 NA NA NA NA
## Comments R.L.handed R.L.eye.dominance Other.disease
## 166 Slight aniscoria L > R R
## 186 R
## 207 R
## 291 R
## 309 R
## 314 R
## UV.questionnaire_NIES.code UV.questionnaire_First.name Maiden.name
## 166 167 NA NA
## 186 187 NA NA
## 207 208 NA NA
## 291 292 NA NA
## 309 310 NA NA
## 314 315 NA NA
## UV.questionnaire_Surname UV.questionnaire_Age Age.excel
## 166 NA 16 17
## 186 NA 10 11
## 207 NA 10 11
## 291 NA 16 17
## 309 NA 13 14
## 314 NA 11 13
## UV.questionnaire_Sex Lived.in.Norfolk.Island Location.
## 166 F 16
## 186 F 10
## 207 M 10
## 291 M 1 Vanuatu
## 309 F 7 Cairns
## 314 F 11
## From.age.in.years Duration.outside.Norfolk.Island Hair.colour
## 166 NA NA Brown
## 186 NA NA Brown
## 207 NA NA Brown
## 291 0 15 Black
## 309 0 6 Brown
## 314 NA NA Brown
## Burns.Tans Sunburn.and.pain Time.outside
## 166 never burns but tans 2-10 times greater than 3/4 day
## 186 burns then tans 2-10 times 1/2 of the day
## 207 don't know never less than 1/4day
## 291 don't know never
## 309 never burns but tans 2-10 times less than 1/4day
## 314 burns then tans 2-10 times 1/2 of the day
## Outdoors.and.hats Outdoors.and.sunglasses Winter..Indoors.and.Outdoors
## 166 seldom never mostly indoors
## 186 usually seldom mostly outdoors
## 207 1/2 of the time seldom 1/2 and 1/2
## 291
## 309 1/2 of the time 1/2 of the time 1/2 and 1/2
## 314 1/2 of the time never 1/2 and 1/2
## At.school Feel.colder.than.people Arc.Welding
## 166 neither seldom No
## 186 yes, hat only seldom No
## 207 yes, hat only never No
## 291
## 309 yes, both seldom No
## 314 yes, hat only never No
## If.yes..eye.protection If.yes..flash.burn. Reason.for.sunglasses
## 166
## 186 fashion
## 207 protection from eye disease
## 291
## 309 fashion
## 314
## Do.not.wear.sunglasses
## 166
## 186 Inconvenient
## 207
## 291
## 309 Not fashionable
## 314 Uncomfortable"
- There were 28 minors from the study.
- Remove rows of data of minors
phen.data.adults<-phen.data.age[phen.data.age$Age.excel>17,]
head(phen.data.adults)
## UUID LAB_ID PID SID Ped_ID_.BK_no. NIES NIES_GWAS
## 1 219960 NA 19960 10015417 2438 ES_001 0
## 2 313180 6876 13180 10016420 104 ES_002 0
## 3 314911 6864 14911 10014320 6059 ES_003 0
## 4 111150 1115 NA ES_004 0
## 5 363161 6962 63161 10014308 NA ES_005 0
## 6 110460 1046 NA ES_006 0
## Miles_coreped NIES_DNA DNA_Box DNA_Postition Gender DOB
## 1 1 0 NA Male 6/01/1955
## 2 1 0 NA Male 21/11/1953
## 3 1 0 NA Female 17/04/1955
## 4 0 0 NA Male 2/10/1942
## 5 0 0 NA Female 4/09/1948
## 6 0 0 NA Female 21/10/1938
## Current.height Current.weight Smoker Glaucoma.medications
## 1 NA NA FALSE
## 2 170 78 FALSE
## 3 180 86 TRUE
## 4 183 NA TRUE
## 5 NA 69 TRUE
## 6 162 NA TRUE
## Eye.Colour Logmar.VA.Right RVA.with.PH Logmar.VA.Left
## 1 Brown 0.02 0.02 -0.04
## 2 Brown 0.10 0.10 0.16
## 3 Hazel / Light Brown 0.00 0.00 0.00
## 4 Blue 0.30 0.10 0.08
## 5 Brown 0.00 0.00 -0.10
## 6 Hazel / Light Brown 0.24 0.16 0.36
## LVA.with.PH R.Sph..pre.dilate. R.Cyl..pre.dilate. R.Axis..pre.dilate.
## 1 -0.04 0.25 0.00 0
## 2 0.10 0.00 -0.75 28
## 3 0.00 1.25 -1.25 148
## 4 0.08 1.25 -0.25 97
## 5 -0.10 1.25 -0.25 37
## 6 0.14 4.00 -0.50 46
## L.Sph..pre.dilate. L.Cyl..pre.dilate. L.Axis..pre.dilate. IPDprecyclo
## 1 0.25 -0.50 79 NA
## 2 -0.50 -0.25 164 NA
## 3 1.25 -0.25 24 NA
## 4 1.50 -0.50 81 NA
## 5 1.25 -0.75 164 NA
## 6 3.75 -0.75 180 NA
## R.Sph R.Cyl R.Axis L.Sph L.Cyl L.Axis IPDpostcyclo Glasses R.Sph.gl
## 1 NA NA NA NA NA NA NA s gls NA
## 2 NA NA NA NA NA NA NA s gls NA
## 3 NA NA NA NA NA NA NA c gls NA
## 4 NA NA NA NA NA NA NA s gls NA
## 5 NA NA NA NA NA NA NA c gls NA
## 6 NA NA NA NA NA NA NA s gls NA
## R.Cyl.gl R.Axis.gl Add L.Sph.gl L.Cyl.gl L.Axis.gl R.K.value.H
## 1 NA NA NA NA NA NA 42.00
## 2 NA NA NA NA NA NA 41.25
## 3 NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA 44.75
## 5 NA NA NA NA NA NA 44.75
## 6 NA NA NA NA NA NA 42.00
## R.K.Value.H.Axis R.K.value.V R.K.value.V.Axis L.K.value.H
## 1 2 43.00 92 42.50
## 2 7 42.25 97 41.50
## 3 NA NA NA NA
## 4 5 45.00 95 45.00
## 5 0 44.75 90 44.25
## 6 8 43.25 98 42.25
## L.K.value.H.Axis L.K.value.V L.K.value.V.Axis Cover.Test.D Lang.score
## 1 5 43.50 95 NA 550
## 2 168 42.00 78 NA 550
## 3 NA NA NA NA 550
## 4 60 45.25 150 NA NA
## 5 178 44.75 88 NA 550
## 6 177 43.25 87 NA 550
## Ocular.Motility IOP.time R.Pachimetry L.Pachimetry R.Axial.Length
## 1 0/01/1900 532 554 24.31
## 2 0/01/1900 608 612 25.02
## 3 0/01/1900 507 510 22.78
## 4 0/01/1900 560 559 23.02
## 5 0/01/1900 556 562 21.75
## 6 0/01/1900 498 501 23.06
## L.Axial.Length AC.Depth.R AC.Depth.L R.IOP.mmHg L.IOP.mmHg
## 1 24.10 3.09 3.03 14 14
## 2 25.21 3.38 3.92 16 15
## 3 22.80 3.40 3.45 26 22
## 4 22.98 3.00 2.85 14 14
## 5 22.04 2.60 2.53 22 21
## 6 23.17 2.94 3.04 18 20
## Anterior.segment Pterygium
## 1 OD NAD, OS mild cortical cataract FALSE
## 2 OD NAD, OS flecks on posterior capsule 'present since age 16.' TRUE
## 3 FALSE
## 4 cortical cataract OU (R>L) FALSE
## 5 OD pterygium, OS NAD TRUE
## 6 NAD OU FALSE
## Disc.size.RE Disc.size.LE CDR.RE CDR.LE FDT.MD_RE FDT.MD_LE FDT.PSD_RE
## 1 L L 0.9 0.9 6.08 5.07 2.050
## 2 L L 0.9 0.7 1.21 -0.72 1.600
## 3 L L 0.7 0.7 2.33 2.28 2.653
## 4 NR NR 0.2 0.2 NA NA NA
## 5 M M 0.3 0.3 NA NA NA
## 6 L L 0.6 0.6 NA NA NA
## FDT.PSD_LE
## 1 2.09
## 2 2.78
## 3 2.92
## 4 NA
## 5 NA
## 6 NA
## Comments
## 1 Suspicious discs (OU)
## 2 Suspicious discs. Fields OK, but left 1 small defect. Check in 1-2 years.
## 3 OHT but field defect on R not classic. Need a 6-12/12 review no axial length
## 4
## 5 No diabetic retinopathy
## 6
## R.L.handed R.L.eye.dominance Other.disease UV.questionnaire_NIES.code
## 1 R 1
## 2 L 2
## 3 L 3
## 4 R 4
## 5 R 5
## 6 R 6
## UV.questionnaire_First.name Maiden.name UV.questionnaire_Surname
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## UV.questionnaire_Age Age.excel UV.questionnaire_Sex
## 1 52 53 M
## 2 53 55 M
## 3 52 53 F
## 4 65 66 M
## 5 59 60 F
## 6 68 70 F
## Lived.in.Norfolk.Island Location. From.age.in.years
## 1 37 SA, WA, NSW 18
## 2 46 PNG NA
## 3 21 Australia/ PNG 0
## 4 28 0
## 5 16 0
## 6 16 QLD 0
## Duration.outside.Norfolk.Island Hair.colour Burns.Tans
## 1 15 Black burns then tans
## 2 7 Brown never burns but tans
## 3 31 Brown burns not tan
## 4 37 Brown burns not tan
## 5 43 Brown burns then tans
## 6 52 Brown burns not tan
## Sunburn.and.pain Time.outside Outdoors.and.hats
## 1 more than 10 times greater than 3/4 day always
## 2 2-10 times none 1/2 of the time
## 3 more than 10 times less than 1/4day 1/2 of the time
## 4 2-10 times cannot judge always
## 5 once less than 1/4day usually
## 6 2-10 times less than 1/4day usually
## Outdoors.and.sunglasses Winter..Indoors.and.Outdoors
## 1 1/2 of the time mostly outdoors
## 2 seldom 1/2 and 1/2
## 3 usually 1/2 and 1/2
## 4 seldom 1/2 and 1/2
## 5 always mostly indoors
## 6 usually mostly indoors
## At.school Feel.colder.than.people Arc.Welding
## 1 yes, sunglasses only seldom Yes
## 2 neither 1/2 of the time Yes
## 3 neither 1/2 of the time No
## 4 neither seldom Yes
## 5 neither seldom No
## 6 neither seldom No
## If.yes..eye.protection If.yes..flash.burn. Reason.for.sunglasses
## 1 No No protection from eye disease
## 2 No No glare
## 3 glare
## 4 No No glare
## 5 driving
## 6 glare
## Do.not.wear.sunglasses
## 1 Inconvenient
## 2 Decrease vision
## 3
## 4 Not necessary
## 5
## 6 Prescription glasses"
- Subset relevant quantitative variables - based on the methods paper (https://www.ncbi.nlm.nih.gov/pubmed/21314255)
quant.variables<- c("UUID", "Current.height", "Current.weight", "R.K.value.H", "R.K.Value.H.Axis", "R.K.value.V", "R.K.value.V.Axis", "L.K.value.H",
"L.K.value.H.Axis", "L.K.value.V", "L.K.value.V.Axis", "R.Pachimetry", "L.Pachimetry", "R.Axial.Length",
"L.Axial.Length", "AC.Depth.R", "AC.Depth.L", "R.IOP.mmHg", "L.IOP.mmHg", "CDR.RE", "CDR.LE", "Age.excel")
quant.data.adults<- phen.data.adults[quant.variables]
- Check summary statistics
summary(quant.data.adults)
## UUID Current.height Current.weight R.K.value.H
## Min. :110150 Min. :147.0 Min. : 31.00 Min. :36.00
## 1st Qu.:273430 1st Qu.:162.5 1st Qu.: 65.00 1st Qu.:42.00
## Median :320921 Median :170.0 Median : 75.00 Median :43.00
## Mean :321681 Mean :170.0 Mean : 77.56 Mean :42.97
## 3rd Qu.:400378 3rd Qu.:178.0 3rd Qu.: 87.00 3rd Qu.:44.00
## Max. :400804 Max. :195.0 Max. :140.00 Max. :48.25
## NA's :130 NA's :155 NA's :16
## R.K.Value.H.Axis R.K.value.V R.K.value.V.Axis L.K.value.H
## Min. : 0.00 Min. :37.00 Min. : 1.00 Min. :32.75
## 1st Qu.: 42.00 1st Qu.:42.75 1st Qu.: 62.00 1st Qu.:42.00
## Median : 91.00 Median :43.75 Median : 89.00 Median :43.00
## Mean : 93.74 Mean :43.80 Mean : 91.19 Mean :42.97
## 3rd Qu.:156.00 3rd Qu.:45.00 3rd Qu.:127.00 3rd Qu.:44.00
## Max. :180.00 Max. :55.00 Max. :189.00 Max. :47.25
## NA's :16 NA's :16 NA's :16 NA's :15
## L.K.value.H.Axis L.K.value.V L.K.value.V.Axis R.Pachimetry
## Min. : 0.0 Min. :39.00 Min. : 0.00 Min. :428.0
## 1st Qu.: 24.0 1st Qu.:42.81 1st Qu.: 65.25 1st Qu.:527.0
## Median : 88.5 Median :43.75 Median : 90.00 Median :546.0
## Mean : 87.8 Mean :43.87 Mean : 92.22 Mean :546.2
## 3rd Qu.:147.0 3rd Qu.:44.75 3rd Qu.:119.75 3rd Qu.:570.0
## Max. :180.0 Max. :52.00 Max. :180.00 Max. :656.0
## NA's :15 NA's :15 NA's :15 NA's :14
## L.Pachimetry R.Axial.Length L.Axial.Length AC.Depth.R
## Min. :424.0 Min. :20.95 Min. :21.29 Min. :2.090
## 1st Qu.:526.0 1st Qu.:22.90 1st Qu.:22.87 1st Qu.:3.058
## Median :547.0 Median :23.50 Median :23.46 Median :3.310
## Mean :546.3 Mean :23.56 Mean :23.54 Mean :3.320
## 3rd Qu.:568.0 3rd Qu.:24.09 3rd Qu.:24.11 3rd Qu.:3.550
## Max. :658.0 Max. :27.66 Max. :34.43 Max. :4.950
## NA's :16 NA's :13 NA's :13 NA's :13
## AC.Depth.L R.IOP.mmHg L.IOP.mmHg CDR.RE
## Min. :2.000 Min. : 6.00 Min. : 8.00 Min. :0.0000
## 1st Qu.:3.040 1st Qu.:14.00 1st Qu.:14.00 1st Qu.:0.3000
## Median :3.280 Median :16.00 Median :16.00 Median :0.4000
## Mean :3.306 Mean :15.88 Mean :16.06 Mean :0.4057
## 3rd Qu.:3.553 3rd Qu.:18.00 3rd Qu.:18.00 3rd Qu.:0.5000
## Max. :5.130 Max. :28.00 Max. :33.00 Max. :0.9900
## NA's :13 NA's :2 NA's :3 NA's :19
## CDR.LE Age.excel
## Min. :0.0000 Min. :18.00
## 1st Qu.:0.3000 1st Qu.:44.00
## Median :0.4000 Median :55.00
## Mean :0.4022 Mean :54.69
## 3rd Qu.:0.5000 3rd Qu.:66.00
## Max. :0.9000 Max. :91.00
## NA's :17
Discussion
This subset of data contains the quantitative measurements that are attributed to ocular endophenotypes, with the exception of the unique identifiers (UUID), height, weight, and age. I suspect that there may be non-sense data in the keratometry readings, given the definition of the measurements. In particular, the axis measurements in degrees should maximise at 180degrees. There is a value of 189 under R K-value V axis, which may be a typographical error and should be replaced in the data set. However, I am not certain whether a measurement of 0degrees is possible or if they signify a missing value. This should be fact checked with a professional.