| tags: [ R Imputation PCA ] categories: [Coding Experiments ]
Multiple PCA Imputation of Phenotypic Data
Introduction
Multiple imputation using PCA is an effective method for imputing an incomplete data set with missing at random (MAR) data. It takes into account the variability of the data and the uncertainty of the predicted values. By doing so, the resulting imputed data set has integrity and is filled with values that will be valuable in subsequent statistical analyses. This is unlike other standard methods of imputation, like using mean or regression imputation, which is single dimensional and may skew results of further analyses.
Methods and Results
1. load data
phen.data.age <- read.csv('C:/Users/Martha/Documents/Honours/Project/honours.project/Data/NIES_master_database-age.csv')
2. Exclude data from minors (<18yrs)
phen.data.adults<-phen.data.age[phen.data.age$Age.excel>17,]
3. Subset quantitative data only from relevant phenotypes
quant.variables<- c("R.K.value.H", "R.K.Value.H.Axis", "R.K.value.V", "R.K.value.V.Axis", "L.K.value.H",
"L.K.value.H.Axis", "L.K.value.V", "L.K.value.V.Axis", "R.Pachimetry", "L.Pachimetry", "R.Axial.Length", "L.Axial.Length", "AC.Depth.R", "AC.Depth.L", "R.IOP.mmHg", "L.IOP.mmHg", "CDR.RE", "CDR.LE")
quant.data.adults<- phen.data.adults[quant.variables]
head(quant.data.adults)
## R.K.value.H R.K.Value.H.Axis R.K.value.V R.K.value.V.Axis L.K.value.H
## 1 42.00 2 43.00 92 42.50
## 2 41.25 7 42.25 97 41.50
## 3 NA NA NA NA NA
## 4 44.75 5 45.00 95 45.00
## 5 44.75 0 44.75 90 44.25
## 6 42.00 8 43.25 98 42.25
## L.K.value.H.Axis L.K.value.V L.K.value.V.Axis R.Pachimetry L.Pachimetry
## 1 5 43.50 95 532 554
## 2 168 42.00 78 608 612
## 3 NA NA NA 507 510
## 4 60 45.25 150 560 559
## 5 178 44.75 88 556 562
## 6 177 43.25 87 498 501
## R.Axial.Length L.Axial.Length AC.Depth.R AC.Depth.L R.IOP.mmHg
## 1 24.31 24.10 3.09 3.03 14
## 2 25.02 25.21 3.38 3.92 16
## 3 22.78 22.80 3.40 3.45 26
## 4 23.02 22.98 3.00 2.85 14
## 5 21.75 22.04 2.60 2.53 22
## 6 23.06 23.17 2.94 3.04 18
## L.IOP.mmHg CDR.RE CDR.LE
## 1 14 0.9 0.9
## 2 15 0.9 0.7
## 3 22 0.7 0.7
## 4 14 0.2 0.2
## 5 21 0.3 0.3
## 6 20 0.6 0.6
4.Duplicate data set
ocular_data <-quant.data.adults
5. Convert to double matrix to ensure that all data is ‘numeric’
ocular_data <- as.matrix(ocular_data)
6. Perform imputation
require(missMDA)
## Loading required package: missMDA
require(FactoMineR)
## Loading required package: FactoMineR
nbdim = estim_ncpPCA(ocular_data, method = 'EM', method.cv="Kfold")
##
|
| | 0%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|= | 1%
|
|= | 2%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 6%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|===== | 7%
|
|===== | 8%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|====== | 9%
|
|======= | 10%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|======= | 11%
|
|======== | 12%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|========= | 13%
|
|========= | 14%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|========== | 15%
|
|=========== | 16%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|=========== | 17%
|
|============ | 18%
|
|============ | 19%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|============= | 20%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|============== | 21%
|
|============== | 22%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|=============== | 23%
|
|================ | 24%
|
|================ | 25%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|================= | 26%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 29%
|
|==================== | 30%
|
|==================== | 31%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|===================== | 32%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|====================== | 33%
|
|====================== | 34%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|======================= | 35%
|
|======================== | 36%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|======================== | 37%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|========================= | 38%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|========================== | 39%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|========================== | 40%
|
|=========================== | 41%
|
|============================ | 42%
|
|============================ | 43%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|============================= | 44%
|
|============================== | 45%
|
|============================== | 46%
|
|=============================== | 47%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|================================ | 48%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|================================ | 49%
|
|================================= | 51%
|
|================================= | 52%
|
|================================== | 53%
|
|=================================== | 54%
|
|=================================== | 55%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|==================================== | 56%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|===================================== | 57%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|===================================== | 58%
|
|====================================== | 59%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|======================================= | 60%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|======================================= | 61%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|======================================== | 62%
|
|========================================= | 63%
|
|========================================= | 64%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 68%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|============================================= | 69%
|
|============================================= | 70%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|============================================== | 71%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|=============================================== | 72%
|
|=============================================== | 73%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|================================================ | 74%
|
|================================================= | 75%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|================================================= | 76%
|
|================================================== | 77%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 80%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 83%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|====================================================== | 84%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|======================================================= | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 91%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 94%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|============================================================== | 95%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
##
|
|============================================================== | 96%
|
|=============================================================== | 97%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 100%
nbdim
## $ncp
## [1] 3
##
## $criterion
## 0 1 2 3 4 5
## 552952.3 554974.3 551044.5 531838.7 544813.3 631030.9
res.comp = MIPCA(ocular_data, ncp = nbdim$ncp, nboot = 1000)
res.comp$ncp
## NULL
7. Plot individuals to view uncertainty of predicted values
png('indiv_supp_plot.png', width = 3200, height = 3200, res = 150)
plot.MIPCA(res.comp, choice = "ind.supp")
dev.off()
## png
## 2
8. Plot variables to view uncertainty of predicted values
png('var_factor_plot.png', width = 3200, height = 3200, res = 150)
plot.MIPCA(res.comp, choice = "var")
dev.off()
## png
## 2
Discussion
Phenotypic data was successfully imputed. The individual supplementary projection and variable factor map plots show the uncertainty of the imputation for each individual and each variable. The plots suggest that the imputation was successful and there are relatively small uncertainties. The next step would be to perform summary and distribution statistics on the imputed data set in order to make an informed decision about whether to transform data or not before proceeding to perform a PCA.
This exercise will have to repeated to include more variables. I originally excluded the visual acuity (logMAR) and prism test (cyl and axis) variables as they were not included in the original analysis from the NI study and are not technically endophenotypes (?). However, after further discussions, we have decided that there may be merit to adding the values into my analysis. Therefore, I will be including these variables and repeating this imputation step.
Note: this multiple PCA imputation identified three components