| tags: [ R Imputation PCA ] categories: [Coding Experiments ]

Multiple PCA Imputation of Phenotypic Data

Introduction

Multiple imputation using PCA is an effective method for imputing an incomplete data set with missing at random (MAR) data. It takes into account the variability of the data and the uncertainty of the predicted values. By doing so, the resulting imputed data set has integrity and is filled with values that will be valuable in subsequent statistical analyses. This is unlike other standard methods of imputation, like using mean or regression imputation, which is single dimensional and may skew results of further analyses.

https://link.springer.com/article/10.1007/s11634-011-0086-7

Methods and Results

1. load data

phen.data.age <- read.csv('C:/Users/Martha/Documents/Honours/Project/honours.project/Data/NIES_master_database-age.csv')

2. Exclude data from minors (<18yrs)

phen.data.adults<-phen.data.age[phen.data.age$Age.excel>17,]

3. Subset quantitative data only from relevant phenotypes

quant.variables<- c("R.K.value.H", "R.K.Value.H.Axis", "R.K.value.V", "R.K.value.V.Axis", "L.K.value.H",
                   "L.K.value.H.Axis", "L.K.value.V", "L.K.value.V.Axis", "R.Pachimetry", "L.Pachimetry", "R.Axial.Length", "L.Axial.Length", "AC.Depth.R", "AC.Depth.L", "R.IOP.mmHg", "L.IOP.mmHg", "CDR.RE", "CDR.LE")
quant.data.adults<- phen.data.adults[quant.variables]

head(quant.data.adults)
##   R.K.value.H R.K.Value.H.Axis R.K.value.V R.K.value.V.Axis L.K.value.H
## 1       42.00                2       43.00               92       42.50
## 2       41.25                7       42.25               97       41.50
## 3          NA               NA          NA               NA          NA
## 4       44.75                5       45.00               95       45.00
## 5       44.75                0       44.75               90       44.25
## 6       42.00                8       43.25               98       42.25
##   L.K.value.H.Axis L.K.value.V L.K.value.V.Axis R.Pachimetry L.Pachimetry
## 1                5       43.50               95          532          554
## 2              168       42.00               78          608          612
## 3               NA          NA               NA          507          510
## 4               60       45.25              150          560          559
## 5              178       44.75               88          556          562
## 6              177       43.25               87          498          501
##   R.Axial.Length L.Axial.Length AC.Depth.R AC.Depth.L R.IOP.mmHg
## 1          24.31          24.10       3.09       3.03         14
## 2          25.02          25.21       3.38       3.92         16
## 3          22.78          22.80       3.40       3.45         26
## 4          23.02          22.98       3.00       2.85         14
## 5          21.75          22.04       2.60       2.53         22
## 6          23.06          23.17       2.94       3.04         18
##   L.IOP.mmHg CDR.RE CDR.LE
## 1         14    0.9    0.9
## 2         15    0.9    0.7
## 3         22    0.7    0.7
## 4         14    0.2    0.2
## 5         21    0.3    0.3
## 6         20    0.6    0.6

4.Duplicate data set

ocular_data <-quant.data.adults

5. Convert to double matrix to ensure that all data is ‘numeric’

ocular_data <- as.matrix(ocular_data)

6. Perform imputation

require(missMDA)
## Loading required package: missMDA
require(FactoMineR)
## Loading required package: FactoMineR
nbdim = estim_ncpPCA(ocular_data, method = 'EM', method.cv="Kfold")
## 
  |                                                                       
  |                                                                 |   0%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |=                                                                |   2%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   6%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=====                                                            |   8%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |========                                                         |  12%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=========                                                        |  13%
  |                                                                       
  |=========                                                        |  14%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |===========                                                      |  16%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=============                                                    |  20%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |================                                                 |  25%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |====================                                             |  31%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=====================                                            |  32%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |======================                                           |  34%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |========================                                         |  36%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |========================                                         |  37%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=========================                                        |  38%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |==========================                                       |  39%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |============================                                     |  42%
  |                                                                       
  |============================                                     |  43%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |==============================                                   |  45%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |===============================                                  |  47%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |================================                                 |  48%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |================================                                 |  49%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |=================================                                |  52%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |===================================                              |  55%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |====================================                             |  56%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=====================================                            |  57%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=====================================                            |  58%
  |                                                                       
  |======================================                           |  59%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=======================================                          |  60%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=======================================                          |  61%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |========================================                         |  62%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |=========================================                        |  64%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |==========================================                       |  65%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |============================================                     |  68%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |=============================================                    |  70%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |==============================================                   |  71%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |===============================================                  |  73%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |=================================================                |  75%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |==================================================               |  77%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  83%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |======================================================           |  84%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |========================================================         |  87%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |=============================================================    |  94%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |==============================================================   |  95%
## Warning in impute(X, ncp = ncp, scale = scale, method = method, threshold =
## threshold, : Stopped after 1000 iterations
## 
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================| 100%
nbdim
## $ncp
## [1] 3
## 
## $criterion
##        0        1        2        3        4        5 
## 552952.3 554974.3 551044.5 531838.7 544813.3 631030.9
res.comp = MIPCA(ocular_data, ncp = nbdim$ncp, nboot = 1000)

res.comp$ncp
## NULL

7. Plot individuals to view uncertainty of predicted values

png('indiv_supp_plot.png', width = 3200, height = 3200, res = 150)
plot.MIPCA(res.comp, choice = "ind.supp") 
dev.off()
## png 
##   2

8. Plot variables to view uncertainty of predicted values

png('var_factor_plot.png', width = 3200, height = 3200, res = 150)
plot.MIPCA(res.comp, choice = "var") 
dev.off()
## png 
##   2

Discussion

Phenotypic data was successfully imputed. The individual supplementary projection and variable factor map plots show the uncertainty of the imputation for each individual and each variable. The plots suggest that the imputation was successful and there are relatively small uncertainties. The next step would be to perform summary and distribution statistics on the imputed data set in order to make an informed decision about whether to transform data or not before proceeding to perform a PCA.

This exercise will have to repeated to include more variables. I originally excluded the visual acuity (logMAR) and prism test (cyl and axis) variables as they were not included in the original analysis from the NI study and are not technically endophenotypes (?). However, after further discussions, we have decided that there may be merit to adding the values into my analysis. Therefore, I will be including these variables and repeating this imputation step.

Note: this multiple PCA imputation identified three components