| tags: [ Data Cleaning Genomic Data Heritability PCA ] categories: [Coding Experiments ]

Using variants unfiltered for MAF for PCA and heritability analysis

Introduction

Methods and Results

1. Run PCA with 20 PCs on unfiltered MAF data set

plink1.9 --bfile merged_nies/merged_nies_geno_210818 --pca 20 var-wts --out plink_output/nies_geno_pca261118

Note:

  • the merged_nies_geno_210818 contains the data set after HWE and missingness filters (contains 3.9 million variants).

  • 20 PCs included. From the previous entry, it seemed that including 20PCs produced consistently higher heritability estimates than including 1 - 5 PC.

2. Generate PCA plots

Load .eigenvec file

nies_geno_eigenvec261118 <- read.table('C:/Users/Martha/Documents/Honours/Project/honours.project/Data/plink_output/nies_geno_pca261118.eigenvec', header = F)

Plot PC1 Vs PC2

plot(nies_geno_eigenvec261118$V3, nies_geno_eigenvec261118$V4, xlab = "PC1", ylab = "PC2", main = "PC1 vs PC2 eigenvectors") 

3. Make GRM

./gcta64 --bfile merged_nies/merged_nies_geno_210818 --make-grm --out gcta_output/merged_nies_grm31218

4. Perform PCA with 20 PCs

./gcta64 --bfile merged_nies/merged_nies_geno_210818 --pca 20 --out gcta_output/merged_nies_pca31218
merged_nies_eigenvec31218 <- read.table('C:/Users/Martha/Documents/Honours/Project/honours.project/Data/gcta_output/merged_nies_pca31218.eigenvec', header = F)

write.csv(merged_nies_eigenvec31218, 'C:/Users/Martha/Documents/Honours/Project/honours.project/Data/gcta_output/merged_nies_eigenvec31218.csv')

5. Estimate heritability of individual traits

for i in {1..27}
do ./gcta64 --reml --grm gcta_output/merged_nies_grm31218 --pheno nies_pheno_020918.txt --mpheno $i --reml-pred-rand --covar nies_sex_covar.txt --qcovar nies_cont_covar31218.txt --out gcta_output/heritability_031218/$i.est
done

6. Estimate heritability of principal components

for i in {1..9}
do ./gcta64 --reml --grm gcta_output/merged_nies_grm31218 --pheno pheno_pca_coord.txt --mpheno $i --reml-pred-rand --covar nies_sex_covar.txt --qcovar nies_cont_covar31218.txt --out gcta_output/heritability_031218/$i.pc.est
done

7. View heritability estimates

heritability_031218 <- read.csv('C:/Users/Martha/Documents/Honours/Project/honours.project/Data/gcta_output/heritability_031218/new_heritability31218.csv', header = F)

heritability_031218
##                     V1    V2       V3 V4 V5    V6       V7
## 1                Trait    h2     pval NA PC    h2     pval
## 2          R.K.value.V 0.798 2.80E-09 NA  1 0.663 8.64E-07
## 3          totaluvafmm 0.687 1.47E-06 NA  4  0.49 8.80E-04
## 4                R.CDR 0.686 1.71E-06 NA  2 0.383 1.73E-02
## 5         L.Pachimetry 0.685 1.54E-05 NA  7  0.31 1.80E-02
## 6   R.Cyl..pre.dilate. 0.659 7.76E-08 NA  6 0.116 2.36E-01
## 7          L.K.value.V 0.657 4.03E-06 NA  8 0.084 2.72E-01
## 8         R.Pachimetry 0.639 3.79E-05 NA  3     0 5.00E-01
## 9                L.CDR 0.606 1.12E-04 NA  5     0 5.00E-01
## 10         R.K.value.H 0.581 1.99E-05 NA  9     0 5.00E-01
## 11         L.K.value.H 0.435 1.16E-03 NA                  
## 12      L.Axial.Length 0.409 4.05E-03 NA                  
## 13      R.Axial.Length 0.375 9.49E-03 NA                  
## 14          R.IOP.mmHg 0.302 3.38E-02 NA                  
## 15          L.AC.Depth  0.17 1.46E-01 NA                  
## 16 L.Axis..pre.dilate. 0.157 1.84E-01 NA                  
## 17          L.IOP.mmHg 0.157 1.76E-01 NA                  
## 18          R.AC.Depth 0.122 2.39E-01 NA                  
## 19 R.Axis..pre.dilate. 0.092 3.05E-01 NA                  
## 20    R.K.Value.H.Axis 0.083 3.37E-01 NA                  
## 21  R.Sph..pre.dilate. 0.025 4.47E-01 NA                  
## 22         R.Logmar.VA     0 5.00E-01 NA                  
## 23         L.Logmar.VA     0 5.00E-01 NA                  
## 24  L.Sph..pre.dilate.     0 5.00E-01 NA                  
## 25  L.Cyl..pre.dilate.     0 5.00E-01 NA                  
## 26    R.K.value.V.Axis     0 5.00E-01 NA                  
## 27    L.K.value.H.Axis     0 5.00E-01 NA                  
## 28    L.K.value.V.Axis     0 5.00E-01 NA

These heritability estimates are again different from the previous results. The differences are most likely due to the inclusion and exclusion of rare and low frequency variants.

American Journal of Human Genetics - for publishing paper PLOS Genetics