| tags: [ GCTA Genomic Data Heritability Linux ] categories: [Coding Experiments ]
Trying GCTA to perform heritability analysis
Introduction
Although heritability estimates have been obtained from gaston, I was unable to obtain signficance values for the estimates. This is important as it will help me prioritise the phenotypes and principal components that will be carried forward to the pedigree-based GWAS. Therefore, I will attempt to re-calculate the heritability estimates using GCTA (http://cnsgenomics.com/software/gcta/#Overview). GCTA is a command line program that uses PLINK input and output.
Methods and Results
1. Download Linux version of GCTA and unzip in /Data folder.
2 Check that GCTA has been installed properly
./gcta64
3. Make GRM from the NIES genomic data
./gcta64 --bfile merged_nies_geno_210818 --make-grm --out gcta_output/merged_nies_grm
4. Perform PCA on genomic data, including 2 PCs
./gcta64 --bfile merged_nies_geno_210818 --make-grm --pca 2 --out gcta_output/merged_nies_pca
From the PCA performed in gaston, only 2 principal components are required to correct for population stratification in this data set.
5. Try estimating heritability for one trait to see what the output is and if the estimate matches gaston results.
Ideally, the GCTA estimates should be very similar as it uses a linear mixed model for estimation as well. The values are not expected to be identical because different packages/softwares perform analyses slightly differently and will result in decimal point differences.
./gcta64 --reml --grm gcta_output/merged_nies_grm --pheno nies_pheno_020918.txt --mpheno 25 --reml-pred-rand -qcovar gcta_output/merged_nies_pca --out gcta_output/est_her_test
Note: the above function should allow me to generate the heritability estimates one-by-one. The “mpheno” function takes a PLAIN TEXT FILE with family and individual IDs in the first two columns respectively, and then data for the phenotypic variables in subsequent columns. The mpheno modifier takes a number from which you can specify the column for heritability estimation e.g. ‘25’ will take the data in the 27th column (i.e. 25th phenotype column = R.CDR)
The output of this exercise is as follows:
Source | Variance | SE |
---|---|---|
V(G) | 0.025569 | 0.006542 |
V(e) | 0.017285 | 0.004785 |
Vp | 0.042853 | 0.003667 |
V(G)/Vp | 0.596660 | 0.122792 |
LogL | 384.161 | |
logL0 | 375.013 | |
LRT | 18.296 | |
df | 1 | |
Pval | 9.4535e-06 |
The value relating to V(G)/Vp gives the heritability estimate for the trait (R CDR) and the corresponding p-value suggests that this value is statistically signficant. Interestingly, the estimate obtained for this trait is very similar to the estimate obtained using ‘gaston’ (in previous entry).
6. Repeat exercise for other traits
read.csv('C:/Users/Martha/Documents/Honours/Project/honours.project/Data/gcta_output/trait_heritability_comparison.csv')
## GCTA X X.1 X.2 X.3 gaston
## 1 Trait h2 LRT Pval NA trait
## 2 R K-val V 0.756885 28.886 3.84E-08 NA R.K.value.V
## 3 L Pachimetry 0.667679 23.898 5.08E-07 NA L.Pachimetry
## 4 R Pachimetry 0.600582 19.815 4.27E-06 NA R.CDR
## 5 R CDR 0.59666 18.296 9.45E-06 NA L.K.value.V
## 6 L K-val V 0.593822 16.975 1.89E-05 NA L.CDR
## 7 R Cyl 0.559755 17.911 1.16E-05 NA R.Pachimetry
## 8 L CDR 0.540132 14.5 7.01E-05 NA R.Cyl..pre.dilate.
## 9 R K-val H 0.518472 14.657 6.45E-05 NA R.K.value.H
## 10 Total UVAF 0.458766 10.262 6.79E-04 NA totaluvafmm
## 11 R Axial Length 0.37054 7.356 3.34E-03 NA L.K.value.H
## 12 L Axial Length 0.353147 6.614 5.06E-03 NA R.Axial.Length
## 13 R ACD 0.312516 6.96 4.17E-03 NA L.Axial.Length
## 14 L K-val H 0.2944 4.182 2.04E-02 NA R.AC.Depth
## 15 L ACD 0.293333 6.386 5.75E-03 NA L.AC.Depth
## 16 R IOP 0.265188 4.139 2.09E-02 NA R.IOP.mmHg
## 17 L IOP 0.216258 3.149 3.80E-02 NA L.IOP.mmHg
## 18 L Axis 0.105924 0.639 2.12E-01 NA
## 19 R Axis 0.067026 0.247 3.10E-01 NA
## 20 R LogMar 0 0 5.00E-01 NA
## 21 L LogMar 0 0 5.00E-01 NA
## 22 R Sph 0 0 5.00E-01 NA
## 23 L Sph 0 0 5.00E-01 NA
## 24 L Cyl 0 0 5.00E-01 NA
## 25 R K-val H Axis 0 0 5.00E-01 NA
## 26 R K-val V Axis 0 0 5.00E-01 NA
## 27 L K-val H Axis 0 0 5.00E-01 NA
## 28 L K-val V Axis 0 0 5.00E-01 NA
## X.4 X.5 X.6 X.7 X.8 X.9
## 1 h2 lrt pval NA NA NA
## 2 0.7600244 33.802544 3.05E-09 NA NA NA
## 3 0.6480675 22.505262 1.05E-06 NA NA NA
## 4 0.6318947 30.979183 1.30E-08 NA NA NA
## 5 0.611756 23.141242 7.53E-07 NA NA NA
## 6 0.5944413 28.727694 4.17E-08 NA NA NA
## 7 0.5835286 19.154887 6.03E-06 NA NA NA
## 8 0.5464705 16.842366 2.03E-05 NA NA NA
## 9 0.530101 18.546107 8.29E-06 NA NA NA
## 10 0.4347234 9.311987 1.14E-03 NA NA NA
## 11 0.3625959 10.350966 6.47E-04 NA NA NA
## 12 0.3503376 6.988205 4.10E-03 NA NA NA
## 13 0.3379609 6.653844 4.95E-03 NA NA NA
## 14 0.3068032 7.235551 3.57E-03 NA NA NA
## 15 0.2828547 6.287883 6.08E-03 NA NA NA
## 16 0.2500742 4.0985 2.15E-02 NA NA NA
## 17 0.2135849 3.612603 2.87E+00 NA NA NA
## 18 NA NA NA
## 19 NA NA NA
## 20 NA NA NA
## 21 NA NA NA
## 22 NA NA NA
## 23 NA NA NA
## 24 NA NA NA
## 25 NA NA NA
## 26 NA NA NA
## 27 NA NA NA
## 28 NA NA NA
The gaston results were from Miles. The same traits appear to be heritable (p <0.05) and estimates are very similar, some only differing by decimal point values.
The issue with my current method of estimating heritability is that I have to repeat the code for each trait and separate file sets are generated for each trait. I wrote a loop for the function, however, separate files are still generated for each trait.
for i in {1..27}
do
./gcta64 --reml --grm gcta_output/merged_nies_grm --pheno nies_pheno_020918.txt --mpheno $i --reml-pred-rand --qcovar gcta_output/merged_nies_pca.eigenvec --out gcta_output/$i.est
head gcta_output/$i.est.hsq
done
This loop generates the heritability estimates for each trait and generates separate files for each estimate. GCTA does not take a phenotypic file with headers, which is why the files are labelled with numbers instead of the trait.