Introduction

After cleaning the phenotypic data and performing PCA, the first part of Aim 1 is completed. The second part will involve cleaing and performing PCA on the genomic data. The genomic data is currently in two separate files as they were generated using two different methods (SNP-array and WGS) for different studies. 506 individuals have SNP-array data, and 108 have WGS data, although not all of these individuals were part of the NIES. PLINK will be used to handle the genomic data. PLINK is an open-source program for analysing genomic data and is a command line tool.

Methods

1. Download PLINK1.9

2. Install and Use the Linux Bash Shell

Control panel > Programs > Windows Subsystem for Linux > restart computer Download Ubuntu from Windows Store > Launch > set up username and password

2. Launch PLINK

In Ubuntu > plink1.9 > output will show details about the program and is a way of checking that the program launched successfully.

3. Test basic statistical functions on PLINK

plink1.9 –bfile NImerged.impute2.chrAllX.2014-Oct-02 –freq –out plink_output/NI_gen_data_test

This function assesses allele frequencies. The output indicates that there were 9,343,280 variants from the WGS data, and the genotyping rate is 99.79%. The output lists all the variants and organises them by chromosome. There are 6 columns: chromosome number (CHR), SNP ID (SNP), allele 1 (A1), allele 2 (A2), allele frequency (MAF), number of allele observations (NCHROBS).

Getting started with PLINK