Monday, May 16, 2011

Imputation using BEAGLE

Just some notes quick on imputation and some of the issues I've run into:
- If e.g. you only have 10-20 individuals, you can run into problems with allele order, e.g. "
- 1000 genomes guzzles memory, use e.g.

java -Xmx10000m -jar /psych/genetics_data/bin/beagle.3.0.4/beagle.jar

- You need to check for alleles to flip...and flip them

plink --bfile file --flip flip.txt --make-bed --out flip.flipped

- Remove e.g. SNPs with duplicated IDs
- Remove e.g. SNPs where alleles are not observed:

gawk '(($3 == "-") || ($4 == "-")) { print $1 }' 1kg_markers.txt | grep '^rs' > weird.1kg

- If going from e.g. Affy5 to impute in 1KG, you need to update SNP coordinates to hg19 - obvious but easily forgotten:

plink ... --update-map mapfile_1kg.txt --recode-beagle ...

- I just ran my imputation in a loop, but not so easy if you have many individuals:

for(( i=1; i<23; i++ ))
bsub -q week -o swage3_imputed_1kg.beagle.chr${i}.lsf.log -e swage3_imputed_1kg.beagle.chr${i}.lsf.err -R "rusage[mem=10000]" -P swage3-beagle-chr{$i} \
java -Xmx10000m -jar /psych/genetics_data/bin/beagle.3.0.4/beagle.jar \
unphased=swage3_imputed_1kg.chr-${i}.dat \
missing=0 \
log=swage3_imputed_1kg.chr-${i}.log \
markers=EUR.20100804.chr${i}.markers \