Showing posts with label Hapmap. Show all posts
Showing posts with label Hapmap. Show all posts

Monday, May 16, 2011

Imputation using BEAGLE

Just some notes quick on imputation and some of the issues I've run into:
- If e.g. you only have 10-20 individuals, you can run into problems with allele order, e.g. "
- 1000 genomes guzzles memory, use e.g.

java -Xmx10000m -Djava.io.tmpdir=./ -jar /psych/genetics_data/bin/beagle.3.0.4/beagle.jar

- You need to check for alleles to flip...and flip them

plink --bfile file --flip flip.txt --make-bed --out flip.flipped

- Remove e.g. SNPs with duplicated IDs
- Remove e.g. SNPs where alleles are not observed:

gawk '(($3 == "-") || ($4 == "-")) { print $1 }' 1kg_markers.txt | grep '^rs' > weird.1kg

- If going from e.g. Affy5 to impute in 1KG, you need to update SNP coordinates to hg19 - obvious but easily forgotten:

plink ... --update-map mapfile_1kg.txt --recode-beagle ...

- I just ran my imputation in a loop, but not so easy if you have many individuals:

for(( i=1; i<23; i++ ))
do
bsub -q week -o swage3_imputed_1kg.beagle.chr${i}.lsf.log -e swage3_imputed_1kg.beagle.chr${i}.lsf.err -R "rusage[mem=10000]" -P swage3-beagle-chr{$i} \
java -Xmx10000m -Djava.io.tmpdir=./ -jar /psych/genetics_data/bin/beagle.3.0.4/beagle.jar \
unphased=swage3_imputed_1kg.chr-${i}.dat \
missing=0 \
log=swage3_imputed_1kg.chr-${i}.log \
markers=EUR.20100804.chr${i}.markers \
phased=EUR.20100804.chr${i}.bgl.phased
out=swage3_imputed_1kg.chr-${i}
done