Useful scripts and code for genetics research: May 2011

Just some notes quick on imputation and some of the issues I've run into:

- If e.g. you only have 10-20 individuals, you can run into problems with allele order, e.g. "

- 1000 genomes guzzles memory, use e.g.

java -Xmx10000m -Djava.io.tmpdir=./ -jar /psych/genetics_data/bin/beagle.3.0.4/beagle.jar

- You need to check for alleles to flip...and flip them

plink --bfile file --flip flip.txt --make-bed --out flip.flipped

- Remove e.g. SNPs with duplicated IDs

- Remove e.g. SNPs where alleles are not observed:

gawk '(($3 == "-") || ($4 == "-")) { print $1 }' 1kg_markers.txt | grep '^rs' > weird.1kg

- If going from e.g. Affy5 to impute in 1KG, you need to update SNP coordinates to hg19 - obvious but easily forgotten:

plink ... --update-map mapfile_1kg.txt --recode-beagle ...

- I just ran my imputation in a loop, but not so easy if you have many individuals:

for(( i=1; i<23; i++ ))

bsub -q week -o swage3_imputed_1kg.beagle.chr${i}.lsf.log -e swage3_imputed_1kg.beagle.chr${i}.lsf.err -R "rusage[mem=10000]" -P swage3-beagle-chr{$i} \

java -Xmx10000m -Djava.io.tmpdir=./ -jar /psych/genetics_data/bin/beagle.3.0.4/beagle.jar \

unphased=swage3_imputed_1kg.chr-${i}.dat \

missing=0 \

log=swage3_imputed_1kg.chr-${i}.log \

markers=EUR.20100804.chr${i}.markers \

phased=EUR.20100804.chr${i}.bgl.phased

out=swage3_imputed_1kg.chr-${i}

done

Useful scripts and code for genetics research

Monday, May 16, 2011

Imputation using BEAGLE

Labels

Blog Archive

Followers

About Me