- If e.g. you only have 10-20 individuals, you can run into problems with allele order, e.g. "
- 1000 genomes guzzles memory, use e.g.
java -Xmx10000m -Djava.io.tmpdir=./ -jar /psych/genetics_data/bin/beagle.3.0.4/beagle.jar
- You need to check for alleles to flip...and flip them
plink --bfile file --flip flip.txt --make-bed --out flip.flipped
- Remove e.g. SNPs with duplicated IDs
- Remove e.g. SNPs where alleles are not observed:
gawk '(($3 == "-") || ($4 == "-")) { print $1 }' 1kg_markers.txt | grep '^rs' > weird.1kg
- If going from e.g. Affy5 to impute in 1KG, you need to update SNP coordinates to hg19 - obvious but easily forgotten:
plink ... --update-map mapfile_1kg.txt --recode-beagle ...
- I just ran my imputation in a loop, but not so easy if you have many individuals:
for(( i=1; i<23; i++ ))
do
bsub -q week -o swage3_imputed_1kg.beagle.chr${i}.lsf.log -e swage3_imputed_1kg.beagle.chr${i}.lsf.err -R "rusage[mem=10000]" -P swage3-beagle-chr{$i} \
java -Xmx10000m -Djava.io.tmpdir=./ -jar /psych/genetics_data/bin/beagle.3.0.4/beagle.jar \
unphased=swage3_imputed_1kg.chr-${i}.dat \
missing=0 \
log=swage3_imputed_1kg.chr-${i}.log \
markers=EUR.20100804.chr${i}.markers \
phased=EUR.20100804.chr${i}.bgl.phased
out=swage3_imputed_1kg.chr-${i}
done
No comments:
Post a Comment