FINEMAP


Command-line arguments | Input | Output | Examples

FINEMAP is a program for

in genomic regions associated with complex traits and disease. FINEMAP is computationally efficient by using summary statistics from genome-wide association studies and robust by using a shotgun stochastic search algorithm (Hans et al., 2007). It produces accurate results in a fraction of processing time of existing approaches. It is therefore the ideal tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing or biobank projects.

Download

(license)

Command-line arguments

--config Evaluate a single causal configuration without performing shotgun stochastic search Subprogram
--corr-config The posterior probability of a causal configuration is set to zero if it includes a pair of SNPs with absolute correlation above this threshold Default is 0.95 (with --sss)
--dataset Option to specify delimiter-separated list of datasets for fine-mapping as given in the master file (e.g. 1,2 or 1|2) All datasets are processed by default
--help Command-line help
--in-files Master file (see below) With --sss/--config
--log Option to write output to log files specified in column 'log' in the master file No log files are written by default
--n-causal-snps Maximum number of allowed causal SNPs Default is 5
--n-configs-top Number of top causal configurations to be saved Default is 50000
--n-convergence Number of iterations that the added probability mass is required to be below the specified threshold (--prob-tol) before the shotgun stochastic search is terminated Default is 1000
--n-iterations Maximum number of iterations before the shotgun stochastic search is terminated Default is 100000
--prior-k Option to use prior probabilities for the number of causal SNPs as specified in K files (see below) in the master file SNPs are by default assumed to be causal with probability 1 / (# of SNPs in the genomic region)
--prior-k0 Prior probability that there is no causal SNP in the genomic region. Only used when computing posterior probabilities for the number of causal SNPs but not during fine-mapping itself Default is 0.0
--prior-std Comma-separated list of prior standard deviations of effect sizes. Default is 0.05
--prob-tol Tolerance at which the added probability mass (over --n-convergence iterations) is considered small enough to terminate the shotgun stochastic search Default is 0.001
--rsids Comma-separated list of SNP identifiers corresponding with the rsid column in Z files (see below) With --config
--sss    Fine-mapping with shotgun stochastic search    Subprogram

Input

(1) Master file

The master file is a semicolon-separated text file and contains no space. It contains the following column names and one dataset per line.

A master file with two datasets could look as follows.

z;ld;snp;config;log;n_samples
dataset1.z;dataset1.ld;dataset1.snp;dataset1.config;dataset1.log;5363
dataset2.z;dataset2.ld;dataset2.snp;dataset2.config;dataset2.log;5363

(2) Z file

The dataset.z file is a space-delimited text file and contains the GWAS summary statistics one SNP per line. It contains exactly the column names in the following order.

A dataset.z file with three SNPs could look as follows.

rsid chromosome position noneff_allele eff_allele maf beta se
rs1 10 1 T C 0.35 0.0050 0.0208
rs2 10 1 A G 0.04 0.0368 0.0761
rs3 10 1 G A 0.18 0.0228 0.0199

(3) LD file

The dataset.ld file is a space-delimited text file and contains the SNP correlation matrix (Pearson's correlation). The order of the SNPs in the dataset.ld must correspond to the order of variants in dataset.z.

A dataset.ld file with three SNPs could look as follows.

1.00 0.95 0.98
0.95 1.00 0.96
0.97 0.96 1.00

(4) Optional K file

By default, FINEMAP assumes that SNPs are causal with prior probability 1 / (# of SNPs in the genomic region). As an alternative, it is possible to specify prior probabilities for the number of causal SNPs in the genomic region by using a dataset.k file. This is a space-delimited text file and contains the prior probabilities pk = Pr(# of causal SNPs is k) for k = 1,...,K, where K is the number of entries in the dataset.k file. The prior probabilities must be non-negative and will be normalized to sum to one.

A dataset.k file allowing for three causal SNPs with p1 = 0.6, p2 = 0.3 and p3 = 0.1 would look as follows.

0.6 0.3 0.1

Output

(1) SNP file

The dataset.snp file is a space-delimited text file. It contains the GWAS summary statistics and model-averaged posterior summaries for each SNP one per line.

(2) CONFIG file

The dataset.config file is a space-delimited text file. It contains the posterior summaries for each causal configuration one per line.

(3) LOG file

The dataset.log file outputs additional information. It contains the following output.

Fine-mapping example

Using genotype data with 50 SNPs and 5363 individuals, a quantitative phenotype was simulated using a linear model with 2 causal SNPs. Single-SNP testing was performed to obtain z-scores. SNP correlations were computed from GWAS genotype data.

Fine-mapping the SNPs in genomic region 1 in the example folder is done follows.

./finemap_v1.2_MacOSX --sss --in-files example/data --dataset 1
./finemap_v1.2_x86_64 --sss --in-files example/data --dataset 1

Single causal configuration example

The same data as in the fine-mapping example above are used. Without having to perform shotgun stochastic search, information about a single causal configuration can be obtain by specifying SNP identifiers as follows

./finemap_v1.2_MacOSX --config --in-files example/data --dataset 1 --rsids rs30,rs11
./finemap_v1.2_x86_64 --config --in-files example/data --dataset 1 --rsids rs30,rs11

References

Benner, C. et al. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501 (2016).
Hans, D. et al. Shotgun stochastic search for "large p" regression. J Am Stat Assoc 102, 507-516 (2007).

Acknowledgements

Matti Pirinen contributed to the design and implementation of FINEMAP.