FINEMAP


Command-line arguments | Input | Output | Fine-mapping example | Correlation example

FINEMAP is a program for identifying causal SNPs in genomic regions associated with complex traits and disease. FINEMAP is computationally efficient by using summary statistics from genome-wide association studies and robust by using a shotgun stochastic search algorithm (Hans et al., 2007). It produces accurate results in a fraction of processing time of existing approaches and is therefore the ideal tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing or biobank projects.

Download

(license)

Command-line arguments

--corr Determine highly correlated SNPs Subprogram
--corr-config The posterior probability of a causal configuration is set to zero if it includes a pair of SNPs with absolute correlation above this threshold Default is 0.95 (with --sss)
--corr-filter SNPs are discarded such that no pair of SNPs remains with absolute correlation greater above this threshold Default is 0.95 (with --corr)
--help Command-line help
--in-files Master file (see below) With --sss/--corr
--log Option to write output to log files specified in column 'log' in the master file No log files are written by default
--n-causal-max Maximum number of allowed causal SNPs Default is 5
--n-configs-top Number of top causal configurations to be saved Default is 50000
--n-convergence Number of iterations that the added probability mass is required to be below the specified threshold (--prob-tol) before the shotgun stochastic search is terminated Default is 1000
--n-iterations Maximum number of iterations before the shotgun stochastic search is terminated Default is 100000
--prior-k Option to use prior probabilities for the number of causal SNPs as specified in K files (see below) in the master file SNPs are by default assumed to be causal with probability 1 / (# of SNPs in the genomic region)
--prior-k0 Prior probability that there is no causal SNP in the genomic region. Only used when computing posterior probabilities for the number of causal SNPs but not during fine-mapping itself Default is 0.0
--prior-std Prior standard deviation of effect sizes Default is 0.05
--prob-tol Tolerance at which the added probability mass (over --n-convergence iterations) is considered small enough to terminate the shotgun stochastic search Default is 0.001
--regions Option to specify delimiter-separated list of datasets for fine-mapping as given in the master file (e.g. 1,2 or 1|2) All regions are processed by default
--sss    Fine-mapping with shotgun stochastic search    Subprogram

Input

(1) Master file

The master file is a semicolon-separated text file and contains no space. It contains the following column names and one dataset per line.

A master file with two datasets could look as follows.

z;ld;snp;config;log;n-ind
dataset1.z;dataset1.ld;dataset1.snp;dataset1.config;dataset1.log;5363
dataset2.z;dataset2.ld;dataset2.snp;dataset2.config;dataset2.log;5363

(2) Z file

The dataset.z file is a space-delimited text file. It contains the following two columns and one SNP per line.

A dataset.z file with three SNPs could look as follows.

rs1 0.240
rs2 0.483
rs3 1.145

(3) LD file

The dataset.ld file is a space-delimited text file and contains the SNP correlation matrix (Pearson's correlation). The order of the SNPs in the dataset.ld must correspond to the order of variants in dataset.z.

A dataset.ld file with three SNPs could look as follows.

1.00 0.95 0.98
0.95 1.00 0.96
0.97 0.96 1.00

(4) Optional K file

By default, FINEMAP assumes that SNPs are causal with prior probability 1 / (# of SNPs in the genomic region). As an alternative, it is possible to specify prior probabilities for the number of causal SNPs in the genomic region by using a dataset.k file. This is a space-delimited text file and contains the prior probabilities pk = Pr(# of causal SNPs is k) for k = 1,...,K, where K is the number of entries in the dataset.k file. The prior probabilities must be non-negative and will be normalized to sum to one.

A dataset.k file allowing for three causal SNPs with p1 = 0.6, p2 = 0.3 and p3 = 0.1 would look as follows.

0.6 0.3 0.1

Output

(1) SNP file

The dataset.snp file is a space-delimited text file. It contains the model-averaged posterior summaries for each SNP one per line.

(2) CONFIG file

The dataset.config file is a space-delimited text file. It contains the posterior summaries for each causal configuration one per line.

(3) LOG file

The dataset.log file outputs additional information. It contains the following output.

Fine-mapping example

Using genotype data with 50 SNPs and 5363 individuals, a quantitative phenotype was simulated using a linear model with 2 causal SNPs. Single-SNP testing was performed to obtain z-scores. SNP correlations were computed from individual-level genotype data.

Fine-mapping the SNPs in genomic region 1 in the example folder is done follows:

./finemap --sss --in-files example/data --regions 1

Correlation example

The same data as in the fine-mapping example above are used. Repairing non-positive definiteness of a SNP correlation matrix (with Pearson's correlation coefficients) can sometimes be done by discarding SNPs such that no pair of SNPs remains with absolute correlation greater than some specified threshold (--corr-filter, default is 0.95). A search through the correlation matrix is performed to determine SNPs that need to be removed. The absolute values of pair-wise correlations are considered. If two SNPs have a high correlation, the mean absolute correlation of each SNP is considered and the SNP with the largest mean absolute correlation is removed.

Pair-wise SNP correlations can be reduced, for instance according to threshold |rij |<0.98, as follows:

./finemap --corr --in-files example/data --corr-filter 0.98

This removed the SNPs 1, 24, 27, 35, 37.

References

Benner, C. et al. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501 (2016).
Hans, D. et al. Shotgun stochastic search for "large p" regression. J Am Stat Assoc 102, 507-516 (2007).

Acknowledgements

Matti Pirinen contributed to the design and implementation of FINEMAP.