FINEMAP


Command-line arguments | Input | Output | Examples

FINEMAP-ing articles

- Refining fine-mapping: effect sizes and regional heritability. bioRxiv. (2018).
- Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. (2017).
- FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501 (2016).

FINEMAP is a program for

in genomic regions associated with complex traits and disease. FINEMAP is computationally efficient by using summary statistics from genome-wide association studies and robust by applying a shotgun stochastic search algorithm (Hans et al., 2007). It produces accurate results in a fraction of processing time of existing approaches. It is therefore the ideal tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing or biobank projects.

Download

(license)

Command-line arguments

--cond Fine-mapping with stepwise conditioning Subprogram
--cond-pvalue Option to set the p-value threshold for declaring genome-wide significance Default is 5 × 10-8
--config Evaluate a single causal configuration without performing shotgun stochastic search Subprogram
--corr-config Option to set the posterior probability of a causal configuration to zero if it includes a pair of SNPs with absolute correlation above this threshold Default is 0.95
--dataset Option to specify a delimiter-separated list of datasets for fine-mapping as given in the master file (e.g. 1,2 or 1|2) All datasets are processed by default
--flip-beta Option to read a column 'flip' in the Z file with binary indicators specifying if the direction of the estimated SNP effect sizes needs to be flipped to match SNP correlations With --cond, --config and --sss
--force-n--samples Option to allow correlations in a BCOR file to be computed on a set of samples with different size than GWAS sample size With --cond, --config and --sss
--help Command-line help
--in-files Master file (see below) With --cond, --config and --sss
--log Option to write output to log files specified in column 'log' in the master file No log files are written by default
--n-causal-snps Option to set the maximum number of allowed causal SNPs Default is 5
--n-configs-top Option to set the number of top causal configurations to be saved Default is 50000
--n-conv-sss Option to set the number of iterations that the added probability mass is required to be below the specified threshold (--prob-conv-sss-tol) before the shotgun stochastic search is terminated Default is 100
--n-iter Option to set the maximum number of iterations before the shotgun stochastic search is terminated Default is 100000
--n-threads Option to set the number of concurrent threads Default is 1
--prior-k Option to use prior probabilities for the number of causal SNPs as specified in K files (see below) in the master file SNPs are by default assumed to be causal with probability 1 / (# of SNPs in the genomic region)
--prior-k0 Option to set the prior probability that there is no causal SNP in the genomic region. Only used when computing posterior probabilities for the number of causal SNPs but not during fine-mapping itself Default is 0.0
--prior-snps Option to read a column 'prob' in the Z file with prior probabilities that a SNP is causal in order to define the prior probability for each causal configuration With --sss
--prior-std Option to specify a comma-separated list of prior standard deviations of effect sizes. Default is 0.05
--prob-conv-sss-tol Option to set the tolerance at which the added probability mass (over --n-conv-sss iterations) is considered small enough to terminate the shotgun stochastic search Default is 0.001
--prob-cred-set Option to set the probability at which the credible interval includes a causal SNP Default is 0.95
--pvalue-snps Option to set a p-value threshold at which SNPs are included Default is 1.0
--rsids Option to sepcify a comma-separated list of SNP identifiers corresponding with the rsid column in Z files (see below) With --config
--sss    Fine-mapping with shotgun stochastic search    Subprogram
--std-effects    Option to print mean and standard deviation of the posterior effect size distribution for standardized dosages    Default is allele dosage

Input

(1) Master file

The master file is a semicolon-separated text file and contains no space. It contains the following mandatory column names and one dataset per line.

(2) Z file

The dataset.z file is a space-delimited text file and contains the GWAS summary statistics one SNP per line. It contains the mandatory column names in the following order.

(3) LD file

The dataset.ld file is a space-delimited text file and contains the SNP correlation matrix (Pearson's correlation).

(4) BCOR file

See here for BCOR file format desciption.

(5) Optional K file

By default, FINEMAP assumes that SNPs are causal with prior probability 1 / (# of SNPs in the genomic region). As an alternative, it is possible to specify prior probabilities for the number of causal SNPs in the genomic region by using a dataset.k file. This is a space-delimited text file and contains the prior probabilities pk = Pr(# of causal SNPs is k) for k = 1,...,K, where K is the number of entries in the dataset.k file. The prior probabilities must be non-negative and will be normalized to sum to one.

Output

(1) SNP file

The dataset.snp file is a space-delimited text file. It contains the GWAS summary statistics and model-averaged posterior summaries for each SNP one per line.

(2) CONFIG file

The dataset.config file is a space-delimited text file. It contains the posterior summaries for each causal configuration one per line.

(3) CRED file

The dataset.cred file is a space-delimited text file. It contains the 95% credible sets for each causal signal in the genomic region. For each credible set, the following posterior summaries are provided

CRED files are generated for those cases of k causal SNPs in the genomic region that have largest posterior probability. For specific k, FINEMAP takes the k-SNP causal configuration with highest posterior probability and then asks, for the l th SNP in that set, which are the other candidates that could possibly replace that SNP in this causal configuration. The l th credible set shows the best candidate SNPs and their posterior probability of being in a k-SNP causal configuration that additionally contains k - 1 SNPs. Note that the k - 1 SNPs are chosen to have highest posterior probability in their credible set.

(4) LOG file

The dataset.log file outputs additional information. It contains the following output.

Fine-mapping example

Using genotype data with 50 SNPs and 5363 individuals, a quantitative phenotype was simulated using a linear model with 2 causal SNPs. Single-SNP testing was performed to obtain z-scores. SNP correlations were computed from GWAS genotype data.

Single causal configuration example

The same data as in the fine-mapping example above are used. Without having to perform shotgun stochastic search, information about a single causal configuration can be obtain by specifying SNP identifiers as follows

./finemap_v1.4_MacOSX --config --in-files example/data --dataset 1 --rsids rs30,rs11
./finemap_v1.4_x86_64 --config --in-files example/data --dataset 1 --rsids rs30,rs11

References

Benner, C. et al. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501 (2016).
Hans, D. et al. Shotgun stochastic search for "large p" regression. J Am Stat Assoc 102, 507-516 (2007).

Acknowledgements

Matti Pirinen contributed to the design and implementation of FINEMAP.