**Command-line arguments**

**|**

**Input**

**|**

**Output**

**|**

**Fine-mapping example**

**|**

**Correlation example**

FINEMAP is a program for identifying causal SNPs in genomic regions associated with complex traits and disease. FINEMAP is computationally efficient by using summary statistics from genome-wide association studies and robust by using a shotgun stochastic search algorithm (Hans *et al*., 2007). It produces accurate results in a fraction of processing time of existing approaches and is therefore the ideal tool for analyzing growing amounts of data produced in genome-wide association studies and emerging sequencing or biobank projects.

### Download

### (license)

- Be aware that FINEMAP v1.1 cannot handle large effect size regions!
- finemap_v1.1_MacOSX.tgz (Mac OS X)
- finemap_v1.1_x86_64.tgz (Unix)
- (Updated 20-July-2016)

- To receive email reminders about updates of FINEMAP, send an email to finemap@christianbenner.com.

### Command-line arguments

--corr | Determine highly correlated SNPs | Subprogram | ||

--corr-config | The posterior probability of a causal configuration is set to zero if it includes a pair of SNPs with absolute correlation above this threshold | Default is 0.95 (with --sss) | ||

--corr-filter | SNPs are discarded such that no pair of SNPs remains with absolute correlation greater above this threshold | Default is 0.95 (with --corr) | ||

--help | Command-line help | |||

--in-files | Master file (see below) | With --sss/--corr | ||

--log | Option to write output to log files specified in column 'log' in the master file | No log files are written by default | ||

--n-causal-max | Maximum number of allowed causal SNPs | Default is 5 | ||

--n-configs-top | Number of top causal configurations to be saved | Default is 50000 | ||

--n-convergence | Number of iterations that the added probability mass is required to be below the specified threshold (--prob-tol) before the shotgun stochastic search is terminated | Default is 1000 | ||

--n-iterations | Maximum number of iterations before the shotgun stochastic search is terminated | Default is 100000 | ||

--prior-k | Option to use prior probabilities for the number of causal SNPs as specified in K files (see below) in the master file | SNPs are by default assumed to be causal with probability 1 / (# of SNPs in the genomic region) | ||

--prior-k0 | Prior probability that there is no causal SNP in the genomic region. Only used when computing posterior probabilities for the number of causal SNPs but not during fine-mapping itself | Default is 0.0 | ||

--prior-std | Prior standard deviation of effect sizes | Default is 0.05 | ||

--prob-tol | Tolerance at which the added probability mass (over --n-convergence iterations) is considered small enough to terminate the shotgun stochastic search | Default is 0.001 | ||

--regions | Option to specify delimiter-separated list of datasets for fine-mapping as given in the master file (e.g. 1,2 or 1|2) | All regions are processed by default | ||

--sss | Fine-mapping with shotgun stochastic search | Subprogram |

### Input

#### (1) Master file

The master file is a **semicolon-separated** text file and contains no space. It contains the following column names and one dataset per line.

z column contains the names of Z files (input).

ld column contains the names of LD files (input).

snp column contains the names of SNP files (output).

config column contains the names of CONFIG files (output).

n-ind column contains the GWAS sample sizes.

k column contains the optional K files (input).

log column contains the optional LOG files (output).

File extensions must correspond with the column names in the header line!

A master file with two datasets could look as follows.

z;ld;snp;config;log;n-ind |

dataset1.z;dataset1.ld;dataset1.snp;dataset1.config;dataset1.log;5363 |

dataset2.z;dataset2.ld;dataset2.snp;dataset2.config;dataset2.log;5363 |

#### (2) Z file

The dataset.z file is a **space-delimited** text file. It contains the following two columns and one SNP per line.

1st column contains the SNP identifiers. The identifier can be a rsID number or combination of chromosome name and genomic position (e.g. XXX:yyy).

2nd column contains the

*z*-scores.

A dataset.z file with three SNPs could look as follows.

rs1 0.240 |

rs2 0.483 |

rs3 1.145 |

The order of SNPs in dataset.z must correspond to the order of SNPs in dataset.ld!

#### (3) LD file

The dataset.ld file is a **space-delimited** text file and contains the SNP correlation matrix (Pearson's correlation). The order of the SNPs in the dataset.ld must correspond to the order of variants in dataset.z.

A dataset.ld file with three SNPs could look as follows.

1.00 0.95 0.98 |

0.95 1.00 0.96 |

0.97 0.96 1.00 |

Ideally, the SNP correlation matrix is computed from the genotype data on the same samples from which the GWAS summary statistics orginate. Read

*here*what could happen if SNP correlations from reference genotypes (e.g. 1000 Genomes Project) do not match well with the GWAS summary statistics.

#### (4) Optional K file

By default, FINEMAP assumes that SNPs are causal with prior probability 1 / (# of SNPs in the genomic region). As an alternative, it is possible to specify prior probabilities for the number of causal SNPs in the genomic region by using a dataset.k file. This is a **space-delimited** text file and contains the prior probabilities *p _{k}* = Pr(# of causal SNPs is

*k*) for

*k*= 1,...,

*K*, where

*K*is the number of entries in the dataset.k file. The prior probabilities must be non-negative and will be normalized to sum to one.

A dataset.k file allowing for three causal SNPs with *p*_{1} = 0.6, *p*_{2} = 0.3 and *p*_{3} = 0.1 would look as follows.

0.6 0.3 0.1 |

We assume that the genomic region includes at least one causal SNP and thus

*p*_{0}= 0. A non-zero prior probability*p*_{0}that there is no causal SNP in the genomic region can be specified with the command-line argument --prior-k0. This value is only used when computing posterior probabilities*p*_{k|data}= Pr(# of causal SNPs is*k*| data) but not during fine-mapping itself. We further assumen that*p*= 0 for_{k}*k*=*K*+1,...,*m*, where*m*is the number of SNPs in the dataset.z file.

### Output

#### (1) SNP file

The dataset.snp file is a **space-delimited** text file. It contains the model-averaged posterior summaries for each SNP one per line.

index column contains the line numbers in which SNPs appear in the dataset.z file.

snp column contains the SNP identifiers.

snp_prob column contains the marginal Posterior Inclusion Probabilities (PIP). The PIP for the

*l*th SNP is the posterior probability that this SNP is causal. It is computed by summing up the posterior probabilities of all causal configurations in the dataset.config file in which*l*th SNP is included.snp_log10bf column contains the log

_{10}Bayes factors. The Bayes factor quantifies the evidence that the*l*th SNP is causal with log_{10}Bayes factors greater than 2 reporting decisive evidence.

#### (2) CONFIG file

The dataset.config file is a **space-delimited** text file. It contains the posterior summaries for each causal configuration one per line.

rank column contains the ranking.

config column contains the SNP identifiers.

config_prob column contains the posterior probability. For a causal configuration γ, an entry in this column is the posterior probability that SNPs in this configuration are causal.

config_log10bf column contains the log

_{10}Bayes factors. The Bayes factor quantifies the evidence for a causal configuration γ over the null configuration γ_{0}.

#### (3) LOG file

The dataset.log file outputs additional information. It contains the following output.

Posterior probabilities Pr(# of causal SNPs is

*k*| data) for*k*= 1,...,*K*, where*K*is the maximum number of allowed causal SNPs.A log

_{10}Bayes factor to quantify the evidence of at least one causal SNP in the genomic region.

### Fine-mapping example

Using genotype data with 50 SNPs and 5363 individuals, a quantitative phenotype was simulated using a linear model with 2 causal SNPs. Single-SNP testing was performed to obtain *z*-scores

Fine-mapping the SNPs in genomic region 1 in the example folder is done follows:

./finemap --sss --in-files example/data --regions 1### Correlation example

The same data as in the fine-mapping example above are used. Repairing non-positive definiteness of a SNP correlation matrix (with Pearson's correlation coefficients) can sometimes be done by discarding SNPs such that no pair of SNPs remains with absolute correlation greater than some specified threshold

Pair-wise SNP correlations can be reduced, for instance according to threshold |*r _{ij}* |<0.98, as follows:

This removed the SNPs 1, 24, 27, 35, 37.

### References

Benner, C. et al. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493-1501 (2016). |

Hans, D. et al. Shotgun stochastic search for "large p" regression. J Am Stat Assoc 102, 507-516 (2007). |

### Acknowledgements

Matti Pirinen contributed to the design and implementation of FINEMAP.