Site Map
News and updates
| New releases and related tools will be announced through the mailing list |
Getting Help
| Questions about TopHat should be sent to tophat.cufflinks@gmail.com. Please do not email technical questions to TopHat contributors directly. |
Releases
| version 2.0.0 (BETA) | 4/09/2012 |
| Source code | |
| Linux x86_64 binary | |
| Mac OS X x86_64 binary |
Related Tools
- Cufflinks: Isoform assembly and quantitation for RNA-Seq
- Bowtie: Ultrafast short read alignment
- TopHat-Fusion: An algorithm for Discovery of Novel Fusion Transcripts
- CummeRbund: Visualization of RNA-Seq differential analysis
Pre-built indexes
| H. sapiens, UCSC hg18 | 2.7 GB | |
| ||
| H. sapiens, UCSC hg19 | 2.7 GB | |
| ||
| M. musculus, UCSC mm9 | 2.4 GB | |
| ||
All indexes are for assemblies, not contigs. Unplaced or unlocalized sequences and alternate haplotype assemblies are excluded.
Some unzip programs cannot handle archives >2 GB. If you have problems downloading or unzipping a >2 GB index, try downloading in two parts.
Check .zip file integrity with MD5s.
Pre-built indexes are compatible with Bowtie versions 0.9.8 and later. For older indexes, please contact us.
Publications
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics doi:10.1093/bioinformatics/btp120
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25.
- TopHat-Fusion: An algorithm for Discovery of Novel Fusion Transcripts
- CummeRbund: Visualization of RNA-Seq differential analysis
Contributors
- Cole Trapnell
- Daehwan Kim
- Geo Pertea
- Harold Pimentel
- Ryan Kelley
- Lior Pachter
- Steven Salzberg
Links
ManualWhat is TopHat?TopHat is a program that aligns RNA-Seq reads to a genome in order to identify exon-exon splice junctions. It is built on the ultrafast short read mapping program Bowtie. TopHat runs on Linux and OS X. What types of reads can I use TopHat with?TopHat was designed to work with reads produced by the Illumina Genome Analyzer, although users have been successful in using TopHat with reads from other technologies. In TopHat 1.1.0, we began supporting Applied Biosystems' Colorspace format. The software is optimized for reads 75bp or longer. Mixing paired- and single- end reads together is not supported. How does TopHat find junctions?TopHat finds splice junctions without a reference annotation. By first mapping RNA-Seq reads to the genome, TopHat identifies potential exons, since many RNA-Seq reads will contiguously align to the genome. Using this initial mapping information, TopHat builds a database of possible splice junctions, and then maps the reads against these junctions to confirm them. Short read sequencing machines can currently produce reads 100bp or longer, but many exons are shorter than this, and so would be missed in the initial mapping. TopHat solves this problem by splitting all input reads into smaller segments, and then mapping them independently. The segment alignments are "glued" back together in a final step of the program to produce the end-to-end read alignments. TopHat generates its database of possible splice junctions from three sources of evidence. The first source is pairings of "coverage islands", which are distinct regions of piled up reads in the initial mapping. Neighboring islands are often spliced together in the transcriptome, so TopHat looks for ways to join these with an intron. The second source is only used when TopHat is run with paired end reads. When reads in a pair come from different exons of a transcript, they will generally be mapped far apart in the genome coordinate space. When this happens, TopHat tries to "close" the gap between them by looking for subsequences of the genomic interval between mates with a total length about equal to the expected distance between mates. The "introns" in this subsequence are added to the database. The third, and strongest, source of evidence for a splice junction is when two segments from the same read are mapped far apart, or when an internal segment fails to map. With long (>=75bp) reads, "GT-AG", "GC-AG" and "AT-AC" introns will be found ab initio. With shorter reads, TopHat only reports alignments across "GT-AG" introns PrerequisitesTo use TopHat, you will need the following programs in your PATH:
Because TopHat outputs and handles alignments in BAM format, you will need to download and install the SAM tools. You may want to take a look at the Getting started guide for more detailed installation instructions, including installation of SAM tools and Boost. You will also need Python version 2.4 or higher. Obtaining and installing TopHat You can download the latest source release and precompiled binaries for Linux and Mac OSX here. See the Getting started
guide for detailed instructions about installing TopHat from the binary
package or building TopHat and its dependencies from source. To install TopHat from source package, unpack the tarball and change directory to the package directory as follows:
tar zxvf tophat-2.0.0.tar.gz cd tophat-2.0.0/ Configure the package, specifying the install path and the library dependencies as needed (see the Getting started guide for details):
./configure --prefix=<install_prefix> --with-boost=<boost_install_prefix> --with-bam=<samtools_install_prefix> Finally, build and install TopHat:
make As detailed in the Getting started guide, if you want to install TopHat 2 without overwriting a previous version of TopHat already installed on your system you should specify a new, separate <install_prefix> for the ./configure command above, and after the 'make install' step just copy the tophat2 script from <install_prefix>/bin to a directory that is in your shell's PATH, so you can invoke this new version of TopHat with the command 'tophat2'. Below you will find a detailed list of command-line options you can
use to control TopHat. Beginning users should take a look at the
Getting started guide for a tutorial on
installing and running TopHat and its prerequisites.
Using TopHatThe following is a detailed description of the options used to control the tophat script: Usage: tophat [options]* <index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]When running TopHat with paired ends, it is critical that the *_1 files an the *_2 files appear in separate comma separated lists, and that the order of the files in the two lists is the same. NOTE: TopHat can align reads that are up to 1024 bp, and it handles paired end reads, but we do not recommend mixing several "types" of reads in the same TopHat run. For example, mixing 100bp single end reads and 2x27bp paired ends into the same TopHat run will give bad results. If you'd like to combine results from several "flavors" of RNA-Seq reads, you can run first with one of your sets, and feed the junctions produced by that run into future TopHat runs as externally supplied junctions with the -j option (see below)
|
Bowtie 2 specific options: Bowtie 2 provides many options so that users can have more flexibility as to how reads are mapped. TopHat 2 allows users to pass many of these options to Bowtie 2 by preceding the Bowtie 2 option name with the --b2- prefix. Please refer to the Bowtie2 website for detailed information.
Reads can be aligned to potential fusion transcripts if the --fusion-search option is specified. The fusion alignments are reported in SAM format using custom fields XF and XP (see the output format) and some additional information about fusions will be reported (see fusions.out). Once mapping is done, you can run tophat-fusion-post to filter out fusion transcripts (see the TopHat-Fusion website for more details).
The options below allow you validate your own list of known transcripts or junctions with your RNA-Seq data. Note that the chromosome names in the files provided with the options below must match the names in the Bowtie index. These names are case-senstitive. The options below allow you validate your own indels with your RNA-Seq data. Note that the chromosome names in the files provided with the options below must match the names in the Bowtie index. These names are case-senstitive.
TopHat OutputThe tophat script produces a number of files in the directory in which it was invoked. Most of these files are internal, intermediate files that are generated for use within the pipeline. The output files you will likely want to look at are:
|
