Site Map
Releases
| TopHat 1.0.12 (BETA) | 10/28/09 |
Related Tools
Pre-built indexes
| H. sapiens, UCSC hg18 | 2.7 GB |
| H. sapiens, UCSC hg19 | 2.7 GB |
| M. musculus, UCSC mm9 | 2.4 GB |
All indexes are for assemblies, not contigs. Unplaced or unlocalized sequences and alternate haplotype assemblies are excluded.
Some unzip programs cannot handle archives >2 GB. If you have problems downloading or unzipping a >2 GB index, try downloading in two parts.
Check .zip file integrity with MD5s.
Pre-built indexes are compatible with Bowtie versions 0.9.8 and later. For older indexes, please contact us.
Publications
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics doi:10.1093/bioinformatics/btp120
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25.
Contributors
Links
Getting started
Install quick-startDownload and extract the appropriate TopHat and Bowtie releases. You will need the following Bowtie in your PATH:
To install TopHat, unpack the tarball and change to the package directory as follows:
tar zxvf tophat-1.0.7.tar.gz cd tophat-1.0.7/ Now build the package:
./configure --prefix=/path/to/install/directory/ make Finally, install TopHat:
make install You may want to add the tophat script to your path. Testing the installationAfter you've installed Bowtie and TopHat, you should test the pipeline on a simple test data set, which you can download here. This data is not meant to exhaustively test all the features of TopHat. It's just to verify that the installation worked. Unzip the data, change to the test_data directory and then run tophat: tar zxvf test_data.tar.gz cd test_data tophat -r 20 test_ref reads_1.fq reads_2.fq If TopHat ran successfully, you should see some lines of output, like this: [Mon May 4 11:07:23 2009] Beginning TopHat run (v1.0.7) ----------------------------------------------- [Mon May 4 11:07:23 2009] Preparing output location ./tophat_out/ [Mon May 4 11:07:23 2009] Checking for Bowtie index files [Mon May 4 11:07:23 2009] Checking for reference FASTA file [Mon May 4 11:07:23 2009] Checking for Bowtie Bowtie version: 0.9.9.1 [Mon May 4 11:07:23 2009] Checking reads seed length: 75bp format: fastq quality scale: phred Splitting reads into 3 segments [Mon May 4 11:07:23 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie Splitting reads into 3 segments [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against test_ref with Bowtie [Mon May 4 11:07:24 2009] Searching for junctions via coverage islands [Mon May 4 11:07:24 2009] Searching for junctions via mate-pair closures [Mon May 4 11:07:24 2009] Retrieving sequences for splices [Mon May 4 11:07:24 2009] Indexing splices [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Joining segment hits [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Mapping reads against segment_juncs with Bowtie [Mon May 4 11:07:24 2009] Joining segment hits [Mon May 4 11:07:24 2009] Reporting output tracks ----------------------------------------------- Run complete [00:00:00 elapsed] In the directory tophat_out should be a file junctions.bed. This file should contain a pair of junctions, on the reference sequence "test_chromosome". Preparing your referenceTo find junctions with TopHat, you'll first need to install a Bowtie index for the organism in your RNA-Seq experiment. The Bowtie site provides pre-built indices for human, mouse, fruit fly, and others. If there's no index for your organism, it's easy to build one yourself. TopHat also requires a fasta file (.fa) for your reference. If this file is not found alongside the other index files, the program will use the Bowtie index you give it to build this file and save it to the output directory. This step can take up to an hour for a human-sized genome. To skip this step in future runs, you can move the fasta file from the tophat_out directory to the directory containing the Bowtie index files. Preparing your readsTopHat currently accepts reads in FASTA or FASTQ format, though FASTQ is recommended. You may need to convert your reads from another format to one of these. Maq's fq_all2std.pl converts many formats into FASTQ. For reads in SRF format, we recommend using the tools bundled with the Staden io_lib package. Note: TopHat does not support mixing FASTA and FASTQ reads in the same input file, so don't run TopHat on FASTQ and FASTA files in the same run. Running TopHatTopHat will map your reads first by running Bowtie to identify places where reads map end to end. Since your reads came from spliced transcripts in an RNA-Seq experiment, Bowtie will identify "islands" in your reference genomewhere reads piled up. Many of these islands will be exons. TopHat will then run a program to find splice junctions using the reads that did not get mapped to an island. So to identify junctions, you do not need to run Bowtie yourself, as TopHat will do it for you. TopHat needs you specify a path to the index files and an input file containing your reads. The first argument should be the full path to the directory containing the index plus the prefix of the index files. To start the TopHat pipeline, enter the command: tophat /path/to/h_sapiens reads1.fq,reads2.fq,reads3.fq Be sure to check out the TopHat manual, as the pipeline has a few options you might want to use to get better results or get them more quickly. Examining your outputTopHat produces several files of output. You can quickly examine your results by loading the two tracks coverage.wig and junctions.bed into the UCSC genome browser. You may want to change the "junctions" track to "pack", so the display looks like this: |
