PrESSTo Human Enhancers Documentation page

 

PrESSTo, the promoter and enhancer slider selector tool is a web tool and database  of promoters and enhancers obtained during the FANTOM5 project.

 

1. Background research

 

The FANTOM5 consortium have extracted RNA transcripts from a multitude of different primary cells and tissues using the CAGE experiment (short for Cap Analysis of Gene Expression). As a result, mappings of transcription start sites and their usage in human primary cells and post-mortem tissues were produced in order to construct a comprehensive overview of gene expression across the human body.

This has been done using CAGE and single-molecule sequencing. The result is a unique gene expression profile atlas, focused specifically on core promoter utilization.

 

Because active enhancer regions are transcribed, we identified a distinct bidirectional CAGE pattern which could predict enhancer regions based on CAGE data not associated with promoters. Because enhancer transcription is a powerful proxy of cell-specific enhancer activity, we could define an atlas of active, in vivo bidirectionally transcribed enhancers across the human body using the FANTOM5 panel of tissue and primary cell samples. Similar to promoters, for each enhancer we have a measure of expression  within each primary cell and tissue and based on the expression levels, an enhancer can be specific to a set of primary cells and tissues or can be broadly (or ubiquitously) expressed. Expression levels are measured in CAGE tags (simple or normalized). The expression levels used in PrESSTo are measured in TPM (tags per million).         

 

In PrESSTo we have simplified the samples by aggregating anatomically/functionally similar cells and tissues into “facets”. Examples of cell facets are: neurons, T cells or monocytes; example of tissue facets are: brain, liver or heart tissue.

 

 

2. Defining sliders and searching (the search page)

 

Based on the expression of individual promoters and enhancers across all cells and all tissues we have transformed the expression values into percentage values. Given this we can define sliders from 0 to 100% which we can use to filter enhancers and promoters based on their expression across a set of samples.

For instance, moving the liver slider to 10% and the intestine slider to 50% results in all enhancers where at least 10% of CAGE tags are from liver at at least 50% are from intestine.  

In addition we can also set location constraints based on the genomic location of promoters and enhancers to further refine our search. The location search offers the option to search near known genes with default values of 100K bp up and downstream gene boundaries.  Since enhancers are bidirectionally transcribed they have no associated strand (+ or -).

 

 

2.1 Slider Search Details:

The percentage (%) number for each cell type/tissue refers to how much of the total expression (normalized CAGE tag counts from all cells and tissues) the enhancer emits for this particular cell type/tissue. The percentage number is a "lowest bound" value: only the enhancers that have higher percentage of expression than the set value for the cell type/tissue will be returned.

The percentage numbers for each cell/tissue can be set in three ways:

 

§  Directly typing the number in the input box;

 

§  Moving the slider along the slider bar;

 

§  Clicking the arrows buttons on the ends of the slider bar to increase/decrease the percentage. The "<" and ">" buttons change the percentage with increments of 0.5%. The "<<" and ">>" buttons change the percentage with increments of 5%.

 

The primary cell/tissue percentage values can be disabled by clicking the "Disable" button. Disabling the percentage values for either or both cells and tissues implies that their percentage numbers will not be taken into account when searching for enhancers (they are all zero % ).

By clicking the "Reset" buttons , all primary cells and/or tissue percentage values will be reset to zero %.

The total value of all non-zero primary cells/tissue percentage cannot exceed 100%. The remaining percentage available is shown above the "Reset" button.

 

2.2 Location search details:

There are two ways to set the location search information:

When selecting from the two types of location search (basic or gene-based), its corresponding search parameter area will be displayed with a light green background while the other option area will be displayed in gray.

To move from gene-based search to the basic search (remove/reset gene options) double click the left box containing the basic search components, namely chromosome, start-site, end-site, or change the values in either of these components.

Note that moving from the gene-based search to the basic location search by double-clicking will trigger all the search parameters to reset to the initial location search, which spans across all chromosomes.

The "Sort by" selector on the right-most side informs on which of the cells’ or organ's expression value the result will be sorted by. The sort type is descending.

The location search can be disabled by clicking the "Disable" button on the top-right corner. Disabling the location search implies that the search will be conducted among all the chromosomes.

 

2.3. Result Options

 

The number of resulting enhancers will be automatically updated after each change in sliders or location values. The number is shown in the box on the right with the caption “number of hits”. In addition, the cell and tissue slider filters can be changed to the following:

 

§  Cell AND Organ (Tissue) Constraints: constraints from both the cell and tissue slider sets so that the resulting enhancer set is a combination of cell AND tissue constrains (enhancers that follow both the cell AND the tissue constraints will be returned)

 

§  Cell OR Organ (Tissue) Constraints: constraints from either the cell and tissue slider sets so that the resulting enhancer set is a combination of cell OR tissue constrains (enhancers that follow either the cell OR the tissue constraints will be returned)

 

§  Cell Constrains Only: only apply the constraints set by the cells sliders and ignore tissue sliders when searching

 

§  Organ(Tissue) Constraints Only: apply the constraints set by the cells sliders and ignore tissue sliders when searching

 

By clicking the “See Detailed Results >>” link the user will be directed the viewer page displaying detailed information about the resulting enhancers.

 

 

 

3. View Resulting Enhancers (the viewer page)

 

Based on the expression and genomic location constraints, the number of resulting enhancers will differ. After selecting your search options from the search page click the “See Detailed Results >>” link to go to the viewer page. 

 

The viewer page displays detailed information about the resulting enhancers and allows the user to perform actions on the resulting set such as downloading and viewing in genome browsers.  The complete set of enhancer information and actions is displayed below.

 

Your selected enhancers are shown on the list to the left. A maximum number of 50 enhancers are displayed on each page. You can use the "First", "Prev", "Next" and "Last" buttons at the bottom-left side of the page to view a different subset. Clicking on an enhancer shows information about that individual enhancer:

 

§  Percentage and Expression Tables: percentage and expression tables for primary cells (facets), organs (facets) and libraries (sample level expression). For the primary cells and organ facets the tables indicate whether the enhancer is significantly overrepresented in a specific primary cell or organ facet based on statistical tests described in Andersson et al. The indication of over-represented is given by red percentage bars as opposed to gray bars for non over-represented. The sample wise expression table contains links to SSTAR which holds detailed information about each sample. Most table columns can be sorted by clicking on the table column header.

 

§  View enhancer in UCSC Genome Browser: Input UCSC track information such as name, description and color and view the individual enhancer as a track in the UCSC Genome Browser.

 

 

§  View enhancer in ZENBU Genome Browser: View enhancer information similar to UCSC in the ZENBU Genome Browser from FANTOM.  A special feature in ZENBU is the option to view expression levels

 

§  View Overlaps: View enhancer-promoter and enhancer-SNP overlaps. An enhancer-promoter overlap occurs when the distance from an enhancer center to a promoter summit is within 500 kbp and the Pearson correlation adjusted pvalue is: FDR < 10-5. More about enhancer centers and promoter summits in the publications listed in section 6 of this documentation. If an enhancer-promoter overlap exists, the promoter can be further inspected by clicking on the promoter annotation and going to the PrESSTo human promoter selector. An enhancer-SNP overlap occurs when the distance from an enhancer center and a SNP location is within 200 bp.  If an enhancer-SNP overlap exists the SNP can be further inspected by clicking on the SNP id.

 

Based on the selection parameters from the slider tool, a number of enhancers can be found. Given the resulting enhancers the following options are available:

§  Download all found enhancers as BED files.

 

§  View all found enhancers in the UCSC Genome Browser: Input UCSC track information such as name, description and color and view the enhancers as a custom track in the UCSC Genome Browser.

 

§  Get DNA Sequences for all found enhancers. In addition to the enhancer sequences given by their genomic coordinates, a total of 600 extra nucleotides can be added to the 5' and/or 3' end of each individual sequence.

 

§  Motif Discovery: Perform basic motif discovery on the resulting enhancer sequences using the MEME motif discovery tool. The enhancer sequences can be padded with a total of 600 extra nucleotides at the 5’ and/or 3’ end. The MEME job will be sent via email if the user inputs a valid email address. Note:  The online version of MEME only accepts a maximum of 1,000 sequences which in total must not exceed 60,000 nucleotides.  All sequences must be longer than 8 nucleotides.  In addition the maximum upstream and downstream extensions must not exceed 600 nucleotides.

 

 

 

 

4.  Motif Over-representation (the motif page)

 

This site contains collections of sequence motifs enriched in human bidirectional CAGE enhancers as described in Andersen et al. 2014, Nature. The underlying CAGE data was produced at the RIKEN OSC as part of the international FANTOM consortium and is described in The FANTOM 5 Consortium and the RIKEN PMI and CLST (DGT), Nature.

 

Sequence motifs were extracted from enhancers or promoters (either all or those showing differential activity in primary cells or tissues) using HOMER, a suite of tools for motif discovery and next-gen sequencing analysis developed by Christopher Benner (Integrative Genomics Core, Salk Institute, San Diego).

 

5. Predefined tracks (the tracks page)

 

Using statistical methods, we have defined several enhancer tracks used in Andersson et al. Most track sets are defined in the BED format unless specified otherwise and can be downloaded. These include:

 

§  Extensive enhancers: ubiquitous enhancers expressed over the entire set of cell facets; ubiquitous enhancers expressed over the entire set of tissue(organs) facets; TSS-enhancer associations(RefSeq promoters only); permissive and robust enhancer sets as defined in Andersson et al. PrESSTo searches and information are only associated with the robust enhancer set

 

§  Enhancers specifically expressed in each individual facet from the cell or the tissue (organ) facet sets

 

 

The above enhancer sets can be marked and downloaded individually or in combination by selecting the desired sets;

 

§  Enhancers expressed in all facets (both cell and organ);

 

§  Enhancers that are positively differential expressed in each facet against any other facet

 

§  Binary matrix of enhancer usage (significant enhancer expression compared to random regions; 1 means used) across all considered FANTOM libraries

 

§  Expression (TPM and RLE normalized) matrix of enhancers across all considered FANTOM libraries

 

§  Enhancer-FANTOM Robust Promoter associations: enhancer-promoter associations are obtained by correlating (Pearson) enhancer-promoter pairs within 500 kbases over cell type and organ facets and filtered by an adjusted correlation pvalue: FDR < 10-5. The enhancer-promoter associations are computed between enhancers and all FANTOM robust promoters.

 

 

6. Additional information and citation

 

For more details on the enhancer tracks or any other research aspects behind enhancers or promoters please consult the following publications:

 

§  The FANTOM 5 Consortium and the RIKEN PMI and CLST (DGT), Nature 507, 462–470, doi:10.1038/nature13182  *

 

§  Andersson et al, Nature 507, 455–461, doi:10.1038/nature12787 *

 

§  Arner et al, Science vol. 347,1010-1014, doi: 10.1126/science.1259418

 

 

* Papers to cite if you are using PrESSTo