TRES Transcription Regulatory Element Search
a tool for Comparative Promoter Analysis


1. Introduction

2. Comparative promoter analysis
    a. Phylogenetic footprinting
    b. Genes with common expression profiles

3. Methods
    a. Input sequence set
    b. Searching TF binding sites using TRANSFAC weight matrices
    c. Searching TF binding sites / cis-elements from TRANSFAC, ooTFD and PLACE Databases
    d. Searching palindromic motifs (inverted repeats)
    e. k-tuple search
    f. Conservation of motifs can be studied relative to TATA box

4. References

5. Contact



Introduction:

Using the program TRES as many as 20 related promoter sequences can be searched simultaneously for conserved regulatory modules. And therefore, TRES is useful for comparative promoter analysis



Comparative Promoter Analysis

When a single promoter sequence is searched, one often finds many putative elements conserved all over the sequence making it difficult to choose for further experimental analysis.

On the other hand, when many related but diverse promoter sequences are searched simultaneously, the conserved motifs are more likely to be functionally important. Moreover, it also gives information about the conservation of motifs relative to transcription initiation site, TATA-box or relative to each other.



Phylogenetic Footprinting

Phylogenetic footprinting is an approach to elucidate common regulatory modules by studying
evolutionarily related genes (Gumucio et al., 1996). Since continiously occuring mutational events accumulate at neutral positons but are eliminated in functional regions,, it is argued that conserved motifs in diverse orthologous promoter sequences are more likely to have functional role.

Phylogenetic footprints have been defined as six or more contigious conserved bases in multiple alignments of orthologous genes (Gumucio et al, 1996). Since TRES k-tuple search can detect conserved motifs irrespective of their position or orientation it is a powerful tool for phylogenetic footprinting.

The selected orthologous promoter sequences should be from moderately diverse species so that there had been sufficient evolutionry time for mutations to accumulate in non-functional regions. Species with cumulative phylogenetic branch lengths of more than about 200 million years (e.g. orthologous genes from human, mouse, bovine and chicken ) offer good candidates for such analysis (Duret and Bucher, 1997).



Genes with common expression profiles:

Genes that are specifically expressed in same tissue at same time might have common regulatory programs and might be recognised by common trans factors. Therefore, conserved motifs in genes showing common expression profiles are likely to be involved in spatial/temporal expression. With ongoing whole genome projects and advances in "DNA-chip  technology"  it is possible to identify large number of genes sharing common expression profiles. The program TRES can be useful to detect putative regulatory elements in such set of genes.



METHODS


Input Sequence Set

TRES reads the input sequences in FASTA format. As many as 20 related sequences of maximum 1000 bp can be searched simultaneously using the program TRES.

It is important to note that the sequences in the set should be moderately diverse so that there is sufficient background noise. The pariwise alignment scores can indicate the extent of diversity between sequences. For example sequences with pairwise alignment scores in the range of about 45 to 65 are suitable for comparative analysis.



Searching TF binding sites using TRANSFAC weight matrices

With the TRES:Matirx-Scan module, the input sequences can be searched for conserved TF binding sites using nucleotide frequency distribution matrices described in TRANSFAC database ( Heinemeyer et al, 1999 ). The  position weights and matrix similarity scores are calculated essentially according to Quandt et al (1995) except that  the gaps are not considered and a pre-processed library of normalised  weight matrices is used during runtime.

During matrix scanning, sliding along each sequence, the matrix similarity score is calculated by simply adding normalised weights. This directly gives a comparative value in the range of 0 to 100. A TF binding site is considered to be conserved only when the matrix similarity score falls above the user selected cut-off value in the range 85 to 100. Higher matrix cut-off score assures lesser chances of false positives. A matrix cut-off score greater than about 95 may be considered highly stingent.



Searching TF binding sites / cis-elements from TRANSFAC, ooTFD and PLACE Databases

A large number of TF binding sites / cis-elements / enhancer elements have been described in the form of consensus sequences. Presently 3980 TF binding sites from TRANSFAC database, 5919 TF binding sites from ooTFD database and 240 plant cis-acting elements from PLACE database can be searched in promoter DNA sequences using the program TRES. To avoid false positives only the sites with a minimum of 6 bases recognition sequence have been included.

The IUPAC consensus characters and match scores used while searching the motif are described in Table-1. Users can choose for perfect match of a motif or can define mismatch level that can be tolerated while searching.  Mismatch level of 1 in 10 bases may be considered sufficiently stringent.



Searching palindromic motifs (inverted repeats)

Palindromic motifs or inverted repeats have unique features of dyad symmetry and ability to form hairpins or loops. They facilitate protein binding in homo-or hetero-dimer form. A majority of b-zip and b-HLH family of transcription factors have palindromic recognition sequences and bind as dimer. Important advantages of dimerisation are stability of protein-protein and protein-DNA interactions and generation of diversity from limited number of transcription factors (Lamb and McKnight, 1991). Therefore, conserved palindromes may have functional significance in transcription regulation.



k-tuple search

A k-tuple is a substring or window of length 'k' on a sequence. In k-tuple search, all possible substrings (sliding windows) from each individual sequence are searched for conservation in all the sequences on both the strands. Thus k-tuple search can detect any motif of length 'k' that is common in given set of sequences.

It is important that the sequences in the set are not too similar, else a very large number of k-tuples can be seen conserved in all.

k-tuple search can be done only when there are a minimum of 4 sequences in the set.



Conservation of motifs can be studied relative to TATA box

Since TATA box is the site of assembly of Transcription Initiation Complex (TIC), spatial conservation of putative TF binding sites relative to TATA box might have significance in interaction of promoter bound TFs with TIC

When this option is selected, the TATA box is searched by convention in last 150 bp of each input sequence using TATA-box weight matrix described by Bucher (1990).



References

Bucher,P. (1990) Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol., 212, 563-578.

Duret,L. and Bucher,P. (1997) Searching for regulatory elements in human noncoding sequences. Curr. Opinion Struct. Biol., 7, 399-406.

Ghosh,D. (2000) Object oriented Transcription Factor Database (ooTFD). Nucleic Acids Res., 28, 308-310.

Gumucio,D.L., Shelton,D.A., Zhu,W., Millinoff,D., Gray,T., Bock,J.H., Slightom,J.L. and Goodman,M. (1996) Evolutionary strategies for the elucidation of cis and trans factors that regulate the developmental switching programs of the ?-like globin genes. Mol. Phylogenet. Evol., 5, 18-32.

Heinemeyer,T., Chen,X., Karas,H., Kel,A.E., Kel,O.V., Liebich,I., Meinhardt,T., Reuter,I., Schacherer,F. and Wingender,E. (1999) Expanding the TRANSFAC database towards an expert system of regulatory molecular mechanisms. Nucleic Acids Res., 27, 318-322.

Higo,K., Ugawa,Y., Iwamoto,M. and Korenaga,T. (1999) Plant cis-acting regulatory DNA elements (PLACE) database:1999. Nucleic Acids Res., 27, 297-300.

Lamb,P. and McKnight,S.L. (1991) Diversity and specificty in transcription regulation:the benefites of heterotypic dimerization. Trends Biochem. Sci., 16, 417-422.

Quandt,K., Frech,K., Karas,H., Wingender,E. and Werner,T. (1995) MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res., 23, 4878-4884.



Contact

For comments and suggestions please contact:

Vidya S. Gupta                                                             Mukund V. Katti
Email:  vidya@ems.ncl.res.in                                       Email:  mvkatti@kelvin.ncl.res.in

Division of Biochemical Sciences
National Chemical Laboratory
Pune- 411 008
INDIA

Telephone: 91-20-5893034
Fax:            91-20-5884032