Our group is studying several aspects of RNA biology using computational means. To enable analyses of splice variation, we have developed a fully-automated software tool that starts from large scale sequence data sets (such as Unigene) and performs all the steps that are necessary to generate a web-accessible database of all the splice forms observed in the sequence data.
An essential component of this tool is a novel algorithm that we developed for mapping cDNA and EST sequences to their corresponding genome . The algorithm integrates information about gene structure, splice sites and sequencing errors in a Bayesian probabilistic framework to infer the most likely mapping of a cDNA sequence to the genome.
We are using the resulting data about splice variants to study the mechanisms regulating alternative splicing. We found for example that the exons that are included in some transcripts and skipped in others differ from constitutive exons in several respects. Most notably, their length distribution is wider, their splice sites “weaker” and they have lower frequency of several known splice enhancer motifs .
We also found that a simple model that describes the binding specificity of the spliceosome to the splice sites can explain the frequent occurrence of small changes in the location of splice sites . This indicates that noise, manifested in the stochastic binding of the spliceosome to neighbouring, competing splice sites, can explain much of the splice variation inferred from sequence databases. On the other hand, we found that for many exons that can be either included or skipped the inclusion is correlated with the choice of transcription start site .
Currently, we develop computational methods for the analysis of high-throughput short RNA sequencing data. We applied these methods mostly to the identification of novel regulatory RNAs [2,3,4 and www.mirz.unibas.ch/smiRNA-annotation], but other groups used short RNA sequencing techniques to identify binding sites for transcription factors and RNA-binding proteins, promoter regions and alternative splice forms.
Key lab techniques: algorithms for spliced alignment and inference of splice variants from sequencing data, analysis of high-throughput short RNA sequencing data, non-coding RNA gene prediction, discovery of regulatory motifs, target prediction for small regulatory RNAs.
Interest in alternative splicing: Alternative splicing is one of main mechanisms that contributes to the complexity of eukaryotes. While much is known about the mechanism of splicing, the factors that contribute to the expression of a specific splice form at the particular time, in a particular cell is largely unknown. The availability of large data sets of cDNA and EST sequences as well as of complete genomes from various eukaryotes enabled us and others to uncover signals that contribute to the regulation of alternative splicing through computational analyses.
Lab contact: Yvonne Steger: Yvonne.firstname.lastname@example.org
Lab website: www.biozentrum.unibas.ch/zavolan/index.html