|
|
Fang Cheng
NYU Bioinformatics Group,
715 Broadway, Rm 1009
New York, NY 10012
|
Phone: 212 998 33xx
Fax: 212 998 33xx
e-mail:fc417 [at] nyu.edu
|
Research Interest
I joined the bioinformatics group as a biology Ph.D. student. My major research interest is in genome evolution.
The classic view of evolution combined Darvinian concepts of gradualism and natral selection with random mutation and Mendelian segregation as the mechanism of evolutionary variability. This perspective provided powerful theory in many studies in evolution of individual protein coding regions; however it failed in explaining the origin of complex integrated systems and the ubiquity of natural genetic engineering systems. In another word, this classic view of evolution is not sufficient to explain the evolution of biological complexity, which is far beyond the gradual changes at protein coding regions.
We are favoring a modular view of the operation of genetic systems, which emphasizes the composite, systemic nature of individual genetic loci, and the physical and structural functions of the non-coding regions. Many computational problems showed up when people extend their interests into the evolution at transcriptional level, post-transcriptional level, and the protein non-coding regions, especially the repetitive elements that make up the majority sequence of higher organisms. Taking the advantage of the strong computational background of our group, we are aiming at understanding the large scale organization of genomes, the function of repetitive DNA sequences, especially transposable elements, in genome rearrangement and how the genome structures affect the transcriptome. Tools and methodology developed for whole-genome comparison, and integration of sequence properties and expression could be used in other studies in genome evolution and functional genomics.
Current Projects
I am currently doing several projects at the NYU bioinformatics group.
- Affects of intron-hosted transposable elements on local transcriptional/post-transcriptional regulation:
We favor a systemic view of the genome, which states that genomes are organized like integrated computer programs as systems of routines and subroutines, not as a collection of independent genetic units. DNA sequences which do not code for protein structure determine the system architecture of the genome. Transposable elements (TEs) as a special population of such non-coding regions is gaining more and more attention due to multiple reasons, e.g. their mobility, the gene structures (promoter, terminator, LTR, splicing sites) they have provide modules for reorganize local gene structures, the insertion site specificity and ability to sense internal/external signals coordinate cellular status with their local affects. In specifically, we are focusing on the correlation between two events: TE inserts into genes and genes gain new splicing isoforms.
Using the Drosophila annotation release 3.1, we saw strong correlation between these two events in our preliminary tests. However getting insights about the evolutionary explanations of such correlation requires analysis on several closely related species, since any property we see in single genome could be only a temporary property of a dynamic system. Complete genomic sequences of Drosophila pseudoobscura and Anopheles gambiae provide good sister genomes for tracing the evolutionary history of each event and the possible affect of one on the other. Although the high-quality annotation data for the comparing sister genomes is still not available, we think the study is doable by taking the advantage of abundant EST data and the ability to do whole-genome alignment. The study relates the non-coding genomic signatures with the local property at the transcriptome level, which has not been addressed in any published work, will bring valuable insights on the function of non-coding DNA on the evolution of genome complexity. Also, since the difficulties met in this study are the common problems emerging in the genome comparisons, the methodology and tools developed for this project could be also used in other researches.
- Tool for functional analysis based on gene ontology (GO) data:
In this project, I am working with several computer scientists on developing an interactive analyzing tool for using gene ontology data. The current GO tools are browsers for searching GO database that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner. Working as a browser, our tool allows batch query the GO database and shows the result as a subgraph of the whole gene ontology DAG, which starts from the most detailed functional category that can cover all the user submitted genes, and traces all the way down to the most detailed functional terms for each individual gene in the list. The graphic presentation of searching results has not been in any of the current GO browsers. More than a browser, our tool allows the following analyzing functions: (1) Calculate functional distance between any selected genes to all the other genes in the user submitted list based on the probability of seeing two genes in one particular functional category given the number of genes that are grouped in the same category. The results will be presented as a network, in which length of edges represents the functional distance. User will be allowed to adjust the length according to their knowledge from other resources. (2) Cluster genes based on user selected functional distance value. (3) Mapping user selected functional cluster onto chromosomes, and allowing simple statistical tests based on the clustering results and the chromosomal positions.
This tool will be especially useful when people submit a list of genes that were suggested to be functionally related from different large-scale investigations, e.g. microarray experiments or yeast two-hybrid screen, because it could suggest the specific biological processes that the particular group of genes working together on, and the possible false positives in the user-defined functional clusters. We should be able to put demos here very soon.
Useful Links
Gene Ontology Main Web Site.
Berkeley Drosophila Genome Project.
Whole-genome compative analysis of the D. melanogaster & D. pseudoobscura.
|