The objective with this assignment is to determine the correlation between exons and domains. The working hypothesis is that eukaryotic protein domains are predominantely evolving within a single exon, thereby facilitating easier “domain exchange” between genes/proteins.
Your assignment is to answer the following questions.
- How often are domains found within a single exon, and how often do they cross exon boundaries?
- Are these numbers significant, i.e., are they different from what is expected by chance? Perform a randomization experiment where you ignore each domain’s actual position within a protein and place it randomly.
You should make a genome wide survey, i.e., you use as many protein coding genes as possible. Exluding data requires serious justification.
Download exon data for human genes from Ensembl’s BioMart. There is domain information available at Ensembl, but you might want to use domain definitions from Pfam.