VILLUM Research Center for
Plant Plasticity

Workshop 2 on database mining and gene annotation

(prerequisite have the NCBI software installed on your laptop)

By visiting professor

David Nelson
University of Tennessee Health Science Center, Memphis TN
Dept. of Microbiology, Immunology and Biochemistry

With the NCBI Stand Alone BLAST software, I will demonstrate how to how to mine all the P450s from a transcriptome using a query set. The query set may be a defined set of P450 families from a given taxon, or it could be 1000 P450 sequences from algae. Once the Blast results are returned, I will show you how to filter for unique hits and recover these accessions with a single command from your blast database that may have >1,000,000 sequences in it. If the hits are nucleotide sequences from a transcriptome, I will show you how to use the Virtual Ribosome website to batch translate your sequences (works well except on pseudogenes). After the sequences are in protein FASTA format we will batch blast search them against named plant P450s to recover best hit IDs. We can sort by length of alignments to mark short sequences and then sort by %ID to mark those <40% (possible new families). Then we can sort by Best Hit CYP name to sort the new sequences into families and subfamilies in preparation for assigning names. (A PowerPoint tutorial will be provided).

NB! We need to know how many are attending the workshop, so please let us know by sending an e-mail to Anne: abm@plen.ku.dk if you are attending.

VILLUM Research Center for Plant Plasticity

Workshop 2 on database mining and gene annotation

(prerequisite have the NCBI software installed on your laptop)

Details