Enhanced Entrez

New Home Page

Cn3D 2.5

News Briefs

GenBank
Contig Division

VecScreen

Recent Publications

BLASTLab

Frequently Asked
Questions

Masthead



VecScreen
BLAST the Vector Out of Your Sequence
(GenBank Indexers Say "Use It!")

VecScreen is a system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. VecScreen was originally developed at NCBI for the GenBank Annotation Staff, who use it to verify that sequences submitted for inclusion in the database are free from contaminating vector sequence. It is now available to the research public through the VecScreen Web site to help researchers identify and remove any segments of vector origin prior to sequence analysis or submission. Early identification of any foreign segments can avert erroneous conclusions about the biological significance of the sequence, prevent time and effort from being wasted in analysis of contaminated sequence, and speed the release of the sequence in a public database.
 

The UniVec Database

VecScreen performs an optimized blastn search, with the sequence to be screened as the query, of a specialized non-redundant vector database called UniVec. Screening against UniVec is efficient because a large number of redundant subsequences have been eliminated to create a database with only one copy of every unique sequence segment from a large number of vectors. In addition to vector sequences, UniVec contains sequences for those adapters, linkers, and primers commonly used in the process of cloning cDNA or genomic DNA. This enables detection of contamination with these oligonucleotide sequences during the vector screen. Elimination of redundant sequence segments reduces UniVec to less than 15% of the size of an equivalent database containing the full sequences for the same set of vectors. UniVec also uses a “pseudo-circularization” process, appending the first 49 bases of a circular vector sequence to the end of the vector to avoid missing a match due to end effects. The current version of UniVec represents 971 vector and oligonucleotide sequences. To see the sequences used to build the database, go to the UniVec Representation List, accessible from the VecScreen page.
 

The VecScreen Graphic

The blastn output from VecScreen is summarized using a graphical representation of the query sequence, which is color-coded to show the location of segments that match vector sequences. The matches are color-coded at four levels of significance: strong, moderate, weak, and suspicious.

Give VecScreen a try at www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html.

PK, DW


NCBI News | Fall 99