 |
Drosophila Sequence Gets 40-Megabase Boost from Celera |
The complete Drosophila melanogaster sequence is near completion. Currently, GenBank contains 40 million base pairs of finished Drosophila genomic sequence from the NIH-funded Berkeley Drosophila Genome Project and the European Drosophila Genome Project. An additional 100 million base pairs of unfinished sequence, including over 600 BACs from the Berkeley project, are in the HTG division of GenBank. Recently, Celera Genomics submitted another 10 million base pairs of unfinished contigs, also in the HTG division. To BLAST against these new sequences, select the HTGS database from the Advanced BLAST page, and restrict the search to Drosophila melanogaster using the organism filter box.
|
|
More dbSNP Data by FTP |
NCBI now provides the full dbSNP database in three formats: flatfile, FASTA, and SQL DDL/table dumps. Complete copies of the database will be refreshed weekly. See the README file for a complete description of the various formats (ftp://ncbi.nlm.nih.gov/snp/00readme).
|
 |
Human Contig Breaks 10-Megabase Barrier |
A segment of DNA sequence from chromosome 22 has become the first human continuous sequence over 10 Mb in length. The Sanger Centre, Washington University, and the University of Oklahoma contributed to this sequence. See the list of contigs for human chromosome 22 on NCBIs Human Genome Sequencing page.
|
 |
UniGene for Zebra Fish
|
The zebra fish (Danio rerio) has been added to the UniGene lineup. Over 5,600 zebra fish clusters are represented. Zebra fish joins the human, mouse, and rat versions of UniGene.
|
 |
Submitting GSS Sequences
|
After January 1, 2000, GenBank will no longer process Genome Survey Sequence (GSS) submissions made with BankIt or Sequin. Instead, use the custom GSS submission procedures (www.ncbi.nlm.nih.gov/dbEST/ how_to_submit.html). The e-mail address for GSS submissions is: batch- sub@ncbi.nlm.nih.gov.
|
 |
Standalone BLAST 2.0.10
|
The latest version of Standalone BLAST is the first to include a standalone version of BLAST 2 Sequences (called bl2seq). Standalone BLAST 2.0.10 also allows searches of multiple databases, which circumvents two current limits on database sizea maximum of 2 gigabytes for any database and a maximum of 4 billion base pairs for nucleotide databases. Databases that exceed these limits can now be formatted as a series of smaller volumes using the program formatdb, also included in the BLAST 2.0.10 package. The blastall program performs the multidatabase search.
|
 |
COGs Includes 21 Genomes
|
Clusters of Orthologous Groups (COGs) now incorporates 21 complete genomes, tripling the number of organisms initially represented. In addition to phylogenetic patterns, the COGs may now be searched using free-text words or protein and gene names. PSI-BLAST searches may be launched using a set of similar sequences from the COGs database to construct a position-specific scoring matrix (PSSM). This PSSM can then be used to search for remote homologs.
|
 |
BLAST E-Mail Server
|
The BLAST e-mail server has been upgraded to use the new QBLAST system. The server therefore no longer supports BLAST 1.4 searches or the RIPEM encryption software.
|
|