The original Cray Bioinformatics Library (CBL) is a low level set of library routines using proprietary Cray hardware to implement some common nucleotide/protein sequence manipulations typical in a bioinformatics context. Written in Fortran and Cray assembly language (most are callable from C), the original CBL was coded and optimized on a Cray SV1 vector machine. Cray also has a port for their new X1.
The Portable CBL is the open source version written in C that implements the computational primitives in a generic fashion with little regard to specific hardware. The CBL routines facilitate performance by operating on compressed data whenever possible. In the case of nucleotide data, for example, it is sufficient to represent each of the four nucleotides with only two bits, and thus a 64-bit word can contain a sequence of 32 nucleotides instead of the normal 8. A CBL search routine can then take advantage of the compression by comparing whole words, one containing all or part of a compressed query, and the other containing part of a compressed database. Since query and database are effectively 1/4 in size, significant performance is realized. In addition to 2-bit compression, CBL supports 4 bit and 5 bit levels for larger alphabets. The CBL will continue to grow as additional biological computational primitives are identified and implemented.
Click here for a paper delivered at the 2003 Cray User's Group describing library internals with performance measurements (Reprinted here by permission of the Cray User Group Incorporated. All other rights are reserved. And please note that my contact info has changed.)
cb_amino_translate_ascii - translate nucleotides to amino acids cb_compress - compresses nucleotide or amino acid ASCII data cb_copy_bits - copy contiguous sequence of memory bits cb_countn_ascii - counts A, C, T, G, and N characters in a string cb_fasta_convert - restructure the memory image of a FASTA file cb_free - frees memory allocated with cb_malloc in Cray version - simply calls free() in portable version cb_irand - generates an array of random bits cb_malloc - allocate block aligned memory region in Cray version - simply calls malloc() in portable version cb_read_fasta - loads data from a FASTA file into memory arrays cb_repeatn - find short tandem repeats in a nucleotide string cb_revcompl - reverse complements compressed nucleotide data cb_searchn - gap-free nucleotide search allowing mismatches cb_uncompress - uncompress nucleotide or amino acid data to ASCII cb_version - returns the version number of libcbl cb_swa_fw - compute Smith-Waterman cell scores with ASCII input and full word output
Subscribe to the mailing list (low volume)
cb_isort & cb_isort1 - unsigned integer radix sort with and w/o index array cb_cghistn - histograms of cg density in a string cb_swn_fw & cb_swn4_fw - same as cb_swa_fw, except with 2- or 4-bit nucleotide input cb_nmer - creates up to 64-bit-length short sequences from each starting point in the input string
James Long
jlong@alaska.edu
Biotechnology Computing Research Group
Arctic Region Supercomputing Center