The Portable Cray Bioinformatics Library

The original Cray Bioinformatics Library (CBL) is a low level set of library routines using proprietary Cray hardware to implement some common nucleotide/protein sequence manipulations typical in a bioinformatics context. Written in Fortran and Cray assembly language (most are callable from C), the original CBL was coded and optimized on a Cray SV1 vector machine. Cray also has a port for their new X1.

The Portable CBL is the open source version written in C that implements the computational primitives in a generic fashion with little regard to specific hardware. The CBL routines facilitate performance by operating on compressed data whenever possible. In the case of nucleotide data, for example, it is sufficient to represent each of the four nucleotides with only two bits, and thus a 64-bit word can contain a sequence of 32 nucleotides instead of the normal 8. A CBL search routine can then take advantage of the compression by comparing whole words, one containing all or part of a compressed query, and the other containing part of a compressed database. Since query and database are effectively 1/4 in size, significant performance is realized. In addition to 2-bit compression, CBL supports 4 bit and 5 bit levels for larger alphabets. The CBL will continue to grow as additional biological computational primitives are identified and implemented.

Click here for a paper delivered at the 2003 Cray User's Group describing library internals with performance measurements (Reprinted here by permission of the Cray User Group Incorporated. All other rights are reserved. And please note that my contact info has changed.)

Portable CBL v1.1 routines

cb_amino_translate_ascii - translate nucleotides to amino acids

cb_compress - compresses nucleotide or amino acid ASCII data

cb_copy_bits - copy contiguous sequence of memory bits

cb_countn_ascii - counts A, C, T, G, and N characters in a string

cb_fasta_convert - restructure the memory image of a FASTA file

cb_free - frees memory allocated with cb_malloc in Cray version
        - simply calls free() in portable version
          
cb_irand - generates an array of random bits

cb_malloc - allocate block aligned memory region in Cray version
          - simply calls malloc() in portable version
            
cb_read_fasta - loads data from a FASTA file into memory arrays

cb_repeatn - find short tandem repeats in a nucleotide string

cb_revcompl - reverse complements compressed nucleotide data

cb_searchn - gap-free nucleotide search allowing mismatches

cb_uncompress - uncompress nucleotide or amino acid data to ASCII

cb_version - returns the version number of libcbl

cb_swa_fw - compute Smith-Waterman cell scores with ASCII input
            and full word output

Download ver 1.1

Subscribe to the mailing list (low volume)

Coming in v1.2

cb_isort & cb_isort1 - unsigned integer radix sort with and w/o index array
cb_cghistn - histograms of cg density in a string
cb_swn_fw & cb_swn4_fw - same as cb_swa_fw, except with 2- or 4-bit nucleotide input
cb_nmer - creates up to 64-bit-length short sequences from each 
          starting point in the input string

James Long
Biotechnology Computing Research Group
University of Alaska Fairbanks
PO Box 757000
Fairbanks, AK 99775
USA
Voice: (907) 474-5769
Fax: (907) 474-5712

jlong@alaska.edu Biotechnology Computing Research Group Arctic Region Supercomputing Center