Saturday, January 30, 2010

Bioinformatics Software and APIs

Automatic Comparative Sequence Analysis (AutoCSA) is a mutation detection software designed to detect small mutations (1-50 bases) in sequence traces. The software is capable of detecting both homozygous and heterozygous base substitutions, as well as small insertions and deletions, to a high sensitivity. It is made for high throughput environments in mind, so it is easy to automate the analysis of large amounts of data with little manual intervention.
DbCon is a Java library that provides a simple interface to DBCP. It offers distributed pooling configuration and provides a clean layer of separation between Java code and SQL. Important features are ConnectionPooling (via DBCP), SingletonConnectionPools (connection pools with only one connection), DataSources backed by the same connection pools, SqlLibrary for SQL storage available both in a custom format and SqlLibXmlFormat, ObjectFacades, Corrects for the OracleMetadata foibles, PasswordObfuscation. The software originally developed by Sanger Cancer Genome Project is available as BSD license.
PICNIC is an algorithm to predict absolute allelic copy number variation with microarray cancer data.
A commonly used software for statistical analysis of microarray data sets is TIGR-MeV. It accepts multiple input data formats. There are mainly two families of data sets: single or multiple chip.
Single chip data sets contain one row per spot (gene), one column per criterion. The different columns typically contain information about green channel intensity, red channel intensity, green channel background, red channel background, spot number, and position of the spot on the slide. There are more elaborate formats like GenePix, spot and TMev that contain additional columns.
Multiple chip data sets combine information from multiple chips. Each chip can represent a given experiment, sample, tissue type, patien type or time point. It contains one row per spot, and one column per chip (plus a few columns with a description of the gene, and some additional parameters). The information in the experiment columns is usully restricted to Red/Green ratios or log(ratios).

Genome, RNA and Protein Data Sources

Biological knowledge is distributed amongst many different general and specialized databases. The standard resource for finding biological databases is the Database Issue of NAR. I will discuss a few data sets that came to my attention for various reasons.
The Cancer Gene Census is an ongoing effort to catalogue those genes for which mutations have been causally implicated in cancer. The database is searchable by cancer genes that are amplified in cancer, by chromosome, by cancer genes that are characterised by frameshift mutations, by germline mutations, by large deletions (covers the abnormalities that result in allele loss/loss of heterozygosity at many recessive cancer genes), by missense mutations, by nonsense mutations, by other types of mutations ( small in-frame deletions and insertions as found in KIT/PDGFRA and larger duplications/insertions as found in FLT3 and EGFR), by cancer genes that are somatically mutated, by splicing mutations, by cancer gene symbol, and by by translocations.
The COSMIC is a database for somatic mutation information relating to human cancers. It contains information on publications, samples and mutations, including samples which have been found to be negative for mutations during screening (enabling frequency data to be calculated for mutations in different genes in different cancer types). The data include benign neoplasms and other benign proliferations, in situ and invasive tumours, recurrences, metastases and cancer cell lines. The data can be queried by tissue, histology or gene and displayed as a graph, as a table or exported in various formats.
The Pfam database is a large collection of protein families. A protein family is represented by multiple sequence alignments and profile hidden Markov models (HMMs). In order to understand the function of domains (one or more functional regions), the focus is on there combination found in nature. The website allows searches for alignments, trees, protein structure and other functional information for each family. The Pfam libraries of HMMs can be used locally to define domains in complete genomes.
The Rfam database is a collection of RNA families. A RNA family is represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs). RNA families have functional classes: non-coding RNA genes, structured cis-regulatory elements and self-splicing RNAs. These functional classes often have a conserved secondary structure which may be better preserved than the RNA sequence. In contrast to Pfam CMs are used to model both, RNA sequence and secondary structure. Rfam allow the user to search (locally with INFERNAL package or on the web site) a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation.

Thursday, January 28, 2010

How I would use Apple's new iPad

Yesterday, Steve Jobs introduced the newest gadget from Apple. The tablet computer got it's eagerly awaited debut. This device, Apple calls it iPad, is located between the phone and the laptop computer. You can't make phone calles with it, at least not in the conventional manner. Still, there is an interesting opportunity to use it as an voip phone because of the attractive data plans for the 3G version of it. But as a mobile phone it is simply too large. The iPad can be used as an ebook reader. It's 9.7" 1024 x 768 pixel 132 ppi display is larger then the Kindle (6", 600 x 800 pixel, 167 ppi) and it is color. This could be the reading experience comparable to reading a real book or a newspaper. It supports the EPUB format, an open ebook standard. However, keep in mind that EPUB (and other formats) has issues rendering equations at this time. The second strength of the iPad is browsing the web. Apple build a great UI that could make it even a better experience then on a laptop computer. I would add iWorks as the third strength of the iPad. For what I have seen on the demonstration, it could be better then the laptop/desktop version.

Wednesday, January 27, 2010

Science and Ethics

What are the ethical foundations of scientific practice and the personal and professional issues that researchers encounter in their work?

Tuesday, January 26, 2010

Ice Crystals on Snow Surface

I took this photo last weekend while we where skiing in the Harz. It shows tiny ice crystals all over on the surface of the snow. I am not quite sure about the metheological explanations for this phenomenon. I think the fluffy show sculptures with the glitter crystals against the sun are beautiful. I did some soft focus and focal b&w post-processing on the image. 

Sunday, January 17, 2010

Crisis Response: Support Disaster Relief in Haiti

A 7.0 magnitude earthquake struck Haiti on January 12th. Join recovery efforts mobilizing around the world to assist earthquake victims. Your donation will help disaster victims rebuild their lives and their communities.
Help map Haiti - Directly assist relief workers in saving lives. Post-Earthquake Imagery from GeoEye:
If you have any information that helps people to connect to there family and loved ones in Haiti, please use the person finder below.



The next two weeks I will connect anyone calling my land line here in Berlin (030)57709925 to any number in Haiti for free.

Wednesday, December 16, 2009

International Calling Rates

Some visitors to Europe may be surprised how much they have to pay for there telephone fees. Most Europeans are probably used to it, but there is actually no reason for doing so. Below, there is a comparison of calling rates. Most of the services require some sort of internet access. This may cause extra fees. Rates are without calling plan.

I admit that this selection is especially tailored for my own use. If you wish to update some rates or add more providers and destinations to the table just send me your email address.

Friday, December 11, 2009

Ten Simple Rules for ...

  1. Doing your best research
  2. Selecting a Postdoctoral Position
  3. Successful Collaboration
  4. Making good Oral Presentations
  5. Getting Published (video)
  6. Getting Grants (video)
  7. Reviewers (How to not review a paper)
  8. Good Poster Presentation
  9. Organizing a Scientific Meeting
  10. Searching and Organizing the Scientific Literature
  11. Successful Start-up
  12. Mathematical Writing
This has been published many times and I still think it is very useful.