Abstract
Sequence alignment is a tool in bioinformatics that is used to find homological relationships in large molecular databases. It can be mapped on the physical model of directed polymers in random media. We consider the finite-temperature version of local sequence alignment for proteins and study the transition between the linear phase and the biologically relevant logarithmic phase, where the free energy grows linearly or logarithmically with the sequence length. By means of numerical simulations and finite-size-scaling analysis, we determine the phase diagram in the plane that is spanned by the gap costs and the temperature. We use the most frequently used parameter set for protein alignment. The critical exponents that describe the parameter-driven transition are found to be explicitly temperature dependent. Furthermore, we study the shape of the (free-) energy distribution close to the transition by rare-event simulations down to probabilities on the order . It is well known that in the logarithmic region, the optimal score distribution is described by a modified Gumbel distribution. We confirm that this also applies for the free-energy distribution . However, in the linear phase, the distribution crosses over to a modified Gaussian distribution.
1 More- Received 13 July 2009
DOI:https://doi.org/10.1103/PhysRevE.80.061913
©2009 American Physical Society