Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications

Kirkpatrick, Anna; Patton, Kalen; Tetali, Prasad; Mitchell, Cassie

doi:10.3390/mca25040067

Open AccessArticle

Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications

¹

School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA

²

School of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA

³

Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl. 2020, 25(4), 67; https://0-doi-org.brum.beds.ac.uk/10.3390/mca25040067

Submission received: 7 July 2020 / Revised: 26 September 2020 / Accepted: 1 October 2020 / Published: 10 October 2020

(This article belongs to the Section Natural Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

Ribonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distance, etc.), it is difficult to know whether the resultant values fall within expected dispersion thresholds for a given energy function. The goal of this study was to construct a Markov chain capable of examining the dispersion of RNA secondary structures and branching properties obtained from NNTM energy function minimization independent of a specific nucleotide sequence. Plane trees are studied as a model for RNA secondary structure, with energy assigned to each tree based on the NNTM, and a corresponding Gibbs distribution is defined on the trees. Through a bijection between plane trees and 2-Motzkin paths, a Markov chain converging to the Gibbs distribution is constructed, and fast mixing time is established by estimating the spectral gap of the chain. The spectral gap estimate is obtained through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. The resulting algorithm can be used as a tool for exploring the branching structure of RNA, especially for long sequences, and to examine branching structure dependence on energy model parameters. Full exposition is provided for the mathematical techniques used with the expectation that these techniques will prove useful in bioinformatics, computational biology, and additional extended applications.

Keywords:

Markov chain Monte Carlo; RNA secondary structure; nearest neighbor thermodynamic Model; Markov chain convergence

1. Introduction

Computational and mathematical applications play a critical role in the analysis of the structure and function of biological molecules, including ribonucleic acid (RNA). RNA is an essential biological polymer with many roles including information transfer, regulation of gene expression, and catalysis of chemical reactions. The primary structure of an RNA molecule may be understood as a sequence of amino acids: arginine, urasil, guanine, and cytosine. As is standard, we frequently abbreviate these as A, U, G, and C, respectively. RNA molecules are single-stranded and may therefore interact with themselves, forming A–U, G–U, and G–C bonds. The secondary structure of an RNA molecule is a set of such bonds.

The determination of secondary structure is an important step to understanding an RNA molecule’s full shape and therefore its function [1,2]. Accordingly, secondary structure information is commonly used in tertiary structure prediction algorithms, see, e.g., [3,4,5,6]. Identifying the secondary structure of RNA is crucial to understanding its function and mechanism in a cell [7]. Thus, the structure of RNA is critical to the development of biological and pharmaceutical therapeutics. Biologists use inexpensive and expedient means to sequence RNA, but the experimental determination of structure is more difficult and time-consuming. Therefore, computational methods are the primary means to determine possible RNA secondary structures.

For decades, one of the main computational approaches for examining RNA structure and branching properties has been thermodynamic free energy minimization using Nearest Neighbor Thermodynamics Modeling (NNTM) [8,9,10]. This free energy is in turn used in algorithms to predict secondary structure given an RNA sequence, see, e.g., [11,12,13]. Under the NNTM, the free energy of a structure is computed as the sum of the free energy of its various substructures. Many common programs (e.g., mFold, RNAFold, RNA Structure, sFold, Vienna RNA, etc.) intake a single sequence to produce secondary structures based on NNTM energy minimizations performed via dynamic programming. Nearest neighbor parameter sets include both a set of rules, referred to as equations or features, and a set of parameter values used by the equations. Separate rules exist for predicting stabilities of helices, hairpin loops, small internal loops, large internal loops, bulge loops, multi-branch loops, and exterior loops. Other branching properties of interest include, but are not limited to, average ladder distance, maximum ladder distance, maximum branching degree, average contact distance, average branching degree, degree of branching at the exterior loop, number of multi-loops with n braches, etc. The online nearest neighbor database (NNDB) archives and stores complete nearest neighbor sets, including rules and corresponding parameter values [14].

A common challenge is inferring whether the predicted results of NNTM for a set of RNA structural features or branching properties are within expected dispersion thresholds for a given energy model. For example, is the number of hairpins more than 2–3 standard deviations greater than the expected mean for a given energy model? This challenge is particularly vexing if the sequence is relatively long (greater than 1000 nucleotides). If structural features or branching properties are determined to exceed expected energy model dispersion thresholds, it relays potential scientific and/or mechanistic insight. Continuing with our hairpin example, what if an NNTM model produces a result where the number of hairpins seems rather large for the given sequence length? If the number of hairpins exceeds the expected dispersion of the NNTM model, it might be inferred that the greater number of hairpins is evidence of natural selection.

The primary objective of the present study is to enable mathematical determination of the dispersion of RNA secondary structural features for a given sequence length. We present a Markov-based algorithm to provide samples of the branching structure under the NNTM and Gibbs distribution, but without reference to a particular sequence of nucleotides. The algorithm enables the determination of where the predicted feature or branching property for an actual sequence falls within this distribution, which in turn enables the determination of whether the predicted NNTM feature or branching property is within expected dispersion limits.

In particular, this work investigates RNA substructures called multi-loops, the places where three or more helices join. Though multi-loops are crucial to the overall shape of a secondary structure, the models used to predict them algorithmically do not produce accurate results [15]. This investigation builds on an existing model of RNA branching [16] and provides a theoretical grounding for a Markov chain which may be used to algorithmically investigate branching properties of secondary structure models. The investigational foundation is a model for RNA secondary structure developed by Hower and Heitsch [16], in which secondary structures are in bijection with plane trees and the minimal energy structures of the model have been previously characterized. The present study characterizes the full Gibbs distribution of possible structures. Notably, Bakhtin and Heitsch [17] analyzed a very similar model and determined degree sequence properties of the distribution of plane trees asymptotically. However, the present study utilizes a Markov chain-based sampling algorithm to investigate the Gibbs distribution in the finite case. A full explanation of the plane tree model as well as the derivation of the energy functions is provided in Section 2.1.

2. Methods

The methods are divided into an overview of the RNA secondary structure NNTM plane tree model and energy functions (Section 2.1) and an all-encompassing explanation of the mathematical preliminaries that lay the foundation for the derived results and corresponding algorithms (Section 2.2).

2.1. Derivation of Energy Functions

The energy function studied here is derived from the Nearest Neighbor Thermodynamic Model (NNTM). The numerical parameters from the NNTM can be found in the NNDB [14]. In calculating energy functions for the sequences, we consider thermodynamic parameter values published by Turner in 1989 [8], 1999 [9], and 2004 [10].

The plane trees that we study in this paper come from two combinatorial RNA sequences, both of the form

A^{4} {(Y^{5} Z A^{4} Y Z^{5} A^{4})}^{n}

. The sequences of interest have

(Y, Z) = (C, G)

or

(Y, Z) = (G, C)

. For both of these sequences, the set of maximally-paired secondary structures is in bijection with the set of plane trees of size n [18]. Figure 1 shows one example of a secondary structure and corresponding plane tree.

These specific combinatorial sequences are chosen because they allow for the study of the relationship between NNTM multiloop parameters and the branching behavior of secondary structures without interference from the energy contributions have specific base pairing combinations. In particular, the only places where the free energy differs between different secondary structures (for the same sequence) is in the type and number of multi-loops, the branching at the exterior loop, the number of hairpins, and the number of internal nodes. All of these energies directly relate to branching, not to specific base pairs. This simplification achieved by focusing only on multi-loops and branching both creates a model that is more amenable to theoretical analysis and speed computation.

Note that these secondary structures should not be considered representative of naturally occurring secondary structures. Instead, the only properties of interest in these structures are branching-related properties.

Three constants determine the free energy contribution of multiloops under NNTM, a, b, and c. The value of a encodes the energy penalty per multiloop. The constant b specifies the energy penalty per single-stranded nucleotide in a multiloop. The value of c gives the energy penalty for each helix branching from a multiloop.

In addition to the multiloop parameters

a, b, c

discussed above, we must account for the energy contributions of stacking base pairs, hairpins, interior loops, and dangling energy contributions. The energy of one helix is given by h. The energy associated with a hairpin is f, and the energy contribution of an interior loop is i. Finally, the parameter g encodes the dangling energy contributions. All of these values can be computed directly from the parameters found in the NNTM.

We wish to compute the energy of the structure corresponding to plane tree t having (down) degree sequence

d_{0}, d_{1}, \dots, d_{n - 1}

and root degree r. Note that the down degree of a node x is equal to the number of children of x, and, in the down degree sequence,

d_{i}

is the number of non-root nodes with exactly i children. The energy contribution of all hairpin loops will be

d_{0} f

, and similarly, the total energy of all interior loops will be

d_{1} i

. For a multi-loop having down degree j, the energy contribution will be

a + 4 b (j + 1) + c (j + 1) + (j + 1) g

, and so the contribution of all multi-loops is given by

\sum_{j = 2}^{n} d_{j} (a + 4 b (j + 1) + c (j + 1) + g (j + 1))

. The root vertex of the tree corresponds to the exterior loop and has energy contribution

g r

. Finally, our structure has n helices, each with energy h. Summing all of these components gives the total energy.

\begin{matrix} d_{0} f + d_{1} i + \sum_{j = 2}^{n} d_{j} (a + 4 b (j + 1) + c (j + 1) + g (j + 1)) + n h + g r \end{matrix}

(1)

\begin{matrix} = (f - a - 4 b - c - g) d_{0} + (i - a - 8 b - 2 c - 2 g) d_{1} + (- 4 b - c) r + (a + 8 b + 2 c + h + 2 g) n, \end{matrix}

(2)

where we have used the facts

\sum_{k = 0}^{n - 1} d_{k} = n

and

\sum_{k = 0}^{n - 1} k d_{k} = n - r

.

Set

α = f - a - 4 b - c - g

,

β = i - a - 8 b - 2 c - 2 g

,

γ = - 4 b - c

, and

δ = a + 8 b + 2 c + h + 2 g

. Then, the energy function is

α d_{0} + β d_{1} + γ r + δ n

. Since n will be fixed, we disregard the term

δ n

, giving

E (t) = α d_{0} + β d_{1} + γ r .

(3)

Though we study these energy functions for arbitrary values of

(α, β, γ)

, numerical values for both the input energy parameters from NNTM and the resulting energy function coefficients are given in Table 1.

2.2. Mathematical Preliminaries

Section 2.2 of this manuscript provides the necessary mathematical background, including a formal introduction of combinatorial objects and a review of the relevant Markov chain mixing results used to construct our resultant sampling Markov chain and corresponding mixing time proof in Section 3.

2.2.1. Combinatorial Objects

A plane tree is a rooted, ordered tree. We will use

T_{n}

to denote the set of plane trees with n edges. It is known that

| T_{n} |

is given by the nth Catalan number

C_{n} = \frac{1}{n + 1} (\binom{2 n}{n})

. In a plane tree, a leaf is a node with down degree 0, and an internal node is a non-root node with down degree 1. For a given plane tree t, we will use

d_{0} (t)

to denote the number of leaves and

d_{1} (t)

to denote the number of internal nodes.

For a plane tree t, the energy of the tree is given by

E (t) = α d_{0} (t) + β d_{1} (t),

(4)

where

α

and

β

are real parameters of the energy function. Note that this function is a simplification of the model due to Hower and Heitsch [16] discussed in Section 2.1. Making this simplification effectively disregards the energy contribution of the exterior loop, which is small in comparison to the total energy of a structure, especially for the longer sequences that are of interest to us. Other authors have made similar simplifications, e.g., [17].

For our purposes, we consider

α

and

β

to be arbitrary but fixed. We will consider a Gibbs distribution

g

on the set

T_{n}

, where the weight of each tree t is given by

g (t) = \frac{e^{- E (t)}}{Z},

(5)

where

Z = \sum_{y \in T_{n}} e^{- E (y)}

is a normalizing constant.

A Motzkin path of length n is a lattice path from

(0, 0)

to

(n, 0)

, which consists of steps along the vectors

U = (1, 1)

,

H = (1, 0)

, and

D = (1, - 1)

and never crosses below the x-axis. We can also represent Motzkin paths as strings from the alphabet

{U, H, D}

where, in any prefix, the number of Us is greater than or equal to the number of Ds. The number of Motzkin paths of length n is given by the Motzkin numbers

M_{n}

where

M_{n} = \sum_{k = 0}^{⌊ n / 2 ⌋} (\binom{n}{2 k}) C_{k} .

(6)

Motzkin numbers and Motzkin paths have been well-studied in the combinatorics literature, see, e.g., [20,21,22,23,24].

A Dyck path is a Motzkin path with no H steps. It is easy to see that a Dyck path must have even length, so we will use

D_{n}

to denote the set of Dyck paths on length

2 n

. It is well known that

| D_{n} | = C_{n}

(see, e.g., [25]).

A 2-Motzkin path is a Motzkin path in which

(1, 0)

steps are given one of two distinguishable colors. Let

M_{m}^{2}

be the set of all 2-Motzkin paths of length m. We can also represent 2-Motzkin paths as strings from the alphabet

{U, H, I, D}

, where, as before, the number of Ds never exceeds the number of Us in any prefix. In a such a string x, we denote by

{| x |}_{a}

the number of times the symbol a appears in x, where

a \in {U, H, I, D}

. Notice that we always have

{| x |}_{U} = {| x |}_{D}

. For any

x \in M_{n}^{2}

and

k \in {1, \dots, n}

, let

x (k)

denote the symbol at index k in the string representation of x. Additionally, the skeleton of a 2-Motzkin path x is the Dyck path of Us and Ds which results from removing all Hs and Is from x. We will denote the skeleton of x by

σ (x)

.

2.2.2. A Bijection Between $T_{n}$ and $M_{n - 1}^{2}$

We will use the particular bijection

Φ : T_{n} \to M_{n - 1}^{2}

between plane trees and 2-Motzkin paths from Deutsch [26], which neatly encodes information about

d_{0}

and

d_{1}

. For clarity, we will overview the bijection here.

For a given plane tree t with n edges, assign a label from the set

{U, H, I, D}

to each edge e according to the following rules:

If e is the leftmost edge off a non-root node of down degree at least 2, assign the label U.
If e is the rightmost edge off a non-root node of down degree at least 2, assign the label D.
If e is the only edge off a non-root node of degree 1, assign the label I.
If e is an edge off the root node, or if e is neither the leftmost nor the rightmost edge off its parent node, assign the label H.

Now, if we traverse t in a preorder reading off these labels, we get a 2-Motzkin path of length n. However, this path will always begin with H, so we define

Φ (t)

to be the 2-Motzkin path of length

n - 1

after this initial H is removed. Figure 2 gives an example of this labeling process. From Deutsch, we know not only that

Φ

is a bijection, but also that if

x = Φ (t)

then

{| x |}_{I} = d_{1} (t)

and

{| x |}_{U} + {| x |}_{H} + 1 = d_{0} (t)

.

Using this bijection, it is natural to extend our energy function to 2-Motzkin paths. We define the energy of a 2-Motzkin path x to be

E (x) = {α (| x |}_{U} + {| x |}_{H} {+ 1) + β | x |}_{I},

(7)

and we extend our definition of the distribution

g

to

M_{n}^{2}

accordingly. We note that, while this energy function does not capture all possible weightings on 2-Motzkin paths, it does capture all weightings possible under our simplification of the model due to Hower and Heitsch [16] after applying the bijection due to Deutsch [26].

2.2.3. Markov Chains

A Markov chain

M

is a sequence of random variables

X_{0}, X_{1}, X_{2}, \dots

taking values in a state space

Ω

subject to the condition that

Pr (X_{t + 1} = y ∣ X_{t} = x, X_{t - 1} = s_{t - 1}, \dots, X_{0} = s_{0}) = Pr (X_{t + 1} = y ∣ X_{t} = x) .

(8)

All Markov chains that we consider will be implicitly time-homogeneous (meaning

Pr (X_{t + 1} = y ∣ X_{t} = x)

does not depend on t) and finite (meaning

| Ω | < \infty

). The transition matrix of a time-homogeneous Markov chain is the matrix

P : Ω \times Ω \to [0, 1]

given by

P (x, y) = Pr (X_{t + 1} = y ∣ X_{t} = x) .

(9)

It is easy to see that if

X_{0}

has distribution vector

x

, then

X_{t}

has distribution vector

P^{t} x

.

A finite Markov chain with transition matrix P is said to be ergodic if it has the following two properties.

Irreducibility: For any $x, y \in Ω$ , there is some integer $t \in N$ for which $P^{t} (x, y) > 0$ .
Aperiodicity: For any state $x \in Ω$ , we have $gcd {t \in N : P^{t} (x, x) > 0} = 1$ .

It is well-known that if

M

is ergodic, then there exists a unique distribution vector

π

, the stationary distribution, such that

P π = π

, and

{lim}_{t \to \infty} P^{t} (x, y) = π (y)

for any states

x, y \in Ω

. Additionally, we call

M

reversible if for all states

x, y \in Ω

, we have

π (x) P (x, y) = π (y) P (y, x)

.

For

ϵ > 0

, the mixing time

τ (ϵ)

of

M

is given by

τ (ϵ) = min \{t \in N : \forall s \geq t, max_{x \in Ω} (\frac{1}{2} \sum_{y \in Ω} | P^{s} (x, y) - π (y) |) < ϵ\} .

(10)

Intuitively, the mixing time gives a measure of the number of steps required for

M

to get sufficiently close to its stationary distribution from any starting state.

Let

M

be a finite ergodic Markov chain over a state space

Ω

with transition matrix P. Let the eigenvalues of P be

λ_{0}, λ_{1}, \dots, λ_{| Ω | - 1}

such that

1 = λ_{0} > | λ_{1} | \geq \dots \geq | λ_{| Ω | - 1} |

. The spectral gap of

M

is given by

Gap (M) = 1 - | λ_{1} |

. As is standard, it will be convenient to denote the inverse of the spectral gap by relaxation time

τ_{r e l} (M) : = 1 / Gap (M)

.

Additionally, the spectral gap is given by the following functional definition [27].

Gap (M) = inf_{f} \frac{\sum_{x, y \in Ω} {| f (x) - f (y) |}^{2} π (x) P (x, y)}{\sum_{x, y \in Ω} {| f (x) - f (y) |}^{2} π (x) π (y)},

(11)

where the infimum is taken over all non-constant functions

f : Ω \to R

. A direct consequence of this definition of the spectral gap is the following lemma.

Lemma 1.

Let

M_{1}

and

M_{2}

be ergodic Markov chains over Ω with the same stationary distribution. Let

P_{1}

and

P_{2}

be the transition matrices of

M_{1}

and

M_{2}

respectively. If for all

x, y \in Ω

and for some constant

c > 0

we have

P_{1} (x, y) \leq c P_{2} (x, y)

, then

Gap (M_{1}) \leq c Gap (M_{2})

.

Additionally, spectral gap is related to the mixing time by the following lemma [28].

Lemma 2.

Let

M

be an ergodic Markov chain with state space Ω, and let

λ_{0}, λ_{1}, \dots, λ_{| Ω | - 1}

be the eigenvalues of the transition matrix P as defined above. Then, for all

ϵ > 0

and

x \in Ω

, we have

\frac{| λ_{1} |}{Gap (M)} log (\frac{1}{2 ϵ}) \leq τ (ϵ) \leq \frac{1}{Gap (M)} log (\frac{1}{π (x) ϵ}) .

(12)

We say that a Markov chain

M

, whose state space depends on a variable

n \in N

, is rapidly mixing if

τ (ϵ)

is bounded above by some polynomial in n and

log (ϵ^{- 1})

. For the specific chains studied in this manuscript, we will show that

τ (ϵ) (M)

is bounded by a polynomial in n and

log (ϵ^{- 1})

if and only if

τ_{r e l} (M)

is bounded by a polynomial in n and

l o g (ϵ^{- 1})

. Our next lemma presents sufficient conditions.

Lemma 3.

Let

M

be an ergodic Markov chain with state space Ω and let

λ_{0}, λ_{1}, \dots, λ_{|Ω| - 1}

be the eigenvalues of its transition matrix. Let

ϵ > 0

. If

τ (ϵ)

is bounded by a polynomial in n and

l o g (ϵ^{- 1})

, then

τ_{r e l}

is also bounded by a polynomial in n and

log (ϵ^{- 1})

. Further, suppose we have

log (1 / π (x))

bounded by some polynomial

q (n)

for all

x \in Ω

. Then,

τ_{r e l} (M)

being bounded by a polynomial in n and

log (ϵ^{- 1})

implies that

τ (ϵ)

is also bounded by some polynomial in n and

log (ϵ^{- 1})

.

Proof.

Suppose that

τ (ϵ) \leq p (n, log (ϵ^{- 1}))

, where p is a polynomial. Beginning with the left hand side of Lemma 2, note that

\begin{matrix} \frac{|λ_{1}|}{1 - |λ_{1}|} log (\frac{1}{2 ϵ}) & = (τ_{r e l} (M) - 1) log (\frac{1}{2 ϵ}) . \end{matrix}

Then, applying Lemma 2 and the bound on

τ (ϵ)

,

\begin{matrix} τ_{r e l} (M) & \leq \frac{τ (ϵ)}{log ({(2 ϵ)}^{- 1})} + 1 \leq \frac{p (n, log (ϵ^{- 1}))}{log ({(2 ϵ)}^{- 1})} + 1 \leq p^{'} (n, log (ϵ^{- 1})), \end{matrix}

where

p^{'}

is again a polynomial in n and

log (ϵ^{- 1})

.

Turning now to converse, suppose that we have

τ_{r e l} \leq p (n, log (ϵ^{- 1}))

, for some polynomial p. Additionally suppose

log (1 / π (x)) \leq q (n)

for all

x \in Ω

, for some polynomial q.

Applying Lemma 2,

\begin{matrix} τ (ϵ) & \leq τ_{r e l} (M) log (\frac{1}{π (x) ϵ}) \leq p (n, log (ϵ^{- 1})) log (ϵ^{- 1}) q (n) \leq p^{'} (n, log (ϵ^{- 1})), \end{matrix}

where

p^{'}

is some polynomial. □

2.2.4. Coupling

A coupling of a Markov chain

M

on

Ω

is a chain

{(X_{t}, Y_{t})}_{t = 0}^{\infty}

on

Ω \times Ω

for which the following properties hold.

Each chain ${(X_{t})}_{t = 0}^{\infty}$ and ${(Y_{t})}_{t = 0}^{\infty}$ , when viewed in isolation, is a copy of $M$ , given initial states $X_{0} = x$ and $Y_{0} = y$ .
Whenever $X_{t} = Y_{t}$ , we have $X_{t + 1} = Y_{t + 1}$ .

Formally, the first item above requires that the joint distribution of

(X_{t}, Y_{t})

(given

(X_{t - 1}, Y_{t - 1})

) should satisfy the property that the marginal of

X_{t}

(and also

Y_{t})

is consistent with the probability transitions of

M

. We define the coupling time T to be

T = max_{x, y \in Ω} E [min {t : X_{t} = Y_{t} ∣ X_{0} = x, Y_{0} = y}] .

(13)

The following lemma [29] is useful in bounding the coupling time T.

Lemma 4.

Suppose that

{(X_{t}, Y_{t})}_{t = 0}^{\infty}

is a coupling of a Markov chain M. Let φ be an integer-valued distance function on

Ω \times Ω

taking values in the range

[0, B]

, and suppose that

φ (x, y) = 0

if and only if

x = y

. Let

φ (t) = φ (x_{t}, y_{t})

. Suppose that the coupling satisfies

E (φ (t + 1) - φ (t) | X_{t}, Y_{t}) \leq 0

. Additionally, suppose that whenever

φ (t) > 0

,

E ({| φ (t + 1) - φ (t) |}^{2} | X_{t}, Y_{t}) \geq V

. Then, the expected coupling time satisfies

E (T^{x, y}) \leq φ (0) (2 B - φ (0)) / V

.

Coupling time and mixing time are then related by the following theorem [28].

Theorem 1.

A Markov chain M with coupling time T has mixing time

τ (ϵ)

bounded by

τ (ϵ) \leq ⌈ T e log ϵ^{- 1} ⌉ .

(14)

2.2.5. Decomposition

We use two disjoint decomposition methods for bounding the spectral gap, one developed by Martin and Randall [30], and a very recent one given by Hermon and Salez [31], building on the work by Jerrum, Son, Tetali and Vigoda [32]. We use both theorems because, while the latter gives better bounds, the former has more relaxed conditions, which is necessary in one of our applications. The setup for both methods is the same.

Let

M

be an ergodic, reversible Markov chain over a state space

Ω

with transition matrix P and stationary distribution

π

. Suppose

Ω

can be partitioned into disjoint subsets

Ω_{1}, \dots, Ω_{m}

. For each

i \in [m]

, let

M_{i}

be the restriction of

M

to

Ω_{i}

, which is obtained by rejecting any transition that would leave

Ω_{i}

. Let

P_{i}

be the transition matrix of

M_{i}

Additionally, we define

\bar{M}

to be the projection chain of

M

over the state space

[m]

as follows. Let the transition matrix

\bar{P}

of

\bar{M}

be given by

\bar{P} (i, j) = \frac{1}{π (Ω_{i})} \sum_{\begin{matrix} x \in Ω_{i} \\ y \in Ω_{j} \end{matrix}} π (x) P (x, y) .

(15)

One can check that

\bar{M}

is reversible and has stationary distribution

\bar{π} (i) = π (Ω_{i}),

while each

M_{i}

has stationary distribution

π_{i} (x) = \frac{π (x)}{\bar{π} (i)} .

With this notation, we have the following theorem by Martin and Randall [30].

Theorem 2.

Defining

M_{i}

and

\bar{M}

as above, we have

Gap (M) \geq \frac{1}{2} Gap (\bar{M}) min_{i \in [m]} Gap (M_{i}) .

(16)

The theorem due to Hermon and Salez obtains better bounds if, for each pair

(i, j) \in [m] \times [m]

with

\bar{P} (i, j) > 0

, we can find an effective joint distribution (often referred to as a “coupling”)

κ_{i j} : Ω_{i} \times Ω_{j} \to [0, 1]

of the distributions

π_{i}

and

π_{j}

. In other words, we must have

\begin{matrix} \forall x \in Ω_{i}, \sum_{y \in Ω_{j}} κ_{i j} (x, y) & = π_{i} (x), \end{matrix}

(17)

\begin{matrix} \forall y \in Ω_{j}, \sum_{x \in Ω_{i}} κ_{i j} (x, y) & = π_{j} (y) . \end{matrix}

(18)

The quality of the joint distribution

κ

is defined as

χ : = χ (κ) : = min \{\frac{π (x) P (x, y)}{\bar{π} (i) \bar{P} (i, j) κ_{i j} (x, y)}\},

(19)

where the minimum is taken over all

(x, y, i, j)

with

x \in Ω_{i}, y \in Ω_{j}

for which

\bar{P} (i, j) > 0

and

κ_{i j} (x, y) > 0

. Hermon and Salez [31] prove the following.

Theorem 3.

With P,

\bar{P}

,

P_{i}

, and χ defined as above,

Gap (M) \geq min \{χ Gap (\bar{M}), min_{i \in [m]} Gap (M_{i})\} .

(20)

The utility of these decomposition theorems is that they allow us to break down a more complicated Markov chain into pieces that are easier to analyze. If we can show that the pieces rapidly mix, and the projection chain rapidly mixes, then we may conclude that the original chain rapidly mixes as well.

Additionally, to aid with the analysis of some projection chains, we will need another lemma from [30].

Let

M_{M}

be the Markov chain on

[m]

with Metropolis transitions

P_{M} (i, j) = \frac{1}{2 Δ} min {1, \frac{π (Ω_{j})}{π (Ω_{i})}}

whenever

\bar{P} (i, j) > 0

, where

Δ

is the maximum degree of vertices in the transition graph of

\bar{M}

. Let

\partial_{i} (Ω_{j}) = {y \in Ω_{j} : \exists x \in Ω_{i} with P (x, y) > 0}

. Then we have the following

Lemma 5.

With

M_{M}

as defined above, suppose there exist constants

a > 0

and

b > 0

with

1.: $P (x, y) \geq a$ for all $x, y$ such that $P (x, y) > 0$ .
2.: $π (\partial_{i} (Ω_{j})) \geq b π (Ω_{j})$ for all $i, j$ with $\bar{P} (i, j) > 0$ .

Then

Gap (\bar{M}) \geq a b \cdot Gap (M_{M})

.

In order to help analyze the mixing time of

M_{M}

, we will also require the following two lemmas. Note that Lemma 6 is used only in the proof of Lemma 7.

Lemma 6.

Let

{(a_{i})}_{i = 1}^{m}

be a log concave sequence, with

a_{i} > 0

for all

1 \leq i \leq m

. Then,

\frac{a_{i + 1}}{a_{i}} \geq \frac{a_{j + 1}}{a_{j}}

(21)

for all

1 \leq i \leq j \leq m

.

Proof.

In order to use induction, we will slightly reframe the statement. We will prove

\frac{a_{i + 1}}{a_{i}} \geq \frac{a_{i + 1 + k}}{a_{i + k}}

for all

i + k \leq n

.

We now proceed by induction on k. The base case,

k = 0

, is trivial.

Now fix

l > 0

and suppose that the induction hypothesis is true for

k = l - 1

, that is,

\frac{a_{i + 1}}{a_{i}} \geq \frac{a_{i + l}}{a_{i + l - 1}} .

By log concavity

a_{i + l}^{2} \geq a_{i + l - 1} a_{i + l + 1}

, or, equivalently,

\frac{a_{i + l}}{a_{i + l - 1}} \geq \frac{a_{i + l + 1}}{a_{i + l}} .

Therefore,

\frac{a_{i + 1}}{a_{i}} \geq \frac{a_{i + l}}{a_{i + l - 1}} \geq \frac{a_{i + l + 1}}{a_{i + l}},

where the first inequality follows from the induction hypothesis, and the second inequality follows from log concavity. □

Lemma 7.

Let π be a probability distribution on

[m]

. Let

M

be a Markov chain on

[m]

with the transition probabilities

P (i, j) = \{\begin{matrix} \frac{1}{4} min \{1, \frac{π (j)}{π (i)}\} & i f | i - j | = 1 \\ 0 & i f | i - j | > 1 \end{matrix}

(22)

and the appropriate self-loop probabilities

P (i, i)

. If

π (i)

is log-concave in i, then

M

has mixing time (and hence also relaxation time)

O (m^{2})

.

Proof.

We define a coupling

(X_{t}, Y_{t})

on

M

as follows. If

X_{t} \neq Y_{t}

, then at time step

t + 1

, flip a fair coin.

If heads, set $Y_{t + 1} = Y_{t}$ . Let l be either 1 or $- 1$ , each with probability $1 / 2$ . If possible, let $X_{t + 1} = X_{t} + l$ with probability $\frac{1}{2} min \{1, \frac{π (X_{t} + l)}{π (X_{t})}\}$ . Otherwise, let $X_{t + 1} = X_{t}$ .
If tails, set $X_{t + 1} = X_{t}$ , and update $Y_{t + 1}$ the same way as we did for $X_{t + 1}$ in the previous case.

Now, suppose that for some t we have

X_{t} = i

and

Y_{t} = j

for

i \neq j

. WLOG, assume that

i < j

. Let

φ (t) = φ (X_{t}, Y_{t}) = j - i

, and let

Δ φ (t) = φ (t) - φ (t - 1)

. Note that we have two moves, with probabilities

P (i, i - 1)

and

P (j, j + 1)

, which will increase the distance

φ

by 1 and similarly two moves, with probabilities

P (i, i + 1)

and

P (j, j + 1)

, will decrease the distance by 1. Then we have

E (Δ φ (t)) = - P (i, i + 1) + P (i, i - 1) + P (j, j + 1) - P (j, j - 1) .

By the log-concavity of

π (i)

and Lemma 6, we have

P (i, i + 1) \geq P (j, j + 1)

and

P (i, i - 1) \leq P (j, j - 1)

. Therefore, the expected change in

φ (t)

is always non-positive. We also have

\begin{matrix} E ({(Δ φ (t))}^{2} | X_{t}, Y_{t}) = P (j, j + 1) + P (i, i + 1) + P (j, j - 1) + P (i, i - 1) \\ = \frac{1}{4} (min \{1, \frac{π (j + 1)}{π (j)}\} + min \{1, \frac{π (i + 1)}{π (i)}\} + min \{1, \frac{π (j - 1)}{π (j)}\} + min \{1, \frac{π (i - 1)}{π (i)}\}) . \end{matrix}

We claim that

E ({(Δ φ)}^{2} | X_{t}, Y_{t}) \geq \frac{1}{4}

. Suppose, for contradiction, that the expectation is less than

\frac{1}{4}

. Then, for each of the minimum functions in the above expression, 1 must be the larger argument. Equivalently,

π (i - 1) < π (i)

,

π (i) > π (i + 1)

,

π (j - 1) < π (j)

, and

π (j) > π (j + 1)

.

Therefore,

π (i)

is not unimodal in i and is therefore also not log-concave in i, contradicting our hypothesis. Therefore we have

E ({(Δ φ)}^{2} | X_{t}, Y_{t}) \geq \frac{1}{4}

, as desired. □

3. Results

Here we present the constructed Markov chain and corresponding algorithms devised for the sampling task and the proof of an upper bound on the relaxation time—that the chain mixes rapidly. Collectively, the results illustrate an analytical approach to calculate the dispersion of the secondary structure and corresponding branching properties of RNA based on the NNTM energy function minimization and without reference to a specific nucleotide sequence.

3.1. Our Markov Chain on $M_{m}^{2}$

We define a Markov chain

M = X_{0}, X_{1}, X_{2}, \dots

on

M_{m}^{2}

to sample 2-Motzkin paths as a representation of plane trees. Here, we use

m = n - 1

to denote the length of the 2-Motzkin paths corresponding to plane trees with n edges.

We define each step of

M

as follows. First, pick a random element l uniformly from

{1, 2, 3, 4}

. Now choose y as follows.

If $l = 1$ , pick a random pair of consecutive symbols in $X_{t}$ , and call this pair s. If s is $U D$ or $H H$ , let $s^{'}$ be either $U D$ or $H H$ with probabilities $\frac{1}{1 + e^{- α}}$ and $\frac{e^{- α}}{1 + e^{- α}}$ respectively. Let y be the string $X_{t}$ with s replaced by $s^{'}$ . Otherwise, let $y = X_{t}$ .
If $l = 2$ , pick i uniformly from ${1, \dots, m}$ . If $X_{t} (i)$ is H or I, choose a symbol c to be either H or I with probabilities $\frac{e^{- α}}{e^{- α} + e^{- β}}$ and $\frac{e^{- β}}{e^{- α} + e^{- β}}$ respectively. Let y be the 2-Motzkin path given by changing the symbol in $X_{t} (j)$ to c. Otherwise, we let $y = X_{t}$ .
If $l = 3$ , pick i and j each uniformly from ${1, \dots, m}$ . If each of $X_{t} (i)$ and $X_{t} (j)$ are either U or D, let y be the string $X_{t}$ with the symbols at indices i and j swapped. Otherwise, let $y = X_{t}$ .
If $l = 4$ , pick a random pair of consecutive symbols in $X_{t}$ , and call this pair s. If s is of the form $a b$ or $b a$ for some $a \in {U, D}$ and $b \in {H, I}$ , let $s^{'}$ be the reverse of s, and let y be the string $X_{t}$ with s replaced by $s^{'}$ . Otherwise, let $y = X_{t}$ .

If y is a valid 2-Motzkin path, set

X_{t + 1} = y

with probability

\frac{1}{2}

. Otherwise, set

X_{t + 1} = X_{t}

.

One can see that

M

is irreducible by noting that every path can be transformed to the path consisting of all H’s. To make this transformation, first use the

l = 4

rule to move all H’s and I’s to the end of the path. If there are any U’s in the path, we must now have at least one consecutive pair

U D

. Use the

l = 1

rule to convert the

U D

to a

H H

. From here we can repeat, again moving all H’s to the end and replacing

U D

with

H H

, until only H’s and I’s remain. Finally, we can use the

l = 2

rule to convert all I’s to H’s. Since all of these steps can also be taken in reverse, this gives a procedure to move between two arbitrary paths, demonstrating irreducibility. We can also conclude that

M

is aperiodic, due to the existence of self-loops. Combined with irreducibility, this establishes that

M

is ergodic.

We claim that

M

is reversible with respect to the stationary distribution

π (x) = \frac{e^{- E (x)}}{Z}

, where

Z = \sum_{y \in M_{m}^{2}} e^{- E (y)}

. This can be easily verified by considering the four move types listed above. For example, for the first move type given above (transforming

U D

to

H H

and vice versa), let x and y be the states of interest. Suppose that y has the consecutive symbols

H H

where x contains

U D

. Then,

\begin{matrix} π (x) P (x, y) & = \frac{e^{- α ({|x|}_{U} + {|x|}_{H} + 1) - β {|x|}_{I}}}{Z} \cdot \frac{e^{- α}}{1 + e^{- α}} \\ = \frac{e^{- α (({|y|}_{U} + 1) + ({|y|}_{H} - 2) + 1) - β {|y|}_{I}}}{Z} \cdot \frac{e^{- α}}{1 + e^{- α}} \\ = \frac{e^{- α ({|y|}_{U} + {|y|}_{H} + 1) - β {|y|}_{I}}}{Z} \cdot \frac{1}{1 + e^{- α}} \\ = π (y) P (y, x) . \end{matrix}

One can verify that similar computations hold for the remaining three types of moves. Therefore, we conclude that the chain

M

has stationary distribution

π (x) = \frac{e^{- E (x)}}{Z}

.

The Markov chain

M

can be implemented in pseudocode as in Algorithm 1. Here, the

Ber (p)

function returns true with probability p, and false otherwise. We also use the addition of strings to denote concatenation.

Algorithm 1: The main Markov chain algorithm. This pseudocode calculates

X_{t}

given

X_{0}

.

Require:

X_{0}

is a valid 2-Motzkin path of length m.

x \leftarrow X_{0}

for

s = 1 \to t

do

y \leftarrow x

l \leftarrow randInt (1, 4)

if

l = 1

then

i \leftarrow randInt (1, m - 1)

if

x [i : i + 1] = U D

and

Ber (\frac{e^{- α}}{2 (1 + e^{- α})})

then

y [i : i + 1] \leftarrow H H

else if

x [i : i + 1] = H H

and

Ber (\frac{1}{2 (1 + e^{- α})})

then

y [i : i + 1] \leftarrow U D

else if

l = 2

then

i \leftarrow randInt (1, m)

if

x [i] = I

and

Ber (\frac{e^{- α}}{2 (e^{- α} + e^{- β})})

then

y [i] \leftarrow H

else if

x (i) = H

and

Ber (\frac{e^{- β}}{2 (e^{- α} + e^{- β})})

then

y [i] \leftarrow I

else if

l = 3

then

i \leftarrow randInt (1, m)

j \leftarrow randInt (1, m)

if (

x [i] \in {U, D}

and

x [j] \in {U, D}

) and

Ber (\frac{1}{2})

then

y [i] \leftarrow x [j]

y [j] \leftarrow x [i]

if y is not a valid 2-Motzkin path then

y \leftarrow x

else if

l = 4

then

i \leftarrow randInt (1, m - 1)

if (

x [i] \in {U, D}

and

x [j + 1] \in {H, I}

) or (

x [i] \in {H, I}

and

x [j + 1] \in {U, D}

) and

Ber (\frac{1}{2})

then

y [i : i + 1] \leftarrow x [j + 1] + x [j]

x \leftarrow y

return x

Additionally, in order to convert the 2-Motzkin path

X_{t}

into a plane tree, we use the algorithm in Algorithm 2, which assumes the existence of a Node object with children and parent attributes.

Algorithm 2: Algorithm to convert a sampled 2-Motzkin path to a plan tree. The pseudocode calculates

Φ^{- 1} (x)

.

Require: x is a valid 2-Motzkin path of length m.

root ← new Node()

// u will be where a new node will be added for an H or D symbol

u \leftarrow

root

// v will be always the last node added

v \leftarrow

new Node()

// the stack will keep track of previous values of u

stack = new Stack()

root.children.append(v)

for

i = 1 \to m

do

node ← new Node()

if

x [i] = U

then

v.children.append(node)

stack.push(u)

u \leftarrow v

else if

x [i] = I

then

v.children.append(node)

else if

x [i] = H

then

u.children.append(node)

else if

x [i] = D

then

u.children.append(node)

u \leftarrow

stack.pop()

v \leftarrow

node

return root

3.2. Mixing Time Results

Our main result is to prove the rapid mixing of the Markov chain defined in Section 3.1. An upper bound on the relaxation time is achieved by bounding the spectral gap from below. A spectral gap bound for the complex chain at hand is obtained through the use of multiple decomposition theorems, which give bounds on the spectral gap of the complex chain in terms of the spectral gaps of multiple simpler chains. The disjoint decomposition theorem due to Martin and Randall [30] provides a flexible approach to the decomposition of Markov chains. Very recent work by Hermon and Salez [31], building on the work of Jerrum, Son, Tetali, and Vigoda [32], proves a decomposition theorem with tighter bounds but stronger hypotheses.

Since this proof involves multiple decomposition steps, we provide an overview here. The primary tools used in this proof are the two decomposition theorems presented in Section 2.2.5. We first partition the state space of all 2-Motzkin paths by the number of Us in the path. The projection chain from this first decomposition is linear and is proved to be rapidly mixing using a result of Martin and Randall [30] (Lemma 8). Each of the restriction chains are decomposed again, this time by the pattern of H and I symbols. The projection chains for this second decomposition are shown to be rapidly mixing by coupling (Lemma 9). The restriction chains are decomposed a third time, this time according to the skeleton of U and D steps. The projection chains for this third decomposition are shown to be rapidly mixing by comparison to the classic mountain valley moves chain on Dyck paths (Lemma 10). This last set of restriction chains are found to be rapidly mixing by isomorphism to the chain consisting of adjacent transpositions on binary strings (Lemma 11). Finally, starting from the most restricted chains, we use the decomposition theorems to obtain a bound on the spectral gap of the original chain (Theorem 4).

We now proceed with a formal presentation. We will use a series of decompositions of

M

. We will first decompose our state space

M_{m}^{2}

into

S_{0}, \dots, S_{m / 2}

, where

S_{k} = {x \in M_{m}^{2} {: | x |}_{U} = k} .

Let

M_{k}

denote the Markov chain

M

restricted to the set

S_{k}

, and let

\bar{M}

be the projection chain over this decomposition as outlined for Theorem 2.

Additionally, we will decompose each

S_{k}

into the sets

{T_{k, q} : q \in {(H + I)}^{m - 2 k}}

, where

{(H + I)}^{m - 2 k}

denotes the set of strings with length

m - 2 k

from the alphabet

{H, I}

. We define

T_{k, q}

to be the set of 2-Motzkin paths

x \in S_{k}

such that the substring of H and I symbols in x is q. Let

M_{k, q}

denote the chain

M_{k}

restricted to

T_{k, q}

, and let

{\bar{M}}_{k}

be the projection chain of

M_{k}

over this decomposition.

Finally, we decompose each

T_{k, q}

into the partition

{U_{k, q, s} : s \in D_{k}}

based on the skeletons of the 2-Motzkin paths. For each

s \in D_{k}

, we define

U_{k, q, s} = {x \in T_{k, q} ∣ σ (x) = s} .

As before, we let

M_{k, q, s}

be the Markov chain

M_{k, q}

restricted to

U_{k, q, s}

, and let

{\bar{M}}_{k, q}

be the appropriate projection chain. For clarity, this four-level decomposition is summarized in Figure 3.

Lemma 8.

\bar{M}

has relaxation time

τ_{r e l} (\bar{M}) = O (m^{4})

.

Proof.

The chain

\bar{M}

is a linear chain with states k in

{0, \dots, m / 2}

, and with stationary distribution

\begin{matrix} \bar{π} (k) & = π (S_{k}) = \frac{C_{k}}{Z_{m}} \cdot \sum_{i = 0}^{m - 2 k} (\binom{m}{2 k}) (\binom{m - 2 k}{i}) e^{- α (k + i + 1) - β (m - 2 k - i)} \\ = \frac{e^{- α (k + 1)}}{Z_{m}} (\binom{m}{2 k}) C_{k} \cdot {(e^{- α} + e^{- β})}^{m - 2 k}, \end{matrix}

where

\bar{π}

is defined as in Section 2.2.5. Notice that transitions in

M

which move between the

S_{k}

sets are those which change a

H H

substring into a

U D

or

D U

substring, or vise versa. Thus, the transitions in

\bar{M}

only increase or decrease k by at most 1. We seek to apply Lemma 5. To choose a, notice that for

x \in S_{k}

and

y \in S_{k \pm 1}

with

P (x, y) > 0

, we have

P (x, y) = \frac{1}{4 (m - 1)} \frac{1}{1 + e^{α}} or P (x, y) = \frac{1}{4 (m - 1)} \frac{1}{1 + e^{- α}} .

Note that the factor

1 / 4

comes from the choice

l = 4

, and the factor

1 / (m - 1)

comes from the fact that there are

m - 1

adjacent pairs to pick from. Then,

P (x, y) \geq \frac{1}{4 (m - 1) (1 + e^{- | α |)}} .

Thus, we pick

a = \frac{1}{4 (m - 1) (1 + e^{- | α |})}

.

To pick b, we let

\partial_{-} (S_{k}) = {y \in S_{k} : \exists x \in S_{k - 1}, P (x, y) > 0}

for

k \in {1, \dots, m / 2}

, and we let

\partial_{+} (S_{k}) = {y \in S_{k} : \exists x \in S_{k + 1}, P (x, y) > 0}

for

k \in {0, \dots, m / 2 - 1}

.

Additionally, let

A_{k}

for

k \in {1, \dots, m / 2}

be the subset of

S_{k}

consisting of the 2-Motzkin paths in which the first D symbol appears immediately after a U. Let

B_{k}

for

k \in {0, \dots, m / 2 - 1}

be the subset of

S_{k}

consisting of the 2-Motzkin paths in which a pair of adjacent H symbols occur before all other H or I symbols. It is easy to see that

A_{k} \subset \partial_{-} (S_{k})

and

B_{k} \subset \partial_{+} (S_{k})

. We have

π (A_{k}) = \frac{C_{k} e^{- α (k + 1)}}{Z_{m}} (\binom{m - 1}{2 k - 1}) {(e^{- α} + e^{- β})}^{m - 2 k},

as there are

C_{k}

ways to arrange the U and D symbols and

(\binom{m - 1}{2 k - 1})

ways to insert

m - 2 k

H or I symbols (treating H and I as being identical for now) without placing anything between the first D and the U immediately before it. The energy contribution of the U and D symbols is given by

e^{- α (k + 1)}

, and the energy contribution of the H and I symbols is

{(e^{- α} + e^{- β})}^{m - 2 k}

. The required normalizing constant is

Z_{n}

. Similarly, we also get

π (B_{k}) = \frac{C_{k} e^{- α (k + 3)} e^{- 2 β}}{Z_{m}} (\binom{m - 1}{2 k}) {(e^{- α} + e^{- β})}^{m - 2 k - 2}

because there are

C_{k}

ways to arrange the U and D symbols and

(\binom{m - 1}{2 k})

ways insert

m - 2 k - 1

H or I symbols (treating the initial pair of H’s as a single symbol gives us only

m - 2 k - 1

symbols to insert). The energy contribution of the U’s, D’s, and the initial two H’s is given by

e^{- α (k + 3)} e^{- 2 β}

, and the energy contribution of the remaining H’s and I’s is

{(e^{- α} + e^{- β})}^{m - 2 k - 2}

. Finally,

Z_{m}

is again a normalizing constant.

Hence combining these two results, we have

\frac{π (\partial_{-} (S_{k}))}{π (S_{k})} \geq \frac{π (A_{k})}{π (S_{k})} = \frac{2 k}{m}

and

\frac{π (\partial_{+} (S_{k}))}{π (S_{k})} \geq \frac{π (B_{k})}{π (S_{k})} = \frac{m - 2 k}{m} {(\frac{e^{- α} e^{- β}}{e^{- α} + e^{- β}})}^{2} .

Thus, we may let

b = \frac{1}{m} {(\frac{e^{- α} e^{- β}}{e^{- α} + e^{- β}})}^{2}

.

Applying Lemma 5, we get that

Gap (\bar{M}) \geq \frac{Gap (M_{M})}{O (m^{2})}

. Additionally, one can check that

\bar{π} (i)

is log concave in i. Hence, using Lemma 7, we get

τ_{r e l} (M_{M}) = O (m^{2})

, and in turn

τ_{r e l} (\bar{M}) = O (m^{4})

, as claimed. □

Lemma 9.

11

{\bar{M}}_{k}

has mixing time

τ ({\bar{M}}_{k}) = O (m log m)

, for all k.

Proof.

Notice that

{\bar{M}}_{k}

appears as a chain with states q in the set

Q = {(H + I)}^{m - 2 k}

. Additionally, transitions in

{\bar{M}}_{k}

only occur between strings in Q that differ at only one index. The stationary distribution of

{\bar{M}}_{k}

is given by

{\bar{π}}_{k} (q) \propto e^{{(β - α) | q |}_{H}}

, where we have intentionally used the constant of proportionality to remove all dependence on k, which we consider, in this context, to be fixed.

Additionally, for

q_{1}, q_{2} \in Q

which differ at exactly one index, we have the transition probability

{\bar{P}}_{k} (q_{1}, q_{2}) = \{\begin{matrix} \frac{(m - 2 k) e^{- α}}{4 m (e^{- α} + e^{- β})} & if | q_{2} |_{H} = {| q_{1} |}_{H} + 1 \\ \frac{(m - 2 k) e^{- β}}{4 m (e^{- α} + e^{- β})} & if | q_{2} |_{H} = {| q_{1} |}_{H} - 1 \end{matrix} .

We may show that

{\bar{M}}_{k}

rapidly mixes by a simple coupling argument. Let

{(X_{t}, Y_{t})}_{t = 0}^{\infty}

be our coupled Markov chain on

Q \times Q

. We define one step in this coupled chain as follows.

With probability $1 - \frac{m - 2 k}{4 m}$ , set $(X_{t + 1}, Y_{t + 1}) = (X_{t}, Y_{t})$ .
Otherwise, pick a random index $j \in [m - 2 k]$ . Let $a \in {H, I}$ be a random symbol such that $Pr (a = H) = \frac{e^{- α}}{e^{- α} + e^{- β}}$ and $Pr (a = I) = \frac{e^{- β}}{e^{- α} + e^{- β}}$ . Now let $X_{t + 1}$ and $Y_{t + 1}$ be $X_{t}$ and $Y_{t}$ respectively, each with the jth symbol changed to a.

One can check that each of

{(X_{t})}_{t}

and

{(Y_{t})}_{t}

are indeed copies of

{\bar{M}}_{k}

. Additionally, notice that we will have

X_{t} = Y_{t}

after all

m - 2 k

possible indices j have been updated. By the Coupon Collector Theorem, we have the coupling time of this chain to be

T_{{\bar{M}}_{k}} = \frac{4 m}{m - 2 k} \cdot O ((m - 2 k) log (m - 2 k)) = O (m log m)

. Thus, using Theorem 1, we have the mixing time (and the relaxation time) also

O (m log m)

. □

Lemma 10.

{\bar{M}}_{k, q}

has relaxation time

τ_{r e l} (\bar{M_{k, q}}) = O (m^{2})

, for all pairs

(k, q)

.

Proof.

Notice that all

x \in T_{k, q}

have equal energy, and that

| U_{k, q, s} | = (\binom{m}{2 k})

for all s. Thus,

{\bar{M}}_{k, q}

has a uniform stationary distribution. If we represent each set

U_{k, q, s}

by the Dyck path s, we can think of

{\bar{M}}_{k, q}

as a chain over

D_{k}

. Since all the transitions in

M_{k, q}

that move between the

U_{k, q, s}

sets are moves that exchange the positions of a U and a D, the transitions in

{\bar{M}}_{k, q}

are simply the moves on elements of

D_{k}

which exchange a U with a D. We call these moves on the elements of

D_{k}

, transposition moves.

For each

s_{1}, s_{2} \in D_{k}

that differ by a transposition move, the transition probabilities in our projection chain are given by

\begin{matrix} {\bar{P}}_{k, q} (s_{1}, s_{2}) & = \frac{1}{π (U_{k, q, s_{1}})} \sum_{\begin{matrix} x \in U_{k, q, s_{1}} \\ y \in U_{k, q, s_{2}} \end{matrix}} π (x) P (x, y) = \frac{1}{| U_{k, q, s} |} \sum_{\begin{matrix} x \in U_{k, q, s_{1}} \\ y \in U_{k, q, s_{2}} \end{matrix}} P (x, y) \\ = \frac{1}{(\binom{m}{2 k})} \sum_{\begin{matrix} x, y \\ P (x, y) > 0 \end{matrix}} \frac{1}{4 m^{2}} = \frac{1}{4 m^{2}} . \end{matrix}

The last equality above relies on counting the number of terms in the sum. Notice that for each

x \in U_{k, q, s_{1}}

, there is a unique

y \in U_{k, q, s_{2}}

for which

P (x, y) > 0

. Therefore, the number of terms is simply

| U_{k, q, s_{1}} | = (\binom{m}{2 k})

. Compare this chain to the traditional mountain valley Markov chain on

D_{k}

, which we will denote by

M^{'}

. The transition probabilities of

M^{'}

are given by

P^{'} (s_{1}, s_{2}) = \frac{1}{k^{2}}

for each pair

(s_{1}, s_{2})

which differ by a mountain-valley move. It is known from Cohen [33] that

Gap (M^{'}) = \frac{1}{O (k^{2})}

. Thus, applying Lemma 1 to

{\bar{M}}_{k, q}

and

M^{'}

, we see that

Gap ({\bar{M}}_{k, q}) = \frac{1}{O (m^{2})}

. □

Lemma 11.

M_{k, q, s}

has relaxation time

τ_{r e l} (M_{k, q, s}) = O (m^{3})

, for all valid triples

(k, q, s)

.

Proof.

Notice that transitions in

M_{k, q, s}

consist only of moves which involve swapping an H or an I with an adjacent U or D. Additionally, all 2-Motzkin paths in

U_{k, q, s}

have equal energy, so for all

x, y \in U_{k, q, s}

such that

P (x, y) > 0

, we have

P (x, y) = \frac{1}{8 (m - 1)}

.

To determine the mixing time of

M_{k, q, s}

, consider an isomorphic chain. Let

U^{'}

be the set of all binary strings of length m with

2 k

zeros and

m - 2 k

ones. Let

M^{'}

be the Markov chain on

U^{'}

where each step does nothing with probability

7 / 8

and swaps a random pair of adjacent (potentially identical) digits with probability

1 / 8

. From Wilson [34], we know that the spectral gap of

M^{'}

is

\frac{1}{O (m^{3})}

.

Finally, we can combine our bounds on the spectral gaps of all of these chains to prove our main result.

Theorem 4.

The Markov chain

M

has relaxation time

τ_{r e l} (M) = O (m^{7})

, for all

α, β \in R

.

Proof.

We use Lemmas 11 and 10 with Theorem 3 to obtain a bound on

Gap (M_{k, q})

. We define a coupling

κ_{s_{1}, s_{2}}

for each pair

(s_{1}, s_{2}) \in D_{k} \times D_{k}

with

{\bar{P}}_{k, q} (s_{1}, s_{2}) > 0

. For each such pair, notice that the set of pairs

(x, y) \in U_{k, q, s_{1}} \times U_{k, q, s_{2}}

with

P (x, y) > 0

is a perfect matching. Thus, we may set

κ_{s_{1}, s_{2}} (x, y) = \{\begin{matrix} \frac{1}{(\binom{m}{2 k})} & i f P (x, y) > 0 \\ 0 & P (x, y) = 0 \end{matrix} .

To compute

χ

, we begin by observing

π (x) = π (y)

for all

x, y \in M_{k, q}

. Note also

|U_{k, q, s}| = (\binom{m}{2 k})

for all skeletons s of length

2 k

. Before computing

χ

, we start by finding

\bar{P} (s_{1}, s_{2})

.

\begin{matrix} \bar{P} (s_{1}, s_{2}) & = \frac{1}{π (U_{k, q, s_{1}})} \sum_{x \in U_{k, q, s_{1}}, y \in U_{k, q, s_{2}}} π (x) P (x, y) \\ = \frac{1}{π (U_{k, q, s_{1}})} \sum_{x \in U_{k, q, s_{1}}, y \in U_{k, q, s_{2}}} \frac{π (x)}{\frac{1}{4} (\binom{m}{2})} \\ = \frac{1}{π (U_{k, q, s_{1}})} |U_{k, q, s_{1}}| \frac{4 π (x)}{(\binom{m}{2})} \\ = \frac{4}{(\binom{m}{2})} . \end{matrix}

We now proceed with the calculation of

χ

. Recall that the minimum is taken over all tuples

x, y, s_{1}, s_{2}

where

\bar{P} (s_{1}, s_{2}) > 0

and

κ_{0_{1}, s_{2}} (x, y) > 0

.

\begin{matrix} χ & = min \{\frac{π (x) P (x, y)}{\bar{π} (s_{1}) \bar{P} (s_{1}, s_{2}) κ_{s_{1}, s_{2}} (x, y)}\} \\ = min \{\frac{π (x) \frac{4}{(\binom{m}{2})}}{π (U_{k, q, s_{1}}) \frac{4}{(\binom{m}{2})} \frac{1}{(\binom{m}{2 k})}}\} \\ = \frac{(\binom{m}{2 k})}{(\binom{m}{2 k})} = 1 . \end{matrix}

Theorem 3 then gives

\begin{matrix} Gap (M_{k, q}) & \geq min \{χ Gap ({\bar{M}}_{k, q}), min_{s} Gap (M_{k, q, s})\} \\ min \{\frac{1}{O (m^{2})}, \frac{1}{O (m^{3})}\} \\ = \frac{1}{O (m^{3})} . \end{matrix}

Similarly, we define a coupling

κ_{q_{1}, q_{2}}

for each pair

(q_{1}, q_{2}) \in {(H + I)}^{m - 2 k} \times {(H + I)}^{m - 2 k}

with

{\bar{P}}_{k} (q_{1}, q_{2}) > 0

to apply Theorem 3 to

{\bar{M}}_{k}

. Notice that once again, the set of pairs

(x, y) \in T_{k, q_{1}} \times T_{k, q_{2}}

for which

P (x, y) > 0

forms a perfect matching. Thus, we take

κ_{q_{1}, q_{2}} (x, y) = \{\begin{matrix} \frac{1}{(\binom{m}{2 k}) C_{k}} & i f P (x, y) > 0 \\ 0 & P (x, y) = 0 \end{matrix} .

To compute

χ

for this coupling, we again begin with a few preliminary computations. In all of the following, let

x \in T_{k, q_{1}}, y \in T_{k, q_{2}}

with

\bar{P} (q_{1}, q_{2}) > 0

. Note that

q_{1}

and

q_{2}

have the same length and differ at only one index. We will show the computations for the case where

q_{1}

has a I where

q_{2}

has a H. The computations for the other case are nearly identical.

Note that

P (x, y) = \frac{e^{- α}}{e^{- α} + a^{- β}}

. Note also

\bar{π} (q_{1}) = π (T_{k, q_{1}}) = π (x) |T_{k, q_{1}}| = π (x) C_{k} (\binom{m}{2 k})

and

\begin{matrix} \bar{P} (q_{1}, q_{2}) & = \frac{1}{π (T_{k, q_{1}})} \sum_{x^{'} \in T_{k, q_{1}}, y^{'} \in T_{k, q_{2}}} P (x^{'}, y^{'}) \\ = \frac{1}{|T_{s, q_{1}}|} \cdot \frac{e^{- α}}{e^{- α} + e^{- β}} |T_{s, q_{1}}| \\ = \frac{e^{- α}}{e^{- α} + e^{- β}} . \end{matrix}

Now we can compute

\begin{matrix} χ & = min \{\frac{π (x) P (x, y)}{\bar{π} (q_{1}) \bar{P}} (q_{1}, q_{2}) κ_{q_{1}, q_{2}} (x, y)\} \\ = min \{\frac{π (x) \frac{e^{- α}}{e^{- α} + e^{- β}}}{π (x) C_{k} (\binom{m}{2 k}) \frac{e^{- α}}{e^{- α} + e^{- β}} \cdot \frac{1}{C_{k} (\binom{m}{2 k})}}\} \\ = 1 . \end{matrix}

Applying Theorem 3 then gives

\begin{matrix} Gap (M_{k}) & \geq min \{χ Gap (\bar{M_{k}}), min_{q} Gap (M_{k, q})\} \\ = min \{\frac{1}{O (m log m)}, \frac{1}{O (m^{3})}\} \\ = \frac{1}{O (m^{3})} . \end{matrix}

Unfortunately, we have not been able to find a useful coupling for

\bar{M}

, so for the last step of our decomposition, we apply Theorem 2. Since

Gap (\bar{M}) = O (\frac{1}{m^{4}})

and

Gap (M_{k}) = O (\frac{1}{m^{3}})

for all k, we have

\begin{matrix} Gap (M) & \geq \frac{1}{2} Gap (\bar{M}) min_{k \in [m / 2]} Gap (M_{k}) \\ = \frac{1}{2 O (m^{4}) O (m^{3})} \\ = \frac{1}{O (m^{7})}, \end{matrix}

establishing Theorem 4. □

Finally, an application of Lemma 3 allows us to conclude that the mixing time is also polynomially-bounded.

Corollary 1.

M

is rapidly mixing.

Proof.

In order to apply Lemma 3, we need to obtain a polynomial bound on

log (1 / π (x))

for all

x \in Ω

. Let

t \in Ω

have maximum energy among all elements of

Ω

. For any

x \in Ω

,

\begin{matrix} log (\frac{1}{π (x)}) & = log (\frac{\sum_{y \in Ω} e^{- α d_{0} (y) - β d_{1} (y)}}{e^{- α d_{0} (x) - β d_{1} (x)}}) \\ \leq log (\frac{C_{n} e^{- α d_{0} (t) - β d_{1} (t)}}{e^{- α d_{0} (x) - β d_{1} (x)}}) \\ \leq log (\frac{C_{n} e^{- α n - β n}}{e^{- α}}) \\ = log (C_{n} e^{- α (n - 1)} e^{- β n}) \\ \leq n log (2 n) + log (\frac{1}{n + 1}) - α (n - 1) - β n . \end{matrix}

This gives us the required polynomial bound, and therefore Lemma 3 implies that

M

is rapidly mixing. □

4. Discussion and Conclusions

The goal of this work was to identify a Markov chain and construct a corresponding algorithm by which to examine the non-uniform distribution and dispersion properties of NNTM RNA secondary structures and branching properties independent of a specific nucleotide sequence. This study successfully identifies the existence of a Markov chain, with a provably polynomial mixing time, which generates a Gibbs distribution on plane trees. This stationary probability distribution models branching characteristics of RNA secondary structure under the NNTM. While the exploration of sampled structures obtained from this algorithm are beyond the scope of the presented results, pseudocode (see Section 3.1) is provided to facilitate future work in this area. Below we discuss the direct applications and implications of this work to RNA modeling, the possibility of implementing a dynamic programming approach, the possibility of an approach using stochastic context-free grammars, other biological applications of this work, contributions of this work towards independent mathematical research interests, and limitations and future directions of the present work.

4.1. Applications to RNA Modeling

The most straightforward application of this work is in understanding the background distribution of the branching behavior for secondary structures predicted under the NNTM. While the NNTM is widely used to predict secondary structures from sequence data, little is known about the general branching characteristics of the predicted structures, independent of a specific input sequence. Quantities such as the number of hairpins, the maximum branching in a multiloop, the average branching in a multiloop, and the maximum ladder distance of the structure [7,35] help to characterize the branching behavior and could be computed from samples obtained from this algorithm. These quantities also have been studied in native structures and/or could be easily obtained from databases such as RNA STRAND [36]. The parameter values of

α

,

β

, and

γ

corresponding to various revisions of the NNTM are given in Table 1 in Section 2.1. The Markov chain and corresponding algorithms presented will enable biologists to calculate the dispersion of key branching properties for a specific energy function. As described with the detailed hairpin dispersion example in the Introduction (Section 1), knowing whether branching properties fall within acceptable dispersion limits is crucial for deducing potential functional insight or hypothesizing other scientific ramifications.

Another key application to RNA modeling of the presented algorithms is the ability to explore the parameter space of possible values for

α

and

β

. While the various revisions of the NNTM correspond to specific values for these parameters, in principle any real-valued parameters could be used. Finding values for these parameters that approximate reality remains an open question. Yet, determination of how differences in parameter values change the distribution of NNTM branching properties, such as maximum ladder distance, is crucial. Moreover, parameter space exploration is necessary to identify and further explore the phase transitions that exist. The presented Markov chain and corresponding algorithms expedite such future computational experimentation. Therefore, collectively, the presented algorithm enables exploration that will greatly improve understanding of NNTM-based RNA secondary structures and branching properties, as well as identify potential limitations or specific branching structures where the NNTM models do not sufficiently emulate reality. For example, NNTM-based free energy minimization algorithms achieved an accuracy of at least 60% in only 9% of 16S secondary structures analyzed by Doshi et. al. [15].

The algorithm presented here can only sample under an energy function of the form

α d_{0} + β d_{1}

, and this does not capture the entirety of the model presented in [16], which considers energy functions of the form

α d_{0} + β d_{1} + γ r

. However, the missing term,

γ r

, represents the energy contribution of the exterior loop, and the exterior loop contributes less of the total free energy as sequence length increases. Therefore, when interested in sequences of at least moderate length, this algorithm may be able to provide insight, as long as information about the exterior loop is not the specific object of study. Note that other authors have made similar simplifications with respect to the exterior loop, e.g., [17].

4.2. Possibility of a Dynamic Programming Approach

This sampling problem to calculate the dispersion of NNTM RNA secondary structure and properties utilized Markov chain techniques. However, is it possible to utilize a dynamic programming algorithm? It is straightforward to sample Dyck paths under a uniform probability distribution using dynamic programming techniques. However, it is not clear whether a similar technique could be used for the Gibbs distribution we define here, due to the complexity of the energy function. In particular, large numeric computations may be required to handle the variable k, the number of U steps in a path. While Alonso presents a way to sample from the unweighted distribution

Pr (k = l) \propto (\binom{m}{2 l}) C_{l}

in

O (n)

time without large computations [37], it is unclear if a similar method may be used for the present application.

4.3. Possibility of an SCFG Approach

Stochastic context-free grammars (SCFGs) have been widely used in the field of RNA secondary structure prediction, e.g., [38,39,40,41]. Most commonly, the probabilities for production rules in an SCFG are determined by training on a set of known secondary structures, often including covariance information from homologous structures. These approaches are not immediately applicable to the problem we study here, as they do not give any insight into the NNTM multiloop energy parameters.

However, some authors have constructed SCFGs based on the NNTM. In particular, Nebel and Scheid [38] construct an SCFG with 29 distinct production rules to mirror the NNTM features. They also present a sampling algorithm allowing for sampling structures of a fixed size using the grammar. However, they do not actually compute probabilities for the production rules that would allow one to sample from a Gibbs distribution (with NNTM energy) and instead rely on training on a set of known structures. Indeed, it is not clear from the paper whether such a set of probabilities must exist.

Even in the case of the simplified model we present in this manuscript, it is not clear how to assign probabilities to production rules in an SCFG so that the probability of obtaining a given structure matches the Gibbs probability under the NNTM. See Supplement 1: SCFG for more details.

Even if a suitable SCFG could be formulated, the SCFG approach is not necessarily superior. The sampling algorithm presented by Nebel and Scheid has time complexity

O (n^{3})

and space complexity

O (n^{2})

. While the algorithm we present does have large time complexity, it only requires linear space, which may be an advantage for some applications.

Even though we cannot easily formulate a SCFG, it is reasonable to consider whether a context-free grammar (such as that presented in Supplement 1) could nonetheless be used as the basis for a dynamic programming algorithm. In fact, this is possible. The key idea is to create a table for each non-terminal symbol X and then populate entry k of the table with

\sum e^{E (t)},

where the sum is taken over all trees

t \in T_{k}

which can be derived from symbol X.

Once the tables been populated with these (non-normalized) probabilities, a stochastic backtracking procedure can be used to obtain samples.

However, as in Section 4.2, an assumption that each arithmetic operation can be performed in unit time is not appropriate here. Because the elements of our dynamic programming tables are in fact parts of the partition function, we can conclude that the numbers involved could have up to

O (n)

digits. Each arithmetic operation, therefore, becomes much more expensive. While a polynomial-time dynamic programming algorithm based on a context-free grammar is possible, an efficient dynamic programming algorithm would require substantially more work.

4.4. Extended Applications

The Markov chain mixing analysis techniques explored in this manuscript have the potential for useful application in a variety of fields. Markov chain Monte Carlo algorithms are widely used in several fields including, machine learning [42], econometrics [43], and Bayesian Statistics [44]. In virtually all applications, an understanding of mixing time increases confidence in the results. In some situations, an understanding of mixing time may also allow for more efficient algorithm selection and implementation.

While many Markov chains with nonuniform stationary distributions have been used for biological applications (e.g., [45,46,47,48]), theoretical guarantees on the mixing time are generally not known. Instead, researchers must rely on convergence heuristics, and in fact, many introductions to Markov chain Monte Carlo written for biologists explain such heuristic techniques [49,50,51,52]. Of course, heuristics can be misleading, and rigorous mixing time guarantees would be significantly preferable. The same techniques used in this work might be used to generate algorithms with rigorous mixing time bounds for other biological problems concerning a nonuniform distribution.

The mathematical techniques used in this manuscript have been widely used in mathematics, physics, and computer science, demonstrating their broader applicability. For numerous examples, we direct the reader to the books of Levin, Peres, and Wilmer [53]; Montenegro and Tetali [54]; and Jerrum [55].

As an example where similar techniques have found utility in biological applications, it is interesting to briefly consider the study of cladograms, which arise from phylogenetic trees. Mathematically, a cladogram is a binary tree with n labeled leaves and

n - 2

unlabeled internal nodes. While an explicit formula is known for the exact number of cladograms of a given size, mixing time under certain dynamics has also been studied. For example, Aldous [56] studied a Markov chain where a leaf is removed at random and then attached to a random edge in the tree, obtaining a proof that the mixing time is bounded below by

O (n^{2})

and bounded above by

O (n^{3})

. Further work by Schweinsberg [57] later proved an upper bound of

O (n^{2})

, closing the gap between the upper and lower bounds.

4.5. Independent Mathematical Research Interests

The plane trees examined as a model for RNA secondary structure are of independent mathematical interest. As Catalan objects, they have been studied combinatorially (see, for example, [25,58]), and Markov chains on Catalan objects have received significant attention over the years [33,34,59,60,61], but with very few results providing tight estimates on the corresponding mixing times; most commonly these are discussed in the language of Dyck paths. Cohen’s thesis [33] gives an overview of the known mixing time results for chains on Catalan objects. All of the chains surveyed there have a uniform distribution over the Catalan-sized state space as their stationary distribution. Among these, essentially the only known chain with tight bounds (upper and lower bounds differing by a small multiplicative constant) is due to Wilson [34] and gives the relaxation time of

O (n^{3})

for the walk consisting of adjacent transpositions on Dyck paths. In comparison, in [59] the chain using all (allowed) transpositions has been shown to have relaxation time of

O (n^{2})

, and further conjectured to have

O (n)

as the relaxation time, in analogy with the random transposition shuffle of n cards.

Judging from the lack of progress on several of these chains, it is evident that determining mixing or relaxation time for these chains is typically a challenging problem, even in the case where the stationary distribution is uniform.

In the current work, the RNA secondary structure modeling naturally leads to a state-space on Catalan objects with a nonuniform distribution, making the corresponding mixing time analysis even more challenging. Another example where mixing times are estimated for Markov chains on Catalan objects with nonuniform stationary distribution is the work of Martin and Randall [30], which examines a Gibbs distribution on Dyck paths weighted by the number of returns to the x-axis.

4.6. Limitations and Future Directions

While the mixing time proved here is polynomial, it is almost certainly too large to allow for any practical computational sampling experiments. However, we conjecture the actual mixing time to be much smaller, and future work may provide a better bound. Even without additional theoretical results, interesting work is possible using the algorithm we present and heuristic methods for evaluating Markov chain mixing. See ([62], Ch. 8) for a discussion of heuristic methods for monitoring Markov chain convergence.

The results of this study provide an important mathematical foundation for examining the dispersion of RNA secondary structures and branching properties using a Markov chain. However, more work is necessary to optimize the developed computational application for incorporation into the software utilized by biologists that study RNA. Example questions that strongly compel further investigation include:

Can the mixing time bound in our main result be improved?
Is there a rapidly mixing chain, with the same stationary distribution studied here, whose transitions correspond naturally to moves on the set plane trees? Mixing time bounds on the chain of matching exchange moves, as defined in [63], would be especially interesting, as such a chain may relate to RNA folding kinetics.
Is there a rapidly mixing chain converging to the Gibbs distribution using the full energy function for the utilized NNTM model [16]? The chain presented here uses only the parameters $α$ and $β$ , setting $γ = 0$ .
Is there a stochastic context-free grammar which generates secondary structures (in our simplified model or using the full NNTM) according to a Gibbs distribution with NNTM energy?

Supplementary Materials

The following are available online at https://0-www-mdpi-com.brum.beds.ac.uk/2297-8747/25/4/67/s1. Supplement 1: SCFG.

Author Contributions

Conceptualization, A.K., K.P.; investigation, A.K., K.P., P.T., C.M.; formal analysis, A.K., K.P.; funding acquisition, A.K., K.P., P.T., C.M.; methodology, A.K., K.P.; supervision, P.T., C.M.; validation, A.K., K.P., P.T., C.M.; writing—original draft, A.K., K.P., C.M.; writing—review and editing, A.K., P.T., C.M. All authors have read and approved the published version of the manuscript.

Funding

Funding provided by National Science Foundation Graduate Research Fellowship Program to A.K.; Georgia Institute of Technology President’s Undergraduate Research Award to K.P.; National Science Foundation grant DMS-1811935 to P.T.; National Science Foundation CAREER award (1944247), National Institute of Health grant R21CA232249, and Alzheimer’s Association Research Grant to C.M.

Acknowledgments

The authors would like to thank Christine Heitsch for generous consultation. The authors would also like to think the reviewers for their insightful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RNA	ribonucleic acid
NNTM	Nearest Neighbor Thermodynamic Model
SCFG	stochastic context free grammar

References

Doudna, J.A. Structural genomics of RNA. Nat. Struct. Biol. 2000, 7, 954–956. [Google Scholar]
Tinoco, I., Jr.; Bustamante, C. How RNA folds. J. Mol. Biol. 1999, 293, 271–281. [Google Scholar]
Massire, C.; Westhof, E. MANIP: An interactive tool for modelling RNA. J. Mol. Gr. Model. 1998, 16, 197–205. [Google Scholar] [CrossRef]
Seetin, M.G.; Mathews, D.H. Automated RNA tertiary structure prediction from secondary structure and low-resolution restraints. J. Comput. Chem. 2011, 32, 2232–2244. [Google Scholar] [CrossRef]
Zhao, Y.; Gong, Z.; Xiao, Y. Improvements of the Hierarchical Approach for Predicting RNA Tertiary Structure. J. Biomol. Struct. Dyn. 2011, 28, 815–826. [Google Scholar] [CrossRef]
Zhao, Y.; Huang, Y.; Gong, Z.; Wang, Y.; Man, J.; Xiao, Y. Automated and fast building of three-dimensional RNA structures. Sci. Rep. 2012, 2, 734. [Google Scholar]
Borodavka, A.; Singaram, S.W.; Stockley, P.G.; Gelbart, W.M.; Ben-Shaul, A.; Tuma, R. Sizes of long RNA molecules are determined by the branching patterns of their secondary structures. Biophys. J. 2016, 111, 2077–2085. [Google Scholar]
Jaeger, J.A.; Turner, D.H.; Zuker, M. Improved predictions of secondary structures for RNA. Proc. Natl. Acad. Sci. USA 1989, 86, 7706–7710. [Google Scholar]
Mathews, D.H.; Sabina, J.; Zuker, M.; Turner, D.H. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 1999, 288, 911–940. [Google Scholar]
Mathews, D.H.; Disney, M.D.; Childs, J.L.; Schroeder, S.J.; Zuker, M.; Turner, D.H. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl. Acad. Sci. USA 2004, 101, 7287–7292. [Google Scholar]
Ding, Y.; Chan, C.Y.; Lawrence, C.E. Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res. 2004, 32, W135–W141. [Google Scholar] [CrossRef]
Hofacker, I.L.; Fontana, W.; Stadler, P.F.; Bonhoeffer, L.S.; Tacker, M.; Schuster, P. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie/Chemical Monthly 1994, 125, 167–188. [Google Scholar]
Mathuriya, A.; Bader, D.A.; Heitsch, C.E.; Harvey, S.C. GTfold: A scalable multicore code for RNA secondary structure prediction. In Proceedings of the 2009 ACM symposium on Applied Computing, Honolulu, HI, USA, 8–12 March 2009; pp. 981–988. [Google Scholar]
Turner, D.H.; Mathews, D.H. NNDB: The nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 2009, 38, D280–D282. [Google Scholar] [CrossRef]
Doshi, K.J.; Cannone, J.J.; Cobaugh, C.W.; Gutell, R.R. Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinf. 2004, 5, 105. [Google Scholar]
Hower, V.; Heitsch, C.E. Parametric analysis of RNA branching configurations. Bull. Math. Biol. 2011, 73, 754–776. [Google Scholar]
Bakhtin, Y.; Heitsch, C.E. Large deviations for random trees and the branching of RNA secondary structures. Bull. Math. Biol. 2009, 71, 84–106. [Google Scholar]
Heitsch, C.; Poznanović, S. Combinatorial insights into RNA secondary structure. In Discrete and Topological Models in Molecular Biology; Springer: Berlin, Germany, 2014; pp. 145–166. [Google Scholar]
Lorenz, R.; Bernhart, S.H.; Zu Siederdissen, C.H.; Tafer, H.; Flamm, C.; Stadler, P.F.; Hofacker, I.L. ViennaRNA Package 2.0. Algorithms Mol. Biol. 2011, 6, 26. [Google Scholar]
Donaghey, R.; Shapiro, L.W. Motzkin numbers. J. Comb. Theory Ser. A 1977, 23, 291–301. [Google Scholar] [CrossRef] [Green Version]
Bernhart, F.R. Catalan, Motzkin, and Riordan numbers. Discret. Math. 1999, 204, 73–112. [Google Scholar] [CrossRef]
Eu, S.P.; Fu, T.S.; Hou, J.T.; Hsu, T.W. Standard Young tableaux and colored Motzkin paths. J. Combin. Theory Ser. A 2013, 120, 1786–1803. [Google Scholar] [CrossRef]
Baril, J.L.; Kirgizov, S.; Petrossian, A. Motzkin paths with a restricted first return decomposition. Integers 2019, 19, A46. [Google Scholar]
Fang, W. A partial order on Motzkin paths. Discrete Math. 2020, 343, 111802, 9. [Google Scholar] [CrossRef] [Green Version]
Stanley, R.P. Enumerative Combinatorics: Volume 1, 2nd ed.; Cambridge University Press: New York, NY, USA, 2011. [Google Scholar]
Deutsch, E.; Shapiro, L.W. A bijection between ordered trees and 2-Motzkin paths and its many consequences. Discret. Math. 2002, 256, 655–670. [Google Scholar]
Madras, N.; Randall, D. Markov chain decomposition for convergence rate analysis. Ann. Appl. Probab. 2002, 12, 581–606. [Google Scholar] [CrossRef]
Randall, D. Rapidly Mixing Markov Chains with Applications in Computer Science and Physics. Comput. Sci. Eng. 2006, 8, 30–41. [Google Scholar] [CrossRef] [Green Version]
Luby, M.; Randall, D.; Sinclair, A. Markov Chain Algorithms for Planar Lattice Structures. SIAM J. Comput. 2001, 31, 167–192. [Google Scholar] [CrossRef] [Green Version]
Martin, R.; Randall, D. Sampling adsorbing staircase walks using a new Markov chain decomposition method. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Redondo Beach, CA, USA, 12–14 November 2000; pp. 492–502. [Google Scholar]
Hermon, J.; Salez, J. Modified log-Sobolev inequalities for strong-Rayleigh measures. arXiv 2019, arXiv:1902.02775. [Google Scholar]
Jerrum, M.; Son, J.B.; Tetali, P.; Vigoda, E. Elementary bounds on Poincaré and log-Sobolev constants for decomposable Markov chains. Ann. Appl. Probab. 2004, 14, 1741–1765. [Google Scholar] [CrossRef] [Green Version]
Cohen, E. Problems in catalan mixing and matchings in regular hypergraphs. Ph.D. Thesis, Georgia Institute of Technology, Atlanta, GA, USA, 2016. [Google Scholar]
Wilson, D.B. Mixing times of lozenge tiling and card shuffling Markov chains. Ann. Appl. Probab. 2004, 14, 274–325. [Google Scholar] [CrossRef]
Yoffe, A.M.; Prinsen, P.; Gopal, A.; Knobler, C.M.; Gelbart, W.M.; Ben-Shaul, A. Predicting the sizes of large RNA molecules. Proc. Natl. Acad. Sci. USA 2008, 105, 16153–16158. [Google Scholar]
Andronescu, M.; Bereg, V.; Hoos, H.H.; Condon, A. RNA STRAND: The RNA secondary structure and statistical analysis database. BMC Bioinf. 2008, 9, 340. [Google Scholar]
Alonso, L. Uniform generation of a Motzkin word. Theor. Comput. Sci. 1994, 134, 529–536. [Google Scholar]
Nebel, M.E.; Scheid, A. Evaluation of a sophisticated SCFG design for RNA secondary structure prediction. Theory Biosci. 2011, 130, 313–336. [Google Scholar]
Rivas, E.; Eddy, S.R. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 2000, 16, 583–605. [Google Scholar]
Rivas, E.; Eddy, S.R. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 1999, 285, 2053–2068. [Google Scholar]
Knudsen, B.; Hein, J. Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res. 2003, 31, 3423–3428. [Google Scholar]
Andrieu, C.; De Freitas, N.; Doucet, A.; Jordan, M.I. An introduction to MCMC for machine learning. Mach. Learn. 2003, 50, 5–43. [Google Scholar]
Chib, S. Introduction to simulation and MCMC methods. In The Oxford Handbook of Bayesian Econometrics; Oxford University Press: Oxford, UK, 2011. [Google Scholar]
Jackman, S. Estimation and inference via Bayesian simulation: An introduction to Markov chain Monte Carlo. Am. J. Political Sci. 2000, 1, 375–404. [Google Scholar]
Huelsenbeck, J.P.; Larget, B.; Alfaro, M.E. Bayesian Phylogenetic Model Selection Using Reversible Jump Markov Chain Monte Carlo. Mol. Biol. Evolut. 2004, 21, 1123–1133. [Google Scholar] [CrossRef]
Kim, S.; Li, H.; Dougherty, E.R.; Cao, N.; Chen, Y.; Bittner, M.; Suh, E.B. Can Markov chain models mimic biological regulation? J. Biol. Syst. 2002, 10, 337–357. [Google Scholar]
Lusseau, D. Effects of tour boats on the behavior of bottlenose dolphins: Using Markov chains to model anthropogenic impacts. Conserv. Biol. 2003, 17, 1785–1793. [Google Scholar]
Said, M.R.; Oppenheim, A.V.; Lauffenburger, D.A. Modeling cellular signal processing using interacting Markov chains. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, 6–10 April 2003. [Google Scholar]
Gelman, A.; Rubin, D.B. Markov chain Monte Carlo methods in biostatistics. Stat. Methods Med. Res. 1996, 5, 339–355. [Google Scholar]
Hamra, G.; MacLehose, R.; Richardson, D. Markov Chain Monte Carlo: An Introduction for Epidemiologists. Int. J. Epidemiol. 2013, 42, 627–634. [Google Scholar]
Nascimento, F.F.; dos Reis, M.; Yang, Z. A biologist’s guide to Bayesian phylogenetic analysis. Nat. Ecol. Evolut. 2017, 1, 1446–1454. [Google Scholar]
Van Ravenzwaaij, D.; Cassey, P.; Brown, S.D. A simple introduction to Markov Chain Monte–Carlo sampling. Psychon. Bull. Rev. 2018, 25, 143–154. [Google Scholar]
Levin, D.A.; Peres, Y.; Wilmer, E.L. Markov Chains and Mixing Times; American Mathematical Society: Providence, RI, USA, 2009. [Google Scholar]
Montenegro, R.R.; Tetali, P. Mathematical Aspects of Mixing Times in Markov Chains; Now Publishers Inc.: Boston, MA, USA, 2006. [Google Scholar]
Jerrum, M. Counting, Sampling and Integrating: Algorithms and Complexity; Springer Science & Business Media: New York, NY, USA, 2003. [Google Scholar]
Aldous, D.J. Mixing time for a Markov chain on cladograms. Comb. Probab. Comput. 2000, 9, 191–204. [Google Scholar]
Schweinsberg, J. An O (n2) bound for the relaxation time of a Markov chain on cladograms. Random Struct. Algorithms 2002, 20, 59–70. [Google Scholar]
Dershowitz, N.; Zaks, S. Ordered trees and non-crossing partitions. Discret. Math. 1986, 62, 215–218. [Google Scholar]
Cohen, E.; Tetali, P.; Yeliussizov, D. Lattice path matroids: Negative correlation and fast mixing. arXiv 2015, arXiv:1505.06710. [Google Scholar]
McShine, L.; Tetali, P. On the mixing time of the triangulation walk and other Catalan structures. In Proceedings of the Randomization Methods in Algorithm Design, Princeton, NJ, USA, 12–14 December 1997; pp. 147–160. [Google Scholar]
Sohoni, M. Rapid mixing of some linear matroids and other combinatorial objects. Graphs Combin. 1999, 15, 93–107. [Google Scholar] [CrossRef]
Robert, C.P.; Casella, G.; Casella, G. Introducing Monte Carlo Methods with R; Springer: Berlin, Germany, 2010; Volume 18. [Google Scholar]
Heitsch, C.E.; Tetali, P. Meander graphs. In Proceedings of the 23rd International Conference on Formal Power Series and Algebraic Combinatorics, Reykjavik, Iceland, 13–17 June 2011; pp. 469–480. [Google Scholar]

Figure 1. A ribonucleic acid (RNA) secondary structure for one of the combinatorial RNA sequences used in this work and its corresponding plane tree. The ordering of the edges in the plane tree is derived from the 3’ to 5’ ordering of the RNA sequence. Note that the exterior loop corresponds to the root of the plane tree. The diagram in (a) was generated by ViennaRNA [19]. (a) A maximally-paired secondary structure for A⁴(C⁵GA⁴CG⁵A⁴)⁴ has 4 helices; (b) The corresponding plane tree has 4 edges and encodes the branching pattern seen in the secondary structure.

Figure 2. A plane tree with edges labeled according to the bijection

Φ

, along with its corresponding 2-Motzkin path.

Figure 2. A plane tree with edges labeled according to the bijection

Φ

, along with its corresponding 2-Motzkin path.

Figure 3. The four level decomposition of

M_{m}^{2}

(left), and the projection chains corresponding to each decomposition (right).

Figure 3. The four level decomposition of

M_{m}^{2}

(left), and the projection chains corresponding to each decomposition (right).

Table 1. Nearest Neighbor Thermodynamic Model (NNTM) parameters and resulting energy functions. Energy functions are of the form

α d_{0} + β d_{1} + γ r

.

Table 1. Nearest Neighbor Thermodynamic Model (NNTM) parameters and resulting energy functions. Energy functions are of the form

α d_{0} + β d_{1} + γ r

.

Y	Z	Turner	a	b	c	h	f	i	g	$α$	$β$	$γ$
C	G	89	4.6	0.4	0.1	$- 10.9$	3.8	3.0	$- 1.6$	$- 0.9$	$- 1.8$	$- 1.7$
G	C	89	4.6	0.4	0.1	$- 16.5$	3.5	3.0	$- 1.9$	$- 0.9$	$- 1.2$	$- 1.7$
C	G	99	3.4	0	0.4	$- 12.9$	4.5	2.3	$- 1.6$	2.3	1.3	$- 0.4$
G	C	99	3.4	0	0.4	$- 16.9$	4.1	2.3	$- 1.9$	2.2	1.9	$- 0.4$
C	G	04	9.3	0	$- 0.9$	$- 12.9$	4.5	2.3	$- 1.1$	$- 2.8$	$- 3.0$	0.9
G	C	04	9.3	0	$- 0.9$	$- 16.9$	4.1	2.3	$- 1.5$	$- 2.8$	$- 2.2$	0.9

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kirkpatrick, A.; Patton, K.; Tetali, P.; Mitchell, C. Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications. Math. Comput. Appl. 2020, 25, 67. https://0-doi-org.brum.beds.ac.uk/10.3390/mca25040067

AMA Style

Kirkpatrick A, Patton K, Tetali P, Mitchell C. Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications. Mathematical and Computational Applications. 2020; 25(4):67. https://0-doi-org.brum.beds.ac.uk/10.3390/mca25040067

Chicago/Turabian Style

Kirkpatrick, Anna, Kalen Patton, Prasad Tetali, and Cassie Mitchell. 2020. "Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications" Mathematical and Computational Applications 25, no. 4: 67. https://0-doi-org.brum.beds.ac.uk/10.3390/mca25040067

Article Menu

Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications

Abstract

1. Introduction

2. Methods

2.1. Derivation of Energy Functions

2.2. Mathematical Preliminaries

2.2.1. Combinatorial Objects

2.2.2. A Bijection Between $T_{n}$ and $M_{n - 1}^{2}$

2.2.3. Markov Chains

2.2.4. Coupling

2.2.5. Decomposition

3. Results

3.1. Our Markov Chain on $M_{m}^{2}$

3.2. Mixing Time Results

4. Discussion and Conclusions

4.1. Applications to RNA Modeling

4.2. Possibility of a Dynamic Programming Approach

4.3. Possibility of an SCFG Approach

4.4. Extended Applications

4.5. Independent Mathematical Research Interests

4.6. Limitations and Future Directions

Supplementary Materials

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications

Abstract

1. Introduction

2. Methods

2.1. Derivation of Energy Functions

2.2. Mathematical Preliminaries

2.2.1. Combinatorial Objects

2.2.2. A Bijection Between T n and M n − 1 2

2.2.3. Markov Chains

2.2.4. Coupling

2.2.5. Decomposition

3. Results

3.1. Our Markov Chain on M m 2

3.2. Mixing Time Results

4. Discussion and Conclusions

4.1. Applications to RNA Modeling

4.2. Possibility of a Dynamic Programming Approach

4.3. Possibility of an SCFG Approach

4.4. Extended Applications

4.5. Independent Mathematical Research Interests

4.6. Limitations and Future Directions

Supplementary Materials

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2.2. A Bijection Between $T_{n}$ and $M_{n - 1}^{2}$

3.1. Our Markov Chain on $M_{m}^{2}$