Next Article in Journal
Energetic and Exergetic Analysis of an Ejector-Expansion Refrigeration Cycle Using the Working Fluid R32
Next Article in Special Issue
Deformed Algebras and Generalizations of Independence on Deformed Exponential Families
Previous Article in Journal
Geometric Interpretation of Surface Tension Equilibrium in Superhydrophobic Systems
Previous Article in Special Issue
Information Geometry Formalism for the Spatially Homogeneous Boltzmann Equation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Asymptotic Description of Neural Networks with Correlated Synaptic Weights

INRIA Sophia Antipolis Mediterannee, 2004 Route Des Lucioles, Sophia Antipolis, 06410, France
*
Author to whom correspondence should be addressed.
Entropy 2015, 17(7), 4701-4743; https://0-doi-org.brum.beds.ac.uk/10.3390/e17074701
Submission received: 13 February 2015 / Revised: 23 May 2015 / Accepted: 23 June 2015 / Published: 6 July 2015
(This article belongs to the Special Issue Entropic Aspects in Statistical Physics of Complex Systems)

Abstract

:
We study the asymptotic law of a network of interacting neurons when the number of neurons becomes infinite. Given a completely connected network of neurons in which the synaptic weights are Gaussian correlated random variables, we describe the asymptotic law of the network when the number of neurons goes to infinity. We introduce the process-level empirical measure of the trajectories of the solutions to the equations of the finite network of neurons and the averaged law (with respect to the synaptic weights) of the trajectories of the solutions to the equations of the network of neurons. The main result of this article is that the image law through the empirical measure satisfies a large deviation principle with a good rate function which is shown to have a unique global minimum. Our analysis of the rate function allows us also to characterize the limit measure as the image of a stationary Gaussian measure defined on a transformed set of trajectories.

1. Introduction

The goal of this paper is to study the asymptotic behavior and large deviations of a network of interacting neurons when the number of neurons becomes infinite. Our network may be thought of as a network of weakly-interacting diffusions: thus before we begin we briefly overview other asymptotic analyses of such systems. In particular, a lot of work has been done on spin glass dynamics, including Ben Arous and Guionnet on the mathematical side [14] and Sompolinsky and his co-workers on the theoretical physics side [58]. Furthermore, the large deviations of weakly interacting diffusions has been extensively studied by Dawson and Gartner [9,10], and more recently Budhiraja, Dupuis and Fischer [11,12]. More references to previous work on this particular subject can be found in these references.
Because the dynamics of spin glasses is not too far from that of networks of interacting neurons, Sompolinsky also succesfully explored this particular topic [13] for fully connected networks of rate neurons, i.e., neurons represented by the time variation of their firing rates (the number of spikes they emit per unit of time), as opposed to spiking neurons, i.e., neurons represented by the time variation of their membrane potential (including the individual spikes). For an introduction to these notions, the interested reader is referred to such textbooks as [1416]. In his study of the continuous time dynamics of networks of rate neurons, Sompolinsky and his colleagues assumed, as in the work on spin glasses, that the coupling coefficients, called the synaptic weights in neuroscience, were random variables independent and identically distributed with zero mean Gaussian laws. The main result obtained by Ben Arous and Guionnet for spin glass networks using a large deviations approach (respectively by Sompolinsky and his colleagues for networks of rate neurons using the local chaos hypothesis) under the previous hypotheses is that the averaged law of Langevin spin glass (respectively rate neurons) dynamics is chaotic in the sense that the averaged law of a finite number of spins (respectively neurons) converges to a product measure as the system gets very large.
The next theoretical efforts in the direction of understanding the averaged law of rate neurons are those of Cessac, Moynot and Samuelides [1721]. From the technical viewpoint, the study of the collective dynamics is done in discrete time, assuming no leak (this term is explained below) in the individual dynamics of each of the rate neurons. Moynot and Samuelides obtained a large deviation principle and were able to describe in detail the limit averaged law that had been obtained by Cessac using the local chaos hypothesis and to prove rigorously the propagation of chaos property. Moynot extended these results to the more general case where the neurons can belong to two populations, the synaptic weights are non-Gaussian (with some restrictions) but still independent and identically distributed, and the network is not fully connected (with some restrictions) [18].
The common thread to all of the above approaches is that, in the large network limit, the neurons are (probabilistically) independent of each other. This independence is desirable because it facilitates a reduction to the macroscopic level, since the net activity of the network can be accurately represented by the mean activity of any particular neuron. However, as our results further below demonstrate, complete independence between the neurons is not the only situation in which one may obtain an accurate reduction to the macroscopic level. We are therefore motivated to incorporate in the network model the fact that the synaptic weights are not independent and in effect often highly correlated. One of the reasons for this is the plasticity processes at work at the levels of the synaptic connections between neurons; see for example [22] for a biological viewpoint, and [14,16,23] for a more computational and mathematical account of these phenomena.
Our results imply that there are system-wide correlations between the neurons, even in the asymptotic limit. The key reason why we do not have propagation of chaos is that the Radon-Nikodym derivative d Q N d P N of the average laws in Proposition 8 cannot be tensored into N independent and identically distributed processes; whereas the simpler assumptions on the weight function Λ in Moynot and Samuelides allow the Radon-Nikodym derivative to be tensored. We remind the reader that the Radon-Nikodym derivative of a measure with respect to another measure is an extension to more general spaces of the following simple result from differential calculus: given two differentiable functions F (x) and G(x) defined on ℝ with derivatives f(x) and g(x), then the ration of the differentials dF (x) and dG(x) is equal to f(x)/g(x) whenever g(x) ≠ 0. In this example, the first measure is f(x) dx and the second g(x) dx. The interested reader may look at standard textbooks on real and complex analysis such as [24].
A very important implication of our result is that the mean-field behavior is insufficient to characterize the behavior of a population. Our limit process µe is system-wide and ergodic. Our work challenges the assumption held by some that one cannot have a “concise” macroscopic description of a neural network without an assumption of asynchronicity at the local population level.
In more details, the problem we solve in this paper is the following. Given a completely connected network of firing rate neurons in which the synaptic weights are Gaussian correlated random variables, we describe the asymptotic behavior of the network when the number of neurons goes to infinity. Like in [18,19] we study a discrete time dynamics but unlike these authors we cope with more complex intrinsic dynamics of the neurons, in particular we allow for a leak (to be explained in more detail below). In the large-size limit, the neurons are highly correlated. The probabilistic law is ergodic, which basically means that it is invariant under a shift of the indices. Despite the non-trivial correlations, we are able to obtain a macroscopic process µe which describes the large-size behavior of the system. Furthermore we are able to obtain various “reductions” to the macroscopic level, as outlined in Section 6.
To be complete, let us mention the fact that this problem has already partially been explored in Physics by Sompolinsky and Zippelius [5,6] and in Mathematics by Alice Guionnet [4] who analyzed symmetric spin glass dynamics, i.e., the case where the matrix of the coupling coefficients (the synaptic weights in our case) is symmetric. This is a very special case of correlation. The work in [25] is also an important step forward in the direction of understanding the spin glass dynamics when more general correlations are present.
Let us also mention very briefly another class of approaches toward the description of very large populations of neurons where the individual spikes generated by the neurons are considered. The model for individual neurons is usually of the class of Integrate and Fire (IF) neurons [26] and the underlying mathematical tools are those of the theory of point-processes [27]. Important results have been obtained in this framework by Gerstner and his collaborators, e.g., [28,29] in the case of deterministic synaptic weights. Related to this approach but from a more mathematical viewpoint, important results on the solutions of the mean-field equations have been obtained in [30]. In the case of spiking neurons but with a continuous dynamics (unlike that of IF neurons), the first author and collaborators have recently obtained some limit equations that describe the asymptotic dynamics of fully connected networks of neurons [31] with independent synaptic weights.
Because of the correlation of the synaptic weights, the natural space to work in is the infinite dimensional space of the trajectories, noted T , of a countably-infinite set of neurons and the set of stationary probability measures defined on this set, noted 1 , S + ( T ).
We introduce the process-level empirical measure, noted μ ^ N, of the N trajectories of the solutions to the equations of the network of N neurons and the averaged (with respect to the synaptic weights) law QN of the N trajectories of the solutions to the equations of the network of N neurons. The first result of this article (Theorem 1) is that the image law ΠN of QN through μ ^ N satisfies a large deviation principle (LDP) with a good rate function H which is shown to have a unique global minimum, µe. We remind the reader that the notion of an image law is simply an extension to more complicated objects than functions, i.e., probability measures, of the usual notion of a change of variables. The interested reader is referred to e.g., the textbook as [32]. Thus, with respect to the measure ΠN 1 , S + ( T ), if the set X contains the measure δ μ e, then Π (X) → 1 as N → ∞, whereas if δ μ e is not in the closure of X, ΠN(X) → 0 as N → ∞ exponentially fast and the constant in the exponential rate is determined by the rate function. Our analysis of the rate function allows us also, and this is our second result (Theorem 3), to characterize the limit measure µe as the image of a stationary Gaussian measure µ e ¯ defined on a transformed set of trajectories T . This is potentially very useful for applications since µ e ¯ can be completely characterized by its mean and spectral density. Furthermore the rate function allows us to quantify the probability of finite-size effects. Theorems 1 and 3 allows us to characterize the average (over the synaptic weights) behavior of the network. We also derive, and this is our third result, some properties of the infinite-size network that are true for almost all realizations of the synaptic weights (Theorems 4 and 6).
The paper is organized as follows. In Section 2 we describe the equations of our network of neurons, the type of correlation between the synaptic weights, define the proper state spaces and introduce the different probability measures that are necessary for establishing our results, in particular the process-level empirical measure, μ ^ N, ΠN and the image RN through μ ^ N of the law of the uncoupled neurons. We state the principle result of this paper in Theorem 1.
In Section 3 we introduce a certain Gaussian process attached to a given measure in 1 , S + ( T ) and 1 , S + ( T N ) and motivate this introduction by showing that the Radon-Nikodym derivative of QN with respect to the law of the uncoupled neurons can be expressed by the Gaussian process corresponding to the empirical measure μ ^ N. This allows us to compute the Radon-Nikodym derivative of ΠN with respect to RN for any measure in 1 , S + ( T ). Using these results, Section 4 is dedicated to the proof of the existence of a strong LDP for the measure ΠN. In Section 5 we show that the good rate function obtained in the previous section has a unique global minimum and we characterize it as the image of a stationary Gaussian measure. Section 6 is dedicated to drawing some important consequences of our first main theorem, in particular some quenched results. Section 7 explores some possible extensions of our work and we conclude with Section 8.

2. The Neural Network Model

We consider a fully connected network of N neurons. Not all sets of neurons are fully connected but many are, e.g., within the same cortical column. One of the major aims of this article is to quantify how quickly the system converges to its limit, so the rate function gives us a means of assessing whether the number of neurons in a cortical column is sufficiently high for the mean field equations to be accurate. For simplicity but without loss of generality, we assume N odd [33] and write N = 2n + 1, n ≥ 0. The state of the neurons is described by the variables ( U t j ) , j = n , , n , t = 0 , , T which represent the values of the neurons membrane potentials.

2.1. The Model Equations

The equation describing the time variation of the membrane potential Uj of the jth neuron writes
U t j = γ U t 1 j + i = n n J j i N f ( U t 1 i ) + B t 1 j , j = n , , n t = 1 , , T .
f : ] 0 , 1 [ is a monotonically increasing bijection which we assume to be Lipschitz continuous. Its Lipschitz constant is noted kf. We could for example employ f(x) = (1 + tanh(gx))/2, where the parameter g can be used to control the slope of the “sigmoid” f at the origin x = 0.
This equation involves the parameters γ , J i j N, and the time processes B t j , i , j = n , , n , t = 0 , , T 1. The initial conditions are discussed at the beginning of Section 2.2.2.
γ is in [0, 1) and determines the time scale of the intrinsic dynamics, i.e., without interactions, of the neurons. If γ = 0 the dynamics is said to have no leak.
The B t j s represent random fluctuations of the membrane potential of neuron j. They are independent random processes with the same law. We assume that at each time instant t, the B t j s are independent and identically distributed random variables distributed as N 1 ( 0 , σ 2 ) [34].
The J i j N s are the synaptic weights. J i j N represents the strength with which the “presynaptic” neuron j influences the “postsynaptic” neuron i. They are Gaussian random variables, independent of the membrane fluctuations, whose mean is given by
E [ J i j N ] = J N ,
where J ¯ is some number independent of N.
We note JN the N × N matrix of the synaptic weights, J N = ( J i j N ) i , j = n , , n. Their covariance is assumed to satisfy the following shift invariance property,
c o v ( J i j N J k l N ) = c o v ( J i + m , j + n N J k + m , l + n N )
for all indexes i, j, k, l = −n,⋯, n and all integers m and n, the indexes being taken modulo N. Here, and throughout this paper, i mod N is taken to lie between −n and n.
Remark 1. This shift invariance property is technically useful since it allows us to use the tools of Fourier analysis. In terms of the neural population it means that the neurons “live” on a circle. Therefore, unlike in the uncorrelated case studied in the papers cited in the introduction, we have to indirectly introduce a notion of space.
We stipulate the covariances through a covariance function Λ : 2 and assume that they scale as 1/N. We write
c o v ( J i j N J k l N ) = 1 N Λ ( ( k i ) mod N , ( l j ) mod N ) .
The function Λ is even:
Λ ( k , l ) = Λ ( k , l ) ,
corresponding to the simultaneous exchange of the two presynaptic and postsynaptic neurons ( c o v ( J i j N J k l N ) = c o v ( J k l N J i j N ) ! ). To be a well-defined covariance function, Λ must be a positive-definite function, i.e., satisfy the following property. For any ( b i j ) i , j [ n , n ] ,
i , j , k , l = n n b i j b k l Λ ( k i , l j ) 0.
This implies that the two-dimensional Fourier transform of Λ (also called the power spectral density) is positive, see also Proposition 1 below. Furthermore, for any k , l + , Λ ( 0 , 0 ) | Λ ( k , l ) |.
We must make further assumptions on Λ to ensure that the system is well-behaved as the number of neurons N asymptotes to infinity. We assume that the series ( Λ ( k , l ) ) k , l ) is absolutely convergent, i.e.,
Λ s u m = k , l = | Λ ( k , l ) | < ,
and furthermore that
Λ min = k , l = Λ ( k , l ) > 0 ,
We let ΛN be the restriction of Λ to [−n, n]2, i.e., ΛN(i, j) = Λ(i, j) for −n ≤ i, j ≤ n.
We next introduce the spectral properties of Λ that are crucial for the results in this paper. We use throughout the paper the notation that if x is some quantity, x ˜ represents its Fourier transform in a sense that depends on the particular space where x is defined. For example Λ ˜ is the 2π doubly periodic Fourier transform of the function Λ whose properties are described in the next proposition. Similarly, Λ ˜ N is the two-dimensional Discrete Fourier Transform (DFT) of the doubly periodic sequence ΛN. The proof of the following proposition is obvious.
Proposition 1. The sum Λ ˜ ( θ 1 , θ 2 ) of absolutely convergent series ( Λ ( k , l ) e i ( k θ 1 + l θ 2 ) ) k , l is continuous on [−π, π[2 and positive. The covariance function Λ is recovered from the inverse Fourier transform of Λ ˜:
Λ ( k , l ) = 1 ( 2 π ) 2 π π π π Λ ˜ ( θ 1 , θ 2 ) e i ( k θ 1 + l θ 2 ) d θ 1 d θ 2
Moreover there exists Λ ˜ min > 0 such that
Λ ˜ N ( 0 , 0 ) Λ ˜ min > 0 ,
for all N.

2.2. The Laws of the Uncoupled and Coupled Processes

2.2.1. Preliminaries

Sets of Trajectories, Temporal and Spatial Projections
The time evolution of one membrane potential is represented by the set [ 0 T ] : = T of finite sequences (ut)t=0,⋯,T of length T + 1 of real numbers. T N is the set of sequences (u−n,⋯,un) (N = 2n + 1) of elements of T that we use to describe the solutions to Equation (1). Similarly we note T the set of doubly infinite sequences of elements of T. If u is in T we note u i , i , its ith coordinate, an element of T. Hence u = ( u i ) i = .
Given the integers s and t, 0 ≤ s≤ t ≤ T, we define the temporal projection π s , t : T [ s t ] : = T s , t as the set of finite sequences of length ts + 1 of real numbers such that π s , t ( u ) = ( u r ) r = s t : = u s , t. When s = t we note πt and T t rather than πt,t and T t , t. The temporal projection πs,t extends in a natural way to T N and T : for example πs,t maps T N to T s , t N. We define the spatial projection π N : T T N ( N = 2 n + 1 ) to be π N ( u ) = ( u n , , u n ). Temporal and spatial projections commute, i.e., π N π s , t = π s , t π N.
The shift operator S : T T is defined by
( S u ) i = u i + 1 , i .
Given the element u = (u−n, …, un) of T N we form the doubly infinite periodic sequence
p N ( u ) = ( , u n 1 , u n , u n , , u n , u n , u n + 1 , )
which is an element of T . We have ( p N ( u ) ) i = u ( i mod N ) p N is a mapping T N T . With a slight abuse of notation we also note S the shift operator induced by S on T N through the function pN:
S u = π N ( S p N ( u ) ) u T N
Topologies on the Sets of Trajectories
We equip T with the projective topology, i.e., the topology generated by the following metric. For u , v T N, we define their distance dN(u, υ) to be
d N ( u , v ) = sup | j | n , 0 s T | u s j v s j | .
This allows us to define the following metric over T , whereby if u , v T then
d ( u , v ) = N = 1 2 N ( d N ( π N u , π N v ) Λ 1 ) ) ,
where ab is the smallest of a and b. Equiped with this topology, T z is Polish (a complete, separable metric space).
The metric d generates the Borelian σ-algebra ( T ) : = . It is generated by the coordinate functions ( u j i ) i , t = 0 T. The spatial and temporal projections defined above can be used to define the corresponding σ-algebras on the sets T s , t N, e.g., s , t N = π N ( π s , t ( ) ) , 0 s t T.
Probability Measures on the Sets of Trajectories
We note 1 + ( T ) (respectively 1 + ( T N )) the set of probability measures on ( T , ) (respectively ( T N , N )).
For μ 1 + ( T ), we denote its marginal distribution at time t by μ t = μ o μ t 1. Similarly, μ s , t N is its N-dimensional spatial, ts + 1-dimensional time marginal μ ( μ N ) 1 μ s , t 1.
We denote the conditional probability distribution of µ, given U 0 j = u 0 j (for all j), by μ u 0. This is understood to be a probability measure over ( T 1 , T ) : = 1 , T.
We note 1 , S + ( T ) the set of stationary probability measures on T . Given a random variable u with values in T governed by µ in 1 , S + ( T ), so is the random variable S u, with S the shift operator defined by Equation (6) ( equivalently μ S 1 = μ ). With a slight abuse of notation, we define 1 , S + ( T N ) to be the set of all μ 1 ( T N ) satisfying the following property. If (un, …, un) are random variables governed by µN, then for all |m| ≤ n, (um−n, …, um+n) has the same law as (u−n, …, un) (recall that the indexing is taken modulo N), or equivalently that μ N S 1 = μ N (remember Equation (8)).
Remark 2. Note that the stationarity discussed here is a spatial stationarity.
Process-Level Empirical Measure
We next introduce the following process-level empirical measure, see e.g., [35]. Given an element u = (u−n, …, un) in T N we associate with it the measure, noted μ ^ N ( u n , , u n ), in 1 , S + ( T ) defined by
μ ^ N : T N 1 , S + ( T ) such that d μ ^ N ( u n , , u n ) ( y ) = 1 N i = n n δ S i p N ( u ) ( y ) .
S is the shift operator defined in Equation (6).
Remark 3. This is a significant difference with previous work dealing with uncorrelated weights (e.g., [19]) where the N processes are coupled through the “usual” empirical measure d μ ^ N ( u n , , u n ) ( y ) = 1 N i = n n δ u i ( y ) which is a measure on T. In our case, because of the correlations and as shown in Section 3.4 the processes are coupled through the process-level empirical measure Equation (10) which is a probability measure on T . This makes our analysis more biologically realistic, since we know that correlations between the synaptic weights do exist, but technically more involved.
Topology on Sets of Measures
We next equip 1 + ( T ) with the topology of weak convergence, as follows. For μ N , v N 1 + ( T N ), we note the Wasserstein distance induced by the metric kfdN(u, υ) ∧ 1,
D N ( μ N , v N ) = inf J { E ( k f , d N ( u , v ) 1 ) } ,
where kf is a positive constant defined at the start of Section 2.1 and J is the set of all measures in 1 + ( T N × T N ) with N-dimensional marginals µN and υN.
Remark 4. The use of kf in Equation (11) is technical and used to simplify the proof of Proposition 5.
For μ , v 1 + ( T N ), we define
D ( μ , v ) = 2 n = 0 κ n D N ( μ N , v N ) ,
where N = 2n + 1. Here κ n = max ( λ n .2 N ) and λ n = k = | Λ ( k , n ) |. We note that this metric is well-defined because DN(μN, υN) ≤ 1 and n = 0 κ n < . It can be shown that 1 + ( T ) equiped with this metric is Polish.

2.2.2. Coupled and Uncoupled Processes

We specify the initial conditions for Equation (1) as N independent and identically distributed random variables ( U 0 j ) j = n , , n. Let µI be the individual law on of U 0 j; it follows that the joint law of the variables is μ I N on N. We note P the law of the solution to one of the uncoupled equations (1) where we take J i j N = 0 , i , j = n , , n. P is the law of the solution to the following stochastic difference equation:
U t = γ U t 1 + B t 1 , t = 1 , , T
the law of the initial condition being µI. This process can be characterized exactly, as follows.
Let Ψ : T T be the following bicontinuous bijection. Writing υ = Ψ(u), we define
{ v s = Ψ s ( u ) = u s γ u s 1 s = 1 , , T . v 0 = Ψ 0 ( u ) = u 0
The following proposition is evident from Equations (13) and (14).
Proposition 2. The law P of the solution to Equation (13) writes
P = ( N T ( 0 T , σ 2 Id T ) µ I ) Ψ ,
where 0T is the T-dimensional vector of coordinates equal to 0 and IdT is the T -dimensional identity matrix.
We later employ the convention that if u = ( u n , , u n ) T N then Ψ(u) = (Ψ(u−n), …, Ψ(un)). A similar convention applies if u T . We also use the notation Ψ1,T for the mapping T T 1 , T such that Ψ1,T = π1,T ◦ Ψ.
Reintroducing the coupling between the neurons, we note QN(JN) the element of 1 + ( T N ) which is the law of the solution to Equation (1) conditioned on JN. We let Q N = E J [ Q N ( J N ) ] be the law averaged with respect to the weights. The reason for this is as follows. We want to study the empirical measure μ ^ N on path space. There is no reason for this to be a simple problem since for a fixed interaction JN, the variables (U−n, ⋯, Un) are not exchangeable. So we first study the law of μ ^ N averaged over the interaction before we prove in Section 6 some almost sure properties of this law. QN is a common construction in the physics of interacting particle systems and is known as the annealed law [36].
We may thus infer that
Lemma 1. P N, QN and μ ^ N N (the N-dimensional marginal of μ ^ N) are in 1 , S + ( T N ).
Since the application ψ defined in Equation (14) plays a central role in the sequel we introduce the following definition.
Definition 1. For each measure μ 1 + ( T N ) or 1 , S + ( T ) we define μ ¯ to be μ ο ψ−1
In particular, note that
P ¯ = N T ( 0 T , σ 2 Id T ) μ I .
Finally we introduce the image laws in terms of which the principal results of this paper are formulated.
Definition 2. Let ΠN (respectively RN) be the image law of QN (respectively PN) through the function μ ^ N : T N 1 , S + ( T ) defined by Equation (10).
The central result of this paper is in the next theorem.
Theorem 1. ΠN is governed by a large deviation principle (LDP) with a good rate function H (to be found in Definition 5). That is, if F is a closed set in 1 , S + ( T ), then
lim ¯ N N 1 log N ( F ) inf μ F H ( μ ) .
Conversely, for all open sets O in 1 , S + ( T ),
lim N N 1 log N ( O ) inf μ O H ( μ ) .
Note lim N is the lim-sup and lim N is the lim-inf. By “good rate function”, we mean that for all a ≥ 0, the following set is compact
{ v : H ( v ) a } ,
see for example [37,38].
Remark 5. We recall that the above LDP is also called a strong LDP.
Our proof of Theorem 1 will occur in several steps. We prove in Sections 4.1 and 4.3 that ΠN satisfies a weak LDP, i.e., that it satisfies Equation (16) when F is compact and Equation (17) for all open O. We also prove in Section 4.2 that {ΠN} is exponentially tight, and we prove in Section 4.4 that H is a good rate function. It directly follows from these results that ΠN satisfies a strong LDP with good rate function H [38]. Finally, in Section 5 we prove that H has a unique minimum µe which μ ^ N converges to weakly as N → ∞. This minimum is a (stationary) Gaussian measure which we describe in detail in Theorem 3.

3. The Good Rate Function

In the sections to follow we will obtain an LDP for the process with correlations (ΠN) via the (simpler) process without correlations (RN). However, to do this we require an expression for the Radon-Nikodym derivative of ΠN with respect to RN, which is the main result of this section. The derivative will be expressed in terms of a function Γ [ N ] : 1 , S + ( T N ) . We will firstly define Γ[N](µ), demonstrating that it may be expressed in terms of a Gaussian process G [ N ] μ (to be defined below), and then use this to determine the Radon-Nikodym derivative of ΠN with respect to RN.

3.1. Gaussian Processes

Given µ in 1 , S + ( T ) we define a stationary Gaussian process Gµ. with values in T 1 , T For all i the mean of G t μ , i is given by c t μ, where
c t μ = J ¯ T f ( u t 1 i ) d μ ( u ) , t = 1 , , T , i ,
the above integral being well-defined because of the definition of f (which maps the state-variable to the activity, see text after Equation (1)) and independent of i due to the stationarity of µ.
We now define the covariance of Gµ. We first define the following matrix-valued process.
Definition 3. Let Mµ,k, k ∈ ℤ be the T × T matrix defined by (for s, t ∈ [1, T ]),
M s t μ , k = T f ( u s 1 0 ) f ( u t 1 k ) d μ ( u ) ,
the above integral being well-defined because of the definition of f.
These matrices satisfy
M μ , k = M μ , k ,
because of the stationarity of µ. Recall that denotes the transpose. Furthermore, they feature a spectral representation, i.e., there exists a T × T matrix-valued measure M ˜ μ = ( M ˜ μ ) s , t = 1 , , T with the following properties. Each M ˜ s t μ is a complex measure on [−π, π[ of finite total variation and such that
M μ , k = 1 2 π π π e i k θ M ˜ μ ( d θ ) .
Relations (20) and (21) imply the following relations, for all Borelian sets A [ π , π [ ,
M ˜ μ ( A ) = M ˜ μ ( A ) = M ˜ μ ( A ) * ,
where * indicates complex conjugation. We may infer from this that M ˜ μ is Hermitian-valued. The spectral representation means that for all vectors W T , W M ˜ μ ( d θ ) W is a positive measure on [−π, π[.
The covariance between the Gaussian vectors Gµ,i and Gµ,i+k is defined to be
K μ , k = t = Λ ( k , l ) M μ , l .
We note that the above summation converges for all k ∈ ℤ since the series (Λ(k, l))k,l∈ℤ is absolutely convergent and the elements of Mµ,l are bounded by 1 for all l ∈ ℤ.
It follows immediately from the definition that for μ 1 , S + ( T ) and k ∈ ℤ we have
K µ , k = K µ , k .
This is necessary for the covariance function to be well-defined. The following proposition may be easily proved from the above definitions.
Proposition 3. The sequence (Kµ,k)k∈ℤ has spectral density K ˜ μ given by
K ˜ μ ( θ ) = 1 2 π π π Λ ˜ ( θ , φ ) M ˜ μ ( d φ ) .
That is, K ˜ μ is Hermitian positive and satisfies K ˜ μ ( θ ) = K ˜ μ ( θ ) and K μ , k = 1 2 π π μ e i k θ K ˜ μ ( θ ) d θ.
Proof. The proof essentially consists of demonstrating that the matrix function
K ˜ μ ( θ ) = k = K μ , k e i k θ
is well-defined on [−π, π[ and is equal to the expression in the statement of the proposition. Afterwards, we will prove that K ˜ μ is positive.
From Equation (23) we obtain that, for all s, t ∈ [1 ⋯ T],
| K s t μ , k | t = | Λ ( k , l ) | .
This shows that, because by Equation (4) the series (Λ(k, l))k,l∈ℤ is absolutely convergent, K ˜ μ ( θ ) is well-defined on [−π, π]. The fact that K ˜ μ ( θ ) is Hermitian follows from Equations (25) and (24).
Combining Equations (21), (23) and (25) we write
K ˜ μ ( θ ) = 1 2 π π π ( m = k = Λ ( k , m ) e i ( k θ m φ ) ) M ˜ μ ( d φ ) .
This can be rewritten in terms of the spectral density Λ ˜ of Λ
K ˜ μ ( θ ) = 1 2 π π π Λ ˜ ( θ , φ ) M ˜ μ ( d φ ) .
We note that K ˜ μ ( θ ) is positive, because for all vectors W of ℝT
W K ˜ μ ( θ ) W = 1 2 π π π Λ ˜ ( θ , φ ) ( W M ˜ μ ( d φ ) W ) ,
the spectral density Λ ˜ is positive and the measure W M ˜ μ ( d φ ) W is positive. The identity K ˜ μ ( θ ) = K ˜ μ ( θ ) follows from Equation (24). □
We may also define the N-dimensional Gaussian process G [ N ] μ with values in T 1 , T N as follows. The mean of G [ N ] μ , i , i = n , , n is given by Equation (18) (or rather its finite dimensional analog) and the covariance between G [ N ] μ , i and G [ N ] μ , i + k is given by
K [ N ] μ , k = m = n n Λ ( k , m ) M μ , m ,
for k = −n,⋯, n. Equation (24) holds for K [ N ] μ , i , k = n , , n. This finite sequence has a Hermitian positive Discrete Fourier Transform denoted by K ˜ [ N ] μ , l , for l = n , , n.

3.2. Convergence of Gaussian Processes

The finite-dimensional system “converges” to the infinite-dimensional system in the following sense. In what follows, and throughout the paper, we use the Frobenius norm on the T × T matrices. We write K ˜ [ N ] μ ( θ ) = k = n n K [ N ] μ , k exp ( i k θ ). Note that for | j | n , K ˜ [ N ] μ ( 2 π j / N ) = K ˜ [ N ] μ , j. The lemma below follows directly from the absolute convergence of j , k | Λ ( j , k ) |.
Lemma 2. Fix μ 1 , S + ( T ). For all ε, there exists an N such that for all M > N and all j such that 2 | j | + 1 M , K [ M ] μ , j K μ , j < ε and for all θ [ π , π [ , K ˜ [ M ] μ ( θ ) K ˜ μ ( θ ) ε.
Lemma 3. The eigenvalues of K ˜ [ N ] μ , l and K ˜ μ ( θ ) are upperbounded by ρ K def T Λ s u m, where Λsum is defined in Equation (4).
Proof. Let W ∈ ℝT. We find from Proposition 3 and Equation (4) that
W K ˜ μ ( θ ) W Λ s u m 2 π π π W M ˜ μ ( d φ ) W = Λ s u m W M μ , 0 W .
The eigenvalues of Mµ,0 are all positive (since it is a correlation matrix), which means that each eigenvalue is upperbounded by the trace, which in turn is upperbounded by T. The proof in the finite dimensional case follows similarly. □
We note K [ N ] μ the (NT × NT) covariance matrix of the sequence of Gaussian random variables ( G [ N ] μ , n , , G [ N ] μ , n ). Because of the properties of the matrixes K [ N ] μ , k , k = n n, this is a symmetric block circulant matrix. It is also positive, being a covariance matrix.
We let A [ N ] μ = K [ N ] μ ( σ 2 Id N T + K [ N ] μ ) 1. This is well-defined because K [ N ] μ is diagonalizable (being symmetric and real) and has positive eigenvalues (being a covariance matrix). It follows from Lemma 20 in Appendix A that this is a symmetric block circulant matrix, with blocks A [ N ] μ , k ( k = n , , n ) such that
A [ N ] μ , k = A [ N ] μ , k
and that the matrixes
A ˜ [ N ] μ , l = k = n n A [ N ] μ , k e 2 π i k l N = K ˜ [ N ] μ , l ( σ 2 Id T + K ˜ [ N ] μ , l ) 1
are Hermitian positive.
In the limit N → ∞ we may define
A ˜ µ ( θ ) = K ˜ µ ( θ ) ( σ 2 Id T + K ˜ µ ( θ ) ) 1 .
The Fourier series of õ is absolutely convergent as a consequence of Wiener’s Theorem. We thus find that, for l ∈ ℤ,
A μ , l = 1 2 π π π A ˜ μ ( θ ) e i l θ d θ = lim N A [ N ] μ , l ,
and A ˜ μ ( θ ) = l = A μ , l e i l θ. Let A ˜ [ N ] μ ( θ ) = k = n n A [ N ] μ , k exp ( i k θ ) and note that for |j| ≤ n, A ˜ [ N ] μ ( 2 π j / N ) = A ˜ [ N ] μ , j.
Lemma 4. The map BB2IdT + B)1 is Lipschitz continuous over the set Δ = { K ˜ [ N ] μ ( θ ) , K ˜ μ ( θ ) : μ 1 , S + ( T ) , N > 0 , θ [ π , π ] }.
Proof. The proof is straightforward using the boundedness of the eigenvalues of the matrixes in Δ. □
The following lemma is a consequence of Lemmas 2 and 4.
Lemma 5. Fix µ 1 , S + ( T ). For all ε, there exists an N such that for all M > N and all θ ∈ [−π, π[, | | A ˜ [ M ] μ ( θ ) A ˜ μ ( θ ) | | ε.
The above-defined matrices have the following “uniform convergen” properties.
Proposition 4. Fix v 1 , S + ( T ). For all ε > 0, there exists an open neighbourhood Vε(ν) such that for all μVε(ν), all s, t ∈ [1,T] and all θ ∈ [−π, π[,
| K ˜ s t ν ( θ ) K ˜ s t μ ( θ ) | ε ,
| A ˜ s t ν ( θ ) A ˜ s t μ ( θ ) | ε ,
| c s ν c s μ | ε ,
and for all N > 0, and for all k such that |k| ≤ n,
| K ˜ [ N ] , s t ν , k K ˜ [ N ] , s t μ , k | ε ,
and
| A ˜ [ N ] , s t ν , k A ˜ [ N ] , s t μ , k | ε .
Proof. The proof is found in Appendix B. □
Before we close this section we define a subset of 1 , S + ( T ) which appears naturally, i.e., it is the subset of 1 , S + ( T ) where the rate function (to be defined) is not infinite, see Section 3.3.2 and Lemma 8.
Definition 4. Let 2 be the subset of 1 , S + ( T ) defined by
2 = { μ 1 , S + ( T ) | E μ ¯ 1 , T [ v 0 2 ] < } ,
where v T 1 , T and
E μ ¯ 1 , T [ v 0 2 ] = T 1 , T π 1 ( v ) 2 d μ 1 , T ( v ) = T 1 , T v 0 2 μ 1 , T 1 ( d v 0 ) .
For this set of measures, we may define the stationary process (υk)k∈ℤ in T 1 , T , where v s k = Ψ s ( u k ), s = 1, ⋯, T. This has a finite mean Eµ1,T [v0], noted v ¯ µ. It admits the following spectral density measure, noted v ˜ μ, such that
E μ ¯ 1 , T [ v 0 v k ] = 1 2 π π π e i k θ v ˜ μ ( d θ ) .

3.3. Definition of the Functional Γ

In this section we define and study a functional Γ[N] = Γ[N],1 + Γ[N],2, which will be used to characterize the Radon-Nikodym derivative of ΠN with respect to RN. Let µ 1 , S + ( T ), and let (μN)N≥1 be the N-dimensional marginals of μ (for N = 2n + 1 odd).

3.3.1. Γ1

We define
Γ [ N ] , 1 ( μ ) = 1 2 N log ( det ( Id N T + 1 σ 2 K [ N ] μ ) ) .
Because of the remarks after Lemma 3 the spectrum of K [ N ] μ[N] is positive, that of Id N T + 1 σ 2 K [ N ] μ is strictly positive (in effect larger than 1) and the above expression has a sense. Moreover, Γ[N],1(µ) 0.
We now define Γ1(µ) = limN→∞ Γ[N],1(µ). The following lemma indicates that this is well-defined.
Lemma 6. When N goes to infinity the limit of Equation (38) is given by
Γ 1 ( μ ) = 1 4 π π π log ( det ( Id N T + 1 σ 2 K ˜ μ ( θ ) ) ) d θ
for all μ 1 , S + ( T ).
Proof. Through Lemma 20 in Appendix A, we have that
Γ [ N ] , 1 ( μ ) = 1 2 N l = n n log ( det ( Id T + 1 σ 2 K ˜ [ N ] μ ( 2 π l N ) ) ) ,
where we recall that K ˜ [ N ] μ ( 2 π l N ) = K ˜ [ N ] μ l. Since, by Lemma 2, K ˜ [ N ] μ ( θ ) converges uniformly to K ˜ μ ( θ ), it is evident that the above expression converges to the desired result. □
Proposition 5. Γ[N],1 and Γ1 are bounded below and continuous on 1 , S + ( T ).
Proof. Applying Lemma 19 in the case of Z = ( G [ N ] µ , n c µ , , G [ N ] µ , n c µ ), a = 0, b = σ2, we write
Γ [ N ] , 1 ( μ ) = 1 N log E [ exp ( 1 2 σ 2 k = n n G [ N ] μ , k c μ 2 ) ] .
Using Jensen’s inequality we have
Γ [ N ] , 1 ( μ ) 1 2 N σ 2 E [ k = n n G [ N ] μ , k c μ 2 ] = 1 2 σ 2 E [ G [ N ] μ , 0 c μ 2 ] .
By definition of K [ N ] μ , 0, the right-hand side is equal to 1 2 σ 2 Trace ( K [ N ] μ , 0 ). From Equation (28), we find that
Trace ( K [ N ] μ , 0 ) = m = n n Λ ( 0 , m ) Trace ( M μ , m ) .
It follows from Equation (19) that 0 ≤ Trace(Mμ,m) ≤ T. Hence Γ[N],1(μ) ≥ −β1, where
β 1 = T Λ s u m 2 σ 2
It follows from Lemma 6 that −β1 is a lower bound for Γ1(μ) as well.
The continuity of both Γ[N],1 and Γ1 follows from the expressions (38) and (39), continuity of the applications μ K ˜ [ N ] μ and μ K ˜ μ (Proposition 4) and the continuity of the determinant.

3.3.2. Γ2

For μ 1 , S + ( T ) we define
Γ [ N ] , 2 ( μ ) = T 1 , T N ϕ N ( μ , ν ) μ ¯ 1 , T N ( d ν )
where ϕ N : 1 , S + ( T ) × T 1. T N is defined by
ϕ N ( μ , ν ) = 1 2 σ 2 ( 1 N j , k = n n ( ν j c μ ) A [ N ] μ , ν ( ν k + j c μ ) + 2 N j = n n c μ , ν j c μ 2 ) .
Γ[N],2(μ) is finite in the subset 2 of 1 , S + ( T ) defined in Definition 4. If μ2, then we set Γ[N],2(μ) = ∞.
We define Γ2(μ) = limN→∞ Γ[N],2(μ). The following proposition indicates that Γ2(μ) is well-defined.
Proposition 6. If the measure μ is in 2, i.e., if E μ ¯ 1 , T [ ν 0 2 ] < , then Γ2(μ) is finite and writes
Γ 2 ( μ ) = 1 2 σ 2 ( 1 2 π π π A ˜ μ ( θ ) : ν ˜ μ ( d θ ) + c μ ( A ˜ μ ( 0 ) Id T ) c μ + 2 E μ ¯ 1 , T [ t ν 0 ( Id T A ˜ μ ( 0 ) ) c μ ] ) .
The “:” symbol indicates the double contraction on the indexes. One also has
Γ 2 ( μ ) = 1 2 σ 2 ( lim n k = n n T 1 , T ( ν 0 c μ ) A μ , k ( ν k c μ ) d μ ¯ 1 , T ( ν ) + 2 E μ ¯ 1 , T [ c μ , ν 0 ] c μ 2 ) .
Proof. Using Equations (37) and (42) the stationarity of μ and the fact that k = n n A [ N ] μ , k = A ˜ [ N ] μ ( 0 ), we have
Γ [ N ] , 2 ( μ ) = 1 4 π σ 2 π π k = n n exp ( i k θ ) A [ N ] μ , k : ν ˜ μ ( d θ ) + 1 σ 2 T 1 , T N c μ , ν 0 c μ A ˜ [ N ] μ ( 0 ) ν 0 d μ ¯ 1 , T N ( ν ) + 1 2 σ 2 c μ ( Id T A ˜ [ N ] μ ( 0 ) ) c μ .
From the spectral representation of A [ N ] μ we find that
Γ [ N ] , 2 ( μ ) = 1 4 π σ 2 π π A ˜ [ N ] μ ( θ ) : ν ˜ μ ( d θ ) + 1 σ 2 E μ ¯ 1 , T [ ν 0 ( Id T A ˜ [ N ] μ ( 0 ) ) c μ ] + 1 2 σ 2 c μ ( Id T A ˜ [ N ] μ ( 0 ) ) c μ .
Since (according to Proposition 5) A ˜ [ N ] μ ( θ ) converges uniformly to Ãμ(θ) as N → ∞, it follows by dominated convergence that Γ[N],2(μ) converges to the expression in the proposition.
The second expression for Γ2(μ) follows analogously, although this time we make use of the fact that the partial sums of the Fourier Series of Ãμ converge uniformly to Ãμ (because the Fourier Series is absolutely convergent).
We next obtain more information about the eigenvalues of the matrices A ˜ [ N ] μ , k = A ˜ [ N ] μ ( 2 k π N ) (where k = −n, …, n) and Ãμ(θ).
Lemma 7. There exists 0 < α < 1, such that for all N and μ, the eigenvalues of A ˜ [ N ] μ , k, Ãμ(θ) and A [ N ] μ are less than or equal to α.
Proof. By Lemma 3, the eigenvalues of K ˜ μ ( θ ) are positive and upperbounded by ρK. Since K ˜ μ ( θ ) and ( σ 2 Id T + K ˜ μ ( θ ) ) 1 are coaxial (because K ˜ μ is Hermitian and therefore diagonalisable), we may take
α = ρ K σ 2 + ρ K .
This upperbound also holds for A ˜ [ N ] μ , k, and for the eigenvalues of A [ N ] μ, because of Lemma 20.
We wish to prove that Γ[N],2(μ) is lower semicontinuous. A consequence of this will be that Γ[N],2(μ) is measureable with respect to ( 1 , S + ( T ) ). In effect, we prove in Appendix C that ϕN(μ, v) defined by Equation (43) satisfies
ϕ N ( μ , ν ) β 2
for some positive constant β2 defined in Equation (87) in Appendix C.
We then have the following proposition.
Proposition 7. Γ[N],2(μ) is lower-semicontinuous.
Proof. We define ϕ N , M ( μ , ν ) = 1 B M ( ν ) ( ϕ N ( μ , ν ) + β 2 ), where 1 B M is the indicator of BM and vBM if N 1 j = n n ν 2 2 M. We have just seen that ϕN,M ≥ 0. We also define
Γ [ N ] , 2 M ( μ ) = T 1 , T N ϕ N , M ( μ , ν ) μ ¯ 1 , T N ( d ν ) β 2 .
Suppose that νnμ with respect to the weak topology in 1 , S + ( T ). Observe that
| Γ [ N ] , 2 M ( μ ) Γ [ N ] , 2 M ( ν n ) | | T 1 , T N ϕ N , M ( μ , ν ) μ ¯ 1 , T N ( d ν ) T 1 , T N ϕ N , M ( μ , ν ) ν ¯ n 1 , T N ( d ν ) | + | T 1 , T N ϕ N , M ( μ , ν ) ν ¯ n 1 , T N ( d ν ) T 1 , T N ϕ N , M ( v n , ν ) ν ¯ n 1 , T N ( d ν ) | .
We may infer from the above expression that Γ 2 , [ N ] M ( μ ) is continuous (with respect to μ) for the following reasons. The first term on the right hand side converges to zero because ϕN,M is continuous and bounded (with respect to v). The second term converges to zero because ϕN,M(μ, v) is a continuous function of μ, see Proposition 4.
Since Γ [ N ] , 2 M ( μ ) grows to Γ[N],2(μ) as M → ∞, we may conclude that Γ[N],2(μ) is lower semicontinuous with respect to μ.
We define Γ[N](μ) = Γ[N],1(μ) + Γ[N],2(μ). We may conclude from Propositions 5 and 7 that Γ[N] is measurable.

3.4. The Radon-Nikodym Derivative

In this section we determine the Radon-Nikodym derivative of ΠN with respect to RN. However, in order for us to do this, we must first compute the Radon-Nikodym derivative of QN with respect to PN. We do this in the next proposition.
Proposition 8. The Radon-Nikodym derivative of QN with respect to PN is given by the following expression.
d Q N d P N ( u n , , u n ) = E [ exp ( 1 σ 2 ( j = n n Ψ 1 , T ( u j ) , G j 1 2 G j 2 ) ) ] ,
the expectation being taken against the N T -dimensional Gaussian processes (Gi), i = −n, ⋯, n given by
G t i j = n n J i j N f ( u t 1 j ) , t = 1 , , T ,
and the function Ψ being defined by Equation (14). Note that the ( G t i ) constitute a finite-dimensional Gaussian Process because they are a linear combination of the Gaussian Processes J i j N.
Proof. For fixed JN, we let R J N : N ( T + 1 ) N ( T + 1 ) be the mapping uy, i.e., R J N ( u n , , u n ) = ( y n , , y n ), where for j = −n, ⋯, n,
{ y 0 j = u 0 j y t j = u t j γ u t 1 j G t j t = 1 , , T .
The determinant of the Jacobian of R J N is 1 for the following reasons. Since d y s j d u t k = 0 if t > s, the determinant is s = 0 T D s, where Ds is the Jacobian of the map ( u s n , , u s n ) ( y s n , , y s n ) induced by R J N. However, Ds is evidently 1. Similar reasoning implies that R J N is a bijection.
It may be seen that the random vector Y = R J N ( U ), U solution of Equation (1), is such that Y 0 j = U 0 j and Y t j = B t 1 j where |j| ≤ n and t = 1, ⋯, T. Therefore
Y j N T ( 0 , σ 2 Id T ) μ I , j = n , , n .
Since the determinant of the Jacobian of R J N is one, we obtain the law of QN(JN) by applying the inverse of R J N to the above distribution, i.e.,
Q N ( J N ) ( d u ) = ( 2 π σ 2 ) N T 2 exp ( 1 2 σ 2 R J N ( u ) 2 ) j = n n μ I ( d u 0 j ) t = 1 T d u t j .
Note that, exceptionally, ‖ ‖ is the Euclidean norm in N ( T + 1 ) or T N.
Recalling that PN = QN(0), we therefore find that
d Q N ( J N ) d P N ( u ) = exp ( 1 2 σ 2 ( R J N ( u ) 2 R 0 ( u ) 2 ) ) .
Taking the expectation of this with respect to JN yields the result.
In fact, as stated in the proposition below, the Gaussian system ( G s i ) i = n , , n , s = 1 , , T has the same law as the system G [ N ] μ ^ N, as defined in Equation (28) and afterwards.
Proposition 9. Fix u T N. The covariance of the Gaussian system ( G s i ), where i = −n, …, n and s = 1, …, T writes K [ N ] μ ^ N ( u ). For each i, the mean of Gi is c μ ^ N ( u ).
The proof of this proposition is an easy verification left to the reader. We obtain an alternative expression for the Radon-Nikodym derivative in Equation (46) by applying Lemma 19 in Appendix A. That is, we substitute Z = (Gn, ⋯, Gn), a = 1 σ 2 ( ν n , , ν n ), and b = 1 σ 2 into the formula in Lemma 19. After noting Proposition 9 we thus find that
Proposition 10. The Radon-Nikodym derivatives write as
d Q N d P N ( u n , , u n ) = exp ( N Γ [ N ] ( μ ^ N ( u n , , u n ) ) , d Π N d R N ( μ ) = exp ( N Γ [ N ] ( μ ) ) .
Here μ 1 , S + ( T ), Γ[N](μ) = Γ[N],1(μ) + Γ[N],2(μ) and the expressions for Γ[N],1 and Γ[N],2 have been defined in Equations (38) and (42).
The second expression in the above proposition follows from the first one because Γ[N] is measureable.
Remark 6. Proposition 10 shows that the process solutions of Equation (1) are coupled through the process-level empirical measure unlike in the case of independent weights where they are coupled through the usual empirical measure. As mentioned in the Remark 3 this complicates significantly the mathematical analysis.

4. The Large Deviation Principle

In this section we prove the principal result of this paper (Theorem 1), that the image laws ΠN satisfy an LDP with good rate function H (to be defined below). We do this by firstly establishing an LDP for the image law with uncoupled weights (RN), see Definition 2, and then use the Radon-Nikodym derivative of Corollary 10 to establish the full LDP for ΠN. Therefore our first task is to write the LDP governing RN.
Let μ, ν be probability measures over a Polish Space Ω equipped with its Borelian σ-algebra. The Küllback-Leibler divergence of μ relative to ν (also called the relative entropy) is
I ( 2 ) ( μ , ν ) = Ω log ( d μ d ν ) d μ
if μ is absolutely continuous with respect to ν, and I(2)(μ, ν) = ∞ otherwise. It is a standard result that
I ( 2 ) ( μ , ν ) = sup f C b ( Ω ) { E μ [ f ] log E ν [ exp ( f ) ] } ,
the supremum being taken over all bounded continuous functions.
For μ M 1 , S + ( Ω ) and ν 1 + ( Ω ), the process-level entropy of μ with respect to ν is defined to be
I ( 3 ) ( μ , ν ) = lim N 1 N I ( 2 ) ( μ N , ν N ) .
See Lemma IX.2.4 in [35] for a proof that this (possibly infinite) limit always exists (the superadditivity of the sequence N−1I(2)(μN) follows from Equation (49)).
Theorem 2. RN is governed by a large deviation principle with good rate function [39,40]
I ( 3 ) ( μ , P ) = I ( 3 ) ( μ 0 , μ I ) + I ( 3 ) ( μ u 0 , P u 0 ) d μ 0 ( u 0 ) ,
where μ 0 = μ π 0 1 is the time marginal of μ at time 0 and μ u 0 1 , S + ( T 1 , T ) is the conditional probability distribution of μ given u0 in. In addition, the set of measures {RN} is exponentially tight.
Proof. RN satisfies an LDP with good rate function I(3)(μ, P) [35]. In turn, a sequence of probability measures (such as {RN}) over a Polish Space satisfying a large deviations upper bound with a good rate function is exponentially tight [38].
It is an identity in [41] that
I ( 2 ) ( μ N , P N ) = I ( 2 ) ( μ 0 N , μ I N ) + N I ( 2 ) ( μ u 0 N , P u 0 N ) d μ 0 N ( u 0 ) .
It follows directly from the variational expression (49) that
I ( 2 ) ( μ u 0 2 N , P u 0 2 N ) 2 I ( 2 ) ( μ u 0 N , P u 0 N ) .
Note that although our convention throughout this paper is for N to be odd, the limit in Equation (50) exists for any sequence of integers going to ∞. We divide Equation (52) by N and consider the subsequence of all N of the form N = 2k for k ∈ ℤ+. It follows from Equation (53) that N 1 I ( 2 ) ( μ u 0 N , P u 0 N ) is strictly nondecreasing as N = 2k → ∞ (for all u0), so that Equation (51) follows by the monotone convergence theorem.
Because Ψ is bijective and bicontinuous, it may be easily shown that
I ( 2 ) ( μ N , P N ) = I ( 2 ) ( μ ¯ N , P ¯ N )
I ( 3 ) ( μ , P ) = I ( 3 ) ( μ ¯ , P ¯ ) .
Before we move to a statement of the LDP governing ΠN, we prove the following relationship between the set 2 (see Definition 4) and the set of stationary measures which have a finite Küllback-Leibler divergence or process-level entropy with respect to P.
Lemma 8.
{ μ 1 , S + ( T ) , I ( 3 ) ( μ , P ) < } ε 2 .
See Lemma 10 in [42] for a proof. We are now in a position to define what will be the rate function of the LDP governing ΠN.
Definition 5. Let H be the function 1 , S + ( T ) { + } defined by
H ( μ ) = { + if I ( 3 ) ( μ , P ) = I ( 3 ) ( μ , P ) Γ ( μ ) otherwise .
Here Γ(μ) = Γ1(μ) + Γ2(μ) and the expressions for Γ1 and Γ2 have been defined in Lemma 6 and Proposition 6. Note that because of Proposition 6 and Lemma 8, whenever I(3) (μ, P) is finite, so is Γ(μ).

4.1. Lower Bound on the Open Sets

We prove the second half of Proposition 1.
Lemma 9. For all open sets O, Equation (17)
lim ¯ N N 1 log Π N ( O ) inf μ O H ( μ ) ,
holds.
Proof. From the expression for the Radon-Nikodym derivative in Proposition 10 we have
Π N ( O ) O exp ( N Γ [ N ] ( μ ) ) d R N ( μ ) .
If μO is such that I(3)(μ, P) = ∞, then H(μ) = ∞ and evidently Equation (17) holds. We now prove Equation (17) for all μO such that I(3)(μ, P) < ∞. Let ε > 0 and Z ε N ( μ ) O be an open neighbourhood containing μ such that inf ν Z ε N ( μ ) Γ [ N ] ( ν ) Γ [ N ] ( μ ) ε. Such { Z ε N ( μ ) } exist for all N because of the lower semi-continuity of Γ[N](μ) (see Propositions 5 and 7). Then
lim ¯ N N 1 log Π N ( O ) = lim ¯ N N 1 log O exp ( N Γ [ N ] ( ν ) ) d R N ( ν ) lim ¯ N N 1 log ( R N ( Z ε N ( μ ) ) × inf ν Z ε N ( μ ) exp ( N Γ [ N ] ( ν ) ) ) I ( 3 ) ( μ , P ) + lim ¯ N inf ν Z ε N ( μ ) Γ [ N ] ( ν ) because of Theorem 2 I ( 3 ) ( μ , P ) + lim ¯ N Γ [ N ] ( μ ) ε = I ( 3 ) ( μ , P ) + Γ ( μ ) ε .
The last equality follows from Lemma 6 and Proposition 6. Since ε is arbitrary, we may take the limit as ε → 0 to obtain Equation (17). Since Equation (17) is true for all μO the lemma is proved.

4.2. Exponential Tightness of ΠN

We begin with the following technical lemma, the proof of which can be found in Appendix D.
Lemma 10. There exist positive constants c > 0 and a > 1 such that, for all N,
T N exp ( a N ϕ N ( μ ^ N ( u ) Ψ 1 , T ( u ) ) ) P N ( d u ) exp ( N c ) ,
where φN is defined in Equation (43).
This lemma allows us to prove the exponential tightness:
Proposition 11. The familyN} is exponentially tight.
Proof. Let B ( 1 , S + ( T ) ). We have from Proposition 10
Π N ( B ) = ( μ ^ N ) 1 ( B ) exp N Γ [ N ] ( μ ^ N ( u ) ) P N ( d u ) .
Through Hölder’s Inequality, we find that for any a > 1:
Π N ( B ) R N ( B ) ( 1 1 a ) ( ( μ ^ N ) 1 ( B ) exp ( a N Γ [ N ] ( μ ^ N ( u ) ) ) P N ( d u ) ) 1 a .
Now it may be observed that
T N exp ( a N Γ [ N ] ( μ ^ N ( μ ) ) ) P N ( d u ) = T N exp ( a N ϕ N ( μ ^ N ( μ ) , Ψ 1 , T ( u ) ) + a N Γ [ N ] , 1 ( μ ^ N ( u ) ) ) P N ( d u ) .
Since Γ1 0, it follows from Lemma 10 that
Π N ( B ) R N ( B ) ( 1 1 a ) exp ( N c a ) .
By the exponential tightness of {RN} (as stated in Theorem 2), for each L > 0, there exists a compact set KL such that lim N ¯ N 1 log ( R N ( K L c ) ) L. Thus if we choose compact KΠ such that K Π , L = K a a 1 ( L + c a ), then for all L > 0, lim N ¯ N 1 log ( Π N ( K Π , L c ) ) L. □

4.3. Upper Bound on the Compact Sets

In this section we obtain an upper bound on the compact sets, i.e., the first half of Theorem 1 for F compact. Our method is to obtain an LDP for a simplified Gaussian system (with fixed Av and cv), and then prove that this converges to the required bound as vμ

4.3.1. An LDP for a Gaussian Measure

We linearise Γ in the following manner. Fix ν M 1 , s + ( T ) and assume for the moment that μ2. Let
Γ [ N ] , 2 ν ( μ ) = T 1 , T N ϕ N ( ν , v ) d μ ¯ 1 , T N ( v ) , where
ϕ N : 1 , S + ( T ) × T 1 , T N is define by
ϕ N ( ν , u ) = 1 2 σ 2 ( 1 N j , k = n n ( v j c ν ) A ν , k ( v k + j c ν ) + 2 N j = n n c ν , v j c ν 2 ) .
Remark 7. Note the subtle difference with the definition of ΦN in Equation (43): we use Aν,k instead of A [ N ] ν , k. When turning to spectral representations of Φ N this will bring in the matrixes Ãν, N, l, l = −n…n defined at the beginning of Section 4.3.2.
Let us also define
Γ 1 N ( ν ) = 1 2 N log det ( Id N T + 1 σ 2 K ν , N ) ,
where Kν,N is the NT × NT matrix with T × T blocks noted Kν,N,l. We define
Γ [ N ] ν ( μ ) = Γ 1 N ( ν ) + Γ [ N ] , 2 ν ( μ ) , and
Γ 2 ν ( μ ) = lim N Γ [ N ] , 2 ν ( μ ). We find, using the first identity in Proposition 6, that
Γ 2 ν ( μ ) = 1 2 σ 2 ( 1 2 π π π A ˜ ν ( θ ) : v ˜ μ ( d θ ) 2 c ν A ˜ ν ( 0 ) v ¯ μ + c ν A ˜ ν ( 0 ) c ν + 2 c ν , v ¯ μ c ν 2 ) ,
Where v ¯ μ = E μ ¯ 1 , T [ v 0 ], and v ˜ μ is the spectral measure defined in Equation (37). We recall that : denotes double contraction on the indices.
Similarly to Lemma 6, we find that
lim N Γ 1 N ( ν ) = 1 4 π π π ( log det ( Id T + 1 σ 2 K ˜ ν ( θ ) ) ) d θ = Γ 1 ( ν ) .
For μ2, we define Hν(μ) = I(3)(μ, P) Γν(μ); for μ2, we define Γ 2 ν ( μ ) = Γ ν ( μ ) = and Hν (μ) = .
Definition 6. Let Q ¯ ν 1 , S + ( T ) with N-dimensional marginals Q ¯ ν , N given by
Q ¯ ν , N ( B ) = B exp ( N Γ [ N ] ν ( μ ^ ¯ N ( v ) ) ) P ¯ N ( d v ) ,
Where B ( T N ). This defines a law Q ν 1 , S + ( T ) according to the correspondence in Definition 1.
We have the following lemma.
Lemma 11. Q ¯ 1 , T ν is a stationary Gaussian process of mean cν. Its N-dimensional spatial, T-dimensional temporal, marginals Q ¯ 1 , T ν , N are in 1 , S + ( T 1 , T N ) and have covariance σ2IdNT+Kν,N. The spectral density of Q ¯ 1 , T ν is σ 2 Id T + K ˜ ν ( θ ) and in addition,
Q ¯ 0 ν = μ I .
Proof. In effect we find that
Q ¯ 1 , T ν , N ( B N ) = ( det ( Id N T + 1 σ 2 K ν , N ) ) 1 2 × B N exp 1 2 σ 2 ( j , k = n n t ( v j c ν ) A ν , k ( v k + j c ν ) + 2 j = n n c ν , v j N c ν 2 ) P ¯ 1 , T N ( d v ) ,
for each Borelian B N ( T 1 , T N ). We note cν,N the NT-dimensional vector obtained by concatenating N times the vector cν. We also have that
1 σ 2 ( Id N T A ν , N ) = ( σ 2 Id N T + K ν , N ) 1 .
Thus, through Proposition 2, we find that
Q ¯ 1 , T ν , N ( B N ) = ( 2 π ) N T 2 ( det ( 1 σ 2 ( Id N T A ν , N ) ) 1 ) 1 2 B N exp 1 2 σ 2 ( v c ν , N ) ( Id N T A ν , N ) ( v c ν , N ) j = n n t = 1 T d v t j .
It is seen that Q ¯ 1 , T ν , N is an NT-dimensional Gaussian measure with mean cν,N, inverse covariance matrix 1 σ 2 ( Id N T A ν , N ), and covariance matrix σ2IdNT+Kν,N. Hence Q ¯ 1 , T ν , N is in 1 , S + ( T 1 , T N ), and
Q ¯ ν , N = Q ¯ 1 , T ν , N μ I N
1 , S + ( T N ). It follows also that the spectral density of Q ¯ 1 , T ν is σ 2 Id T + K ˜ ν ( θ ). □
We may thus define the measure Q ¯ ν of a stationary process over the variables { v s j } j , s = 0 , , T, with N-dimensional marginals given by Equations (67) and (68).
Definition 7. Let Π ¯ ν , N be the image law of Q ¯ ν , N under μ ¯ ^ N, i.e., for B ( 1 , S + ( T ) ),
Π ¯ ν , N ( B ) = Q ¯ ν , N ( μ ^ ¯ N B ) .
The point is that it can be shown that the image Π ¯ ν , N of the measure Q ¯ ν , N satisfies a strong LDP (see next lemma) and that this LDP can be transferred to ΠN, see Proposition 12. We begin with the following lemma which is a generalization of the result in [39].
Lemma 12. The image law Π ¯ ν , N satisfies a strong LDP (in the manner of Theorem 1) with good rate function
H ¯ ν ( μ ¯ ) = I ( 3 ) ( μ ¯ , P ¯ ) Γ ν ( μ ) .
Proof. We have found an LDP for a Gaussian process in [42]. Since Qν may be separated in the manner of Equation (68), we may use the expression in Theorem 2 to obtain the result. □
For B N ( T 1 , T N ), we define the image law
Π ν , N ( B ) = Q ν , N ( μ ^ N B ) = Q ¯ ν , N ( μ ¯ ^ N B ) .
It follows from the contraction principle that if we write H ν ( μ ) : = H ¯ ν ( μ ¯ ), then
Corollary 1. The image law Πν,N satisfies a strong LDP with good rate function
H ν ( μ ) = I ( 3 ) ( μ , P ) Γ ν ( μ ) .

4.3.2. An Upper Bound for ΠN over Compact Sets

In this section we derive an upper bound for ΠN over compact sets using the LDP of the previous section. Before we do this, we require two lemmas governing the ‘distance’ between Γν and Γ. Let K ˜ µ , N be the DFT of ( K µ , j ) j = n n, and similarly õ,N is the DFT of ( A µ , j ) j = n n. We define
C N ν = sup M N , ( 2 | l | + 1 ) M { A ˜ [ M ] ν , l A ˜ ν , M , l , K ˜ [ M ] ν , l K ˜ ν , M , l } .
Lemma 13. For all ν 1 , S + ( T ), C N ν is finite and
C N ν 0 a s N .
Proof. We recall from Proposition 4 that K ˜ [ M ] , s t ν ( θ ) converges uniformly (in θ) to K ˜ s t ν ( θ ). The same holds for K ˜ s t ν , M , l, because this represents the partial summation of an absolutely converging Fourier Series. That is, for fixed θ = 2πlM, K ˜ s t ν , M , l M K ˜ s t ν ( θ ) as M → ∞ The result then follows from the equivalence of matrix norms. The proof for Ãν is analogous. □
The second lemma, the proof of which can be found in Appendix E, goes as follows.
Lemma 14. There exists a constant C0 such that for all ν in 1 , S + ( T ), all ε > 0 and all μVε (ν)∩ε2,
Γ [ N ] ( μ ) Γ [ N ] ν ( μ ) | C 0 ( C N ν + ε ) ( 1 + E μ ¯ 1 , r [ ν 0 2 ] ) .
Here Vε(ν) is the open neighbourhood defined in Proposition 4, and µ is given in Definition 1.
We are now ready to begin the proof of the upper bound on compact sets for which we follow the ideas in [4].
Proposition 12. Let K be a compact subset of 1 , S ( T ). Then
lim N ¯ N 1 log ( N ( K ) ) inf K H
Proof. Fix ε > 0. Let Vε(ν) be the open neighbourhood of ν defined in Proposition 4, and let V ¯ ε ( ν ) be its closure. Since K is compact and { V ε ( ν ) } ν K is an open cover, there exists an r and { ν i } i = 1 r such that K i = 1 r V ε ( ν i ). We find that
lim N ¯ N 1 log ( Π N ( i = 1 r V ε ( ν i ) K ) ) sup 1 i r lim N ¯ N 1 log ( Π N ( V ¯ ε ( ν i ) K ) ) .
It follows from the fact that μ ^ N ε 2, Proposition 10 and Lemma 14 that
N ( V ¯ ε ( ν i ) K μ ^ N ( u ) V ¯ ε ( ν i ) K exp ( N Γ ( N ) ν i ( μ ^ N ( u ) ) + N C 0 ( ε + C N ν i ) ( 1 + 1 N j = n n Ψ 1 , T ( u j ) 2 ) ) P 1 , T N ( d u ) .
From the definition of Qν,N in Equation (64) and Hölder’s Inequality, for p, q such that 1 p + 1 q = 1, we have
N ( V ¯ ε ( ν i ) K ) ( Q ν i , N ( μ ^ N ( u ) V ¯ ε ( ν i ) K ) ) 1 p D 1 q ,
where
D = μ ^ N ( u ) V ¯ ε ( ν i ) K exp ( q N C 0 ( ε + C N ν i ) ( 1 + 1 N j = n n Ψ 1 , T ( u j ) 2 ) ) Q 1 , T ν i , N ( d u ) = exp q N C 0 ( ε + C N ν i ) × μ ^ ¯ N ( v ) Ψ ( V ¯ ε ( ν i ) K ) exp ( q N C 0 ( ε + C N ν i ) ( j = n n ( v j ) 2 ) ) Q ¯ 1 , T ν i , N ( d u ) .
We note from Lemma 3 that the eigenvalues of the covariance of Q ¯ 1 , T ν i , N are upperbounded by σ2 + ρK. Thus for this integral to converge it is sufficient that
q C 0 ( ε + C N ν i ) 1 2 ( σ 2 + ρ K ) .
This condition will always be satisfied for sufficiently small ε and sufficiently large N (since C N ν i 0as N → ∞ by Lemma 13). Considering Equation (73), by Corollary 1,
lim N ¯ N 1 log ( Q ν i , N ( μ ^ N ( u ) V ¯ ε ( ν i ) K ) ) inf μ V ¯ ε ( ν i ) K H ν i ( μ ) ,
We next find an upper bound for the integral appearing in the definition of the quantity D. We apply Lemma 19 in Appendix A to find
μ ^ ¯ N ( v ) Ψ ( V ¯ ε ( ν i ) K ) exp q C 0 ( ε + C N ν i ) ( j = n n ( v j ) 2 ) Q ¯ 1 , T ν i , N ( d v ) ( det ( ( 1 2 q C 0 ( ε + C N ν i ) σ 2 ) Id N T 2 q C 0 ( ε + C N ν i ) K ν i , N ) ) 1 2 × exp ( 2 C 0 2 q 2 ( ( ε + C N ν i ) 2 ) ( 1 N T c ν i ) B ( Id T 1 N ) c ν i ) + N q C 0 ( ε + C N ν i ) c ν i 2 )
where IdT ⊗ 1N is the NT × T block matrix with each block IdT and
B = ( σ 2 Id N T + K ν i , N ) ( ( 1 2 C 0 q ( ε + C N ν i ) σ 2 ) Id N T 2 C 0 q ( ε + C N ν i ) K ν i , N ) 1
is a symmetric block circulant matrix. We note Bk, k = −n, ⋯, n its T × T blocks. We have
( ( Id T 1 N ) c ν i ) B ( ( Id T 1 N ) c ν i ) = N c ν i ( k = n n B k ) c ν i = N c ν i B ˜ 0 c ν i ,
where B ˜ 0 is the 0th component of the spectral representation of the sequence (Bk)k=−n,,n. Let vm be the largest eigenvalue of B. Since (by Lemma 20) the eigenvalues of B ˜ 0 are a subset of the eigenvalues of B, we have
( ( Id T 1 N ) c ν i ) B ( ( Id T 1 N ) c ν i ) N v m c ν i 2 .
From the definition of B and through Lemma 3 we have v m σ 2 + ρ K 1 2 C 0 q ( ε + C N ν i ) ( σ 2 + ρ K ). Hence we have, c ν i 2 T J ¯ 2
exp ( 2 C 0 2 ( q 2 ( ε + C N ν i ) 2 ( ( Id T 1 N ) c ν i ) B ( ( Id T 1 N ) c ν i ) ) exp ( N T × 2 C 0 2 q 2 ( ε + C N ν i ) 2 ( σ 2 + ρ K ) J ¯ 2 1 2 C 0 q ( ε + C N ν i ) ( σ 2 + ρ K ) ) .
Since the determinant is the product of the eigenvalues, we similarly find that
( det ( ( 1 2 C 0 q ( ε + C N ν i ) σ 2 ) Id N T 2 C 0 q ( ε + C N ν i ) K ν i , N ) ) 1 2 ( 1 2 C 0 q ( ε + C N ν i ) ( σ 2 + ρ K ) ) N T 2 .
Upon collecting the above inequalities, and noting that c ν 2 T J ¯ 2, we find that
D exp ( N s N ν i ( q , ε ) )
where
s N ν i ( q , ε ) = T ( 1 2 log ( 1 2 C 0 q ( ε + C N ν i ) ( σ 2 + ρ K ) ) + 2 C 0 2 q 2 ( ε + C N ν i ) 2 ( σ 2 + ρ K ) J ¯ 2 1 2 C 0 q ( ε + C N ν i ) ( σ 2 + ρ K ) + q C 0 ( ε + C N ν i ) ( 1 T + J ¯ 2 ) ) .
We let s ( q , ε ) = lim N ¯ s N ν i ( q , ε ), and find through Lemma 13 that
s ( q , ε ) = T ( 1 2 log ( 1 2 C 0 q ( σ 2 + ρ K ) ) + 2 C 0 2 q 2 ( σ 2 + ρ K ) J ¯ 2 1 2 C 0 q ( σ 2 + ρ K ) + q C 0 ε ( 1 T + J ¯ 2 ) ) .
Notice that s(q, ε) is independent of νi and that s(q, ε) → 0 as ε → 0. Using Equations (73), (75) and (76) we thus find that
lim N ¯ N 1 log ( N ( K ) ) sup 1 i r 1 p inf μ K V ¯ ε ( ν i ) H ν i ( μ ) 1 q s ( q , ε ) .
Recall that Hν(μ) = ∞ for all με2. Thus if K ε 2 = , we may infer that lim N ¯ N 1 log ( N ( K ) ) = and the proposition is evident. Thus we may assume without loss of generality that inf µ K H ν i ( μ ) = inf µ K ε 2 H ν i ( μ ). Furthermore it follows from Proposition 13 (below) that there exists a constant CI such that for all µ V ¯ ε ( ν i ε 2,
H ν i ( μ ) I ( 3 ) ( μ , P ) Γ ( μ ) C I ε ( 1 + I ( 3 ) ) ( μ , P ) ) .
We thus find that
lim N ¯ N 1 log ( N ( K ) 1 p inf K ε 2 ( I ( 3 ) ( μ , P ) ( 1 C I ε ) Γ ( μ ) ) s ( q , ε ) q + ε p C I .
We take ε → 0 and find, through the use of Lemma 15, that
lim N ¯ N 1 log ( N ( K ) ) 1 p inf K ( I ( 3 ) ( μ , P ) Γ ( μ ) ) .
The proof may thus be completed by taking p → 1.
Proposition 13. There exists a positive constant CI such that, for all ν in 1 , S + ( T ) ε 2, all ε > 0 and all µ V ¯ ε ( ν ) ε 2 (where V ¯ ε ( ν ) is the neighbourhood defined in Proposition 4),
| Γ ν ( μ ) Γ μ ( μ ) | C I ε ( 1 + I ( 3 ) ( μ , P ) ) .
The proof is very similar to that of Lemma 14 and we leave it to the reader. We end this section with Lemma 15 whose proof can be found in Appendix D
Lemma 15. There exist constants a > 1 and c > 0 such that for all μ 1 , S + ( T ) ε 2,
Γ ( μ ( I ( 3 ) ( μ , P ) + c ) a ) .

4.4. End of the Proof of Theorem 1

Lemma 16. H(μ) is lower-semi-continuous.
The proof is very similar to that in [42]. Because {∏N} is exponentially tight and satisfies the weak LDP with rate function H(μ), the following corollary is immediate (Lemma 2.1.5 in [37]).
Corollary 2. H(μ) is a good rate function, i.e., the sets {μ: H(μ) ≤ δ} are compact for all δ ∈ ℝ+ and it satisfies the first condition of Theorem 1.
This allows us to complete the proof of Theorem 1:
Proof. By combining Lemmas 16 and 9, Proposition 11, and Corollary 2, we complete the proof of Theorem 1.

5. Characterization of the Unique Minimum of the Rate Function

We prove that there exists a unique minimum µe of the rate function. and provide explicit equations for µe which would facilitate its numerical simulation. We start with the following lemma.
Lemma 17. For μ , ν 1 , S + ( T ), Hν(µ) = 0 if and only if µ = Qν.
Proof. This is a straightforward consequence of Theorem 1 in [42] and Theorem 2.
Proposition 14. There is a unique distribution μ e 1 , S + ( T ) which minimizes H. This distribution satisfies H(µe) = 0.
Proof. By the previous lemma, it suffices to prove that there is a unique µe such that
Q μ e = μ e .
We define the mapping L : 1 , S + ( T ) 1 , S + ( T ) by
μ L ( μ ) = Q μ .
It follows from Equation (65) that
Q 0 μ = μ I ,
which is independent of µ.
It may be inferred from the definitions in Section 3.1 that the marginal of L(µ) = Qµ over 0 , t only depends upon the marginal of µ over 0 , t 1 , t 1. This follows from the fact that Q ¯ 1 , t μ (which determines Q 0 , t μ) is completely determined by the means { c s μ ; s = 1 , , t } and covariances { K u v μ , j ; j , u , v [ 1 , t ] }. In turn, it may be observed from Equations (18) and (23) that these variables are determined by µ0,t−1. Thus for any μ , ν 1 , S + ( T ) and t ∈ [1, T], if
μ 0 , t 1 = ν 0 , t 1 ,
Then
L ( μ ) 0 , t = L ( ν ) 0 , t .
It follows from repeated application of the above identity that for any ν satisfying
L T ( ν ) 0 , T = L T ( L ( ν ) ) 0 , T .
Defining
μ e = L T ( ν ) ,
it follows from Equation (80) that µe satisfies Equation (78).
Conversely if µ = L(µ) for some µ, then we have that µ = L2(ν) for any ν such that ν0,T−2 = µ0,T−2. Continuing this reasoning, we find that µ = LT (ν) for any ν such that ν0 = µ0. But by Equation (79), since Qµ = µ, we have μ 0 = μ I . But we have just seen that any µ satisfying µ = LT (ν), where ν 0 = μ I , is uniquely defined by Equation (81), which means that µ = µe.
We may use the proof of Proposition 14 to characterize the unique measure µe such that μ e = Q μ e in terms of its image µ e ¯. This characterization allows one to directly numerically calculate µe. We characterize µ e ¯ recursively (in time), by providing a method of determining μ ¯ e 0 , t in terms of μ ¯ e 0 , t 1. However, we must firstly outline explicitly the bijective correspondence between µe0.t and μ ¯ e 0 , t, as follows. For v T, we write Ψ−1(v) = (Ψ−1(v)0, ⋯, Ψ−1(v)T). We recall from Equation (14) that Ψ−1(v)0 = v0. The coordinate Ψ−1(v)t is the affine function of vs, s = 0 ⋯ t obtained from Equation (14)
Ψ 1 ( v ) t = i = 0 t γ i v t i , t = 0 , , T .
Let K ( t 1 , s 1 ) μ e , l be the (t − 1) × (s − 1) submatrix of K μ e , l composed of the times rows from 1 to (t −1) and the columns from times 1 to (s − 1), and
c ( t 1 ) μ e = ( c 1 μ e , , c t 1 μ e ) .
Let the measures μ e ¯ 0 , t 1 and μ e ¯ t , s ( 0 , l ) be given by
μ e ¯ 0 , t 1 ( d v ) = μ I ( d v 0 ) N t ( c ( t ) μ e , σ 2 Id t + K ( t , t ) μ e , 0 ) d v 1 d t 1 . μ e ¯ ( t , s ) ( 0 , l ) ( d v 0 d v l ) = μ I ( d v 0 0 ) μ I ( d v 0 l ) N t + s ( ( c ( t ) μ e , c ( s ) μ e ) , σ 2 Id t + s + K ( t , s ) μ e , ( 0 , l ) ) d v 1 0 d v t 0 d v 1 l d v s l ,
where
K ( t , s ) μ e , ( 0 , l ) = [ K ( t , t ) μ e , 0 K ( t , s ) μ e , l K ( t , s ) μ e , l K ( s , s ) μ e , 0 ] .
The lemma below is evident from the definitions above.
Lemma 18. For any t ∈ [1, T], the variables { c s μ e , K r s μ e , j : 1 r , s t , j } are necessary and sufficient to completely characterize the measures { μ e ¯ 0 , t 1 , μ e ¯ ( 0 , t ) ( 0 , l ) : l }. In turn, the measures { μ e ¯ 0 , t 1 , μ e ¯ ( 0 , t ) ( 0 , l ) : l } are necessary and sufficient to characterize μ ¯ e 0 , t.
The inductive method for calculating µ e ¯ is outlined in the theorem below.
Theorem 3. We may characterize µ e ¯ inductively as follows. Initially μ e ¯ 0 = μ I . Given that we have a complete characterization of
{ μ e ¯ ( 0 , t 1 ) ( 0 , l ) , μ e ¯ 0 , t 1 1 : l } ,
we may characterize
{ μ e ¯ ( 0 , t ) ( 0 , l ) , μ e ¯ 0 , t 1 : l }
according to the following identities. For s ∈ [1, t],
c s μ e = J ¯ t ( f ( Ψ 1 ( v ) s 1 ) ) μ e ¯ 0 , s 1 1 ( d v ) .
For 1 ≤ r, st, K r s μ e , k = l = Λ ( k , l ) M r s μ , l Here, for p = max(r − 1; s − 1),
M r s μ e , 0 = p + 1 ( f ( Ψ 1 ( v ) r 1 ) ) × ( f ( Ψ 1 ( v ) s 1 ) ) μ e ¯ 0 , p 1 ( d v ) .
and for l 6= 0
M r s μ e , l = r × s ( f ( Ψ 1 ( v 0 ) r 1 ) ) × ( f ( Ψ 1 ( v l ) s 1 ) ) μ e ¯ ( r 1 , s 1 ) ( 0 , l ) ( d v 0 d v l ) .
Of course the measure µe may be determined from µ e ¯ since μ e = µ e ¯ Ψ.

6. Some Important Consequences of Theorem 1

We state some important consequences of our results including some which are valid J almost surely (quenched results). We recall that QN(JN) is the conditional law of N neurons for given JN.
Theorem 4. ΠN converges weakly to δ µ e, i.e.„ for all Φ C b ( 1 , S + ( T ) ),
lim N T N Φ ( μ ^ N ( u ) ) Q N ( d u ) = Φ ( μ e ) .
Similarly,
lim N T N Φ ( μ ^ N ( u ) ) Q N ( J N ) ( d u ) = Φ ( μ e ) J a l m o s t s u r e l y
Proof. The proof of the first result follows directly from the existence of an LDP for the measure ∏N see Theorem 1, and is a straightforward adaptation of Theorem 2.5.1 in [18]. The proof of the second result uses the same method, making use of Theorem 5 below.
We can in fact obtain the following quenched convergence analogue of Equation (16).
Theorem 5. For each closed set F of 1 , S + ( T ) and for almost all J
lim ¯ N 1 N log [ Q N ( J N ) ( μ ^ N F ) ] inf μ F H ( μ ) .
Proof. The proof is a combination of Tchebyshev’s inequality and the Borel-Cantelli lemma and is a straightforward adaptation of Theorem 2.5.4 and Corollary 2.5.6 in [18].
We define Q N ( J N ) = 1 N j = n n Q N ( J N ) S j where we recall the shift operator S defined by Equation (8). Clearly Q N ( J N ) is in 1 , S + ( T N ).
Corollary 3. Fix M and let N > M. For almost every J and all h h C b ( T M ),
lim N T N h ( u ) Q N , M ( J N ) ( d u ) = T N h ( u ) μ e M ( d u ) . lim N T N h ( u ) Q N , M ( d u ) = T N h ( u ) μ e M ( d u ) .
That is, the Mth marginals Q N , M ( J N ) and QN,M converge weakly to μ e M as N → ∞.
Proof. It is sufficient to apply Theorem 4 in the case where Φ in C b ( 1 , S + ( T ) ) is defined by
Φ ( μ ) T N h d μ M
and to use the fact that QN, Q N ( J ) 1 , S + ( T N ) (Lemma 1 and above remark).
We now prove the following ergodic-type theorem. We may represent the ambient probability space by W, where ω W is such that ω = ( J i j , B t j , μ 0 j ), where i, j ∈ ℤ and 0 ≤ tT − 1, recall Equation (1). We denote the probability measure governing ω by P. Let u ( N ) ( ω ) T N be defined by Equation (1). As an aside, we may then understand QN(JN) to be the conditional law of P on u(N)(ω), for given JN.
Theorem 6. Fix M > 0 and let h C b ( T M ). For u ( N ) ( ω ) T N (where N > M) and |j| ≤ n. Then P almost surely,
lim N 1 N j = n n h ( π M ( S j u ( N ) ( ω ) ) ) = T M h ( u ) d μ e M ( u ) ,
where S j u ( N ) ( ω ) is defined in Equation (8). Hence μ ^ N ( u ( N ) ( ω ) ) converges P-almost-surely to µe.
Proof. Our proof is an adaptation of [18]. We may suppose without loss of generality that f T M h ( u ) d μ e M ( u ) = 0. For p > 1 let
F p = { μ 1 , S + ( T ) | | T M h ( u ) μ M ( d u ) | 1 p } .
Since µeFp, but it is the unique zero of H, it follows that inf F p H = m > 0. Thus by Theorem 1 there exists an N0, such that for all N > N0,
Q N ( μ ^ N F p ) exp ( m N ) .
However
P ( ω | μ ^ N ( u ( N ) ( ω ) ) F p ) = Q N ( u | μ ^ N ( u N ( u ) F p ) .
Thus
N = 1 P ( ω | μ ^ N ( u ( N ) ( ω ) ) F p ) < .
We may thus conclude from the Borel-Cantelli Lemma that P almost surely, for every ω W, there exists Np such that for all NNp,
| 1 N j = n n h ( π M S j u ( N ) ( ω ) ) | 1 p .
This yields (82) because p is arbitrary. The convergence of μ ^ N ( u ( N ) ( ω ) ) is a direct consequence of Equation (82), since this means that each of the Mth marginals converge.

7. Possible Extensions

Our results hold true if we assume that Equation (1) is replaced by the more general equation
U t j = k = 1 l γ k U t k j i = n l J j i N f ( U t 1 j ) + θ j + B t 1 j , j = n , , n t = l , , T ,
where l is a positive integer strictly less than T (in practice much smaller) and the θjs are independent and identically distributed random variables independent of the synaptic weights J and the random processes Bj. They can be thought of as external stimuli imposed on the neurons. This equation accounts for a more complicated “intrinsic” dynamics of the neurons, i.e., when they are uncoupled. The parameters γk, k = 1, ⋯, l must satisfy some conditions to ensure stability of the uncoupled dynamics.
This result can be straightforwardly extended to the case when the noise is correlated but stationary Gaussian, that is cov ( B s j B t k ) is some function of s, t and (kj). It can also be easily extended to the case that the initial distribution is correlated but mixing, using the Large Deviation Principle in [43].
The hypothesis that the synaptic weights are Gaussian is somewhat unrealistic from the biological viewpoint. In his PhD thesis [18], Moynot has obtained some preliminary results in the case of uncorrelated weights. We think that this is also a promising avenue.
Moynot again, in his thesis, has extended the uncorrelated weights case, to include two populations with different (Gaussian) statistics for each population. This is also an important practical problem in neuroscience. Extending Moynot’s result to the correlated case is probably a low hanging fruit.
Last but not least, the solutions of the equations for the mean and covariance operator of the measure minimizing the rate function derived in Section 5 and their numerical simulation are very much worth investigating and their predictions confronted to biological measurements.

8. Conclusions

In recent years there has been a lot of effort to mathematically justify neural-field models, through some sort of asymptotic analysis of finite-size neural networks. Many, if not most, of these models assume/prove some sort of thermodynamic limit, whereby if one isolates a particular population of neurons in a localized area of space, they are found to fire increasingly asynchronously as the number in the population asymptotes to infinity [44]. Indeed, this was the result of Moynot and Samuelides. However, our results imply that there are system-wide correlations between the neurons, even in the asymptotic limit. The key reason why we do not have propagation of chaos is that the Radon-Nikodym derivative d Q N d P N of the average laws in Proposition 8 cannot be tensored into N independent and identically distributed processes; whereas the simpler assumptions on the weight function Λ in Moynot and Samuelides allow the Radon-Nikodym derivative to be tensored. A very important implication of our result is that the mean-field behavior is insufficient to characterize the behavior of a population. Our limit process µe is system-wide and ergodic. Our work challenges the assumption held by some that one cannot have a “concise” macroscopic description of a neural network without an assumption of asynchronicity at the local population level.
It would be of interest to compare our LDP with other analyses of the rate of convergence of neural networks to their limits as the size asymptotes to infinity. This includes the system-size expansion of Bressloff [45], the path-integral formulation of Buice and Cowan [46] and the systematic expansion of the moments by (amongst others) [4749].

Appendix

A. Two Useful Lemmas

The following lemma from Gaussian calculus [18,50] which we recall for completeness is used several times throughout the paper.
Lemma 19. Let Z be a Gaussian vector ofp with mean c and covariance matrix K. If aand b ∈ ℝ is such that for all eigenvalues α of K the relation αb > − 1 holds, we have
E [ exp ( t a Z b 2 Z 2 ) ] = 1 det ( Id p + b K ) × exp ( t a c b 2 c 2 + 1 2 t ( a b c ) K ( Id p + b K ) 1 ( a b c ) )
Block-circulant matrices may be diagonalised using DFTs as follows.
Lemma 20. Let B be a symmetric block-circulant matrix with the (j, k) T × T block given by (B(jk) mod N), j, k = −n, ⋯, n. Let W (N) be the N × N Unitary matrix with elements W j k ( N ) 1 N exp ( 2 π i j k N ), j, k = −n, ⋯, n. Then B may be ‘block’-diagonalised in the follow manner (whereis the Kronecker Product and the complex conjugate),
B = ( W ( N ) Id T ) d i a g ( B ˜ n , , B ˜ n ) ( W ( N ) Id T ) * .
Here B ˜ j is a T × T Hermitian matrix and is the DFT defined in Equation (88). We observe also that λ is an eigenvalue of B if and only if λ is an eigenvalue of B ˜ k for some k.

B. Proof of Proposition 4

We first recall Proposition 4:
Proposition 4. Fix ν 1 , S + ( T ). For all ε > 0, there exists an open neighborhood Vε(ν) such that for all µ ∈ Vε(ν), all s, t ∈ [1, T] and all θ ∈ [−π, π[,
| K ˜ s t ν ( θ ) K ˜ s t μ ( θ ) | ε , | A ˜ s t ν ( θ ) A ˜ s t μ ( θ ) | ε , | c s ν c s μ | ε ,
and for all N > 0, and for all k such that |k| ≤ n,
| K ˜ [ N ] , s t ν , k K ˜ [ N ] s t μ , k | ε ,
and
| A ˜ [ N ] , s t ν , k A ˜ [ N ] , s t μ , k | ε .
Proof. Let µ be in 1 , S + ( T ) and θ ∈ [−π, π[. We have
K ˜ s t μ ( θ ) K ˜ s t ν ( θ ) = k = ( K s t μ , k K s t ν , k ) e i k θ .
Using Equation (23) we have
K ˜ s t μ ( θ ) K ˜ s t ν ( θ ) = k , l = Λ ( k , l ) ( M s t μ , l M s t ν , l ) e i k θ ,
hence
| K ˜ s t μ ( θ ) K ˜ s t ν ( θ ) | k , l = | Λ ( k , l ) | inf 2 L T L × T L | f ( u s 1 0 ) f ( u t 1 l ) f ( v s 1 0 ) f ( v t 1 l ) | 2 L ( d u , d v ) ,
where L = 2|l| + 1 and 2 L has marginals µL and νL. Since | f ( u s 1 0 ) f ( u t 1 l ) f ( v s 1 0 ) f ( v t 1 l ) | 2 ( k f d L ( π L u , π L v ) 1 ), where kf is the Lipschitz constant of the function f, we find (through Equation (12)) that
| K ˜ s t μ ( θ ) K ˜ s t ν ( θ ) | 2 D ( μ , ν ) .
Thus for Equation (32) to be satisfied, it suffices for us to stipulate that Vε(ν) is a ball of radius less than ε 2 (with respect to the distance metric in Equation (12)). Similar reasoning dictates that Equation (35) is satisfied too.
However, in light of Lemma 4, it is evident that we may take the radius of Vε(ν) to be sufficiently small that Equations (32), (35) and (36) are satisfied. In fact Equation (33) is also satisfied, as it may be obtained by taking the limit as N → ∞ of Equation (36). Since cµ is determined by the one-dimensional spatial marginal of µ, it follows from the definition of the metric in Equation (12) that we may take the radius of Vε(ν) to be sufficiently small that Equation (34) is satisfied too.

C. Existence of a Lower Bound for ΦN(µ, v)

In order to prove that φN(µ, v) defined in Equation (42) possesses a lower bound, we use a spectral representation and let w ˜ j = v ˜ j for all j, except that w ˜ 0 = v ˜ 0 N c μ. We may then write
ϕ ¯ N ( μ , v ) = 1 2 N 2 σ 2 l = n n w ˜ l , * A ˜ [ N ] μ , l w ˜ l + 1 N σ 2 c μ , w ˜ 0 + 1 2 σ 2 c μ 2 .
Thus in order that the integrand possesses a lower bound, it suffices to prove, since the matrixes A ˜ [ N ] μ , l are Hermitian positive, that there exists a lower bound for
1 N 2 w ˜ 0 A ˜ [ N ] μ , 0 w ˜ 0 + 1 N w ˜ 0 , c μ ,
We have made use of the fact that w ˜ 0 and A ˜ [ N ] μ , 0 are real (since they are each a sum of real variables). Let K ˜ [ N ] μ , 0 = O [ N ] μ D [ N ] μ O [ N ] μ, where D [ N ] μ is diagonal and O [ N ] μ is orthonormal. We define X = O [ N ] μ w ˜ 0, so that Equation (84) is equal to
1 N 2 X D [ N ] μ ( σ 2 Id T + D [ N ] μ ) 1 X + 2 N t = 1 T O [ N ] , t μ , c μ X t ,
where O [ N ] , t μ is the t-th column vector of O [ N ] μ. In order that Equation (85) is bounded below, we require that the coefficient of X converges to zero when D [ N ] μ does. The following lemma is sufficient.
Lemma 21. For each 1 ≤ tT,
c μ , O [ N ] , t μ 2 J ¯ 2 Λ ˜ m i n D [ N ] , t t μ ,
where Λ ˜ m i n is given in Proposition 1.
Proof. If J ¯ = 0 the conclusion is evident, thus we assume throughout this proof that J ¯ 0. Since D [ N ] , t t μ = O ¯ [ N ] , t μ , K ˜ [ N ] μ , 0 O [ N ] , t μ, we find from the definition that
D [ N ] , t t μ = k , m = n n Λ N ( k , m ) O [ N ] , t μ M μ , m O [ N ] , t μ .
We introduce the matrixes (Lµ,k), k ∈ ℤ, where for 1 ≤ s, tT,
L s t μ , k = M s t μ , k c ¯ s μ c ¯ t μ = T N ( f ( u s 1 0 ) c ¯ s 1 μ ) ( f ( u t 1 k ) c ¯ t 1 μ ) μ ( d u )
where c ¯ μ = 1 J ¯ c μ.
These matrices have the same properties as the matrixes Mμk, in particular the discrete Fourier Transform ( L ˜ μ N , l ) l = n , , n is Hermitian positive. Using this spectral representation we write
D [ N ] , t t μ = Λ ˜ N ( 0 , 0 ) c ¯ μ , O [ N ] , t μ 2 + 1 N l = n n Λ ˜ N ( 0 , l ) O [ N ] , t μ L ˜ μ , l O [ N ] , t μ ,
and since Λ ˜ N ( 0 , l ) is positive for all l = −n, ⋯, n and O [ N ] , t μ L ˜ μ N , l O [ N ] , t μ N is positive for all t = 1, ⋯, T, we have
D [ N ] , t t μ Λ ˜ N ( 0 , 0 ) J ¯ 2 c μ , O [ N ] , t μ 2 ,
and the conclusion follows from assumption (5).
We may use the previous lemma to obtain a lower-bound for the quadratic form (85). We recall the easily-proved identity from the calculus of quadratics that, for all x ∈ ℝ,
a x 2 + 2 b x b 2 a .
We therefore find, through Lemma 21, that Equation (85) is greater than or equal to
J ¯ 2 Λ ˜ min ( T σ 2 + t = 1 T D [ N ] , t t μ ) = J ¯ 2 Λ ˜ min ( T σ 2 + Trace ( K ˜ [ N ] μ , 0 ) ) .
We have already noted in the proof of Proposition 5 that Trace ( K ˜ μ , 0 ) T Λ s u m. Thus, pulling these results together, we find that ϕN(µ, v) is greater than −β2, where
β 2 = T J ¯ 2 2 σ 2 Λ ˜ min ( σ 2 + Λ s u m ) .

D. Proof of Lemmas 10 and 15

For technical reasons, we need the following definition which is also used in Appendix E. The motivation is that when we analyze the function ΦN(µ, v) defined in Equation (43) we are led to use spectral representations and to introduce the Fourier transform v ˜ of v. Since v ˜ ( T ) N the correspondence v v ˜ from T 1 , T N to (ℂT)N is not one to one. We need to take into account the symmetries of v ˜, hence the following definition.
Definition 8. For v = ( v j ) j = n n T 1 , T N, we note N ( v ) = v = ( v n , , v n ) T 1 , T N, where v is defined from the Discrete Fourier Transform v ˜ = ( v ˜ n , , v ˜ n ) of v as follows
v ˜ k = j = n n v j exp ( 2 π i j k N ) .
The inverse transform is given by v j = 1 N k = n n v ˜ k exp ( 2 π i j k N ).
Because v is in T 1 , T N the real part of its DFT is even ( Re ( v ˜ k ) = Re ( v ˜ k ) , k = n , , n ) and similarly its imaginary part is odd. As a consequence we define
v k = { v ˜ 0 k = 0 2 Im ( v ˜ k ) k = n , , 1 2 Re ( v ˜ k ) k = 1 , , n
It is easily verified that the mapping vv = N(v) is a bijection from T 1 , T N to itself the inverse being given by
v j = 1 N [ v 0 + 2 Re ( k = 1 n ( v k + i v k ) e 2 π i j k N ) ]
and that
k = n n | | v k | | 2 = k = n n v ˜ k * v ˜ k = N k = n n | | v k | | 2 ,
For a probability measure μ N M 1 + ( T N ), we define μ N = μ 1 , T N ( N ) 1 to be the image law.
We also note μ ¯ N the measure μ ¯ 1 , T N ( N ) 1 (where μ ¯ N is given in Definition 1). We note that
P ¯ N T ( 0 T , N σ 2 Id T ) .
We notice that Γ [ N ] , 2 ( μ ) = T 1 , T N ϕ N ( μ , v ) μ ¯ N ( d v ), where
ϕ N ( m u , v ) = 1 2 N 2 σ 2 l = n n v ˜ l * A ˜ [ N ] μ , l v ˜ l + 1 N σ 2 v ˜ 0 ( Id T Ã [ N ] μ ( 0 ) ) c μ 1 2 σ 2 c μ ( Id T Ã [ N ] μ ( 0 ) ) c μ ,
and v ˜ is implicitly given by Equation (89) as a function of v . We have used Definition 8 and the DFT diagonalisation of Lemma 20. We note that, since à [ N ] μ , l is Hermitian positive, v ˜ l , * à [ N ] μ , l v ˜ l is real and positive. We recall Lemma 10 and give a proof.
Lemma 10. There exist positive constants c > 0 and a > 1 such that, for all N,
T N exp ( a N ϕ N ( μ ^ N ( u ) , Ψ 1 , T ( u ) ) ) P N ( d u ) exp ( N c ) ,
where ϕN is defined in Equation (43).
Proof. We have from Equation (83) that ϕ N ( μ , v ) = ϕ ¯ N ( μ , w ), where w j = v j for all j, except that w 0 = v 0 N c μ. Since (by Equation (90)) the distribution of the variables v under P ¯ N is N T ( 0 T , N σ 2 Id T ) N, the distribution of w under P ¯ N is N T ( N c μ , N σ 2 Id T ) N. By Lemma 7, the eigenvalues of à [ N ] μ , j are upperbounded by 0 < α < 1, for all j. Thus
N ϕ ¯ N ( μ , w ) α 2 N σ 2 l = n n | | w l | | 2 + 1 σ 2 c μ , w 0 + N 2 σ 2 | | c μ | | 2 .
Hence we find that
T N exp ( a N ϕ N ( μ ^ N ( u ) , Ψ ( u ) ) ) P N ( d u ) ( 2 π N σ 2 ) N T × T 1 , T N 1 G 1 exp ( 1 2 N σ 2 [ | j | = 1 n a α | | y j | | 2 | | y j | | 2 ] ) | j | = 1 n t = 1 T d y t j ,
where
G 1 = T 1 , T exp [ 1 2 N σ 2 × [ a α | | y 0 | | 2 + 2 a N c μ ^ N , y 0 + a N 2 | | c μ ^ N | | 2 | | y 0 + N c μ ^ N | | 2 ] ] t = 1 T d y t 0 .
We note the dependency of G 1 on (yj) (for all |j| ≠ n) via c μ ^ N. After diagonalisation, we find that
G 1 = T 1 , T exp [ N | | c μ ^ N | | 2 a ( a 1 ) ( 1 α ) 2 ( 1 a α ) σ 2 ( 1 a α ) 2 N σ 2 s = 1 T ( y s 0 N c s μ ^ N ( a 1 ) 1 a α ) 2 ] s = 1 T d y s 0 .
We assume that a > 1 is such that 1 − >0. To bound this expression, we note the identity that if A : ℝ → ℝ satisfies | A | > 0 and γc > 0, then
exp ( 1 2 γ c ( t A ( t ) ) 2 ) d t 2 + 2 π γ c .
Since | c s μ ^ N | | J ¯ |, s = 1, ⋯, T, and hence | | c μ ^ N | | 2 T J ¯ 2, we therefore find that G 1 G 1 c, where
G 1 c = exp [ N T J ¯ 2 a ( a 1 ) ( 1 α ) 2 σ 2 ( 1 a α ) ] ( 2 N | J ¯ | ( a 1 ) 1 a α + 2 π N σ 2 1 a α ) T .
Thus
T N exp ( a N ϕ N ( μ ^ N ( u ) , Ψ 1 , T ( u ) ) ) P N ( d u ) G 1 c ( 1 a α ) T ( N 1 ) 2 ( 2 π N σ 2 ) T 2 ,
which yields the lemma.
We include the proof of Lemma 15 which is used in the proof of the upper bound on compact sets in Section 4.3.2.
Lemma 15. There exist constants a > 1 and c > 0 such that for all μ 1 , S + ( T ) ε 2,
Γ ( μ ) ( I ( 3 ) ( μ , P ) + c ) a .
Proof. a > 1 is chosen as in the proof of Lemma 10. We have (from Equation (50) that
I ( 3 ) ( μ , P ) = lim N N 1 I ( 2 ) ( μ N , P N ) .
We recall that I(2) may be expressed using the variational expression (49) as
I ( 2 ) ( μ N , P N ) = sup ς N C b ( T N ) ( T N ς N ( u ) μ N ( d u ) log T N exp ( ς N ( u ) ) P N ( d u ) ) ,
where ςN is a continuous, bounded function on T N. We let ς M N = a 1 B M ς * N, where ς * N ( u ) = N ( ϕ N ( μ N , Ψ 1 , T ( u ) ) + Γ [ N ] , 1 ( μ ) ), and uBM only if either ||Ψ(u)|| ≤ NM or (ϕN(µN, Ψ1,T (u)) + Γ[N],1(µ)) ≤ 0. We proved in Section 3.3.2 that ϕN (µN, Ψ1,T (u)) possesses a lower bound, which means that ς M N is continuous and bounded. Furthermore ς M N grows to ς * N, so that after substituting ς M N into Equation (93) and taking M → ∞ (i.e., applying the dominated convergence theorem), we obtain
a T N ς * N ( u ) μ N ( d u ) log T N exp ( a ς * N ( u ) ) P N ( d u ) + I ( 2 ) ( μ N , P N ) .
It can be easily shown, similarly to Lemma 10, that log T N exp ( a ς * N ( u ) ) P N ( d u ) N c. We may thus divide both sides by aN and let N → ∞ to obtain the required result.

E. Proof of Lemma 14

We prove Lemma 14.
Lemma 14. There exists a constant C0 such that for all ν in 1 , S + ( T ), all ε > 0 and all µVε(ν)ε2,
| Γ [ N ] ( μ ) Γ [ N ] v ( μ ) | C 0 ( C N v + ε ) ( 1 + E μ 1 , T [ | | v 0 | | 2 ] ) .
Here Vε(ν) is the open neighborhood defined in Proposition 4, and µ is given in Definition 1.
Proof. We firstly bound Γ1. From Equations (60) and (61) and Lemma 20 we have
| Γ [ N ] , 1 ( μ ) Γ [ N ] v ( μ ) | 1 2 N l = n n | log det ( Id T + σ 2 K ˜ [ N ] μ , l ) log det ( Id T + σ 2 K ˜ [ N ] v , l ) | + 1 2 N l = n n | log det ( Id T + σ 2 K ˜ [ N ] v , l ) log det ( Id T + σ 2 K ˜ v , N , l ) | .
It thus follows from Proposition 4 and Lemma 13 that
| Γ [ N ] , 1 ( μ ) Γ [ N ] , 1 v ( μ ) | C 0 * ( C N v + ε ) ,
for some constant C 0 * which is independent of ν and N.
We define ϕ , N ( ν , v ) = ϕ N ( ν , ( N ) 1 ( v ) ), where N is given in Definition 8 and ϕ N is given in Equation (59), and find that
ϕ , N ( μ , v ) = 1 2 N 2 σ 2 l = n n v ˜ l , * Ã v , l v ˜ l + 1 N σ 2 v ˜ 0 ( Id T Ã v ( 0 ) ) c v + 1 2 σ 2 c v ( Id T Ã v ( 0 ) ) c v .
This means that
Γ [ N ] , 2 v ( μ ) Γ [ N ] , 2 ( μ ) = T 1 , T N ϕ , N ( ν , v ) ϕ N ( μ N , v ) μ ¯ N ( d v ) .
Upon expansion of the above expression, we find that
| ϕ , N ( ν , v ) ϕ N ( μ N , v ) | 1 2 σ 2 ( 1 N 2 l = n n | | à [ N ] μ , l à ν , N , l | | | | v ˜ l | | 2 + 2 N | | d ν , μ | | | | v ˜ 0 | | + | e ν , μ | ) ,
where d ν , μ = c μ Ã ν , N , 0 c ν Ã [ N ] μ , 0 c μ and e ν , μ = c μ Ã [ N ] μ , 0 c μ | | c μ | | 2 c ν Ã ν , N , 0 c ν + | | c ν | | 2. It follows from Proposition 5 and Lemma 13 that the (Euclidean) norm each of the above terms is bounded by C * ( C N ν + ε ) for some constant C.
The lemma now follows after consideration of the fact that T T | | v k | | 2 μ ¯ 1. T ( d v ) = E μ ¯ 1 , T [ | | v 0 | | 2 ], | | v ˜ 0 | | 2 N k = n n | | v k | | 2 (Cauchy-Schwarz) and, because of the properties of the DFT, l = n n | | v l | | 2 = N k = n n | | v ˜ k | | 2. □

Acknowledgments

Many thanks to Bruno Cessac whose suggestion to look at process-level empirical measures and entropies has been very useful and whose insights into the physical interpretations of our results have been very stimulating.
This work was partially supported by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 269921 (BrainScaleS), no. 318723 (Mathemacs), and by the ERC advanced grant NerVi no. 227747.
This work was supported by INRIA FRM, ERC-NERVI number 227747, European Union Project # FP7-269921 (BrainScales), and Mathemacs # FP7-ICT-2011.9.7.

Author Contributions

Both authors contributed to all parts of the article. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References and Notes

  1. Guionnet, A. Dynamique de Langevin d’un verre de spins. Ph.D. Thesis, Université de Paris Sud, Orsay, France, March 1995. [Google Scholar]
  2. Ben-Arous, G.; Guionnet, A. Large deviations for Langevin spin glass dynamics. Probab. Theory Relat. Fields. 1995, 102, 455–509. [Google Scholar]
  3. Ben-Arous, G.; Guionnet, A. Symmetric Langevin Spin Glass Dynamics. Ann. Probab. 1997, 25, 1367–1422. [Google Scholar]
  4. Guionnet, A. Averaged and quenched propagation of chaos for spin glass dynamics. Probab. Theory Relat. Fields. 1997, 109, 183–215. [Google Scholar]
  5. Sompolinsky, H.; Zippelius, A. Dynamic theory of the spin-glass phase. Phys. Rev. Lett. 1981, 47, 359–362. [Google Scholar]
  6. Sompolinsky, H.; Zippelius, A. Relaxational dynamics of the Edwards-Anderson model and the mean-field theory of spin-glasses. Phys. Rev. B 1982, 25, 6860–6875. [Google Scholar]
  7. Crisanti, A.; Sompolinsky, H. Dynamics of spin systems with randomly asymmetric bonds: Langevin dynamics and a spherical model. Phys. Rev. A 1987, 36, 4922–4939. [Google Scholar]
  8. Crisanti, A.; Sompolinsky, H. Dynamics of spin systems with randomly asymmetric bounds: Ising spins and Glauber dynamics. Phys. Rev. A 1987, 37, 4865–4874. [Google Scholar]
  9. Dawson, D.; Gartner, J. Large deviations from the mckean-vlasov limit for weakly interacting diffusions. Stochastics 1987, 20, 247–308. [Google Scholar]
  10. Dawson, D.; Gartner, J. Multilevel large deviations and interacting diffusions. Probab. Theory Relat. Fields. 1994, 98, 423–487. [Google Scholar]
  11. Budhiraja, A.; Dupuis, P.; Markus, F. Large deviation properties of weakly interacting processes via weak convergence methods. Ann. Probab. 2012, 40, 74–102. [Google Scholar]
  12. Fischer, M. On the form of the large deviation rate function for the empirical measures of weakly interacting systems. Bernoulli 2014, 20, 1765–1801. [Google Scholar]
  13. Sompolinsky, H.; Crisanti, A.; Sommers, H. Chaos in Random Neural Networks. Phys. Rev. Lett. 1988, 61, 259–262. [Google Scholar]
  14. Gerstner, W.; Kistler, W. Spiking Neuron Models; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
  15. Izhikevich, E. Dynamical Systems in Neuroscience: The Geometry of Excitability and Bursting; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
  16. Ermentrout, G.B.; Terman, D. Foundations of Mathematical Neuroscience; Interdisciplinary Applied Mathematics; Springer: New York, NY, USA, 2010. [Google Scholar]
  17. Cessac, B. Increase in complexity in random neural networks. J. Phys. I 1995, 5, 409–432. [Google Scholar]
  18. Moynot, O. Etude mathématique de la dynamique des réseaux neuronaux aléatoires récurrents. Ph.D. Thesis, Université Paul Sabatier, Toulouse, France, January 2000. [Google Scholar]
  19. Moynot, O.; Samuelides, M. Large deviations and mean-field theory for asymmetric random recurrent neural networks. Probab. Theory Relat. Fields. 2002, 123, 41–75. [Google Scholar]
  20. Cessac, B.; Samuelides, M. From neuron to neural networks dynamics. Eur. Phys. J. Spec. Top. 2007, 142, 7–88. [Google Scholar]
  21. Samuelides, M.; Cessac, B. Random Recurrent Neural Networks. Eur. Phys. J. Spec. Top. 2007, 142, 7–88. [Google Scholar]
  22. Kandel, E.; Schwartz, J.; Jessel, T. Principles of Neural Science, 4th ed; McGraw-Hill: New York, NY, USA, 2000. [Google Scholar]
  23. Dayan, P.; Abbott, L. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
  24. Rudin, W. Real and Complex Analysis; McGraw-Hill: New York, NY, USA, 1987. [Google Scholar]
  25. Cugliandolo, L.F.; Kurchan, J.; Le Doussal, P.; Peliti, L. Glassy behaviour in disordered systems with nonrelaxational dynamics. Phys. Rev. Lett. 1997, 78, 350–353. [Google Scholar]
  26. Lapicque, L. Recherches quantitatifs sur l’excitation des nerfs traitee comme une polarisation. J. Physiol. Paris. 1907, 9, 620–635. [Google Scholar]
  27. Daley, D.; Vere-Jones, D. An Introduction to the Theory of Point Processes: General Theory and Structure; Springer: New York, NY, USA, 2007; Volume 2. [Google Scholar]
  28. Gerstner, W.; van Hemmen, J. Coherence and incoherence in a globally coupled ensemble of pulse-emitting units. Phys. Rev. Lett. 1993, 71, 312–315. [Google Scholar]
  29. Gerstner, W. Time structure of the activity in neural network models. Phys. Rev. E 1995, 51, 738–758. [Google Scholar]
  30. Cáceres, M.J.; Carillo, J.A.; Perhame, B. Analysis of nonlinear noisy integrate and fire neuron models: Blow-up and steady states. J. Math. Neurosci. 2011, 1. [Google Scholar] [CrossRef]
  31. Baladron, J.; Fasoli, D.; Faugeras, O.; Touboul, J. Mean-field description and propagation of chaos in networks of Hodgkin-Huxley and FitzHugh-Nagumo neurons. J. Math. Neurosci. 2012, 2. [Google Scholar] [CrossRef] [Green Version]
  32. Bogachev, V. Measure Theory, 1 ed; Volume 1 in Measure Theory; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  33. When N is even the formulae are slightly more complicated but all the results we prove below in the case N odd are still valid.
  34. We note N p ( m , Σ ) the law of the p-dimensional Gaussian variable with mean m and covariance matrix Σ.
  35. Ellis, R. Entropy, Large Deviations and Statistical Mechanics; Springer: New York, NY, USA, 1985. [Google Scholar]
  36. Liggett, T.M. Interacting Particle Systems; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  37. Deuschel, J.D.; Stroock, D.W. Large Deviations; Pure and Applied Mathematics; Academic Press: Waltham, MA, USA, 1989; Volume 137. [Google Scholar]
  38. Dembo, A.; Zeitouni, O. Large Deviations Techniques, 2nd ed; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
  39. Donsker, M.; Varadhan, S. Large deviations for stationary Gaussian processes. Commun. Math. Phys. 1985, 97, 187–210. [Google Scholar]
  40. Baxter, J.R.; Jain, N.C. An Approximation Condition for Large Deviations and Some Applications. In Convergence in Ergodic Theory and Probability; Bergulson, V., Ed.; De Gruyter: Boston, MA, USA, 1993. [Google Scholar]
  41. Donsker, M.; Varadhan, S. Asymptotic Evaluation of Certain Markov Process Expectations for Large Time, IV. Commun. Pure Appl. Math. 1983, XXXVI, 183–212. [Google Scholar]
  42. Faugeras, O.; MacLaurin, J. A Large Deviation Principle and an Analytical Expression of the Rate Function for a Discrete Stationary Gaussian Process 2013, arXiv, 1311.4400.
  43. Chiyonobu, T.; Kusuoka, S. The Large Deviation Principle for Hypermixing Processes. Probab. Theory Relat. Fields. 1988, 78, 627–649. [Google Scholar]
  44. We noted in the introduction that this is termed propagation of chaos by some.
  45. Bressloff, P. Stochastic neural field theory and the system-size expansion. SIAM J. Appl. Math. 2009, 70, 1488–1521. [Google Scholar]
  46. Buice, M.; Cowan, J. Field-theoretic approach to fluctuation effects in neural networks. Phys. Rev. E 2007, 75. [Google Scholar] [CrossRef]
  47. Ginzburg, I.; Sompolinsky, H. Theory of correlations in stochastic neural networks. Phys. Rev. E 1994, 50, 3171–3191. [Google Scholar]
  48. ElBoustani, S.; Destexhe, A. A master equation formalism for macroscopic modeling of asynchronous irregular activity states. Neural Comput. 2009, 21, 46–100. [Google Scholar]
  49. Buice, M.; Cowan, J.; Chow, C. Systematic fluctuation expansion for neural network activity equations. Neural Comput. 2010, 22, 377–426. [Google Scholar]
  50. Neveu, J. Processus aléatoires gaussiens; Presses de l’Université de Montréal: Montréal, QC, Canada, 1968; Volume 34. [Google Scholar]

Share and Cite

MDPI and ACS Style

Faugeras, O.; MacLaurin, J. Asymptotic Description of Neural Networks with Correlated Synaptic Weights. Entropy 2015, 17, 4701-4743. https://0-doi-org.brum.beds.ac.uk/10.3390/e17074701

AMA Style

Faugeras O, MacLaurin J. Asymptotic Description of Neural Networks with Correlated Synaptic Weights. Entropy. 2015; 17(7):4701-4743. https://0-doi-org.brum.beds.ac.uk/10.3390/e17074701

Chicago/Turabian Style

Faugeras, Olivier, and James MacLaurin. 2015. "Asymptotic Description of Neural Networks with Correlated Synaptic Weights" Entropy 17, no. 7: 4701-4743. https://0-doi-org.brum.beds.ac.uk/10.3390/e17074701

Article Metrics

Back to TopTop