Next-Hop Relay Selection for Ad Hoc Network-Assisted Train-to-Train Communications in the CBTC System

Ma, Sixing; Li, Meng; Yang, Ruizhe; Sun, Yang; Wang, Zhuwei; Si, Pengbo

doi:10.3390/s23135883

Open AccessArticle

Next-Hop Relay Selection for Ad Hoc Network-Assisted Train-to-Train Communications in the CBTC System

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(13), 5883; https://0-doi-org.brum.beds.ac.uk/10.3390/s23135883

Submission received: 15 May 2023 / Revised: 21 June 2023 / Accepted: 23 June 2023 / Published: 25 June 2023

(This article belongs to the Special Issue Advanced Communication and Networking Technologies for Vehicular Ad Hoc Networks (VANETs))

Download

Browse Figures

Versions Notes

Abstract

:

In the communication-based train control (CBTC) system, traditional modes such as LTE or WLAN in train-to-train (T2T) communication face the problem of a complex and costly deployment of base stations and ground core networks. Therefore, the multi-hop ad hoc network, which has the characteristics of being relatively flexible and cheap, is considered for CBTC. However, because of the high mobility of the train, it is likely to move out of the communication range of wayside nodes. Moreover, some wayside nodes are heavily congested, resulting in long packet queuing delays that cannot meet the transmission requirements. To solve these problems, in this paper, we investigate the next-hop relay selection problem in multi-hop ad hoc networks to minimize transmission time, enhance the network throughput, and ensure the channel quality. In addition, we propose a multiagent dueling deep Q learning (DQN) algorithm to optimize the delay and throughput of the entire link by selecting the next-hop relay node. The simulation results show that, compared with the existing routing algorithms, it has obvious improvement in the aspects of delay, throughput, and packet loss rate.

Keywords:

communication-based train control (CBTC); train-to-train(T2T) communication; relay selection; multiagent dueling DQN

1. Introduction

Recently, with the rapid development of urbanization, urban rail transit has become one of the main transportations. With the development of technology, the communication-based train control (CBTC) system plays an important role in urban rail transit to guarantee the safe operation of rail trains [1]. To ensure their safety and reliability, CBTC systems have strict requirements on transmission delay and channel quality [2]. Long communication delay and link interruptions may lead to emergency brakes or collisions [3]. Therefore, it is crucial to design a CBTC communication system with low latency and high channel quality.

In traditional CBTC systems, long-term evolution for metro (LTE-M) and wireless local area networks (WLANs) are more widely used in train-to-wayside communication [4]. The train information is first transmitted to the ground-zone controller (G-ZC), which is used to generate control commands for all trains in its management area [5]. After obtaining the commands, the wayside node sends the commands back to trains. However, due to the huge computational burden of the G-ZC and non-direct transmission link [6], the transmission delay of important control commands is excessively large. Therefore, the T2T direct transmission approach was proposed [4], while the G-ZC was also changed to an onboard-zone controller (On-ZC). Unlike the G-ZC, the On-ZC only needs to generate its own commands, which greatly reduces computation latency.

Although the direct T2T transmission greatly reduces latency [7], if we continue to use WLAN or LTE, interruptions and delays caused by hard handoff at the base station boundary are still unavoidable. In addition, the deployment of the terrestrial core network and base station are complex, which makes network construction and maintenance cost high. Therefore, new technologies such as reconfigurable intelligent surfaces [8,9] and wireless ad hoc networks have been proposed to improve T2T communication.

The wireless ad hoc network is a novel approach to improve the performance of CBTC systems. In the wireless ad hoc network, the packages are sent to the wayside node from the on-board node. Similar to the vehicular ad hoc network (VANET), the role of wayside nodes is to assist packet transmission [10]. Therefore, the packages are transmitted through the transmission network formed by the wayside nodes hop-by-hop and finally to the running train, so that the transmission link is more stable. Furthermore, the deployment of wireless ad hoc network nodes is less costly, and it does not have a fixed topology or require any fixed infrastructure to communicate, allowing it to be deployed more flexibly and configured quickly.

The relay selection strategy plays an important role in wireless ad hoc networks to reduce transmission delay, improve throughput, and decrease packet loss. However, since wireless ad hoc networks are rarely used in CBTC systems, no suitable routing strategy has been proposed. Therefore, the existing strategies from VANETs and mobile ad hoc networks (MANETs) should be considered and improved to adapt to the CBTC system.

In VANET and MANET relay selection strategies, the routing algorithms are generally divided into two types: proactive routing and reactive routing [11]. Traditional proactive routing approaches, such as optimized link-state routing (OLSR) [12] and destination sequenced distance vector (DSDV) [13], require significant overhead for node exploration and maintenance of routing tables, which is not feasible for CBTC systems with strict requirements for low latency. At the same time, although the traditional reactive routing greedy perimeter stateless routing (GPSR) can directly select the next-hop relay, it cannot fully consider various factors such as node congestion and channel quality. In the CBTC system, the transmission channel can be affected by the high mobility of the train, causing shadow fading and multipath fading, which leads to a sudden change in transmission link. Therefore, the relay selection strategy must be able to make decisions according to the varying channel state.

On the basis of past training experience, the learning-based routing algorithm can make real-time decisions well by observing current channel state. Meanwhile, in deep reinforcement learning (DRL), the optimization problem of multiple factors can be transformed to maximize cumulative rewards [14]. We can combine diverse factors to design rewards, in order to achieve optimization of these indicators. Therefore, learning-based routing algorithms are more suitable for CBTC networks. Existing DRL-based algorithms are often used in relay selection for VANET and MANET [15,16]; these algorithms consider more focus on single-hop delay, outage probability, and power consumption as the criteria for next-hop selection. However, they ignore the whole-link delay and throughput.

In this paper, our objective is to design a low-latency and high-throughput routing method in an ad hoc network for a CBTC system. However, packet transmission still faces challenges in multi-hop relay selection. For example, due to the high-speed train, the transmission distance between the train and wayside node is limited. In order to decrease the outage probability, we set strict distance limitations when transmitting with trains. In addition, we comprehensively consider the transmission delay, queuing delay, and channel quality, aiming to optimize the overall performance of the link by selecting the next-hop node. Moreover, the process of selecting the next-hop node can be formulated as a Markov decision process (MDP) [17,18]. We propose a multi-agent DRL method to solve the problem. The main contributions of this paper are summarized as follows:

We formulate the next-hop relay selection problem in a CBTC system. The goal is to select relay nodes with low transmission delay and high throughput in both the train and the wayside node communication range. Meanwhile, in order to balance the single-hop transmission delay and the whole-link hop count, we propose the concept of “hop tradeoff” to minimize the entire link latency.
To handle the time-varying channel state and node congestion, we propose a DRL algorithm to optimize the long-term system reward. Using a multiagent approach [14], all nodes are trained centrally with dueling DQN [19], and then each node makes the next-hop decision individually, in order to avoid nodes with a long queuing delay and poor channel quality.
Lastly, we conduct simulations with a different number of nodes between two trains and different buffer sizes. Meanwhile, the proposed algorithm is compared with several existing algorithms in terms of whole-link delay, packet loss rate, and throughput. The simulation results indicate that the proposed scheme works well against congested networks. In particular, it also significantly superior to other routing algorithms in the aspects of whole-link delay, throughput, and packet loss rate.

The remainder of this paper is organized as follows: in Section 2, some related work about routing selection in ad hoc networks is introduced; in Section 3, we present a multi-hop relay selection model for ad hoc networks in CBTC systems; the joint optimization problem of channel throughput and total-link delay is formulated in Section 4; then, we introduce the multiagent deep reinforcement method to solve the formulated problem in Section 5; some simulation results and analyses are presented in Section 6; in Section 7, we conclude the paper and propose some future work.

2. Related Work

2.1. Traditional Communication Method in CBTC System

In the traditional CBTC system, WLAN is widely used in the communication between trains and the wayside base station. Zhu et al. [20] proposed a WLAN-based redundant connect scheme for train–ground communication. The train connects the backup link and active link simultaneously to deal with the interruption at the coverage boundaries of the two access nodes. However, many WLAN standards based on IEEE802.11 are not suitable for high-speed mobile environments [5]. Meanwhile, WLAN is in the open frequency band, which can easily be interfered with by other devices [21]. LTE has strong anti-interference ability and is more stable in switching between access nodes; thus, LTE-based approaches have been proposed in the CBTC system. In [6], a sensing-based semi-persistent scheduling method for LTE-based T2T communication was proposed, which greatly improved the transmission delay of system safety information. However, both LTE and WLAN have the problem of packet loss and delay due to the switch between the access nodes, in addition to the high cost of the base stations and ground core network. Since wireless ad hoc networks do not require a fixed infrastructure, their deployment is more flexible and cheaper. Therefore, ad hoc networks are also a better choice for T2T communication.

2.2. Traditional Ad Hoc Network Route Selection

In ad hoc network application, the packet routing is critical for optimizing transmission delay, throughput, and packet loss. There are two types of routing methods commonly used in ad hoc networks: proactive routing and reactive routing. In proactive routing, the OLSR protocol [12] is to store the information of each relay node into the routing table by sending HELLO packets in advance, and then select the shortest path from routing table. The DSDV protocol [13] uses the Bellman–Ford algorithm to select relay nodes in the routing table. Although these approaches allow the optimal route to be selected, they require a large amount of information to be exchanged among all nodes. Especially when the nodes are dynamic, this leads to a rapid increase in the amount of information exchanged. Nevertheless, in the CBTC system, the trains move rapidly, and transmission latency is strict; thus, it is more suitable to use reactive routing.

In reactive routing, nodes cannot know global information of the whole network and can only make decisions for selecting the next-hop node. In [22], the GPSR protocol was derived, and the node which is closest to the destination within the communication range was selected as next-hop node. This minimizes the number of hops in the entire link, thus reducing latency. In order to solve the high outage probability caused by the high-speed movement of vehicles, the authors of [23] proposed a hybrid relay node selection strategy. The relay with the best channel quality and the relay closest to the destination were simultaneously selected as the next-hop node. Although this method improved the transmission successful rate, the overhead of transmitting both packets simultaneously was huge.

2.3. The Relay Selection of Ad Hoc Network Enabled by DRL

The deep learning-based relay selection method can make a better choice for next-hop relay where transmission rate and node congestion change in real time, and it can also synthesize multiple performance factors to make comprehensive decisions. Therefore, the routing method based on learning-based algorithms performs better in mobile communication network.

Learning-based routing methods are widely used in MANETs and VANETs [24]. As the mobility of nodes may lead to constant variations in the network topology [25], the learning-based approach can make decisions according to real-time changes and is, thus, more suitable for both scenarios. In [26], Wang et al. proposed a multiagent method, wherein all agents share the same training experience based on DQN, but make the next-hop decision individually. The aim of this method is to select the optimal route in MANET, which minimizes transmission delay and queuing delay. In order to minimize delay and make a reasonable power allocation in a vehicular-to-vehicular (V2V) communication network, Zhang et al. [27] proposed a deep reinforcement learning method to choose the optimal relay according to the velocity, location, and packet number of each vehicle. In [16], He et al. proposed a Q-learning algorithm to find optimal UAV relays to assist V2V communication. Their proposed method improved the delivery ratio and delivery latency with comprehensive consideration of the state transition probability of communication interruption, delay consumption, and energy consumption.

However, these methods only consider the delay of a single hop and ignore the total delay of the whole link. In some cases, if we only consider the shortest delay of a single hop, it is likely to choose the closest next-hop relay. This causes the hop number of the whole link to increase, along with the total link latency. Meanwhile, most methods use an infinitely long cache during the computation of system throughput. However, in real life, the buffer size is limited. Therefore, the impact of packet loss on throughput must be considered.

3. System Model

As shown in Figure 1, we consider a T2T communication over a multi-hop wireless ad hoc network. In this scenario, since the coverage of one hop is very limited, the train needs the assistance of wayside relays for multi-hop transmission. There are multiple relays within the communication range of each train and wayside node; hence, they need to select the most suitable next-hop wayside relays among these candidate nodes. For example,

R_{2}

may communicate with

R_{3}

,

R_{4}

, and

R_{5}

, but

R_{3}

is chosen as the next hop node by considering factors such as channel quality and delay.

Therefore, for

N

trains running on the rail, denoted as

T = {T_{1}, T_{2}, \dots, T_{n}, \dots T_{N}},

there are

M

wayside relays distributed beside the rail, denoted as

R = {R_{1}, R_{2}, \dots, R_{m}, \dots R_{M}} .

The train has high mobility; in order to ensure the quality of the T2W transmission, we assume that there are two orthogonal frequency bands available: band 1 for train-to-wayside (T2W) transmission [28] and band 2 for wireless wayside-to-wayside (W2W) transmission. Since the transmission is on two orthogonal channels, there is no interference between T2W and W2W transmission, while multiple W2W transmissions at the same time will cause interference.

In multi-hop transmission, all relays follow the decode-and-forward (DF) principle. Furthermore, we assume that the whole system is stationary in the time slot

t

, and that the transmit power of the nodes does not change. All channels follow quasi-static Rayleigh fading, such that the channel gain between node

a

and node

b

can be represented as follows:

h_{a, b} = X_{a, b} d_{a, b}^{- β / 2},

(1)

where

h_{a, b}

is the instantaneous channel gain of link

a \to b

,

X_{a, b}

is the fading coefficient, and

d_{a, b}

and

β

indicate the distance between two nodes and the path-loss exponent [29].

3.1. Communication Model

3.1.1. Train-to-Wayside (T2W) Link

The transmission of T2W link is on an independent channel; thus, there is no other link interference, and the signal-to-noise ratio (SNR) of the T2W transmission is

γ_{T_{n}, R_{m}} = \frac{p_{T_{n}} {| h_{T_{n}, R_{m}} |}^{2}}{N_{0}},

(2)

where

p_{T_{n}}

is the transmission power of the train

T_{n}

,

h_{T_{n}, R_{m}}

is the channel gain between the train

T_{n}

and the wayside relay

R_{m}

, and

N_{0}

is the noise power. Hence, the channel throughput [15] between train

T_{n}

and the wayside relay

R_{m}

is

C_{T_{n}, R_{m}} = B \log_{2} (1 + γ_{T_{n}, R_{m}}) .

(3)

As for the final hop, wayside node

R_{m}

transmits to destination train

{T^{'}}_{n}

, which is also calculated in the same way as above, except that the transmission direction is different. The throughput between

R_{m}

and destination train

{T^{'}}_{n}

can be expressed as

C_{R_{m}, {T^{'}}_{n}} = B \log_{2} (1 + γ_{R_{m}, {T^{'}}_{n}}) .

(4)

3.1.2. Wayside-to-Wayside (W2W) Link

At time slot

t

, it is possible that more than one wireless W2W link transmits information simultaneously; thus, there is interference between W2W links [16]. During the packet transmission, the one-hop link from wayside relay

i

to wayside relay

j

at time slot

t

is denoted as

l_{i, j} (t)

, where

i, j \in {R_{1}, R_{2}, \dots, R_{m}, \dots R_{M}}

. Moreover,

ρ_{i, j} (t)

represents the

i \to j

link transmission status.

ρ_{i, j} (t) = 1

denotes that wayside node

i

is transmitting with node

j

. Therefore, the SNR for the transmission of wayside node

i

and wayside node

j

can be represented as

γ_{i, j} = \frac{P_{i} {| h_{i, j} |}^{2}}{N_{0} + \sum_{l_{i, j} (t) \neq l_{i^{'}, j^{'}} (t)} ρ_{i^{'}, j^{'}} P_{i^{'}} {| h_{i^{'}, j^{'}} |}^{2}},

(5)

where

l_{i^{'}, j^{'}} (t)

is the interference link during the time slot

t

.

The transmission throughput between wayside relay

i

and wayside relay

j

is

C_{i, j} = B \log_{2} (1 + γ_{i, j}) .

(6)

3.1.3. Outage Analysis

In wireless networks, outage events occur when the actual mutual information is less than the required data rate [30]. To ensure the reliability of information transmission, the SNR of the channel must be greater than the SNR threshold value

γ_{t h}

to transmit. At time slot

t

, the transmission condition for train

T_{n}

with wayside node

R_{m}

is

γ_{T_{n}, R_{m}} > γ_{t h}

, and the transmission condition for wayside node

i

and wayside node

j

is

γ_{i, j} > γ_{t h}

. In particular, the maximum transmission distance

R_{\max}

between train and wayside relay while the train

T_{n}

is moving can be calculated as

\frac{p_{T_{n}} {| X_{T_{n}, R_{m}} d_{T_{n}, R_{m}}^{- β / 2} |}^{2}}{N_{0}} \geq γ_{t h} .

(7)

We can obtain the maximum distance

R_{\max}

as

R_{\max} = d_{T_{n}, R_{m}} = {(\frac{N_{0}}{p_{T_{n}} {X_{T_{n}, R_{m}}}^{2}} γ_{t h})}^{- \frac{1}{β}} .

(8)

3.2. Optimal Relay Selection

3.2.1. Mobile Reliable Model

Due to the mobility of trains, the train

T_{n}

may move out of the communication range of the wayside relay

R_{m}

, resulting in an outage event; hence, the distance between the train and wayside nodes should be limited. The candidate wayside node locations are denoted as

(x_{R_{1}}, y_{R_{1}}), (x_{R_{2}}, y_{R_{2}}), (x_{R_{m}}, y_{R_{m}}) \dots (x_{R_{M}}, y_{R_{M}}) .

In the transmission delay

T_{c}

between train and wayside node, the channel SNR must satisfy the SNR threshold condition, whether the train is in the initial position

(x_{R_{1}}, y_{R_{1}})

or at the end of transmission position

({x^{'}}_{T_{1}}, {y^{'}}_{T_{1}})

. Meanwhile, the speed of packet transmission in the channel is much faster than the speed of the train; thus, we assume that the train drives at initial speed

v (t)

during the transmission. In the transmission time delay

T_{c}

, the distance of the train moving in the

x

- and

y

-directions can be calculated as follows [15]:

S_{x} = v (t) \times T_{c} \times e_{x} (t),

(9)

S_{y} = v (t) \times T_{c} \times e_{y} (t),

(10)

where

e_{x} (t)

and

e_{y} (t)

are the

x

- and

y

-directions of the train. Therefore, the location where the train ends its transmission is

({x^{'}}_{T_{1}}, {y^{'}}_{T_{1}}) = (x_{T_{1}} + S_{x}, y_{T_{1}} + S_{y}) .

The conditions that candidate wayside nodes need to satisfy are

d_{T 1, R m} = \sqrt{{(x_{T 1} {- x}_{R m})}^{2} + {(y_{T 1} {- y}_{R m})}^{2}} < R_{\max},

(11)

{d^{'}}_{T 1, R m} = \sqrt{{({x^{'}}_{T 1} {- x}_{R m})}^{2} + {({y^{'}}_{T 1} {- y}_{R m})}^{2}} < R_{\max} .

(12)

A node can only be a candidate transmission node for a train if its location is within the transmission range of the train at the beginning and end of the transmission.

3.2.2. Delay Model

During the packet transmission, a time delay is generated. In this section, we build a delay model to calculate the transmission delay between each node. We define the total delay

D_{i, j}

of packet transmission between wayside nodes

i, j

into two main components, which are the transmission delay

T_{n}

caused by the node sending the packet and the queuing delay

T_{q}

due to node congestion. In this subsection, the time delay calculation is the same for both T2T transmission and T2W transmission; thus, both are expressed as the transmission between nodes

a

and

b

.

When the node sends data packets, the transmission delay can be represented as

T_{c} = \frac{L}{C_{a, b}},

(13)

where

L

is the number of bits in the packet, and

C_{a, b}

is the transmission rate of the channel between node

a

and node

b

.

Queuing delay [27] is unavoidable in the transmission of large amounts of data. Therefore, it is crucial to build a node queuing model. The queue follows the first-in first-out (FIFO) rule. When the CBTC system is stable, we assume that each wayside node can receive multiple data streams simultaneously to eliminate any scheduling effects, and that the average arriving rate and queuing situation are basically fixed. In order to calculate the queuing delay, we use Little’s formula.

In Little’s law, the average waiting time of a queue can be calculated as the queue length divided by the effective throughput. Since the buffer length of our designed model is limited, we calculate the effective throughput by considering the packet loss rate and the packet error rate. According to Little’s law [31,32], the packet delay at next-hop node

b

can be expressed as

T_{q} = \frac{Q_{b}}{{T h}_{b}},

(14)

where

Q_{b}

is the average number of packets queued at node

b

, and

{T h}_{b}

is the effective throughput of node

b

.

The effective throughput of node

b

is indicated as

{T h}_{b} = λ_{b} Δ t \times (1 - p_{f}),

(15)

where

λ_{b}

is the average arriving rate of node

b

,

λ_{b} Δ t

is the total number of packets arriving in a time slot

Δ t

, and

p_{f}

is the link

a \to b

unsuccessful transmission rate. There are many factors that affect the transmission unsuccessful rate, such as the packet error rate

p_{f e}

and the packet loss rate

p_{f l}

. If a packet is lost or transmission error occurs, this will cause this packet to be unusable; hence, the probability of unsuccessful transmission rate can be expressed as

p_{f} = 1 - (1 - p_{f l}) (1 - p_{f e}) .

(16)

As for the calculation of the packet error rate, we can assume that the channel is modulated using the quadrature phase shift keying (QPSK) coding method. Therefore, the bit error rate (BER) [2] is

p_{b e} = \frac{1}{2} (1 - \sqrt{\frac{γ_{a, b}}{1 + γ_{a, b}}}),

(17)

where

γ_{a, b}

is the SNR between the node

a

and node

b

; when the SNR between two nodes is large enough,

p_{b e} ≅ \frac{1}{4 \cdot γ_{a, b}}

. Meanwhile, as the length of the packet is

L

, the BER of the whole packet can be represented as

p_{f e} = 1 - {(1 - p_{b e})}^{L} .

(18)

For packet loss rate

p_{f l}

, we build a node queuing model to solve this problem [33,34]. Packet loss is due to the limited buffer length of the node. Therefore, if the total packet length exceeds the buffer length, the packet will not be received by the node, causing an increment in packet loss rate. We define

M

as the maximum number of packets that the buffer can hold, while

Q_{t - 1}

is denoted as the number of packets left in the previous time slot, and

A_{t}

is the average number of packets arriving at the node in time slot

t

. We can derive the average number of arriving packets per time slot as

A_{t} = λ_{b} Δ t

, where

λ_{b}

is the arrival rate of node

b

, and

L

is the length of the packet. The level of packet loss is

F_{t}

, which can be expressed as

F_{t} = \max [0, (Q_{t - 1} + A_{t}) - M] .

(19)

When the train or wayside relay steadily sends packets to the next hop,

A_{t}

and

F_{t}

remain constant during transmission. Therefore,

\lim_{t \to \infty} A_{t} = A

and

\lim_{t \to \infty} F_{t} = F

. In time slot

t

, the packet loss rate of this node can be calculated by

p_{f l} = \lim_{t \to \infty} \frac{\sum_{t = 1}^{T} F_{t}}{\sum_{t = 1}^{T} A_{t}} = \frac{E {F}}{E {A}} = \frac{E {F}}{λ_{b} Δ t},

(20)

where

E {x}

is the mathematical expectation. If there is no packet overflow, then the packet loss rate is zero. Otherwise, the packet loss rate is the number of packet losses divided by the number of node arrivals. The packet loss rate and the packet error rate are calculated in Equations (18) and (20), respectively, and then introduced into Equations (14) and (15) to calculate the queuing delay. Therefore, the total delay of the

k

hop from node

a

to node

b

can be expressed as

D_{a, b}^{k} = T_{p} + T_{c} .

(21)

3.2.3. Hop Tradeoff

In the next-hop selection, if we only pursue large throughput and small delay for one hop, this will result in an increase in the hop count of the entire T2T link. Therefore, we design a “hop tradeoff” indicator to optimize the number of hops on the entire link. The initial train

T_{n}

and the destination train

{T^{'}}_{n}

need to transmit information, and the distance of the whole T2T link is

d_{{T_{n}, T^{'}}_{n}}

. During the transmission process, the distance between node

a

and node

b

for one-hop distance is

S_{a, b}

. We calculate the number of hops

k_{a, b}

required to complete the entire T2T link for the one-hop distance

S_{a, b}

, which can be represented as

k_{a, b} = \frac{d_{{T_{n}, T^{'}}_{n}}}{S_{a, b}} .

(22)

4. Problem Formulation

In the CBTC scenario, there are different numbers of wayside nodes to assist information transmission depending on the distance between two trains; thus, the link selection for transmission is particularly important. To solve the problem of multi-hop relay selection in wireless ad hoc networks, we propose an optimal transmission model based on a discrete Markov process. The aim is to design a relay node selection decision that satisfies low latency and high throughput, so that information can be transmitted quickly and accurately between trains to each other.

Since the next-hop selection depends only on the current state and the next state changes with the current action selection, next hop selection can be considered as an MDP. The transfer probability between states in the MDP is unknown; thus, we can use the DRL approach to better solve our proposed problem. In DRL, the agent finds the optimal policy that maximizes the long-term reward value according to the channel state information (CSI) and the congestion level of the node. In this paper, we use multiagent DRL (MADRL), in which each node acts as an agent. As agents need to make decisions in a shorter period of time, the agent’s network is trained offline centrally by collecting information between nodes, and then porting it to each agent for online decision making. Therefore, each agent needs to select the next-hop node independently according to the current state, without additional communication for further training. In DRL, there are several key components, as described below.

(1): State Space

In each time slot, the agent updates and learns the policy by observing the variation of the state. In particular, the state contains two components: the number of packages queued in each node and the channel throughput of the links between the two nodes. In time slot

t

, the state space is defined as

s_{t} = {Q (t), C (t), V (t)},

(23)

where

Q (t)

indicates the queue length of each node.

C (t)

is the throughput between the transmission node and other nodes.

V (t) = {0, 1}

; when

V (t) = 0

, at time slot

t

, the wayside node sends packets, whereas

V (t) = 1

means that the train sends the packet.

(2): Action Space

According to the channel state and queue state of each candidate node, the optimal next-hop node

A (t)

is selected. The action space can be given by

a_{t} = {A (t)},

(24)

where

A (t) = {0, 1 \dots m \dots M}

. If

A (t) = m

, then the next hop is node

m

.

(3): Reward Function

In the selection of the next-hop node, the optimization objective is to minimize the delay and maximum the throughput of the entire T2T link while ensuring that the next-hop SNR is greater than threshold value. The packet is successfully transmitted from the initial train to the destination train after

k \in {1, 2, \dots, K}

hops through wayside nodes

i, j \in {R_{1}, R_{2}, \dots, R_{m}, \dots R_{M}}

. Furthermore,

T_{n}

is the packet source train, and

{T^{'}}_{n}

is the packet destination train.

Overall Transmission Time of Whole Link

The total transmission time for packet transmission between source train

T_{n}

and destination train

{T^{'}}_{n}

is

τ_{{T_{n}, T^{'}}_{n} = D_{T_{n}, i}^{1}} + \sum_{k = 2}^{K - 1} D_{i, j}^{k} + D_{{j^{'}, T^{'}}_{n}}^{K} .

(25)

Throughput of Whole Link

In order to better measure the quality of each hop in the transmission link and considering the packet loss and packet error rate, the throughput of the entire link is defined as follows [35,36]:

C_{{T_{n}, T^{'}}_{n}} = (1 - {p^{'}}_{f}) \cdot \min {E [C_{T_{n}, i}], E [C_{i, j}] \dots E [C_{{j^{'}, T^{'}}_{n}}]},

(26)

where

{p^{'}}_{f}

is the unsuccessful transmission rate of whole link;

p_{f}

for the one-hop unsuccessful rate is calculated using Equation (16).

C_{i, j}

denotes the throughput between waysides node

i

and

j

.

The Optimization Goal

The optimization goal for the proposed MADDQN is to reduce the latency and improve throughput for the whole link; hence, the proposed optimization goal is defined as

\max ω_{1} \frac{1}{τ_{{T_{n}, T^{'}}_{n}}} + ω_{2} C_{{T_{n}, T^{'}}_{n}}

(27)

s. t : C 1 : γ_{i, j}^{k} > γ_{t h},

C 2 : D_{i, j}^{k} > D_{t h},

C 3 : d_{T_{n}, i} < R_{\max} .

Here,

τ_{{T_{n}, T^{'}}_{n}}

and

C_{{T_{n}, T^{'}}_{n}}

are the whole-link latency and throughput between train

T_{n}

and train

{T^{'}}_{n}

, respectively.

ω_{1}

and

ω_{2}

are the weight factors of the delay and throughput (

ω_{1} + ω_{2} = 1

). C1 is to ensure that the SNR of the channel is greater than threshold value. C2 indicates that each hop delay needs to be less than the target transmission time. If the one-hop delay takes too much time, the transmission is considered to fail. C3 indicates that, when the train is transmitted with the wayside node

i

, the distance should be less than the maximum transmission distance.

The defined reward function comprehensively considers the throughput and delay of each hop, as well as adds the indicator

k_{i, j}

in Section 3.2.3. Therefore, the reward function is defined as

F_{t} = {\begin{matrix} ω_{1} \frac{1}{k_{i, j} D_{i, j}^{k}} + ω_{2} C_{i, j}^{k} + r_{s} + r_{c}, i f C 1 - C 3 a r e s a t i s f i e d \\ ω_{1} \frac{1}{k_{i, j} D_{i, j}^{k}} + ω_{2} C_{i, j}^{k} + r_{s}, o t h e r w i s e \end{matrix},

(28)

where

r_{s}

is an additional reward for the final hop directly to the train. This reward is established to prevent other wayside nodes close to the train from being selected, which may lead to an increase in hops and delay.

r_{c}

is the outage penalty caused by the next-hop node out of the communication range and a long single-hop delay under C1–C3.

5. Problem Solution

Since value-based functions are suitable for solving discrete space problems, and next-hop relay selection is a discrete action, we choose a value-based reinforcement learning approach for policy optimization. Our proposed scheme has a large number of channel states and queuing states, which leads to a high dimension of the Q-table, and makes the Q-learning [37] algorithm difficult to coverage during training. However, the DQN algorithm can solve this problem, featuring a combination of a deep neural network (DNN) and Q-value. DQN does not directly select the action with the highest Q-value in the Q-table, but fits the

Q_{π} (s_{t}, a_{t}, θ)

through the neural network [38]. Compared to recording the Q-value for an all-action-state situation, DQN just needs to store the weights of each neuron to calculate the Q-values for all policies

π (s_{t}, a_{t})

, which greatly reduces the storage space and makes the algorithm converge faster [39,40].

Since each node needs to make the decision to select the next-hop node, we use the multiagent dueling DQN (MADDQN) approach. MADDQN treats each node as an independent agent, and all the agents are trained centrally. When making the next-hop selection, the trained network parameters are shared with all nodes, and each node selects the next-hop node individually. The specific process of MADDQN is shown in Figure 2.

5.1. DQN

In each time slot

t

, each node acts as an agent to observe the current state

s_{t}

of the system, including the congestion level and channel state information of all nodes. Then, the agent chooses a suitable action

a_{t}

to select the next-hop node. After selecting an action

a_{t}

, a new state

s_{t + 1}

is obtained, and the reward

r_{t}

corresponding to this action is also computed. The goal of the agent is to find a policy

π (s_{t}, a_{t})

, which maximizes the expected discounted cumulative reward

E [\sum_{i = t}^{K - 1} γ^{i - t} r_{i + 1}]

[9]. Therefore, the action-state value function is used to calculate the expected discounted cumulative reward of each relay selection policy and then select the policy

π

with the largest reward. The state-action value function is defined as

Q_{π} (s_{t}, a_{t}) ≜ E [\sum_{i = t}^{K - 1} γ^{i - t} r_{i + 1} | s_{t}, a_{t}, π],

(29)

where

γ \in (0, 1)

is the discount factor, which represents the ratio between immediate and long-term reward.

r_{i + 1}

is the immediate reward at time slot

i + 1

.

In the next time slot of action selection, not only is the next-hop relay selected with maximum

Q_{π} (s_{t}, a_{t})

, but the

ε - g r e e d y

algorithm is also added to explore extra actions. In order to try more possible actions and avoid falling into the local maximum, the agent has

ε

possibility to choose an action randomly. The

ε - g r e e d y

algorithm is denoted as

a_{t} = {\begin{matrix} \arg \max Q_{π} (s_{t}, a_{t}), w i t h p r o b a b i l i t y 1 - ε \\ r a n d o m, w i t h p r o b a b i l i t y ε \end{matrix} .

(30)

Due to the different channel states and node congestion levels, a large number of states are formed; thus, it is impossible to calculate the Q-value for each action and state. In DQN, the convolutional neural network (CNN) is trained to get the weight

θ

of each neuron. After inputting the current channel state and the next-hop relay selection action, the neural network can fit a state-action value

Q (s_{t}, a_{t}, θ)

. To make the network converge faster, DQN has two networks: the target network and the evaluation network. During the training process, the weights of the evaluation network

θ

are continuously updated, and the weights of the evaluation network are assigned to the target network

θ^{t a r}

at certain time intervals. Then, the weight

θ

is updated by the stochastic gradient descent method to minimize the result of loss function between the target network and the evaluation network. The loss function between the target network and the evaluation network is defined as

L (θ) = E_{(s_{t}, a_{t}, r_{t}, a_{t + 1})} {(Q_{t \arg e t} - Q (s_{t}, a_{t}, θ))}^{2},

(31)

where the target value for each iteration of the network is represented as

Q_{t a r g e t} = r_{t} + γ \max_{a_{t + 1}} Q (s_{t}, a_{t}, θ^{t a r}) .

(32)

To focus more on historical experiences and disrupt the correlation between experiences, DQN also uses the mechanism of experience replay. At each time slot, when nodes are trained centrally, the node acts as an agent, storing the training experience

e_{t} = (s_{t}, a_{t}, r_{t}, s_{t + 1})

into the experience pool, and then forming the sequence

D = {e_{1}, e_{2} \dots e_{N}}

. For each training, a small number of samples are randomly selected from the experience pool as a batch for network training, which makes the network converge better.

5.2. Dueling DQN

The dueling DQN network [19] makes further improvements on the DQN network structure. In DQN, the network directly outputs the state-action values

Q (s_{t}, a_{t}, θ)

corresponding to each relay selection policy. However, in dueling DQN, the output Q-value is split into two branches: the state value

V (s_{t})

indicating the value of the current channel and queuing state, and the action advantage value

A (s_{t}, a_{t})

representing the value brought by the relay selection action. Finally, the output values of the two branches are combined to make the estimation of

Q

more accurate. The combination of the two branches can be written as

Q (s_{t}, a_{t}; θ, σ, ϑ) = V (s_{t}; θ, ϑ) + A (s_{t}, a_{t}; θ, σ),

(33)

where

θ

,

σ

, and

ϑ

are the coefficients of the neural network. In order to prevent multiple sets of state value

V (s_{t})

and action advantage value

A (s_{t}, a_{t})

with the same state-action value

Q (s_{t}, a_{t}; θ, σ, ϑ)

, and to make the algorithm more stable [38], Equation (30) is replaced by

Q (s_{t}, a_{t}; θ, σ, ϑ) = V (s_{t}; θ, ϑ) + (A (s_{t}, a_{t}; θ, σ) - \frac{1}{| A |} \sum_{a_{t + 1}} A (s_{t}, a_{t + 1}; θ, σ)) .

(34)

The proposed multiagent dueling DQN is shown in Algorithm 1.

Algorithm 1 Dueling-DQN

1: Initialization:
Initialize the maximum buffer capacity

M

and packet length

L

;
Initialize the number of nodes along the rail

N

;
Initialize network memory size

J

, batch size

B

, greedy coefficient

ε

, and learning rate

φ

.
2: for episode in range K do:
3: Reset channel quality

C

and the queue length

Q

of each node as initial state

S_{i n i t i a l}

4: While

a (t)! = D e s t i n a t i o n N o d e

do
5: Choose action: with probability

ε

to choose next hop node in random.
6: Otherwise, choose action

a_{t}

with

{\arg \max Q}_{π} (s_{t}, a_{t}, θ)

.
7: From current state

s_{t}

and action

a_{t}

of this hop, obtain the reward

r_{t}

for this action

a_{t}

and the next state

s_{t + 1}

.
8: Store

〈 s_{t}, a_{t}, r_{t}, s_{t + 1} 〉

into experience reply to memory.
9: Randomly take minibatch of

〈 s_{t}, a_{t}, r_{t}, s_{t + 1} 〉

from experience reply to memory.
10: Combine two branches

V (s_{t}; θ, ϑ)

and

A (s_{t}, a_{t}; θ, σ)

into

Q (s_{t}, a_{t}; θ, σ, ϑ)

11: Calculate target Q-value

Q_{t a r g e t} = {_{r_{t} + γ \max_{a_{t + 1}} Q (s_{t + 1}, a_{t + 1}; θ^{t a r}, σ, ϑ), o t h e r w i s e .}^{r_{t}, i f a_{t} i s t h e d e s t i n a t i o n n o d e}

12: Minimize loss function

L (θ)

using Equation (30)
13: Update the target network after several steps using the parameters of the evaluation network

14: end while
15: end for

6. Simulation Results

In this section, we verify the effectiveness of the proposed deep learning-based relay selection algorithm by conducting simulation experiments in CBTC system.

6.1. Simulation Settings

In the simulation, TensorFlow 1.13.1 was imported in Python 3.6 as the simulation environment.

In order to simplify the system model, we performed a simulation of relay selection between two adjacent trains. If the SNR between the current node and the next-hop wayside relay is greater than the SNR threshold, then the communication between the two nodes is possible. Since each node has the same transmission power, the distance between nodes mainly determines the channel throughput; thus, the next-hop node which is closer to the current node has higher one-hop channel throughput. In the train system, packets transmitted by the train must pass through wayside nodes and cannot be delivered directly to the forward train. In addition, wayside relays are uniformly distributed on both sides of the track, as the distance between trains become longer, the number of hops required for transmission also increases.

Furthermore, in the process of training, each agent has its own training parameters; we set the batch size

B = 256

, greedy coefficient

ε = 0.1

, learning rate

φ = 0.001

, and memory size

J = 1024

. Some other main parameters of the communication system are shown in Table 1.

6.2. Performance Analysis

We compare the proposed MADDQN algorithm with two existing algorithms:

GPSR [22] (greedy perimeter stateless routing) is often used in the transmission of ad hoc networks, which collects the geographic location information of neighboring nodes and finds the next-hop node with the nearest geographic location to the destination through a greedy algorithm.
The random selection scheme randomly selects the next-hop node within the communication range without any optimization strategy.

6.2.1. Performance Comparison of Convergence

Firstly, in order to find the learning rate that makes the proposed model converge best, we conducted experiments at three different learning rates. As shown in Figure 3, when the learning rate was equal to

10^{- 4}

, the convergence rate of the agent was slow, and the total reward value did not reach the optimal value. To make the convergence speed up, we increased the learning rate to

10^{- 3}

, which made the convergence faster and the total reward higher. When the learning rate increased to

10^{- 2}

, it was easy for the agent to converge to a local optimum, resulting in poor convergence results. Therefore, in the training of the agents, the learning rate was set to

10^{- 3}

.

The goal of agent training is to better avoid outage events and reduce packet loss rate. As shown in Figure 4, the probability of outage events decreased dramatically in the first 1000 episodes, indicating that the agent learned to select next-hop relays within communication range. At the same time, the probability of network congestion gradually decreased during the training process, which illustrates that the agent successfully avoided congested nodes. The simulation results show that the MADDQN algorithm could effectively avoid outage and congestion events, ensuring the quality of transmission.

6.2.2. Performance Comparison of Different Aspects

The distance between two adjacent trains is different and the wayside nodes are uniformly distributed. Thus, when the distance between two trains become larger, the number of trackside nodes between them increases and the topology of the network changes. Figure 5 depicts the curves of the variation of the total delay as the number of nodes increase. It can be observed that the whole-link delay increased from 4.31 ms to 8.43 ms under the MADDQN algorithm. The main reason is that, with the increment in the number of relays, the number of hops required for the whole link increased; thus, the total delay increased. Compared with the random selection scheme and GPSR algorithm, the transmission delay of the MADDQN algorithm was reduced by an average of 2 ms and 0.5 ms, respectively. Although the traditional GPSR algorithm requires a small number of hops, it cannot avoid congested nodes, resulting in a large packet loss rate. In addition, the random scheme can neither select nodes with small queued tasks nor optimize the hop count; hence, the delay is longer than the previous two algorithms.

The effect of buffer size on total delay is investigated in Figure 6. When the buffer size was less than 300 Kb, the total latency increased significantly as the buffer size became larger. When the buffer size was small, the packet queue became shorter, resulting in lower packet delay. Moreover, when the buffer size reached 300 kb, the total delay gradually flattened out. The reason is that nodes were no longer dropping packets, and the queue length of each node tended to be stable. In addition, MADDQN improved by 0.5 ms compared to the GPSR algorithm and 3 ms compared to the random selection scheme, which illustrates that the proposed MADDQN algorithm could select nodes with shorter queues for transmission.

Figure 7 presents the relationship between the number of nodes and the average loss rate under different schemes. As the number of nodes increased from four to nine, the packet loss rate of MADDQN increased to 0.12. The reason is that, as the number of hops increased, the total packet loss rate also increased. Compared with the other two schemes, the proposed MADDQN could select the next-hop relay with a shorter queuing number for transmission, which greatly reduced the packet loss probability. While the GPSR algorithm could not avoid congested nodes, it required fewer hops for transmission; hence, the packet loss rate was also lower than the random scheme.

Figure 8 depicts the change in the average loss rate with the buffer size. The average packet loss rate rapidly decreased to zero when the buffer size was 300 kb. Due to the small buffer size, packets could easily overflow. Hence, the packet loss rate continued to decrease until the buffer was large enough. The average loss rate of the GPSR and random scheme before 300 kb was higher than MADDQN, which illustrates that our proposed method had a significant effect in avoiding congested nodes and reducing the number of hops, such that the packet loss rate was the lowest.

Figure 9 presents how the number of nodes affects the average throughput. It can be observed that the average throughput of the entire link decreased monotonically with the increment in the number of nodes. This is because, as shown in Figure 7, the packet loss rate increased with the number of nodes, it led to a reduction in the overall throughput. The throughput of MADDQN was greater than the other two methods, indicating that, when selecting the next hop node, MADDQN chose the node with relatively large channel throughput and fewer queued packets, ensuring channel quality.

Figure 10 shows the impact of buffer size on the throughput. It can be observed that the throughput rapidly increased before 300 kb and then reached a stable state with increasing buffer size. This is because as the buffer size gradually increased, it caused a decrease in packet loss rate; therefore, the system throughput increased. When the system had no packet loss, the throughput tended to stable. Meanwhile, the proposed algorithm had the highest throughput compared with the other two schemes; hence, it can be proven that MADDQN was effective in selecting the routing with a large throughput.

Figure 11 illustrates the relationship of the number of nodes and optimization goal under different weights of latency and channel throughput. The simulation results show that, with the rise in

ω_{1}

, the optimization goal was much larger. This is because, although the optimization goal optimized both delay and throughput, when the weight of delay was high, latency was optimized more, while the optimization of throughput was relatively weak. Moreover, when the node number was between four and six, delay accounted for a large proportion of the optimization objective. Thus, when

ω_{1}

= 0.9, the total optimization objective was the highest. However, in order to optimize both delay and throughput to a better level, we chose the case

ω_{1}

= 0.5,

ω_{2}

= 0.5. In the application, different parameters can also be chosen according to different needs. For example, the weight of

ω_{1}

can be increased for safety information with higher requirements on latency. For systems where throughput is more important, the proportion of

ω_{2}

can be increased appropriately, but the sum of

ω_{1}

and

ω_{2}

must be equal to 1.

The effect of the number of nodes and the buffer size on the optimization goal is investigated in Figure 12 and Figure 13. The optimization goal was derived from Equation (27), which is a comprehensive indicator of channel throughput and transmission delay. In Figure 12, the delay increased and the throughput decreased as the number of nodes grew; thus, the optimization objective was gradually reduced. This shows that, as the number of nodes increased, the overall performance of the system worsened. In Figure 13, both latency and throughput tended to rise as the buffer size increased, but latency rose faster, having a greater impact on the optimization objective. Therefore, the optimization goal showed a slight decrease after combining these two indicators. Moreover, we can observe that our proposed algorithm always outperformed the existing algorithms, indicating that the MADDQN algorithm could better tradeoff the channel throughput and transmission delay under any topological condition and buffer size to achieve the optimization goal.

7. Conclusions and Future Work

In this paper, we designed a multi-hop relay selection strategy based on wireless ad hoc networks to assist T2T communication. The optimization goal of our proposed algorithm is to reduce the T2T transmission delay and increase the throughput of the entire link in a congested network. Since the channel status changes in real time, an MADDQN approach was proposed to better solve the problem. Simulation results showed that our proposed algorithm could effectively avoid congested nodes and reduce the number of hops for the whole-link transmission, thereby better achieving the optimization goal compared with existing routing algorithms. In future work, the energy consumption of the nodes and the problem of retransmission after packet loss should be considered. Moreover, some secure and energy-efficient technologies, such as reconfigurable intelligent surfaces (RIS), will be applied to the CBTC system to better assist in signal transmission.

Author Contributions

Conceptualization, S.M. and M.L.; methodology, S.M. and R.Y.; software, S.M.; validation, S.M., R.Y., and Y.S.; formal analysis, S.M. and M.L.; investigation, S.M. and Z.W.; resources, S.M., Z.W., and P.S.; data curation, S.M. and M.L.; writing—original draft preparation, S.M.; writing—review and editing, M.L. and P.S.; visualization, Y.S. and Z.W.; supervision, M.L. and P.S.; project administration, M.L., R.Y., and Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported through the Beijing Natural Science Foundation under Grants L211002, 4222002, and L202016, and the Foundation of the Beijing Municipal Commission of Education under Grants KM 202010005017 and KM202110005021.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, X.; Liu, L.; Tang, T.; Sun, W. Enhancing Communication-Based Train Control Systems Through Train-to-Train Communications. IEEE Trans. Intell. Transp. Syst. 2019, 20, 1544–1561. [Google Scholar] [CrossRef]
Zhu, L.; Yu, F.R.; Ning, B.; Tang, T. Communication-Based Train Control (CBTC) Systems with Cooperative Relaying: Design and Performance Analysis. IEEE Trans. Veh. Technol. 2014, 63, 2162–2172. [Google Scholar] [CrossRef]
Liu, L.; Parag, P.; Tang, J.; Chen, W.Y.; Chamberland, J.F. Resource Allocation and Quality of Service Evaluation for Wireless Communication Systems Using Fluid Models. IEEE Trans. Inf. Theory. 2007, 53, 1767–1777. [Google Scholar] [CrossRef]
Wang, X.; Liu, L.; Tang, T. Train-Centric CBTC Meets Age of Information in Train-to-Train Communications. IEEE Trans. Intell. Transp. Syst. 2020, 21, 4072–4085. [Google Scholar] [CrossRef]
Sun, W.; Yu, F.R.; Tang, T.; Bu, B. Energy-Efficient Communication-Based Train Control Systems with Packet Delay and Loss. IEEE Trans. Intell. Transp. Syst. 2016, 17, 452–468. [Google Scholar] [CrossRef]
Wang, X.; Liu, L.; Tang, T.; Zhu, L. Next Generation Train-Centric Communication-Based Train Control System with Train-to-Train (T2T) Communications. In Proceedings of the 2018 International Conference on Intelligent Rail Transportation (ICIRT), Singapore, 12–14 December 2018. [Google Scholar]
Li, Y.; Zhu, L. Collaborative Cloud and Edge Computing in 5G based Train Control Systems. In Proceedings of the 2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022. [Google Scholar]
Gong, S.; Lu, X.; Hoang, D.T.; Niyato, D.; Shu, D.; Shu, L.; Kim, D.I.; Liang, Y.C. Toward Smart Wireless Communications via Intelligent Reflecting Surfaces: A Contemporary Survey. IEEE Commun. Surv. Tutor. 2020, 22, 2283–2314. [Google Scholar] [CrossRef]
Ahmed, M.; Wahid, A.; Laique, S.S.; Khan, W.U.; Ihsan, A.; Xu, F.; Chatzinotas, S.; Han, Z. A Survey on STAR-RIS: Use Cases, Recent Advances, and Future Research Challenges. IEEE Internet Things J. 2023. [Google Scholar] [CrossRef]
Ahmed, M.; Mirza, M.A.; Raza, S.; Ahmad, H.; Xu, F.; Khan, W.U.; Lin, Q.; Han, Z. Vehicular Communication Network Enabled CAV Data Offloading: A Review. IEEE Trans. Intell. Transp. Syst. 2023. [Google Scholar] [CrossRef]
Gupta, L.; Jain, R.; Vaszkun, G. Survey of Important Issues in UAV Communication Networks. IEEE Commun. Surv. Tutor. 2016, 18, 1123–1152. [Google Scholar] [CrossRef] [Green Version]
Jacquet, P.; Muhlethaler, P.; Clausen, T.; Laouiti, A.; Qayyum, A.; Viennot, L. Optimized Link State Routing Protocol for Ad Hoc Networks. In Proceedings of the 2001 IEEE International Multi Topic Conference, Lahore, Pakistan, 30 December 2001. [Google Scholar]
Bai, F.; Sadagopan, N.; Helmy, A. IMPORTANT: A framework to systematically analyze the Impact of Mobility on Performance of Routing Protocols for Adhoc Networks. In Proceedings of the 2003 Twenty-Second Annual Joint Conference of the IEEE Computer and Communications Societies, San Francisco, CA, USA, 30 March–3 April 2003. [Google Scholar]
Ding, R.; Xu, Y.; Gao, F.; Shen, X. Trajectory Design and Access Control for Air–Ground Coordinated Communications System with Multiagent Deep Reinforcement Learning. IEEE Internet Things J. 2022, 9, 5785–5798. [Google Scholar] [CrossRef]
Ding, R.; Chen, J.; Wu, W.; Liu, J.; Gao, F.; Shen, X. Packet Routing in Dynamic Multi-Hop UAV Relay Network: A Multi-Agent Learning Approach. IEEE Trans. Veh. Technol. 2022, 71, 10059–10072. [Google Scholar] [CrossRef]
He, Y.; Zhai, D.; Jiang, Y.; Zhang, R. Relay Selection for UAV-Assisted Urban Vehicular Ad Hoc Networks. IEEE Wirel. Commun. Lett. 2020, 9, 1379–1383. [Google Scholar] [CrossRef]
Wu, Q.; Zheng, J. Performance Modeling and Analysis of the ADHOC MAC Protocol for VANETs. In Proceedings of the IEEE International Conference on Communication (ICC’15), London, UK, 8–12 June 2015. [Google Scholar]
Wu, Q.; Zheng, J. Performance Modeling and Analysis of the ADHOC MAC Protocol for Vehicular Networks. Wirel. Netw. 2016, 22, 799–812. [Google Scholar] [CrossRef]
Wu, Q.; Zhao, Y.; Fan, Q.; Fan, P.; Wang, J.; Zhang, C. Mobility-Aware Cooperative Caching in Vehicular Edge Computing Based on Asynchronous Federated and Deep Reinforcement Learning. IEEE J. Sel. Top. Signal Process. 2023, 17, 66–81. [Google Scholar] [CrossRef]
Zhu, L.; Yu, F.R.; Ning, B. Availability Improvement for WLAN-Based Train-Ground Communication Systems in Communication-Based Train Control (CBTC). In Proceedings of the 2010 IEEE 72nd Vehicular Technology Conference, Ottawa, ON, Canada, 6–9 September 2010. [Google Scholar]
Wang, Y.; Zhu, L.; Zhao, H. Handover Performance Test and Analysis in TD-LTE based CBTC Train Ground Communication Systems. In Proceedings of the 2017 Chinese Automation Congress, Jinan, China, 20–22 October 2017. [Google Scholar]
Karp, B.; Kung, H.-T. Gpsr: Greedy Perimeter Stateless Routing for Wireless Networks. In Proceedings of the 6th Annual International Conference on Mobile Computing and Networking, Boston, MA, USA, 6–11 August 2000. [Google Scholar]
Liu, K.; Niu, K. A Hybrid Relay Node Selection Strategy for VANET Routing. In Proceedings of the 2017 IEEE/CIC International Conference on Communications in China (ICCC), Qingdao, China, 22–24 October 2017. [Google Scholar]
Li, F.; Wang, Y. Routing in Vehicular Ad Hoc Networks: A Survey. IEEE Veh. Technol. Mag. 2007, 2, 12–22. [Google Scholar] [CrossRef]
Toor, Y.; Muhlethaler, P.; Laouiti, A.; La Fortelle, A.D. Vehicle Ad Hoc Networks: Applications and Related Technical Issues. IEEE Commun. Surv. Tutor. 2008, 10, 74–88. [Google Scholar] [CrossRef]
Wang, Z.; Han, R.; Li, H.; Knoblock, E.J.; Apaza, R.D.; Gasper, M.R. Deep Reinforcement Learning Based Routing in an Air-to-Air Ad-hoc Network. In Proceedings of the 2022 IEEE/AIAA 41st Digital Avionics Systems Conference (DASC), Portsmouth, VA, USA, 18–22 September 2022. [Google Scholar]
Zhang, H.; Chong, S.; Zhang, X.; Lin, N. A Deep Reinforcement Learning Based D2D Relay Selection and Power Level Allocation in mmWave Vehicular Networks. IEEE Wirel. Commun. Lett. 2020, 9, 416–419. [Google Scholar] [CrossRef]
Wu, Q.; Zheng, J. Performance Modeling of IEEE 802.11 DCF Based Fair Channel Access for Vehicular-to-Roadside Communication in a Non-Saturated State. In Proceedings of the IEEE International Conference on Communication (ICC’14), Sydney, NSW, Australia, 10–14 June 2014. [Google Scholar]
Chen, Y.; Feng, Z.; Xu, D.; Liu, Y. Optimal Power Allocation and Relay Selection in Dual-Hop and Multi-Hop Cognitive Networks. In Proceedings of the 2012 IEEE International Conference on Communications (ICC), Ottawa, ON, Canada, 10–15 June 2012. [Google Scholar]
Wang, Y.; Feng, Z.; Chen, X.; Li, R.; Zhang, P. Outage Constrained Power Allocation and Relay Selection for Multi-Hop Cognitive Network. In Proceedings of the 2012 IEEE Vehicular Technology Conference (VTC Fall), Quebec City, QC, Canada, 3–6 September 2012. [Google Scholar]
Kleinrock, L. Queueing Systems, Volume I: Theory; Wiley-Interscience: New York, NY, USA, 1975. [Google Scholar]
Liu, W.; Zhou, S.; Giannakis, G.B. Queuing with Adaptive Modulation and Coding over Wireless Links: Cross-Layer Analysis and Design. IEEE Trans. Wirel. Commun. 2005, 4, 1142–1153. [Google Scholar]
Ma, R.; Chang, Y.-J.; Chen, H.-H.; Chiu, C.-Y. On Relay Selection Schemes for Relay-Assisted D2D Communications in LTE-A Systems. IEEE Trans. Veh. Technol. 2017, 66, 8303–8314. [Google Scholar] [CrossRef]
Chen, Z.; Smith, D. MmWave M2M Networks: Improving Delay Performance of Relaying. IEEE Trans. Wirel. Commun. 2021, 20, 577–589. [Google Scholar] [CrossRef]
Xia, B.; Fan, Y.; Thompson, J.; Poor, H.V. Buffering in a Three-Node Relay Network. IEEE Trans. Wirel. Commun. 2008, 7, 4492–4496. [Google Scholar] [CrossRef]
Gui, J.; Deng, J. Multi-Hop Relay-Aided Underlay D2D Communications for Improving Cellular Coverage Quality. IEEE Access 2018, 6, 14318–14338. [Google Scholar] [CrossRef]
Liu, M.; Yu, F.R.; Teng, Y.; Leung, V.C.M.; Song, M. Performance Optimization for Blockchain-Enabled Industrial Internet of Things (IIoT) Systems: A Deep Reinforcement Learning Approach. IEEE Trans. Ind. Inf. 2019, 15, 3559–3570. [Google Scholar] [CrossRef]
Wang, Z.; Freitas, N.D.; Lanctot, M. Dueling Network Architectures for Deep Reinforcement Learning. PMLR 2016, 48, 1995–2003. [Google Scholar]
Du, J.; Cheng, W.; Lu, G.; Cao, H.; Chu, X.; Zhang, Z.; Wang, J. Resource Pricing and Allocation in MEC Enabled Blockchain Systems: An A3C Deep Reinforcement Learning Approach. IEEE Trans. Netw. Sci. Eng. 2022, 9, 33–44. [Google Scholar] [CrossRef]
Du, J.; Yu, F.R.; Lu, G.; Wang, J.; Jiang, J.; Chu, X. MEC-Assisted Immersive VR Video Streaming over Terahertz Wireless Networks: A Deep Reinforcement Learning Approach. IEEE Internet Things J. 2020, 7, 9517–9529. [Google Scholar] [CrossRef]

Figure 1. Next-hop relay selection in T2T communication.

Figure 2. Framework of MADDQN.

Figure 3. Total reward with different learning rates.

Figure 4. Outage probability and congestion probability of the proposed algorithm.

Figure 5. Total delay versus the number of nodes.

Figure 6. Total delay versus buffer size.

Figure 7. Average loss rate versus the number of nodes.

Figure 8. Average loss rate versus buffer size.

Figure 9. Average throughput versus the number of nodes.

Figure 10. Average throughput versus buffer size.

Figure 11. Relationship of optimization goal and number of nodes under different weights.

Figure 12. Relationship of optimization goal and number of nodes under different schemes.

Figure 13. Relationship of optimization goal and buffer size under different schemes.

Table 1. Communication system parameters.

Parameters	Value
Number of trains	2
Number of wayside nodes	7
Bandwidth	10 MHz
Max buffer size	250 kb
Average packet size	25 kb
SNR threshold	31
Gaussian noise power spectral density	−174 dBm/Hz
Weight of latency, $ω_{1}$	0.5
Weight of channel throughput, $ω_{2}$	0.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, S.; Li, M.; Yang, R.; Sun, Y.; Wang, Z.; Si, P. Next-Hop Relay Selection for Ad Hoc Network-Assisted Train-to-Train Communications in the CBTC System. Sensors 2023, 23, 5883. https://0-doi-org.brum.beds.ac.uk/10.3390/s23135883

AMA Style

Ma S, Li M, Yang R, Sun Y, Wang Z, Si P. Next-Hop Relay Selection for Ad Hoc Network-Assisted Train-to-Train Communications in the CBTC System. Sensors. 2023; 23(13):5883. https://0-doi-org.brum.beds.ac.uk/10.3390/s23135883

Chicago/Turabian Style

Ma, Sixing, Meng Li, Ruizhe Yang, Yang Sun, Zhuwei Wang, and Pengbo Si. 2023. "Next-Hop Relay Selection for Ad Hoc Network-Assisted Train-to-Train Communications in the CBTC System" Sensors 23, no. 13: 5883. https://0-doi-org.brum.beds.ac.uk/10.3390/s23135883

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Next-Hop Relay Selection for Ad Hoc Network-Assisted Train-to-Train Communications in the CBTC System

Abstract

1. Introduction

2. Related Work

2.1. Traditional Communication Method in CBTC System

2.2. Traditional Ad Hoc Network Route Selection

2.3. The Relay Selection of Ad Hoc Network Enabled by DRL

3. System Model

3.1. Communication Model

3.1.1. Train-to-Wayside (T2W) Link

3.1.2. Wayside-to-Wayside (W2W) Link

3.1.3. Outage Analysis

3.2. Optimal Relay Selection

3.2.1. Mobile Reliable Model

3.2.2. Delay Model

3.2.3. Hop Tradeoff

4. Problem Formulation

5. Problem Solution

5.1. DQN

5.2. Dueling DQN

6. Simulation Results

6.1. Simulation Settings

6.2. Performance Analysis

6.2.1. Performance Comparison of Convergence

6.2.2. Performance Comparison of Different Aspects

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI