Entropic Bounds on the Average Length of Codes with a Space

Bruno, Roberto; Vaccaro, Ugo

doi:10.3390/e26040283

Open AccessArticle

Entropic Bounds on the Average Length of Codes with a Space

by

Roberto Bruno

^†

and

Ugo Vaccaro

^*,†

Department of Computer Science, University of Salerno, 84084 Fisciano, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2024, 26(4), 283; https://0-doi-org.brum.beds.ac.uk/10.3390/e26040283

Submission received: 13 November 2023 / Revised: 15 March 2024 / Accepted: 23 March 2024 / Published: 26 March 2024

(This article belongs to the Special Issue Extremal and Additive Combinatorial Aspects in Information Theory)

Download Versions Notes

Abstract

:

We consider the problem of constructing prefix-free codes in which a designated symbol, a space, can only appear at the end of codewords. We provide a linear-time algorithm to construct almost-optimal codes with this property, meaning that their average length differs from the minimum possible by at most one. We obtain our results by uncovering a relation between our class of codes and the class of one-to-one codes. Additionally, we derive upper and lower bounds to the average length of optimal prefix-free codes with a space in terms of the source entropy.

Keywords:

codes; entropy; average length; prefix-free codes; one-to-one codes

1. Introduction

Modern natural languages achieve the unique parsability of written texts by inserting a special character (i.e., a space) between words [1] (See [2] for a few exceptions to this rule). Classical Information Theory, instead, studies codes that achieve the unique parsability of texts by imposing diverse combinatorial properties on the codeword set: e.g., the prefix property, unique decipherability, etc. [3]. With respect to the efficiency of such codes (usually measured via the average number of code symbols per source symbol), it is well known that the Shannon entropy of the information source constitutes a fundamental lower bound for it. On the other hand, if one drops the property of the unique parsability of code messages into individual codewords, and simply requires that different source symbols be encoded with different codewords, one can obtain codes (known as one-to-one codes) with efficiency below the source Shannon entropy (although not too much below; see, e.g., [4,5]).

Jaynes [6] took the approach of directly studying source codes in which a designated character of the code alphabet is exclusively used as a word delimiter. More precisely, Jaynes studied the possible decrease of the noiseless channel capacity (see [7], p. 8) associated with any code that uses a designated symbol as an end-of-codeword mark, as compared with the noiseless channel capacity of an unconstrained code. Quite interestingly, Jaynes proved that the decrease of the noiseless channel capacity of codes with an end-of-codeword mark becomes negligible, as the maximum codeword length increases.

In this paper, we study the problem of constructing prefix-free codes where a specific symbol (referred to as a ‘space’) can only be positioned at the end of codewords. We refer to this kind of prefix code as prefix codes ending with a space. We develop a linear-time algorithm that constructs ‘almost’-optimal codes with this characteristic, in the sense that the average length of the constructed codes is at most one unit longer than the shortest possible average length of any prefix-free code in which the space can appear only at the end of codewords. We prove this result by highlighting a connection between our type of codes and the well-known class of one-to-one codes. We also provide upper and lower limits of the average length of optimal prefix codes ending with a space, expressed in terms of the source entropy and the cardinality of the code alphabet.

The paper is structured as follows. In Section 2, we illustrate the relationships between prefix codes ending with a space and one-to-one codes. Specifically, we prove that, from one-to-one codes, one can easily construct prefix codes ending with a space, and we give an upper bound on the average length of the constructed codes. Successively, we show that, if we remove all the spaces from the codewords of prefix codes ending with a space, one obtains a one-to-one code. This result allows us to prove that the average length of our prefix codes ending with a space differs from the minimum possible by at most one. In Section 3 and Section 4, we derive upper and lower bounds on the average length of optimal prefix codes ending with a space in terms of the source entropy and the cardinality of the code alphabet.

2. Relations between One-to-One Codes and Prefix Codes Ending with a Space

Let

S = {s_{1}, \dots, s_{n}}

be the set of source symbols,

p = (p_{1}, \dots, p_{n})

be a probability distribution on the set S (that is,

p_{i}

is the probability of source symbol

s_{i}

), and

{0, \dots, k - 1}

be the code alphabet. We denote by

{0, \dots, k - 1}^{+}

the set of all non-empty sequences on the code alphabet

{0, \dots, k - 1}

,

k \geq 2

, and by

{0, \dots, k - 1}^{+} ⊔

the set of all non-empty k-ary sequences that ends with the special symbol ⊔, i.e., the space symbol.

A prefix-free code ending with a space is a one-to-one mapping:

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

in which no codeword

C (s)

is a prefix of another codeword

C (s^{'})

, for any

s, s^{'} \in S

,

s \neq s^{'}

.

A k-ary one-to-one code (see [4,5,8,9,10,11] and the references therein quoted) is a bijective mapping

D : S ⟼ {0, \dots, k - 1}^{+}

from S to the set of all non-empty sequences over the alphabet

{0, \dots, k - 1}

,

k \geq 2

.

The average length of an arbitrary code for the set of source symbols

S = {s_{1}, \dots, s_{n}}

, with probabilities

p = (p_{1}, \dots, p_{n})

, is

\sum_{i = 1}^{n} p_{i} l_{i}

, where

l_{i}

is the number of alphabet symbols in the codeword associated with the source symbol

s_{i}

.

Without loss of generality, we assume that probability distribution

p = (p_{1}, \dots, p_{n})

is ordered, that is

p_{1} \geq \dots \geq p_{n}

. Under this assumption, it is apparent that the best one-to-one code proceeds by assigning the shortest codeword (e.g., in the binary case, codeword 0) to the highest probability source symbol

s_{1}

, the next shortest codeword 1 to the source symbol

s_{2}

, the codeword 00 to

s_{3}

, the codeword 01 to

s_{4}

, and so on.

An equivalent approach for constructing an optimal one-to-one code, which we will use later, proceeds as follows: Let us consider the first n non-empty k-ary strings according to the radix order [12] (that is, the k-ary strings are ordered by length and, for equal lengths, ordered according to the lexicographic order). We assign the strings to the symbols

s_{1}, \dots, s_{n}

in S by increasing the string length and, for equal lengths, by inverse order according to the lexicographic order. For example, in the binary case, we assign the codeword 1 to the highest probability source symbol

s_{1}

, the codeword 0 to the source symbol

s_{2}

, the codeword 11 to

s_{3}

, the codeword 10 to

s_{4}

, and so on. Therefore, one can see that, in the general case of a k-ary code alphabet,

k \geq 2

, an optimal one-to-one code of minimal average length assigns a codeword of length

l_{i}

to the i-th symbol

s_{i} \in S

, where

l_{i}

is given by:

l_{i} = ⌊ {log}_{k} ((k - 1) i + 1) ⌋ .

(1)

Moreover, the codewords of an optimal k-ary one-to-one code can be represented as the nodes of a k-ary tree of maximum depth

h = ⌈ {log}_{k} (n - ⌈ n / k ⌉) ⌉

, where, for each node v, the k-ary string (codeword) associated with v is obtained by concatenating all the labels in the path from the root of the tree to v.

It is evident that, if we apply the above encoding to a sequence of source symbols, the obtained binary sequence is not uniquely parsable in terms of individual codewords. Let us see how one can recover unique parsability by appending a space ⊔ to judiciously chosen codewords of an optimal one-to-one code. To gain insight, let us consider the following example. Let

S = {s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}, s_{7}, s_{8}, s_{9}, s_{10}}

be the set of source symbols, and and let us assume that the code alphabet is

{0, 1}

. Under the standing hypothesis that

p_{1} \geq \dots \geq p_{10}

, one has that the best prefix-free code C one can obtain by the procedure of appending a space ⊔ to codewords of the optimal one-to-one code for S is the following:

\begin{matrix} C (s_{1}) & = 1 ⊔ \\ C (s_{2}) & = 0 ⊔ \\ C (s_{3}) & = 11 \\ C (s_{4}) & = 10 \\ C (s_{5}) & = 01 ⊔ \\ C (s_{6}) & = 00 ⊔ \\ C (s_{7}) & = 011 \\ C (s_{8}) & = 010 \\ C (s_{9}) & = 001 \\ C (s_{10}) & = 000 . \end{matrix}

Observe that we started from the codewords of the optimal one-to-one code constructed according to the second procedure previously described. Moreover, note that the codewords associated with symbols

s_{1}, s_{2}, s_{5}

, and

s_{6}

necessitate the space character ⊔ at their end; otherwise, the unique parsability of some encoded sequences of source symbols would not be guaranteed. On the other hand, the codewords associated with symbols

s_{3}, s_{4}, s_{7}, s_{8}, s_{9}

, and

s_{10}

do not necessitate the space character ⊔. Indeed, the codeword set

{1 ⊔, 0 ⊔, 11, 10, 01 ⊔, 00 ⊔, 011, 010, 001, 000}

satisfies the prefix-free condition (i.e., no codeword is a prefix of any other); therefore, it guarantees the unique parsability of any coded message in terms of individual codewords.

The idea of the above example can be generalized, as shown in the following lemma.

Lemma 1.

Let

S = {s_{1}, \dots, s_{n}}

be the set of source symbols and

p = (p_{1}, \dots, p_{n})

,

p_{1} \geq \dots \geq p_{n} > 0

, be a probability distribution on S. Let

{0, \dots, k - 1}

be the

k \geq 2

-ary code alphabet. We can construct a prefix-free code

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

, in

O (n)

, such that its average length

L (C)

satisfies

\begin{matrix} L (C) & = L_{+} + \sum_{i = 1}^{\frac{k^{h - 1} - 1}{k - 1} - 1} p_{i} + \sum_{i = \frac{k^{h} + k^{h - 1} - 2}{k - 1} - ⌈ n / k ⌉}^{\frac{k^{h} - 1}{k - 1} - 1} p_{i} \end{matrix}

(2)

\begin{matrix} \leq L_{+} + \sum_{i = 1}^{⌈ n / k ⌉ - 1} p_{i}, \end{matrix}

(3)

where

L_{+}

is the average length of an optimal one-to-one code

D : S ⟼ {0, \dots, k - 1}^{+}

and

h = ⌈ {log}_{k} (n - ⌈ n / k ⌉) ⌉

.

Proof.

Under the standing hypothesis that the probabilities of the source symbols are ordered from the largest to the smallest, we show how to construct a prefix-free code—by appending the special character ⊔ to the end of (some) codewords of an optimal one-to-one code for S—having the average length upper bounded by (2).

Among the class of all the prefix-free codes that one can obtain by appending the character ⊔ to the end of (some) codewords of an optimal one-to-one code for S, we aim to construct the one with the minimum average length. Therefore, we have to ensure that, in the k-ary tree representation of the code, the following basic condition holds: For any pair of nodes

v_{i}

and

v_{j}

,

i < j

, associated with the symbols

s_{i}

and

s_{j}

, the depth of the node

v_{j}

is not smaller than the depth of the node

v_{i}

. In fact, if it were otherwise, the average length of the code could be improved.

Therefore, by recalling that

h = ⌈ {log}_{k} (n - ⌈ n / k ⌉) ⌉

is the height of the k-ary tree associated with an optimal one-to-one code, we have that the prefix-free code of the minimum average length that one can obtain by appending the special character ⊔ to the end of (some) codewords of an optimal one-to-one code for S assigns a codeword of length

l_{i}

to the i-th symbol

s_{i} \in S

, where

l_{i}

is given by:

l_{i} = \{\begin{matrix} ⌊ {log}_{k} ((k - 1) i + 1) ⌋ + 1, & if i \leq \frac{k^{h - 1} - 1}{k - 1} - 1, \\ ⌊ {log}_{k} ((k - 1) i + 1) ⌋ + 1, & if i \geq \frac{k^{h} + k^{h - 1} - 2}{k - 1} - ⌈ n / k ⌉ \\ and i \leq \frac{k^{h} - 1}{k - 1} - 1, \\ ⌊ {log}_{k} ((k - 1) i + 1) ⌋, & otherwise . \end{matrix}

(4)

We stress that the obtained prefix-free code is not necessarily a prefix-free code

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

of minimum average length. Now, we justify the expression (4). First, since the probabilities

p_{1}, \dots, p_{n}

are ordered in non-increasing fashion, the codeword lengths

l_{i}

of the code are ordered in non-decreasing fashion, that is

l_{1} \leq \dots \leq l_{n}

. Therefore, in the k-ary tree representation of the code, it holds the desired basic condition: For any pair of nodes

v_{i}

and

v_{j}

,

i < j

, associated with the symbols

s_{i}

and

s_{j}

, the depth of the node

v_{i}

is smaller than or equal to the depth of the node

v_{j}

.

Furthermore, we need to append the space character only to the k-ary strings that are the prefix of some others. Therefore, let us consider the first n non-empty k-ary strings according to the radix order [12], in which, we recall, the k-ary strings are ordered by length and, for equal lengths, ordered according to the lexicographic order. We have that the number of strings that are a prefix of some others is exactly

⌈ \frac{n}{k} ⌉ - 1

. One obtains this number by seeing the strings as corresponding to nodes in a k-ary tree with labels

0, \dots, k - 1

on the edges. The number of strings that are a prefix of some others (among the n strings) is exactly equal to the number of internal nodes (except the root) in such a tree. This number of internal nodes is equal to

⌈ \frac{N - 1}{k} ⌉ - 1

, where N is the total number of nodes that, in our case, is equal to

N = n + 1

(i.e., N counts also the root of the tree).

Moreover, starting from the optimal one-to-one code constructed according to our second method, that is by assigning k-ary strings to the symbols by increasing length and, for equal lengths, by inverse order according to the lexicographic order, one can verify that the

⌈ \frac{n}{k} ⌉ - 1

internal nodes are associated with the codewords of the symbols

s_{i}

, for i that goes from 1 to

\frac{k^{h - 1} - 1}{k - 1} - 1

, and from

\frac{k^{h} + k^{h - 1} - 2}{k - 1} - ⌈ n / k ⌉

to

\frac{k^{h}}{k - 1} - 2

.

In fact, since the height of the k-ary tree is

h = ⌈ {log}_{k} (n - ⌈ n / k ⌉) ⌉

and since all the levels of the tree, except the last two, are full, we need to append the space to all symbols from 1 to

\frac{k^{h - 1} - 1}{k - 1} - 1

. While on the second-to-last level, we have to append the space only to the remaining internal nodes associated with the symbols

s_{i}

, where i goes from

\frac{k^{h} + k^{h - 1} - 2}{k - 1} - ⌈ n / k ⌉

to

\frac{k^{h} - 1}{k - 1} - 1

. Those remaining nodes are exactly, among all the nodes in the second-to-last level, the ones associated with the symbols that have smaller probabilities. Thus, we obtain (4).

Summarizing, we can construct a prefix-free code

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

, in

O (n)

time, with lengths defined as in (4), starting from an optimal one-to-one code. Thus:

\begin{matrix} L (C) & = \sum_{i = 1}^{n} p_{i} l_{i} \\ = \sum_{i = 1}^{n} p_{i} ⌊ {log}_{k} ((k - 1) i + 1) ⌋ + \sum_{i = 1}^{\frac{k^{h - 1} - 1}{k - 1} - 1} p_{i} + \sum_{i = \frac{k^{h} + k^{h - 1} - 2}{k - 1} - ⌈ n / k ⌉}^{\frac{k^{h} - 1}{k - 1} - 1} p_{i} \\ = L_{+} + \sum_{i = 1}^{\frac{k^{h - 1} - 1}{k - 1} - 1} p_{i} + \sum_{i = \frac{k^{h} + k^{h - 1} - 2}{k - 1} - ⌈ n / k ⌉}^{\frac{k^{h} - 1}{k - 1} - 1} p_{i} \\ \leq L_{+} + \sum_{i = 1}^{⌈ n / k ⌉ - 1} p_{i} (since we are adding ⌈ n / k ⌉ - 1 p_{i}' s, and the p_{i}' s are ordered) . \end{matrix}

□

Note that, from Lemma 1, we obtain that the average length of any optimal (i.e., of minimum average length) prefix-free code ending with a space is upper bounded by the Formula (2). Furthermore, we have an upper bound on the average length of the optimal prefix-free codes ending with a space in terms of the average length of optimal one-to-one codes.

We can also derive a lower bound on the average length of optimal prefix-free codes ending with a space in terms of the average length of optimal one-to-one codes. For such a purpose, we need two intermediate results. We first recall that, given a k-ary code C, its codewords can be represented as nodes in a k-ary tree with labels

0, \dots, k - 1

on the edges. Indeed, for each node v, the k-ary string (codeword) associated with v can be obtained by concatenating all the labels in the path from the root of the tree to v. We also recall that, in prefix-free codes, the codewords correspond to the node leaves of the associated tree, while in one-to-one codes, the codewords may correspond also to the internal nodes of the associated tree.

Lemma 2.

Let

S = {s_{1}, \dots, s_{n}}

be the set of source symbols, and let

p = (p_{1}, \dots, p_{n})

,

p_{1} \geq \dots \geq p_{n} > 0

, be a probability distribution on S. There exists an optimal prefix-free code ending with a space

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

such that the following property holds: For any internal node v (except the root) of the tree representation of C, if we denote by w the k-ary string associated with the node v, then the string

w ⊔

belongs to the codeword set of C.

Proof.

Let C be an arbitrary optimal prefix-free code ending with a space. Let us assume that, in the tree representation of C, there exists an internal node v whose associated string w is such that

w ⊔

does not belong to the codeword set of C. Since v is an internal node, there is at least a leaf x, which is a descendant of v, whose associated string is the codeword of some symbol

s_{j}

. We modify the encoding, by assigning the codeword

w ⊔

to the symbol

s_{j}

. The new encoding is still prefix-free, and its average length can only decrease since the length of the newly assigned codeword to

s_{j}

cannot be greater than the previous one. We can repeat the argument for all internal nodes that do not satisfy the property stated in the lemma to complete the proof. □

Lemma 3.

Let

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

be an arbitrary prefix-free code, then the code

D : S ⟼ {0, \dots, k - 1}^{+}

one obtains from C by removing the space ⊔ from each codeword of C is a one-to-one code.

Proof.

The proof is straightforward. Since C is prefix-free, it holds that, for any pair

s_{i}, s_{j} \in S

, with

s_{i} \neq s_{j}

, the codeword

C (s_{i})

is not a prefix of

C (s_{j})

and vice versa. Therefore, since D is obtained from C by removing the space, we have four cases:

$C (s_{i}) = D (s_{i})$ and $C (s_{j}) = D (s_{j})$ : then $D (s_{i}) \neq D (s_{j})$ since $C (s_{i}) \neq C (s_{j})$ ;
$C (s_{i}) = D (s_{i}) ⊔$ and $C (s_{j}) = D (s_{j}) ⊔$ : then $D (s_{i}) \neq D (s_{j})$ since $C (s_{i})$ is not a prefix of $C (s_{j})$ and vice versa;
$C (s_{i}) = D (s_{i}) ⊔$ and $C (s_{j}) = D (s_{j})$ : then $D (s_{i}) \neq D (s_{j})$ since $C (s_{j})$ is not a prefix of $C (s_{i})$ ;
$C (s_{i}) = D (s_{i})$ and $C (s_{j}) = D (s_{j}) ⊔$ : then $D (s_{i}) \neq D (s_{j})$ since $C (s_{i})$ is not a prefix of $C (s_{j})$ .

Therefore, for any pair

s_{i}, s_{j} \in S

, with

s_{i} \neq s_{j}

,

D (s_{i}) \neq D (s_{j})

, and D is a one-to-one code. □

We can now derive a lower bound on the average length of optimal prefix-free codes with space in terms of the average length of optimal one-to-one codes.

Lemma 4.

Let

S = {s_{1}, \dots, s_{n}}

be the set of source symbols, and let

p = (p_{1}, \dots, p_{n})

,

p_{1} \geq \dots \geq p_{n} > 0

, be a probability distribution on S, then the average of an optimal prefix-free code

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

satisfies

L (C) \geq L_{+} + \sum_{i = 1}^{⌈ n / k ⌉ - 1} p_{n - i + 1},

(5)

where

L_{+}

is the average length of an optimal k-ary one-to-one code on S.

Proof.

From Lemma 2, we know that there exists an optimal prefix-free code C with a space in which exactly

⌈ \frac{n}{k} ⌉ - 1

codewords contain the space character at the end. Let

A \subset {1, \dots, n}

be the set of indices associated with the symbols whose codeword contains the space. Moreover, from Lemma 3, we know that the code D obtained by removing the space from C is a one-to-one code. Putting it all together, we obtain that

\begin{matrix} L (D) = L (C) - \sum_{i \in A} p_{i} . \end{matrix}

(6)

From (6), we have that

\begin{matrix} L (C) & = L (D) + \sum_{i \in A} p_{i} \\ \geq L_{+} + \sum_{i \in A} p_{i} (since D is a one - to - one code) \\ \geq L_{+} + \sum_{i = 1}^{⌈ n / k ⌉ - 1} p_{n - i + 1} (since A contains ⌈ \frac{n}{k} ⌉ - 1 elements) . \end{matrix}

□

We notice that the difference between the expression (2) and the lower bound (5) is, because of (3), less than

\sum_{i = 1}^{⌈ n / k ⌉ - 1} p_{i} - \sum_{i = 1}^{⌈ n / k ⌉ - 1} p_{n - i + 1} < 1;

(7)

therefore, the prefix-free codes ending with a space that we construct in Lemma 1 have an average length that differs from the minimum possible by at most one. Moreover, since both the upper bound (3) and the lower bound (5) are easily computable, we can determine the average length of an optimal prefix-free code

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

with a tolerance at most of one. One can also see that the left-hand side of (7) is, often, much smaller than one.

In the following sections, we will focus on providing upper and lower bounds on the average length

L_{+}

of k-ary optimal one-to-one codes in terms of the k-ary Shannon entropy

H_{k} (p) = - \sum_{i = 1}^{n} p_{i} {log}_{k} p_{i}

of the source distribution

p

. Because of Lemmas 1 and 4, this will give us the corresponding upper and lower bounds on the average length of optimal prefix-free codes ending with a space.

3. Lower Bounds on the Average Length

In this section, we provide lower bounds on the average length of the optimal one-to-one code and, subsequently, thanks to Lemma 4, on the average length of the optimal prefix-free code with a space. For technical reasons, it will be convenient to consider one-to-one codes that make use of the empty word

ϵ

, that is one-to-one mappings

D_{ϵ} : S ⟼ {0, 1, \dots, k - 1}^{+} \cup {ϵ}

. It is easy to see (cf. (1)) that the optimal one-to-one code that makes use of the empty word assigns to the i-th symbol

s_{i} \in S

a codeword of length

l_{i}

given by:

l_{i} = ⌊ {log}_{k} ((k - 1) i) ⌋ .

(8)

where k is the cardinality of the code alphabet.

Thus, by denoting by

L_{+}

the average length of the optimal one-to-one code that does not make use of the empty word and with

L_{ϵ}

the average length of the optimal one-to-one code that does use it, we obtain the following relation:

L_{+} = L_{ϵ} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} .

(9)

Our first result is a generalization of the lower bound to the average length of the optimal one-to-one codes presented in [5], from the binary case to the general case of k-ary alphabets,

k \geq 2

. Our proof technique differs from that of [5] since we are dealing with a set of source symbols of bounded cardinality (in [5], the authors considered the case of a numerable set of source symbols).

Lemma 5.

Let

S = {s_{1}, \dots, s_{n}}

be the set of source symbols and

p = (p_{1}, \dots, p_{n})

be a probability distribution on S, with

p_{1} \geq \dots \geq p_{n}

. The average length

L_{ϵ}

of the optimal one-to-one code

D : {s_{1}, \dots, s_{n}} \to {0, \dots, k - 1}^{+} \cup {ϵ}

satisfies

\begin{matrix} L_{ϵ} > & H_{k} (p) - (H_{k} (p) + {log}_{k} (k - 1)) {log}_{k} (1 + \frac{1}{H_{k} (p) + {log}_{k} (k - 1)}) \\ - {log}_{k} (H_{k} (p) + {log}_{k} (k - 1) + 1), \end{matrix}

where

H_{k} (p) = - \sum_{i = 1}^{n} p_{i} {log}_{k} p_{i}

.

Proof.

The proof is an adaptation of Alon et al.’s proof [4] from the binary case to the

k \geq 2

-ary case.

We recall that the optimal one-to-one code (i.e., whose average length achieves the minimum

L_{ϵ}

) has codeword lengths

l_{i}

given by:

l_{i} = ⌊ {log}_{k} ((k - 1) i) ⌋ .

(10)

For each

j \in {0, \dots, ⌊ {log}_{k} n ⌋}

, let us define the quantities

q_{j}

as

q_{j} = \sum_{i = \frac{k^{j} - 1}{k - 1} + 1}^{\frac{k^{j + 1} - 1}{k - 1}} p_{i} .

It holds that

\sum_{j = 0}^{⌊ {log}_{k} n ⌋} q_{j} = 1 .

Let Y be a random variable that takes values in

{0, \dots, ⌊ {log}_{k} n ⌋}

according to the probability distribution

q = (q_{0}, \dots, q_{⌊ {log}_{k} n ⌋})

, that is

\forall j \in {0, \dots, ⌊ {log}_{k} n ⌋} Pr {Y = j} = q_{j} .

From (10), we have

\begin{matrix} L_{ϵ} = & \sum_{i = 1}^{n} ⌊ {log}_{k} ((k - 1) i) ⌋ p_{i} \\ = & \sum_{j = 0}^{⌊ {log}_{k} n ⌋} \sum_{i = \frac{k^{j} - 1}{k - 1} + 1}^{\frac{k^{j + 1} - 1}{k - 1}} ⌊ {log}_{k} ((k - 1) i) ⌋ p_{i} \\ = & \sum_{j = 0}^{⌊ {log}_{k} n ⌋} j q_{j} = E [Y] . \end{matrix}

(11)

By applying the entropy grouping rule ([3], Ex. 2.27) to the distribution

p

, we obtain

\begin{matrix} H_{2} (p) & = H_{2} (q) + \sum_{j = 0}^{⌊ {log}_{k} n ⌋} q_{j} H_{2} (\frac{p_{\frac{k^{j} - 1}{k - 1} + 1}}{q_{j}}, \dots, \frac{p_{\frac{k^{j + 1} - 1}{k - 1}}}{q_{j}}) \\ \leq H_{2} (q) + \sum_{j = 0}^{⌊ {log}_{k} n ⌋} q_{j} {log}_{2} k^{j} (since H_{2} (\frac{p_{\frac{k^{j} - 1}{k - 1} + 1}}{q_{j}}, \dots, \frac{p_{\frac{k^{j + 1} - 1}{k - 1}}}{q_{j}}) \leq {log}_{2} k^{j}) \\ = H_{2} (q) + \sum_{j = 0}^{⌊ {log}_{k} n ⌋} j q_{j} {log}_{2} k \\ = H_{2} (q) + E [Y] {log}_{2} k . \end{matrix}

(12)

We now derive an upper bound to

H_{2} (Y) = H_{2} (q)

in terms of the expected value

E [Y]

.

To this end, let us consider an auxiliary random variable

Y^{'}

with the same distribution of Y, but with values ranging from 1 to

⌊ {log}_{k} (n) ⌋ + 1

(instead of from 0 to

⌊ {log}_{k} (n) ⌋

). It is easy to verify that

μ = E [Y^{'}] = E [Y] + 1

.

Let

α

be a positive number, whose value will be chosen later. We obtain that

\begin{matrix} H_{k} (Y) - α μ & = \sum_{i = 1}^{⌊ {log}_{k} (n) ⌋ + 1} q_{i - 1} {log}_{k} \frac{1}{q_{i - 1}} - α \sum_{j = 1}^{⌊ {log}_{k} (n) ⌋ + 1} j q_{j - 1} \\ = \sum_{i = 1}^{⌊ {log}_{k} (n) ⌋ + 1} q_{i - 1} {log}_{k} \frac{1}{q_{i - 1}} + \sum_{j = 1}^{⌊ {log}_{k} (n) ⌋ + 1} (- α j) q_{j - 1} \\ = \sum_{i = 1}^{⌊ {log}_{k} (n) ⌋ + 1} q_{i - 1} {log}_{k} \frac{1}{q_{i - 1}} + \sum_{j = 1}^{⌊ {log}_{k} (n) ⌋ + 1} q_{j - 1} {log}_{k} (k^{- α j}) \\ = \sum_{i = 1}^{⌊ {log}_{k} (n) ⌋ + 1} q_{i - 1} {log}_{k} \frac{k^{- α i}}{q_{i - 1}} \\ \leq {log}_{k} \sum_{i = 1}^{⌊ {log}_{k} (n) ⌋ + 1} k^{- α i} (by Jensen ’ s inequality) \\ = {log}_{k} [(\frac{1}{k^{α}}) (\frac{1 - k^{- α (⌊ {log}_{k} (n) ⌋ + 1}}{1 - k^{- α}})] \\ \leq {log}_{k} (\frac{1 - k^{- α ({log}_{k} (n) + 1)}}{k^{α} - 1}) \\ = {log}_{k} (\frac{1 - {(k n)}^{- α}}{k^{α} - 1}) . \end{matrix}

By substituting

{log}_{k} \frac{μ}{μ - 1}

with

α

in the obtained inequality

H_{k} (Y) \leq α μ + {log}_{k} (\frac{1 - {(k n)}^{- α}}{k^{α} - 1}),

we obtain

\begin{matrix} H_{k} (Y) \leq μ {log}_{k} \frac{μ}{μ - 1} + {log}_{k} (μ - 1) + {log}_{k} (1 - {(\frac{1}{k n})}^{{log}_{k} \frac{μ}{μ - 1}}) . \end{matrix}

(13)

Since

{(\frac{1}{k n})}^{{log}_{k} \frac{μ}{μ - 1}}

is decreasing in

μ

, and because

μ = E [Y] + 1 > 1

, we obtain:

\begin{matrix} H_{k} (Y) < E [Y] {log}_{k} (1 + \frac{1}{E [Y]}) + {log}_{k} (E [Y] + 1) . \end{matrix}

(14)

By applying (14) to (12) and since

H_{k} (Y) = \frac{H_{2} (Y)}{{log}_{2} k}

, we obtain

\begin{matrix} H_{2} (p) < & E [Y] {log}_{2} k + E [Y] {log}_{2} (1 + \frac{1}{E [Y]}) + {log}_{2} (E [Y] + 1) . \end{matrix}

(15)

From (11), we have that

L_{ϵ} = E [Y]

; moreover, from the inequality (28) of Lemma 7 (proven in the next Section 4), we know that

L_{ϵ} \leq H_{k} (p) + {log}_{k} (k - 1) .

(16)

Hence, since the function

f (z) = z {log}_{k} (1 + \frac{1}{z})

is increasing in z, we can apply (16) to upper-bound the term

E [Y] {log}_{2} (1 + \frac{1}{E [Y]}),

to obtain the following inequality:

\begin{matrix} H_{2} (p) < & L_{ϵ} {log}_{2} k + (H_{k} (p) + {log}_{k} (k - 1)) {log}_{2} (1 + \frac{1}{H_{k} (p) + {log}_{k} (k - 1)}) \\ + {log}_{2} (H_{k} (p) + {log}_{k} (k - 1) + 1) . \end{matrix}

(17)

Rewriting (17), we finally obtain

\begin{matrix} L_{ϵ} > & H_{k} (p) - (H_{k} (p) + {log}_{k} (k - 1)) {log}_{k} (1 + \frac{1}{H_{k} (p) + {log}_{k} (k - 1)}) \\ - {log}_{k} (H_{k} (p) + {log}_{k} (k - 1) + 1), \end{matrix}

and that concludes our proof. □

By bringing into play the size of the largest mass in addition to the entropy, Lemma 5 can be improved, as shown in the following result.

Lemma 6.

Let

S = {s_{1}, \dots, s_{n}}

be the set of source symbols and

p = (p_{1}, \dots, p_{n})

,

p_{1} \geq \dots \geq p_{n}

, be a probability distribution on S. The average length

L_{ϵ}

of the optimal one-to-one code

D : {s_{1}, \dots, s_{n}} \to {0, \dots, k - 1}^{+} \cup {ϵ}

has the following lower bounds:

1.: If $0 < p_{1} \leq 0.5$ ,

$\begin{matrix} L_{ϵ} \geq & H_{k} (p) - (H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1)) \\ {log}_{k} (1 + \frac{1}{H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1)}) \\ - {log}_{k} (H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1) + 1) \\ - {log}_{k} (1 - {(\frac{1}{k n})}^{{log}_{k} (1 + \frac{1}{1 - p_{1}})}), \end{matrix}$

(18)
2.: if $0.5 < p_{1} \leq 1$

$\begin{matrix} L_{ϵ} \geq & H_{k} (p) - (H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) (1 + {log}_{k} (k - 1))) \\ {log}_{k} (1 + \frac{1}{H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) (1 + {log}_{k} (k - 1))}) \\ - {log}_{k} (H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) (1 + {log}_{k} (k - 1)) + 1) \\ - {log}_{k} (1 - {(\frac{1}{k n})}^{{log}_{k} (1 + \frac{1}{1 - p_{1}})}), \end{matrix}$

(19)

where $H_{k} (p_{1}) = - p_{1} {log}_{k} p_{1} - (1 - p_{1}) {log}_{k} (1 - p_{1})$ .

Proof.

The proof is the same as the proof of Lemma 5. However, we change two steps in the demonstration.

First, since

{(\frac{1}{k n})}^{{log}_{k} \frac{μ}{μ - 1}} = {(\frac{1}{k n})}^{{log}_{k} \frac{E [Y] + 1}{E [Y]}} = {(\frac{1}{k n})}^{{log}_{k} (1 + \frac{1}{E [Y]})}

is decreasing in

μ

and

E [Y] = L_{ϵ} = 0 p_{1} + 1 p_{2} + \dots \geq 1 - p_{1}

, we have

{log}_{k} (1 - {(\frac{1}{k n})}^{{log}_{k} \frac{μ}{μ - 1}}) \leq {log}_{k} (1 - {(\frac{1}{k n})}^{{log}_{k} (1 + \frac{1}{1 - p_{1}})}) .

(20)

Hence, by applying (20) to the right-hand side of (13), we obtain

H_{k} (Y) \leq E [Y] {log}_{k} (1 + \frac{1}{E [Y]}) + {log}_{k} (E [Y] + 1) + {log}_{k} (1 - {(\frac{1}{k n})}^{{log}_{k} (1 + \frac{1}{1 - p_{1}})}) .

(21)

Now, by applying (21) (instead of (14)) to (12) and since

H_{k} (Y) = \frac{H_{2} (Y)}{{log}_{2} k}

, we obtain

\begin{matrix} H_{2} (p) \leq & E [Y] {log}_{2} k + E [Y] {log}_{2} (1 + \frac{1}{E [Y]}) + {log}_{2} (E [Y] + 1) \\ + {log}_{2} (1 - {(\frac{1}{k n})}^{{log}_{k} (1 + \frac{1}{1 - p_{1}})}) . \end{matrix}

(22)

Here, instead of applying the upper bound:

L_{ϵ} \leq H_{k} (p) + {log}_{k} (k - 1)

of Lemma 7 to the right-hand side of (22), we apply the improved version:

L_{ϵ} \leq \{\begin{matrix} H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1) if 0 < p_{1} \leq 0.5, \\ H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) {log}_{k} 2 (k - 1) if 0.5 < p_{1} \leq 1, \end{matrix}

proven in Lemma 8 of the Section 4. Then, we simply need to rewrite the inequality, concluding the proof. □

Thanks to Lemma 4 and the Formula (9), the above lower bounds on

L_{ϵ}

can be applied to derive our main results for prefix-free codes with a space, as shown in the following theorems.

Theorem 1.

The average length of an optimal prefix-free code with space

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

satisfies

\begin{matrix} L (C) > & H_{k} (p) - (H_{k} (p) + {log}_{k} (k - 1)) {log}_{k} (1 + \frac{1}{H_{k} (p) + {log}_{k} (k - 1)}) \\ - {log}_{k} (H_{k} (p) + {log}_{k} (k - 1) + 1) + \sum_{i = 1}^{⌈ \frac{n}{k} ⌉ - 1} p_{n - i + 1} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} . \end{matrix}

(23)

Proof.

From Lemma 4 and the Formula (9), we have

L (C) \geq L_{ϵ} + \sum_{i = 1}^{⌈ \frac{n}{k} ⌉ - 1} p_{n - i + 1} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} .

(24)

By applying the lower bound (10) of Lemma 5 to (24), we obtain (23). □

Analogously, by exploiting (the possible) knowledge of the maximum source symbol probability value, we have the following result.

Theorem 2.

The average length of the optimal prefix-free code with space

C : S ⟼ {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

has the following lower bounds:

1.: If $0 < p_{1} \leq 0.5$ :

$\begin{matrix} L (C) \geq & H_{k} (p) - (H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1)) \\ {log}_{k} (1 + \frac{1}{H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1)}) \\ - {log}_{k} (H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1) + 1) \\ - {log}_{k} (1 - {(\frac{1}{k n})}^{{log}_{k} (1 + \frac{1}{1 - p_{1}})}) + \sum_{i = 1}^{⌈ \frac{n}{k} ⌉ - 1} p_{n - i + 1} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} . \end{matrix}$

(25)
2.: If $0.5 < p_{1} \leq 1$ :

$\begin{matrix} L (C) \geq & H_{k} (p) - (H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) (1 + {log}_{k} (k - 1))) \\ {log}_{k} (1 + \frac{1}{H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) (1 + {log}_{k} (k - 1))}) \\ - {log}_{k} (H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) (1 + {log}_{k} (k - 1)) + 1) \\ - {log}_{k} (1 - {(\frac{1}{k n})}^{{log}_{k} (1 + \frac{1}{1 - p_{1}})}) + \sum_{i = 1}^{⌈ \frac{n}{k} ⌉ - 1} p_{n - i + 1} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} . \end{matrix}$

(26)

Proof.

From Lemma 4 and the Formula (9), we have

L (C) \geq L_{ϵ} + \sum_{i = 1}^{⌈ \frac{n}{k} ⌉ - 1} p_{n - i + 1} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} .

(27)

By applying the lower bounds (18) and (19) of Lemma 6 to (27), we obtain (25) or (26) according to the value of the maximum source symbol probability. □

4. Upper Bounds on the Average Length

In this section, we will first derive upper bounds on the average length of optimal one-to-one codes. Successively, we will provide corresponding upper bounds on the average length of optimal prefix-free codes ending with a space.

We start by extending the result obtained in [13] from the binary case to the k-ary case,

k \geq 2

.

Lemma 7.

Let

S = {s_{1}, \dots, s_{n}}

be the set of source symbols and

p = (p_{1}, \dots, p_{n})

,

p_{1} \geq \dots \geq p_{n}

, be a probability distribution on S. The average length

L_{ϵ}

of the optimal one-to-one code

D : {s_{1}, \dots, s_{n}} \to {0, \dots, k - 1}^{+} \cup {ϵ}

satisfies

L_{ϵ} \leq H_{k} (p) + {log}_{k} (k - 1) .

(28)

Proof.

Under the standing hypothesis that

p_{1} \geq \dots \geq p_{n}

, it holds that

\forall i = 1, \dots, n p_{i} \leq \frac{1}{i} .

(29)

We recall that the length of the i-th codeword of the optimal one-to-one code D is equal to

l_{i} = ⌊ {log}_{k} ((k - 1) i) ⌋ .

(30)

Therefore, from (29), we can upper bound each length

l_{i}

as

l_{i} = ⌊ {log}_{k} ((k - 1) i) ⌋ \leq {log}_{k} ((k - 1) i) \leq {log}_{k} (k - 1) + {log}_{k} \frac{1}{p_{i}} .

(31)

Hence, by applying (31) to the average length of D, we obtain

L_{ϵ} = \sum_{i = 1}^{n} p_{i} l_{i} \leq \sum_{i = 1}^{n} p_{i} ({log}_{k} (k - 1) + {log}_{k} \frac{1}{p_{i}}) = H_{k} (p) + {log}_{k} (k - 1) .

(32)

This concludes our proof. □

By exploiting the knowledge of the maximum probability value of

p

, we generalize the upper bound in [5] from

k = 2

to arbitrary

k \geq 2

.

Lemma 8.

Let

S = {s_{1}, \dots, s_{n}}

be the set of source symbols and

p = (p_{1}, \dots, p_{n})

,

p_{1} \geq \dots \geq p_{n}

, be a probability distribution on S. The average length

L_{ϵ}

of the optimal one-to-one code

D : {s_{1}, \dots, s_{n}} \to {0, \dots, k - 1}^{+} \cup {ϵ}

satisfies

L_{ϵ} \leq \{\begin{matrix} H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1) i f 0 < p_{1} \leq 0.5, \\ H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) {log}_{k} 2 (k - 1) i f 0.5 < p_{1} \leq 1 . \end{matrix}

(33)

Proof.

Let us prove first that the length of an optimal one-to-one code satisfies the inequality:

\begin{matrix} L_{ϵ} \leq & \sum_{i = 2}^{n} p_{i} {log}_{k} (i (k - 1)) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} . \end{matrix}

(34)

Indeed, by recalling that

l_{1} = ⌊ {log}_{k} (k - 1) ⌋ = 0

, we can write

L_{ϵ}

as follows:

\begin{matrix} L_{ϵ} = & \sum_{i = 2}^{n} p_{i} ⌊ {log}_{k} (i (k - 1)) ⌋ \\ = & \sum_{j \geq 1 : \frac{k^{j} - 1}{k - 1} + 1 \leq n} \sum_{i = \frac{k^{j} - 1}{k - 1} + 1}^{min (\frac{k^{j + 1} - 1}{k - 1} - 1, n)} p_{i} ⌊ {log}_{k} (i (k - 1)) ⌋ + \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} ⌊{log}_{k} (\frac{k^{j} - 1}{k - 1} (k - 1))⌋ \\ = & \sum_{j \geq 1 : \frac{k^{j} - 1}{k - 1} + 1 \leq n} \sum_{i = \frac{k^{j} - 1}{k - 1} + 1}^{min (\frac{k^{j + 1} - 1}{k - 1} - 1, n)} p_{i} ⌊ {log}_{k} (i (k - 1)) ⌋ + \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} {log}_{k} (k^{j} - 1) \\ - \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} ({log}_{k} (k^{j} - 1) - ⌊ {log}_{k} (k^{j} - 1) ⌋) \\ \leq & \sum_{i = 2}^{n} p_{i} {log}_{k} (i (k - 1)) - \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} ({log}_{k} (k^{j} - 1) - ⌊ {log}_{k} (k^{j} - 1) ⌋), \end{matrix}

where the last inequality holds since

\sum_{j \geq 1 : \frac{k^{j} - 1}{k - 1} + 1 \leq n} \sum_{i = \frac{k^{j} - 1}{k - 1} + 1}^{min (\frac{k^{j + 1} - 1}{k - 1} - 1, n)} p_{i} ⌊ {log}_{k} (i (k - 1)) ⌋ \leq \sum_{j \geq 1 : \frac{k^{j} - 1}{k - 1} + 1 \leq n} \sum_{i = \frac{k^{j} - 1}{k - 1} + 1}^{min (\frac{k^{j + 1} - 1}{k - 1} - 1, n)} p_{i} {log}_{k} (i (k - 1)) .

We note that the function

f (j) = {log}_{k} (k^{j} - 1) - ⌊ {log}_{k} (k^{j} - 1) ⌋

is increasing in j. Therefore, it reaches the minimum at

j = 2

, where it takes the value

{log}_{k} (k^{2} - 1) - ⌊ {log}_{k} (k^{2} - 1) ⌋ = 1 + {log}_{k} (1 - \frac{1}{k^{2}}) > 0.5,

for any

k \geq 2

. Thus, (34) holds as we claimed.

Let us now show that

L_{ϵ} \leq H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} .

(35)

Since the distribution

p

is ordered in a non-increasing fashion, from (29) and (34), we have

\begin{matrix} L_{ϵ} & \leq \sum_{i = 2}^{n} p_{i} {log}_{k} (i (k - 1)) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} \\ \leq \sum_{i = 2}^{n} p_{i} {log}_{k} \frac{1}{p_{i}} (k - 1) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} (since i \leq \frac{1}{p_{i}}) \\ = H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} . \end{matrix}

Therefore, (35) holds.

To conclude the proof, it remains to prove that

L_{ϵ} \leq H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) ({log}_{k} 2 (k - 1)) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} .

(36)

By observing that for any

i \geq 2

, it holds that

p_{i} \leq \frac{2 (1 - p_{1})}{i},

(37)

we obtain:

\begin{matrix} L_{ϵ} & \leq \sum_{i = 2}^{n} p_{i} {log}_{k} (i (k - 1)) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} \\ \leq \sum_{i = 2}^{n} p_{i} {log}_{k} (\frac{2 (1 - p_{1})}{p_{i}} (k - 1)) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} (since from (37), we have i \leq \frac{2 (1 - p_{1})}{p_{i}}) \\ = H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + ({log}_{k} 2 + {log}_{k} (1 - p_{1}) + {log}_{k} (k - 1)) (1 - p_{1}) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} \\ = H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) ({log}_{k} 2 (k - 1)) - 0.5 \sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} . \end{matrix}

Therefore, (36) holds as well.

From (35) and (36), since

\sum_{j \geq 2 : \frac{k^{j} - 1}{k - 1} \leq n} p_{\frac{k^{j} - 1}{k - 1}} \geq 0,

we obtain

L_{ϵ} \leq H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1),

and

L_{ϵ} \leq H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) ({log}_{k} 2 (k - 1)) .

Now, it is easy to verify that

p_{1} {log}_{k} \frac{1}{p_{1}} \geq H_{k} (p_{1}) + (1 - p_{1}) {log}_{k} 2

for

0 < p_{1} \leq 0.5

, proving the Lemma. □

Thanks to the result of Lemma 1 and to the Formula (9), the upper bounds obtained above can be used to derive our upper bounds on the average length of optimal prefix-free codes with space, as shown in the following theorems.

Theorem 3.

The average length of an optimal prefix-free code with space

C : {s_{1}, \dots, s_{n}} \to {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

satisfies

\begin{matrix} L (C) \leq & H_{k} (p) + {log}_{k} (k - 1) + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} \end{matrix}

\begin{matrix} + \sum_{i = 1}^{\frac{k^{h - 1} - 1}{k - 1} - 1} p_{i} + \sum_{i = \frac{k^{h} + k^{h - 1} - 2}{k - 1} - ⌈ n / k ⌉}^{\frac{k^{h} - 1}{k - 1} - 1} p_{i} \end{matrix}

(38)

\begin{matrix} \leq & H_{k} (p) + {log}_{k} (k - 1) + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} + \sum_{i = 1}^{⌈ \frac{n}{k} ⌉ - 1} p_{i}, \end{matrix}

(39)

where

h = ⌈ {log}_{k} (n - ⌈ n / k ⌉) ⌉

.

Proof.

From Lemma 1 and the Formula (9), we have

\begin{matrix} L (C) \leq & L_{ϵ} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} \\ + \sum_{i = 1}^{\frac{k^{h - 1} - 1}{k - 1} - 1} p_{i} + \sum_{i = \frac{k^{h} + k^{h - 1} - 2}{k - 1} - ⌈ n / k ⌉}^{\frac{k^{h} - 1}{k - 1} - 1} p_{i} . \end{matrix}

(40)

By applying the upper bound (28) on

L_{ϵ}

of Lemma 7 to (40), we obtain (38). □

Theorem 4.

The average length of an optimal prefix-free code with space

C : {s_{1}, \dots, s_{n}} \to {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

satisfies

\begin{matrix} L (C) \leq \{\begin{matrix} H_{k} (p) - p_{1} {log}_{k} \frac{1}{p_{1}} + (1 - p_{1}) {log}_{k} (k - 1) \\ + \sum_{i = 1}^{⌈ \frac{n}{k} ⌉ - 1} p_{i} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} i f 0 < p_{1} \leq 0.5, \\ H_{k} (p) - H_{k} (p_{1}) + (1 - p_{1}) {log}_{k} 2 (k - 1) \\ + \sum_{i = 1}^{⌈ \frac{n}{k} ⌉ - 1} p_{i} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} i f 0.5 < p_{1} \leq 1 . \end{matrix} \end{matrix}

(41)

Proof.

From Lemma 1 and the Formula (9) and by recalling that

h = ⌈ {log}_{k} (n - ⌈ n / k ⌉) ⌉

, we have

\begin{matrix} L (C) \leq & L_{ϵ} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} \\ + \sum_{i = 1}^{\frac{k^{h - 1} - 1}{k - 1} - 1} p_{i} + \sum_{i = \frac{k^{h} + k^{h - 1} - 2}{k - 1} - ⌈ n / k ⌉}^{\frac{k^{h} - 1}{k - 1} - 1} p_{i} \\ \leq & L_{ϵ} + \sum_{i = 1}^{⌈ \frac{n}{k} ⌉ - 1} p_{i} + \sum_{i = 1}^{⌊ {log}_{k} ⌈ \frac{n - 1}{k} ⌉ ⌋} p_{\frac{k^{i} - 1}{k - 1}} . \end{matrix}

(42)

We apply the upper bound (33) on

L_{ϵ}

of Lemma 8 to (42). That gives us (41). □

Remark 1.

One can estimate how much the average length of optimal prefix-free codes ending with a space differs from the minimum average length of unrestricted optimal prefix-free codes on the alphabet

{0, 1, \dots, k - 1, ⊔}

, that is optimal prefix-free codes in which the special symbol ⊔ is not constrained to appear at the end of the codewords, only.

Let

S = {s_{1}, \dots, s_{n}}

be the set of source symbols and

p = (p_{1}, \dots, p_{n})

be a probability distribution on S. Let us denote by

C_{⊔} : {s_{1}, \dots, s_{n}} \to {0, \dots, k - 1}^{+} \cup {0, \dots, k - 1}^{+} ⊔

an optimal prefix-free code ending with a space for S and by

C : {s_{1}, \dots, s_{n}} \to {0, \dots, k - 1, ⊔}^{+}

an optimal prefix-free code without the restriction of where the space can occur. Clearly,

L (C_{⊔}) < H_{k} (p) + 1

, since the more constrained optimal code

C^{'} : {s_{1}, \dots, s_{n}} \to {0, \dots, k - 1}^{+}

has an average length less than

H_{k} (p) + 1

. Therefore,

\begin{matrix} L (C_{⊔}) - L (C) < & H_{k} (p) + 1 - L (C) \\ \leq & H_{k} (p) + 1 - H_{k + 1} (p) (s i n c e L (C) \geq H_{k + 1} (p)) \\ = & H_{k} (p) (1 - \frac{1}{{log}_{k} (k + 1)}) + 1 . \end{matrix}

Since

{lim}_{k \to \infty} {log}_{k} (k + 1) = 1

, we have that, as the cardinality of the code alphabet increases, the constraint that the space can appear only at the end of codewords becomes less and less influential.

5. Concluding Remarks

In this paper, we have introduced the class of prefix-free codes where a specific symbol (referred to as a “space") can only appear at the end of codewords. We have proposed a linear-time algorithm to construct “almost"-optimal codes with this characteristic, and we have shown that their average length is at most one unit longer than the minimum average length of any prefix-free code in which the space can appear only at the end of codewords. We have proven this result by highlighting a connection between our type of codes and the well-known class of one-to-one codes. We have also provided upper and lower limits of the average length of optimal prefix-free codes ending with a space, expressed in terms of the source entropy and the cardinality of the code alphabet.

We leave open the problem of providing an efficient algorithm to construct optimal prefix-free codes ending with a space. It seems that there is no easy way to modify the classical Huffman greedy algorithm to solve our problem. It is possible that the more powerful dynamic programming approach could be useful to provide an optimal solution to the problem, as done in [14] for optimal binary codes ending with ones. This will be the subject of future investigations.

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the referees and the Guest Editor I. Sason for their corrections and useful suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Daniels, P.T.; Bright, W. (Eds.) The World’s Writing Systems; Oxford University Press: Oxford, UK, 1996. [Google Scholar]
Wikipedia: Writing Systems without Word Boundaries. Available online: https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries (accessed on 25 March 2024).
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
Alon, N.; Orlitsky, A. A lower bound on the expected length of one-to-one codes. IEEE Trans. Inf. Theory 1994, 40, 1670–1672. [Google Scholar] [CrossRef]
Blundo, C.; De Prisco, R. New bounds on the expected length of one-to-one codes. IEEE Trans. Inf. Theory 1996, 42, 246–250. [Google Scholar] [CrossRef]
Jaynes, E. Note on unique decipherability. Ire Trans. Inf. Theory 1959, 5, 98–102. [Google Scholar] [CrossRef]
Shannon, C.E. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
Courtade, T.; Verdú, S. Cumulant generating function of codeword lengths in optimal lossless compression. IEEE Int. Symp. Inf. Theory 2014, 2494–2498. [Google Scholar]
Kontoyiannis, I.; Verdú, S. Optimal lossless data compression: Non-asymptotics and asymptotics. IEEE Trans. Inf. Theory 2014, 60, 777–795. [Google Scholar] [CrossRef]
Kosut, O.; Sankar, L. Asymptotics and non-asymptotics for universal Fixed-to-Variable source coding. IEEE Trans. Inf. Theory 2017, 63, 3757–3772. [Google Scholar] [CrossRef]
Szpankowski, W.; Verdu, S. Minimum expected length of Fixed-to-Variable lossless compression without prefix constraints. IEEE Trans. Inf. Theory 2011, 57, 4017–4025. [Google Scholar] [CrossRef]
Knuth, D. The Art of Computer Programming, Volume 3, 2nd ed.; Sorting and Searching; Addison Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1998. [Google Scholar]
Wyner, A. An upper bound on the entropy series. Inf. Control 1972, 20, 176–181. [Google Scholar] [CrossRef]
Chan, S.-L.; Golin, M.J. A dynamic programming algorithm for constructing optimal 1-ended binary prefix-free codes. IEEE Trans. Inf. Theory 2000, 46, 1637–1644. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bruno, R.; Vaccaro, U. Entropic Bounds on the Average Length of Codes with a Space. Entropy 2024, 26, 283. https://0-doi-org.brum.beds.ac.uk/10.3390/e26040283

AMA Style

Bruno R, Vaccaro U. Entropic Bounds on the Average Length of Codes with a Space. Entropy. 2024; 26(4):283. https://0-doi-org.brum.beds.ac.uk/10.3390/e26040283

Chicago/Turabian Style

Bruno, Roberto, and Ugo Vaccaro. 2024. "Entropic Bounds on the Average Length of Codes with a Space" Entropy 26, no. 4: 283. https://0-doi-org.brum.beds.ac.uk/10.3390/e26040283

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Entropic Bounds on the Average Length of Codes with a Space

Abstract

1. Introduction

2. Relations between One-to-One Codes and Prefix Codes Ending with a Space

3. Lower Bounds on the Average Length

4. Upper Bounds on the Average Length

5. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI