Next Article in Journal
PIS-Net: Efficient Medical Image Segmentation Network with Multivariate Downsampling for Point-of-Care
Next Article in Special Issue
Side Information Design in Zero-Error Coding for Computing
Previous Article in Journal
Dark Matter and Mirror World
Previous Article in Special Issue
Smoothing of Binary Codes, Uniform Distributions, and Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entropic Bounds on the Average Length of Codes with a Space

Department of Computer Science, University of Salerno, 84084 Fisciano, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Submission received: 13 November 2023 / Revised: 15 March 2024 / Accepted: 23 March 2024 / Published: 26 March 2024
(This article belongs to the Special Issue Extremal and Additive Combinatorial Aspects in Information Theory)

Abstract

:
We consider the problem of constructing prefix-free codes in which a designated symbol, a space, can only appear at the end of codewords. We provide a linear-time algorithm to construct almost-optimal codes with this property, meaning that their average length differs from the minimum possible by at most one. We obtain our results by uncovering a relation between our class of codes and the class of one-to-one codes. Additionally, we derive upper and lower bounds to the average length of optimal prefix-free codes with a space in terms of the source entropy.

1. Introduction

Modern natural languages achieve the unique parsability of written texts by inserting a special character (i.e., a space) between words [1] (See [2] for a few exceptions to this rule). Classical Information Theory, instead, studies codes that achieve the unique parsability of texts by imposing diverse combinatorial properties on the codeword set: e.g., the prefix property, unique decipherability, etc. [3]. With respect to the efficiency of such codes (usually measured via the average number of code symbols per source symbol), it is well known that the Shannon entropy of the information source constitutes a fundamental lower bound for it. On the other hand, if one drops the property of the unique parsability of code messages into individual codewords, and simply requires that different source symbols be encoded with different codewords, one can obtain codes (known as one-to-one codes) with efficiency below the source Shannon entropy (although not too much below; see, e.g., [4,5]).
Jaynes [6] took the approach of directly studying source codes in which a designated character of the code alphabet is exclusively used as a word delimiter. More precisely, Jaynes studied the possible decrease of the noiseless channel capacity (see [7], p. 8) associated with any code that uses a designated symbol as an end-of-codeword mark, as compared with the noiseless channel capacity of an unconstrained code. Quite interestingly, Jaynes proved that the decrease of the noiseless channel capacity of codes with an end-of-codeword mark becomes negligible, as the maximum codeword length increases.
In this paper, we study the problem of constructing prefix-free codes where a specific symbol (referred to as a ‘space’) can only be positioned at the end of codewords. We refer to this kind of prefix code as prefix codes ending with a space. We develop a linear-time algorithm that constructs ‘almost’-optimal codes with this characteristic, in the sense that the average length of the constructed codes is at most one unit longer than the shortest possible average length of any prefix-free code in which the space can appear only at the end of codewords. We prove this result by highlighting a connection between our type of codes and the well-known class of one-to-one codes. We also provide upper and lower limits of the average length of optimal prefix codes ending with a space, expressed in terms of the source entropy and the cardinality of the code alphabet.
The paper is structured as follows. In Section 2, we illustrate the relationships between prefix codes ending with a space and one-to-one codes. Specifically, we prove that, from one-to-one codes, one can easily construct prefix codes ending with a space, and we give an upper bound on the average length of the constructed codes. Successively, we show that, if we remove all the spaces from the codewords of prefix codes ending with a space, one obtains a one-to-one code. This result allows us to prove that the average length of our prefix codes ending with a space differs from the minimum possible by at most one. In Section 3 and Section 4, we derive upper and lower bounds on the average length of optimal prefix codes ending with a space in terms of the source entropy and the cardinality of the code alphabet.

2. Relations between One-to-One Codes and Prefix Codes Ending with a Space

Let S = { s 1 , , s n } be the set of source symbols, p = ( p 1 , , p n ) be a probability distribution on the set S (that is, p i is the probability of source symbol s i ), and { 0 , , k 1 } be the code alphabet. We denote by { 0 , , k 1 } + the set of all non-empty sequences on the code alphabet { 0 , , k 1 } , k 2 , and by { 0 , , k 1 } + the set of all non-empty k-ary sequences that ends with the special symbol ⊔, i.e., the space symbol.
A prefix-free code ending with a space is a one-to-one mapping:
C : S { 0 , , k 1 } + { 0 , , k 1 } +
in which no codeword C ( s ) is a prefix of another codeword C ( s ) , for any s , s S , s s .
A k-ary one-to-one code (see [4,5,8,9,10,11] and the references therein quoted) is a bijective mapping D : S { 0 , , k 1 } + from S to the set of all non-empty sequences over the alphabet { 0 , , k 1 } , k 2 .
The average length of an arbitrary code for the set of source symbols S = { s 1 , , s n } , with probabilities p = ( p 1 , , p n ) , is i = 1 n p i l i , where l i is the number of alphabet symbols in the codeword associated with the source symbol s i .
Without loss of generality, we assume that probability distribution p = ( p 1 , , p n ) is ordered, that is p 1 p n . Under this assumption, it is apparent that the best one-to-one code proceeds by assigning the shortest codeword (e.g., in the binary case, codeword 0) to the highest probability source symbol s 1 , the next shortest codeword 1 to the source symbol s 2 , the codeword 00 to s 3 , the codeword 01 to s 4 , and so on.
An equivalent approach for constructing an optimal one-to-one code, which we will use later, proceeds as follows: Let us consider the first n non-empty k-ary strings according to the radix order [12] (that is, the k-ary strings are ordered by length and, for equal lengths, ordered according to the lexicographic order). We assign the strings to the symbols s 1 , , s n in S by increasing the string length and, for equal lengths, by inverse order according to the lexicographic order. For example, in the binary case, we assign the codeword 1 to the highest probability source symbol s 1 , the codeword 0 to the source symbol s 2 , the codeword 11 to s 3 , the codeword 10 to s 4 , and so on. Therefore, one can see that, in the general case of a k-ary code alphabet, k 2 , an optimal one-to-one code of minimal average length assigns a codeword of length l i to the i-th symbol s i S , where l i is given by:
l i = log k ( ( k 1 ) i + 1 ) .
Moreover, the codewords of an optimal k-ary one-to-one code can be represented as the nodes of a k-ary tree of maximum depth h = log k ( n n / k ) , where, for each node v, the k-ary string (codeword) associated with v is obtained by concatenating all the labels in the path from the root of the tree to v.
It is evident that, if we apply the above encoding to a sequence of source symbols, the obtained binary sequence is not uniquely parsable in terms of individual codewords. Let us see how one can recover unique parsability by appending a space ⊔ to judiciously chosen codewords of an optimal one-to-one code. To gain insight, let us consider the following example. Let S = { s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 8 , s 9 , s 10 } be the set of source symbols, and and let us assume that the code alphabet is { 0 , 1 } . Under the standing hypothesis that p 1 p 10 , one has that the best prefix-free code C one can obtain by the procedure of appending a space ⊔ to codewords of the optimal one-to-one code for S is the following:
C ( s 1 ) = 1 C ( s 2 ) = 0 C ( s 3 ) = 11 C ( s 4 ) = 10 C ( s 5 ) = 01 C ( s 6 ) = 00 C ( s 7 ) = 011 C ( s 8 ) = 010 C ( s 9 ) = 001 C ( s 10 ) = 000 .
Observe that we started from the codewords of the optimal one-to-one code constructed according to the second procedure previously described. Moreover, note that the codewords associated with symbols s 1 , s 2 , s 5 , and s 6 necessitate the space character ⊔ at their end; otherwise, the unique parsability of some encoded sequences of source symbols would not be guaranteed. On the other hand, the codewords associated with symbols s 3 , s 4 , s 7 , s 8 , s 9 , and s 10 do not necessitate the space character ⊔. Indeed, the codeword set
{ 1 , 0 , 11 , 10 , 01 , 00 , 011 , 010 , 001 , 000 }
satisfies the prefix-free condition (i.e., no codeword is a prefix of any other); therefore, it guarantees the unique parsability of any coded message in terms of individual codewords.
The idea of the above example can be generalized, as shown in the following lemma.
Lemma 1.
Let S = { s 1 , , s n } be the set of source symbols and p = ( p 1 , , p n ) , p 1 p n > 0 , be a probability distribution on S. Let { 0 , , k 1 } be the k 2 -ary code alphabet. We can construct a prefix-free code C : S { 0 , , k 1 } + { 0 , , k 1 } + , in O ( n ) , such that its average length L ( C ) satisfies
L ( C ) = L + + i = 1 k h 1 1 k 1 1 p i + i = k h + k h 1 2 k 1 n / k k h 1 k 1 1 p i
L + + i = 1 n / k 1 p i ,
where L + is the average length of an optimal one-to-one code D : S { 0 , , k 1 } + and h = log k ( n n / k ) .
Proof. 
Under the standing hypothesis that the probabilities of the source symbols are ordered from the largest to the smallest, we show how to construct a prefix-free code—by appending the special character ⊔ to the end of (some) codewords of an optimal one-to-one code for S—having the average length upper bounded by (2).
Among the class of all the prefix-free codes that one can obtain by appending the character ⊔ to the end of (some) codewords of an optimal one-to-one code for S, we aim to construct the one with the minimum average length. Therefore, we have to ensure that, in the k-ary tree representation of the code, the following basic condition holds: For any pair of nodes v i and v j , i < j , associated with the symbols s i and s j , the depth of the node v j is not smaller than the depth of the node v i . In fact, if it were otherwise, the average length of the code could be improved.
Therefore, by recalling that h = log k ( n n / k ) is the height of the k-ary tree associated with an optimal one-to-one code, we have that the prefix-free code of the minimum average length that one can obtain by appending the special character ⊔ to the end of (some) codewords of an optimal one-to-one code for S assigns a codeword of length l i to the i-th symbol s i S , where l i is given by:
l i = log k ( ( k 1 ) i + 1 ) + 1 , if   i k h 1 1 k 1 1 , log k ( ( k 1 ) i + 1 ) + 1 , if   i k h + k h 1 2 k 1 n / k and   i k h 1 k 1 1 , log k ( ( k 1 ) i + 1 ) , otherwise .
We stress that the obtained prefix-free code is not necessarily a prefix-free code C : S { 0 , , k 1 } + { 0 , , k 1 } + of minimum average length. Now, we justify the expression (4). First, since the probabilities p 1 , , p n are ordered in non-increasing fashion, the codeword lengths l i of the code are ordered in non-decreasing fashion, that is l 1 l n . Therefore, in the k-ary tree representation of the code, it holds the desired basic condition: For any pair of nodes v i and v j , i < j , associated with the symbols s i and s j , the depth of the node v i is smaller than or equal to the depth of the node v j .
Furthermore, we need to append the space character only to the k-ary strings that are the prefix of some others. Therefore, let us consider the first n non-empty k-ary strings according to the radix order [12], in which, we recall, the k-ary strings are ordered by length and, for equal lengths, ordered according to the lexicographic order. We have that the number of strings that are a prefix of some others is exactly n k 1 . One obtains this number by seeing the strings as corresponding to nodes in a k-ary tree with labels 0 , , k 1 on the edges. The number of strings that are a prefix of some others (among the n strings) is exactly equal to the number of internal nodes (except the root) in such a tree. This number of internal nodes is equal to N 1 k 1 , where N is the total number of nodes that, in our case, is equal to N = n + 1 (i.e., N counts also the root of the tree).
Moreover, starting from the optimal one-to-one code constructed according to our second method, that is by assigning k-ary strings to the symbols by increasing length and, for equal lengths, by inverse order according to the lexicographic order, one can verify that the n k 1 internal nodes are associated with the codewords of the symbols s i , for i that goes from 1 to k h 1 1 k 1 1 , and from k h + k h 1 2 k 1 n / k to k h k 1 2 .
In fact, since the height of the k-ary tree is h = log k ( n n / k ) and since all the levels of the tree, except the last two, are full, we need to append the space to all symbols from 1 to k h 1 1 k 1 1 . While on the second-to-last level, we have to append the space only to the remaining internal nodes associated with the symbols s i , where i goes from k h + k h 1 2 k 1 n / k to k h 1 k 1 1 . Those remaining nodes are exactly, among all the nodes in the second-to-last level, the ones associated with the symbols that have smaller probabilities. Thus, we obtain (4).
Summarizing, we can construct a prefix-free code C : S { 0 , , k 1 } + { 0 , , k 1 } + , in O ( n ) time, with lengths defined as in (4), starting from an optimal one-to-one code. Thus:
L ( C ) = i = 1 n p i l i   = i = 1 n p i log k ( ( k 1 ) i + 1 ) + i = 1 k h 1 1 k 1 1 p i + i = k h + k h 1 2 k 1 n / k k h 1 k 1 1 p i   = L + + i = 1 k h 1 1 k 1 1 p i + i = k h + k h 1 2 k 1 n / k k h 1 k 1 1 p i   L + + i = 1 n / k 1 p i ( since   we   are   adding   n / k 1   p i s ,   and   the   p i s   are   ordered ) .
Note that, from Lemma 1, we obtain that the average length of any optimal (i.e., of minimum average length) prefix-free code ending with a space is upper bounded by the Formula (2). Furthermore, we have an upper bound on the average length of the optimal prefix-free codes ending with a space in terms of the average length of optimal one-to-one codes.
We can also derive a lower bound on the average length of optimal prefix-free codes ending with a space in terms of the average length of optimal one-to-one codes. For such a purpose, we need two intermediate results. We first recall that, given a k-ary code C, its codewords can be represented as nodes in a k-ary tree with labels 0 , , k 1 on the edges. Indeed, for each node v, the k-ary string (codeword) associated with v can be obtained by concatenating all the labels in the path from the root of the tree to v. We also recall that, in prefix-free codes, the codewords correspond to the node leaves of the associated tree, while in one-to-one codes, the codewords may correspond also to the internal nodes of the associated tree.
Lemma 2.
Let S = { s 1 , , s n } be the set of source symbols, and let p = ( p 1 , , p n ) , p 1 p n > 0 , be a probability distribution on S. There exists an optimal prefix-free code ending with a space C : S { 0 , , k 1 } + { 0 , , k 1 } + such that the following property holds: For any internal node v (except the root) of the tree representation of C, if we denote by w the k-ary string associated with the node v, then the string w belongs to the codeword set of C.
Proof. 
Let C be an arbitrary optimal prefix-free code ending with a space. Let us assume that, in the tree representation of C, there exists an internal node v whose associated string w is such that w does not belong to the codeword set of C. Since v is an internal node, there is at least a leaf x, which is a descendant of v, whose associated string is the codeword of some symbol s j . We modify the encoding, by assigning the codeword w to the symbol s j . The new encoding is still prefix-free, and its average length can only decrease since the length of the newly assigned codeword to s j cannot be greater than the previous one. We can repeat the argument for all internal nodes that do not satisfy the property stated in the lemma to complete the proof. □
Lemma 3.
Let C : S { 0 , , k 1 } + { 0 , , k 1 } + be an arbitrary prefix-free code, then the code D : S { 0 , , k 1 } + one obtains from C by removing the space ⊔ from each codeword of C is a one-to-one code.
Proof. 
The proof is straightforward. Since C is prefix-free, it holds that, for any pair s i , s j S , with s i s j , the codeword C ( s i ) is not a prefix of C ( s j ) and vice versa. Therefore, since D is obtained from C by removing the space, we have four cases:
  • C ( s i ) = D ( s i ) and C ( s j ) = D ( s j ) : then D ( s i ) D ( s j ) since C ( s i ) C ( s j ) ;
  • C ( s i ) = D ( s i ) and C ( s j ) = D ( s j ) : then D ( s i ) D ( s j ) since C ( s i ) is not a prefix of C ( s j ) and vice versa;
  • C ( s i ) = D ( s i ) and C ( s j ) = D ( s j ) : then D ( s i ) D ( s j ) since C ( s j ) is not a prefix of C ( s i ) ;
  • C ( s i ) = D ( s i ) and C ( s j ) = D ( s j ) : then D ( s i ) D ( s j ) since C ( s i ) is not a prefix of C ( s j ) .
Therefore, for any pair s i , s j S , with s i s j , D ( s i ) D ( s j ) , and D is a one-to-one code. □
We can now derive a lower bound on the average length of optimal prefix-free codes with space in terms of the average length of optimal one-to-one codes.
Lemma 4.
Let S = { s 1 , , s n } be the set of source symbols, and let p = ( p 1 , , p n ) , p 1 p n > 0 , be a probability distribution on S, then the average of an optimal prefix-free code C : S { 0 , , k 1 } + { 0 , , k 1 } + satisfies
L ( C ) L + + i = 1 n / k 1 p n i + 1 ,
where L + is the average length of an optimal k-ary one-to-one code on S.
Proof. 
From Lemma 2, we know that there exists an optimal prefix-free code C with a space in which exactly n k 1 codewords contain the space character at the end. Let A { 1 , , n } be the set of indices associated with the symbols whose codeword contains the space. Moreover, from Lemma 3, we know that the code D obtained by removing the space from C is a one-to-one code. Putting it all together, we obtain that
L ( D ) = L ( C ) i A p i .
From (6), we have that
L ( C ) = L ( D ) + i A p i L + + i A p i ( since   D   is   a   one - to - one   code ) L + + i = 1 n / k 1 p n i + 1 ( since   A   contains   n k 1   elements ) .
We notice that the difference between the expression (2) and the lower bound (5) is, because of (3), less than
i = 1 n / k 1 p i i = 1 n / k 1 p n i + 1 < 1 ;
therefore, the prefix-free codes ending with a space that we construct in Lemma 1 have an average length that differs from the minimum possible by at most one. Moreover, since both the upper bound (3) and the lower bound (5) are easily computable, we can determine the average length of an optimal prefix-free code C : S { 0 , , k 1 } + { 0 , , k 1 } + with a tolerance at most of one. One can also see that the left-hand side of (7) is, often, much smaller than one.
In the following sections, we will focus on providing upper and lower bounds on the average length L + of k-ary optimal one-to-one codes in terms of the k-ary Shannon entropy H k ( p ) = i = 1 n p i log k p i of the source distribution p . Because of Lemmas 1 and 4, this will give us the corresponding upper and lower bounds on the average length of optimal prefix-free codes ending with a space.

3. Lower Bounds on the Average Length

In this section, we provide lower bounds on the average length of the optimal one-to-one code and, subsequently, thanks to Lemma 4, on the average length of the optimal prefix-free code with a space. For technical reasons, it will be convenient to consider one-to-one codes that make use of the empty word ϵ , that is one-to-one mappings D ϵ : S { 0 , 1 , , k 1 } + { ϵ } . It is easy to see (cf. (1)) that the optimal one-to-one code that makes use of the empty word assigns to the i-th symbol s i S a codeword of length l i given by:
l i = log k ( ( k 1 ) i ) .
where k is the cardinality of the code alphabet.
Thus, by denoting by L + the average length of the optimal one-to-one code that does not make use of the empty word and with L ϵ the average length of the optimal one-to-one code that does use it, we obtain the following relation:
L + = L ϵ + i = 1 log k n 1 k p k i 1 k 1 .
Our first result is a generalization of the lower bound to the average length of the optimal one-to-one codes presented in [5], from the binary case to the general case of k-ary alphabets, k 2 . Our proof technique differs from that of [5] since we are dealing with a set of source symbols of bounded cardinality (in [5], the authors considered the case of a numerable set of source symbols).
Lemma 5.
Let S = { s 1 , , s n } be the set of source symbols and p = ( p 1 , , p n ) be a probability distribution on S, with p 1 p n . The average length L ϵ of the optimal one-to-one code D : { s 1 , , s n } { 0 , , k 1 } + { ϵ } satisfies
L ϵ > H k ( p ) ( H k ( p ) + log k ( k 1 ) ) log k 1 + 1 H k ( p ) + log k ( k 1 ) log k ( H k ( p ) + log k ( k 1 ) + 1 ) ,
where H k ( p ) = i = 1 n p i log k p i .
Proof. 
The proof is an adaptation of Alon et al.’s proof [4] from the binary case to the k 2 -ary case.
We recall that the optimal one-to-one code (i.e., whose average length achieves the minimum L ϵ ) has codeword lengths l i given by:
l i = log k ( ( k 1 ) i ) .
For each j { 0 , , log k n } , let us define the quantities q j as
q j = i = k j 1 k 1 + 1 k j + 1 1 k 1 p i .
It holds that j = 0 log k n q j = 1 . Let Y be a random variable that takes values in { 0 , , log k n } according to the probability distribution q = ( q 0 , , q log k n ) , that is
j { 0 , , log k n } Pr { Y = j } = q j .
From (10), we have
L ϵ = i = 1 n log k ( ( k 1 ) i ) p i = j = 0 log k n i = k j 1 k 1 + 1 k j + 1 1 k 1 log k ( ( k 1 ) i ) p i = j = 0 log k n j q j = E [ Y ] .
By applying the entropy grouping rule ([3], Ex. 2.27) to the distribution p , we obtain
H 2 ( p ) = H 2 ( q ) + j = 0 log k n q j H 2 p k j 1 k 1 + 1 q j , , p k j + 1 1 k 1 q j H 2 ( q ) + j = 0 log k n q j log 2 k j ( since   H 2 p k j 1 k 1 + 1 q j , , p k j + 1 1 k 1 q j log 2 k j ) = H 2 ( q ) + j = 0 log k n j q j log 2 k = H 2 ( q ) + E [ Y ] log 2 k .
We now derive an upper bound to H 2 ( Y ) = H 2 ( q ) in terms of the expected value E [ Y ] .
To this end, let us consider an auxiliary random variable Y with the same distribution of Y, but with values ranging from 1 to log k ( n ) + 1 (instead of from 0 to log k ( n ) ). It is easy to verify that μ = E [ Y ] = E [ Y ] + 1 .
Let α be a positive number, whose value will be chosen later. We obtain that
H k ( Y ) α μ = i = 1 log k ( n ) + 1 q i 1 log k 1 q i 1 α j = 1 log k ( n ) + 1 j q j 1 = i = 1 log k ( n ) + 1 q i 1 log k 1 q i 1 + j = 1 log k ( n ) + 1 ( α j ) q j 1 = i = 1 log k ( n ) + 1 q i 1 log k 1 q i 1 + j = 1 log k ( n ) + 1 q j 1 log k ( k α j ) = i = 1 log k ( n ) + 1 q i 1 log k k α i q i 1 log k i = 1 log k ( n ) + 1 k α i ( by   Jensen s   inequality ) = log k 1 k α 1 k α ( log k ( n ) + 1 1 k α log k 1 k α ( log k ( n ) + 1 ) k α 1 = log k 1 ( k n ) α k α 1 .
By substituting log k μ μ 1 with α in the obtained inequality
H k ( Y ) α μ + log k 1 ( k n ) α k α 1 ,
we obtain
H k ( Y ) μ log k μ μ 1 + log k ( μ 1 ) + log k 1 1 k n log k μ μ 1 .
Since 1 k n log k μ μ 1 is decreasing in μ , and because μ = E [ Y ] + 1 > 1 , we obtain:
H k ( Y ) < E [ Y ] log k 1 + 1 E [ Y ] + log k ( E [ Y ] + 1 ) .
By applying (14) to (12) and since H k ( Y ) = H 2 ( Y ) log 2 k , we obtain
H 2 ( p ) < E [ Y ] log 2 k + E [ Y ] log 2 1 + 1 E [ Y ] + log 2 ( E [ Y ] + 1 ) .
From (11), we have that L ϵ = E [ Y ] ; moreover, from the inequality (28) of Lemma 7 (proven in the next Section 4), we know that
L ϵ H k ( p ) + log k ( k 1 ) .
Hence, since the function f ( z ) = z log k 1 + 1 z is increasing in z, we can apply (16) to upper-bound the term
E [ Y ] log 2 1 + 1 E [ Y ] ,
to obtain the following inequality:
H 2 ( p ) < L ϵ log 2 k + ( H k ( p ) + log k ( k 1 ) ) log 2 1 + 1 H k ( p ) + log k ( k 1 ) + log 2 ( H k ( p ) + log k ( k 1 ) + 1 ) .
Rewriting (17), we finally obtain
L ϵ > H k ( p ) ( H k ( p ) + log k ( k 1 ) ) log k 1 + 1 H k ( p ) + log k ( k 1 ) log k ( H k ( p ) + log k ( k 1 ) + 1 ) ,
and that concludes our proof. □
By bringing into play the size of the largest mass in addition to the entropy, Lemma 5 can be improved, as shown in the following result.
Lemma 6.
Let S = { s 1 , , s n } be the set of source symbols and p = ( p 1 , , p n ) , p 1 p n , be a probability distribution on S. The average length L ϵ of the optimal one-to-one code D : { s 1 , , s n } { 0 , , k 1 } + { ϵ } has the following lower bounds:
1.
If  0 < p 1 0.5 ,
L ϵ H k ( p ) ( H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) ) log k 1 + 1 H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) log k ( H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) + 1 ) log k 1 1 k n log k 1 + 1 1 p 1 ,
2.
if  0.5 < p 1 1
L ϵ H k ( p ) ( H k ( p ) H k ( p 1 ) + ( 1 p 1 ) ( 1 + log k ( k 1 ) ) ) log k 1 + 1 H k ( p ) H k ( p 1 ) + ( 1 p 1 ) ( 1 + log k ( k 1 ) ) log k ( H k ( p ) H k ( p 1 ) + ( 1 p 1 ) ( 1 + log k ( k 1 ) ) + 1 ) log k 1 1 k n log k 1 + 1 1 p 1 ,
where H k ( p 1 ) = p 1 log k p 1 ( 1 p 1 ) log k ( 1 p 1 ) .
Proof. 
The proof is the same as the proof of Lemma 5. However, we change two steps in the demonstration.
First, since
1 k n log k μ μ 1 = 1 k n log k E [ Y ] + 1 E [ Y ] = 1 k n log k 1 + 1 E [ Y ]
is decreasing in μ and E [ Y ] = L ϵ = 0 p 1 + 1 p 2 + 1 p 1 , we have
log k 1 1 k n log k μ μ 1 log k 1 1 k n log k 1 + 1 1 p 1 .
Hence, by applying (20) to the right-hand side of (13), we obtain
H k ( Y ) E [ Y ] log k 1 + 1 E [ Y ] + log k ( E [ Y ] + 1 ) + log k 1 1 k n log k 1 + 1 1 p 1 .
Now, by applying (21) (instead of (14)) to (12) and since H k ( Y ) = H 2 ( Y ) log 2 k , we obtain
H 2 ( p ) E [ Y ] log 2 k + E [ Y ] log 2 1 + 1 E [ Y ] + log 2 ( E [ Y ] + 1 ) + log 2 1 1 k n log k 1 + 1 1 p 1 .
Here, instead of applying the upper bound:
L ϵ H k ( p ) + log k ( k 1 )
of Lemma 7 to the right-hand side of (22), we apply the improved version:
L ϵ H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) if 0 < p 1 0.5 , H k ( p ) H k ( p 1 ) + ( 1 p 1 ) log k 2 ( k 1 ) if 0.5 < p 1 1 ,
proven in Lemma 8 of the Section 4. Then, we simply need to rewrite the inequality, concluding the proof. □
Thanks to Lemma 4 and the Formula (9), the above lower bounds on L ϵ can be applied to derive our main results for prefix-free codes with a space, as shown in the following theorems.
Theorem 1.
The average length of an optimal prefix-free code with space C : S { 0 , , k 1 } + { 0 , , k 1 } + satisfies
L ( C ) > H k ( p ) ( H k ( p ) + log k ( k 1 ) ) log k 1 + 1 H k ( p ) + log k ( k 1 ) log k ( H k ( p ) + log k ( k 1 ) + 1 ) + i = 1 n k 1 p n i + 1 + i = 1 log k n 1 k p k i 1 k 1 .
Proof. 
From Lemma 4 and the Formula (9), we have
L ( C ) L ϵ + i = 1 n k 1 p n i + 1 + i = 1 log k n 1 k p k i 1 k 1 .
By applying the lower bound (10) of Lemma 5 to (24), we obtain (23). □
Analogously, by exploiting (the possible) knowledge of the maximum source symbol probability value, we have the following result.
Theorem 2.
The average length of the optimal prefix-free code with space C : S { 0 , , k 1 } + { 0 , , k 1 } + has the following lower bounds:
1.
If 0 < p 1 0.5 :
L ( C ) H k ( p ) ( H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) ) log k 1 + 1 H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) log k ( H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) + 1 ) log k 1 1 k n log k 1 + 1 1 p 1 + i = 1 n k 1 p n i + 1 + i = 1 log k n 1 k p k i 1 k 1 .
2.
If 0.5 < p 1 1 :
L ( C ) H k ( p ) ( H k ( p ) H k ( p 1 ) + ( 1 p 1 ) ( 1 + log k ( k 1 ) ) ) log k 1 + 1 H k ( p ) H k ( p 1 ) + ( 1 p 1 ) ( 1 + log k ( k 1 ) ) log k ( H k ( p ) H k ( p 1 ) + ( 1 p 1 ) ( 1 + log k ( k 1 ) ) + 1 ) log k 1 1 k n log k 1 + 1 1 p 1 + i = 1 n k 1 p n i + 1 + i = 1 log k n 1 k p k i 1 k 1 .
Proof. 
From Lemma 4 and the Formula (9), we have
L ( C ) L ϵ + i = 1 n k 1 p n i + 1 + i = 1 log k n 1 k p k i 1 k 1 .
By applying the lower bounds (18) and (19) of Lemma 6 to (27), we obtain (25) or (26) according to the value of the maximum source symbol probability. □

4. Upper Bounds on the Average Length

In this section, we will first derive upper bounds on the average length of optimal one-to-one codes. Successively, we will provide corresponding upper bounds on the average length of optimal prefix-free codes ending with a space.
We start by extending the result obtained in [13] from the binary case to the k-ary case, k 2 .
Lemma 7.
Let S = { s 1 , , s n } be the set of source symbols and p = ( p 1 , , p n ) , p 1 p n , be a probability distribution on S. The average length L ϵ of the optimal one-to-one code D : { s 1 , , s n } { 0 , , k 1 } + { ϵ } satisfies
L ϵ H k ( p ) + log k ( k 1 ) .
Proof. 
Under the standing hypothesis that p 1 p n , it holds that
i = 1 , , n p i 1 i .
We recall that the length of the i-th codeword of the optimal one-to-one code D is equal to
l i = log k ( ( k 1 ) i ) .
Therefore, from (29), we can upper bound each length l i as
l i = log k ( ( k 1 ) i ) log k ( ( k 1 ) i ) log k ( k 1 ) + log k 1 p i .
Hence, by applying (31) to the average length of D, we obtain
L ϵ = i = 1 n p i l i i = 1 n p i log k ( k 1 ) + log k 1 p i = H k ( p ) + log k ( k 1 ) .
This concludes our proof. □
By exploiting the knowledge of the maximum probability value of p , we generalize the upper bound in [5] from k = 2 to arbitrary k 2 .
Lemma 8.
Let S = { s 1 , , s n } be the set of source symbols and p = ( p 1 , , p n ) , p 1 p n , be a probability distribution on S. The average length L ϵ of the optimal one-to-one code D : { s 1 , , s n } { 0 , , k 1 } + { ϵ } satisfies
L ϵ H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) i f 0 < p 1 0.5 , H k ( p ) H k ( p 1 ) + ( 1 p 1 ) log k 2 ( k 1 ) i f 0.5 < p 1 1 .
Proof. 
Let us prove first that the length of an optimal one-to-one code satisfies the inequality:
L ϵ i = 2 n p i log k ( i ( k 1 ) ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 .
Indeed, by recalling that l 1 = log k ( k 1 ) = 0 , we can write L ϵ as follows:
L ϵ = i = 2 n p i log k ( i ( k 1 ) ) = j 1 : k j 1 k 1 + 1 n i = k j 1 k 1 + 1 min ( k j + 1 1 k 1 1 , n ) p i log k ( i ( k 1 ) ) + j 2 : k j 1 k 1 n p k j 1 k 1 log k k j 1 k 1 ( k 1 ) = j 1 : k j 1 k 1 + 1 n i = k j 1 k 1 + 1 min ( k j + 1 1 k 1 1 , n ) p i log k ( i ( k 1 ) ) + j 2 : k j 1 k 1 n p k j 1 k 1 log k ( k j 1 ) j 2 : k j 1 k 1 n p k j 1 k 1 ( log k ( k j 1 ) log k ( k j 1 ) ) i = 2 n p i log k ( i ( k 1 ) ) j 2 : k j 1 k 1 n p k j 1 k 1 ( log k ( k j 1 ) log k ( k j 1 ) ) ,
where the last inequality holds since
j 1 : k j 1 k 1 + 1 n i = k j 1 k 1 + 1 min ( k j + 1 1 k 1 1 , n ) p i log k ( i ( k 1 ) ) j 1 : k j 1 k 1 + 1 n i = k j 1 k 1 + 1 min ( k j + 1 1 k 1 1 , n ) p i log k ( i ( k 1 ) ) .
We note that the function f ( j ) = log k ( k j 1 ) log k ( k j 1 ) is increasing in j. Therefore, it reaches the minimum at j = 2 , where it takes the value
log k ( k 2 1 ) log k ( k 2 1 ) = 1 + log k 1 1 k 2 > 0.5 ,
for any k 2 . Thus, (34) holds as we claimed.
Let us now show that
L ϵ H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 .
Since the distribution p is ordered in a non-increasing fashion, from (29) and (34), we have
L ϵ i = 2 n p i log k ( i ( k 1 ) ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 i = 2 n p i log k 1 p i ( k 1 ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 ( since   i 1 p i ) = H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 .
Therefore, (35) holds.
To conclude the proof, it remains to prove that
L ϵ H k ( p ) H k ( p 1 ) + ( 1 p 1 ) ( log k 2 ( k 1 ) ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 .
By observing that for any i 2 , it holds that
p i 2 ( 1 p 1 ) i ,
we obtain:
L ϵ i = 2 n p i log k ( i ( k 1 ) ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 i = 2 n p i log k 2 ( 1 p 1 ) p i ( k 1 ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 ( since   from   ( 37 ) ,   we   have   i 2 ( 1 p 1 ) p i ) = H k ( p ) p 1 log k 1 p 1 + ( log k 2 + log k ( 1 p 1 ) + log k ( k 1 ) ) ( 1 p 1 ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 = H k ( p ) H k ( p 1 ) + ( 1 p 1 ) ( log k 2 ( k 1 ) ) 0.5 j 2 : k j 1 k 1 n p k j 1 k 1 .
Therefore, (36) holds as well.
From (35) and (36), since
j 2 : k j 1 k 1 n p k j 1 k 1 0 ,
we obtain
L ϵ H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) ,
and
L ϵ H k ( p ) H k ( p 1 ) + ( 1 p 1 ) ( log k 2 ( k 1 ) ) .
Now, it is easy to verify that p 1 log k 1 p 1 H k ( p 1 ) + ( 1 p 1 ) log k 2 for 0 < p 1 0.5 , proving the Lemma. □
Thanks to the result of Lemma 1 and to the Formula (9), the upper bounds obtained above can be used to derive our upper bounds on the average length of optimal prefix-free codes with space, as shown in the following theorems.
Theorem 3.
The average length of an optimal prefix-free code with space C : { s 1 , , s n } { 0 , , k 1 } + { 0 , , k 1 } + satisfies
L ( C ) H k ( p ) + log k ( k 1 ) + i = 1 log k n 1 k p k i 1 k 1
+ i = 1 k h 1 1 k 1 1 p i + i = k h + k h 1 2 k 1 n / k k h 1 k 1 1 p i
H k ( p ) + log k ( k 1 ) + i = 1 log k n 1 k p k i 1 k 1 + i = 1 n k 1 p i ,
where h = log k ( n n / k ) .
Proof. 
From Lemma 1 and the Formula (9), we have
L ( C ) L ϵ + i = 1 log k n 1 k p k i 1 k 1 + i = 1 k h 1 1 k 1 1 p i + i = k h + k h 1 2 k 1 n / k k h 1 k 1 1 p i .
By applying the upper bound (28) on L ϵ of Lemma 7 to (40), we obtain (38). □
Theorem 4.
The average length of an optimal prefix-free code with space C : { s 1 , , s n } { 0 , , k 1 } + { 0 , , k 1 } + satisfies
L ( C ) H k ( p ) p 1 log k 1 p 1 + ( 1 p 1 ) log k ( k 1 ) + i = 1 n k 1 p i + i = 1 log k n 1 k p k i 1 k 1 i f 0 < p 1 0.5 , H k ( p ) H k ( p 1 ) + ( 1 p 1 ) log k 2 ( k 1 ) + i = 1 n k 1 p i + i = 1 log k n 1 k p k i 1 k 1 i f 0.5 < p 1 1 .
Proof. 
From Lemma 1 and the Formula (9) and by recalling that h = log k ( n n / k ) , we have
L ( C ) L ϵ + i = 1 log k n 1 k p k i 1 k 1 + i = 1 k h 1 1 k 1 1 p i + i = k h + k h 1 2 k 1 n / k k h 1 k 1 1 p i L ϵ + i = 1 n k 1 p i + i = 1 log k n 1 k p k i 1 k 1 .
We apply the upper bound (33) on L ϵ of Lemma 8 to (42). That gives us (41). □
Remark 1.
One can estimate how much the average length of optimal prefix-free codes ending with a space differs from the minimum average length of unrestricted optimal prefix-free codes on the alphabet { 0 , 1 , , k 1 , } , that is optimal prefix-free codes in which the special symbol ⊔ is not constrained to appear at the end of the codewords, only.
Let S = { s 1 , , s n } be the set of source symbols and p = ( p 1 , , p n ) be a probability distribution on S. Let us denote by C : { s 1 , , s n } { 0 , , k 1 } + { 0 , , k 1 } + an optimal prefix-free code ending with a space for S and by C : { s 1 , , s n } { 0 , , k 1 , } + an optimal prefix-free code without the restriction of where the space can occur. Clearly, L ( C ) < H k ( p ) + 1 , since the more constrained optimal code C : { s 1 , , s n } { 0 , , k 1 } + has an average length less than H k ( p ) + 1 . Therefore,
L ( C ) L ( C ) < H k ( p ) + 1 L ( C ) H k ( p ) + 1 H k + 1 ( p ) ( s i n c e   L ( C ) H k + 1 ( p ) ) = H k ( p ) 1 1 log k ( k + 1 ) + 1 .
Since lim k log k ( k + 1 ) = 1 , we have that, as the cardinality of the code alphabet increases, the constraint that the space can appear only at the end of codewords becomes less and less influential.

5. Concluding Remarks

In this paper, we have introduced the class of prefix-free codes where a specific symbol (referred to as a “space") can only appear at the end of codewords. We have proposed a linear-time algorithm to construct “almost"-optimal codes with this characteristic, and we have shown that their average length is at most one unit longer than the minimum average length of any prefix-free code in which the space can appear only at the end of codewords. We have proven this result by highlighting a connection between our type of codes and the well-known class of one-to-one codes. We have also provided upper and lower limits of the average length of optimal prefix-free codes ending with a space, expressed in terms of the source entropy and the cardinality of the code alphabet.
We leave open the problem of providing an efficient algorithm to construct optimal prefix-free codes ending with a space. It seems that there is no easy way to modify the classical Huffman greedy algorithm to solve our problem. It is possible that the more powerful dynamic programming approach could be useful to provide an optimal solution to the problem, as done in [14] for optimal binary codes ending with ones. This will be the subject of future investigations.

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the referees and the Guest Editor I. Sason for their corrections and useful suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Daniels, P.T.; Bright, W. (Eds.) The World’s Writing Systems; Oxford University Press: Oxford, UK, 1996. [Google Scholar]
  2. Wikipedia: Writing Systems without Word Boundaries. Available online: https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries (accessed on 25 March 2024).
  3. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
  4. Alon, N.; Orlitsky, A. A lower bound on the expected length of one-to-one codes. IEEE Trans. Inf. Theory 1994, 40, 1670–1672. [Google Scholar] [CrossRef]
  5. Blundo, C.; De Prisco, R. New bounds on the expected length of one-to-one codes. IEEE Trans. Inf. Theory 1996, 42, 246–250. [Google Scholar] [CrossRef]
  6. Jaynes, E. Note on unique decipherability. Ire Trans. Inf. Theory 1959, 5, 98–102. [Google Scholar] [CrossRef]
  7. Shannon, C.E. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
  8. Courtade, T.; Verdú, S. Cumulant generating function of codeword lengths in optimal lossless compression. IEEE Int. Symp. Inf. Theory 2014, 2494–2498. [Google Scholar]
  9. Kontoyiannis, I.; Verdú, S. Optimal lossless data compression: Non-asymptotics and asymptotics. IEEE Trans. Inf. Theory 2014, 60, 777–795. [Google Scholar] [CrossRef]
  10. Kosut, O.; Sankar, L. Asymptotics and non-asymptotics for universal Fixed-to-Variable source coding. IEEE Trans. Inf. Theory 2017, 63, 3757–3772. [Google Scholar] [CrossRef]
  11. Szpankowski, W.; Verdu, S. Minimum expected length of Fixed-to-Variable lossless compression without prefix constraints. IEEE Trans. Inf. Theory 2011, 57, 4017–4025. [Google Scholar] [CrossRef]
  12. Knuth, D. The Art of Computer Programming, Volume 3, 2nd ed.; Sorting and Searching; Addison Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1998. [Google Scholar]
  13. Wyner, A. An upper bound on the entropy series. Inf. Control 1972, 20, 176–181. [Google Scholar] [CrossRef]
  14. Chan, S.-L.; Golin, M.J. A dynamic programming algorithm for constructing optimal 1-ended binary prefix-free codes. IEEE Trans. Inf. Theory 2000, 46, 1637–1644. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bruno, R.; Vaccaro, U. Entropic Bounds on the Average Length of Codes with a Space. Entropy 2024, 26, 283. https://0-doi-org.brum.beds.ac.uk/10.3390/e26040283

AMA Style

Bruno R, Vaccaro U. Entropic Bounds on the Average Length of Codes with a Space. Entropy. 2024; 26(4):283. https://0-doi-org.brum.beds.ac.uk/10.3390/e26040283

Chicago/Turabian Style

Bruno, Roberto, and Ugo Vaccaro. 2024. "Entropic Bounds on the Average Length of Codes with a Space" Entropy 26, no. 4: 283. https://0-doi-org.brum.beds.ac.uk/10.3390/e26040283

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop