Abstract

Nearest neighbour search (NNS) is the core of large data retrieval. Learning to hash is an effective way to solve the problems by representing high-dimensional data into a compact binary code. However, existing learning to hash methods needs long bit encoding to ensure the accuracy of query, and long bit encoding brings large cost of storage, which severely restricts the long bit encoding in the application of big data. An asymmetric learning to hash with variable bit encoding algorithm (AVBH) is proposed to solve the problem. The AVBH hash algorithm uses two types of hash mapping functions to encode the dataset and the query set into different length bits. For datasets, the hash code frequencies of datasets after random Fourier feature encoding are statistically analysed. The hash code with high frequency is compressed into a longer coding representation, and the hash code with low frequency is compressed into a shorter coding representation. The query point is quantized to a long bit hash code and compared with the same length cascade concatenated data point. Experiments on public datasets show that the proposed algorithm effectively reduces the cost of storage and improves the accuracy of query.

1. Introduction

Given a query object/point and a dataset , the nearest neighbour search (NNS) [13] is to return the nearest neighbours in to . Nowadays, the NNS is widely used in many applications such as image retrieval, text classification, and recommendation systems. However, with the exponential growth of data scale and the disaster of the high data dimensionality, the NNS problem is now much more difficult to solve than before. Therefore, new efficient index structures and query algorithms for similarity searches have increasingly become the focus of research for the problem.

The hashing-based NNS methods [35] have attracted much attention. Generally, the hashing methods can project the original data with locality preserved to a low-dimensional Hamming space, i.e., binary codes [46]. The complexity of those methods is always in sublinear time. In addition, the hashing methods only need a simple bit operation to compute the similarity from Hamming encoding, which is very fast. As the high performance in large-scale data retrieval, hashing techniques have gained increasing interests in facilitating cross-view retrieval tasks [7, 8], on-line retrieval tasks [9], and metric learning tasks [10].

For large-scale data retrievals, the time and space costs are the two important issues. As we know, the accuracy of existing hash methods is limited by the length of hash encoding and usually requires a longer coding to get better accuracy. However, a long coding will increase the space cost, network communication overhead, and response time.

In order to solve this problem, a coding quantization mechanism [11] based on asymmetric hashing algorithm [12] was proposed. Different from the direct hash code comparison, by cascade concatenating the coding of the data point to the same encoding length of the query point, the coding storage cost of the dataset is reduced effectively and the accuracy of the result is ensured. However, this algorithm uses a unified compression method for all data, ignoring the effect of data distribution. Actually, the distribution of large-scale data is generally uneven. Hence, for most hashing algorithms, the frequency of quantization is also different. As we know, longer encoding can preserve most of the original information; however, it will bring high cost and vice versa. A careful trade-off among accuracy, computing overhead, and space-saving needs to be studied. Intuitively, high-density data require longer length encoding to ensure that the original information is preserved as much as possible, while low-density data can use shorter length encoding and still preserve most of the original information. That is the idea behind our algorithm.

In this paper, an asymmetric learning to hash with variable bit encoding algorithm (AVBH) is proposed. The AVBH uses two types of hash mapping functions to quantify the dataset and the query set separately to encode the hash codes with different length bits. In particular, the frequency of dataset is calculated by random Fourier encoding, and then the random Fourier coding with high frequency is compressed into a longer hash code representation, and the random Fourier coding with low frequency is compressed into a shorter hash code representation.

The main contributions of this paper are as follows: (1) a variable bit encoding mechanism (named AVBH) based on hash code frequency compression is proposed, which makes the encoding space effectively used, and (2) the experiment shows that the AVBH can effectively reduce the storage cost and improve the query accuracy.

2. Preliminaries and Description

In this section, we review some basic knowledge of LSH (locality-sensitive hashing) [1315], vector quantization [16], and product quantization [17] that is essential to our proposed technique.

2.1. Vector Quantization

Vector quantization (VQ) is a classical data compression technique, which compresses the original data into discrete vectors. For a vector of dimensions, formally, a VQ function can specified as , where (with dimensions) is an original data/vector, is a pretrained code set, and is a codeword in the codebook . The objective of a VQ function is to quantify the original real number vector to the nearest codeword with the lowest VQ loss. Here, the VQ loss of vector is given by

2.2. Product Quantization

Product quantization (PQ) is an optimization of vector quantization. Firstly, the feature space is divided into mutually exclusive subspaces, and each subspace is then quantized separately using VQ. That is, the coding of each subspace forms a small codebook , and small codebooks form a large codebook by the Cartesian product. In this method, a high-dimensional data can be decomposed into low-dimensional spaces and can be processed in parallel. Suppose an object is represented as a combination of codewords , the loss of the product quantization of vector is given by

2.3. Random Fourier Feature

Traditional dimensionality reduction methods, such as PCA, map the data to the independent feature space and compute the main independent features. This method ignores the nonlinear information of the sample distribution and cannot apply to the actual data well. Based on the feature mapping method of random Fourier feature (RFF), data are mapped to the characteristic space under the approximate kernel function, and the inner product of any two points under the feature space is approximated by their kernel function values. Compared with the PCA method, RFF can maximize the data distribution information and obtain the dimensional characteristic by reducing the dimension or raising the dimension. This kind of characteristic is suitable for the characteristic compression processing. SKLSH [18] is a kind of classical hashing algorithm based on RFF, which has a good experimental result under the long bit digit coding.

The length coding hash learning algorithm firstly maps the sample points from the original dimension real space to the dimension of the approximate kernel function feature space by RFF. Because of the convergence of RFF consistency, the kernel function similarity between any two sample points can be maintained.

Specifically, for two points , the translation invariant kernel function [12] satisfies the following equation:where , satisfies the uniform distribution between , obeys the probability distribution induced by the translation invariant kernel function, and is a constant parameter.

Thus, the mapping from the dimensional space to the feature space of the dimensional approximation kernel function can be obtained by the following equation:where is for the same-sense sampling subject to the probability distribution and obeys the same-distribution sampling which is uniformly distributed between the obedience . When the translation invariant kernel function is a Gaussian kernel function, , is a Gaussian distribution, i.e., .

2.4. Orthogonal Procrustes Problem

An orthogonal Procrustes problem is to solve an orthogonal transforming matrix , so that is as close to as possible, i.e.,where . This formula is not easy to be solved directly, and it can be optimized by alternating optimization. Namely, the matrix is first to be fixed, and the matrix is optimized to make the target function value reduced. Then the matrix is fixed, and the orthogonal transforming matrix is optimized to make the target function value reduced.

3. Asymmetric Learning to Hash with Variable Bit Encoding

3.1. Algorithm Framework

For a general hash learning algorithm, the length of the hash code by learning is always fixed. AVBH uses the idea of asymmetric hashing algorithm, that is, the hash code for the dataset is short and unfixed, and the query point of the code is long and fixed. The steps of the AVBH hashing algorithm are shown in Figure 1, which mainly includes the dataset encoding steps ①–③ and the query point encoding step ④.

The dataset encoding section consists of two phases: random Fourier feature encoding (RFF encoding) and variable bit encoding (AVBH encoding). First, step ① uses the random Fourier feature (RFF) to map the dataset and get RFF encoding. After RFF coding, considering the difference of RFF coding frequency, the RFF coding frequency is sorted in step ②. According to the requirement, the original dataset can be divided into the subset by the RFF code as the length of shown in the figure. As shown in step ③, the AVBH subset encoding of the length of can be reproduced by duplicating times sequentially and then the Hamming code of dimension is formed.

In the query point encoding section, the query point quantization is encoded into RFF encoding of length by step ④.

3.2. Objective Function

The target of the AVBH method is to get groups of hash encoding with the length of through the hash function , namely,where , .

This divides the dataset into subset according to the RFF encoding frequency. By cascading times, respectively, we can get group bit hash code. For example,

Then combine to get a bit long hash code of the entire dataset . The AVBH method calculates the similarity by calculating the Hamming distance between the hash code of the query point and the concatenated dataset during the query process. Therefore, for the dataset, we need to construct the hash mapping function, so that the group hash code obtained with the length of , respectively, can preserve the original information as much as possible. Therefore, the AVBH method obtains the hash mapping function by the reconstruction error (8) between the minimum cascading encoding and dimension sample vector :where is an orthogonal rotation matrix, namely, .

Combining the properties of associative matrices and the definition of F-norm of matrices, we can get the following equation:

As the unknown variable in formula (8) is the product relation, the expanded formula (9) contains two items of unknown variables, so it is difficult to solve. After further simplification, we can get the following formula:

As , it is easy to get . As , we can get that is unrelated to and . As a result, , where is unrelated to . So formula (10) is simplified as follows:

Thus, minimizing the reconfiguration error (8) equals minimizing the quantization error (11).

The objective function of AVBH to encode the dataset is to minimize the reconstruction error of the concatenated encoding of the bit by finding the orthogonal rotation matrix . In extreme cases of the dataset which is uniformly distributed, there is no significant difference in the frequency of the hash encoding of the dataset, and the AVBH method then degenerates into the ACH algorithm [16]. Compared with the ACH hashing algorithm, the AVBH hashing algorithm is more adaptable to the real data because it can adapt to the data of various distributions and the generalization ability is stronger.

3.3. Optimization Algorithm

The objective function (11) can be optimized by alternating optimization. Namely, the rotation matrix is first to be fixed, and the encoding matrix is optimized to make the target function value reduced. Then the encoding matrix is fixed and the rotation matrix is optimized to make the target function value reduced. In this way, the value of the target function decreases until it converges. The following is a discussion of how to tune and optimize the value of the target function.(1)Fix the rotation matrix , and optimize the encoding matrix .Given , is a matrix consisted of row to row, column to column of . From formula (11), we can get the following equation:As are unrelated to , for a fixed , the problem of minimizing (12) is equal to the problem of maximizing the following formula:As , optimal analytic solution of formula (13) is given by(2)Fix the encoding matrix , and optimize the rotation matrix .

Under , the problem of minimizing formula (11) is equal to an Orthogonal Procrustes problem [9]. The optimal solution of such problems with is as follows:

Hence, the problem of optmizing to get the minimum value of formula (15) is equal to the problem of maximizing the following formula:

By calculating the SVD of , we can get the following formula:where is a matrix which consists of left singular value vector, is a matrix which consists of right singular value vector, is a diagonal matrix which consists of corresponding singular value vectors, and the diagonal elements of which is .

By combining formulas (16) and (17), we can get the following equation:

Given , , is the diagonal element of , and , respectively, represent the row of , . By Cauchy–Schwarz inequalities [11], the following equation is obtained:

So formula (18) can be written as formula (20):

Combining formula (19), when , formula (20) takes the maximum value. For , we can get the following formula:when , formula (16) takes the maximum value, and formula (15) takes the minimum value. As a result, we can get the optimal result by formula (21).

3.4. Encoding Functions
3.4.1. Dataset Encoding

When the objective function value converges, we can get the mapping function of AVBH to the dataset according to formula (14), where is the random Fourier feature (RFF) obtained by the mapping stage of the sample point :

3.4.2. Query Point Encoding

The optimal rotation matrix can be obtained from the training process of dataset coding. For the data in the query set, the main goal of encoding is to keep as much accurate information as possible, so the query set encoding does not need to be compressed and mapped to the hash code of the length bit. Combining formula (14), we can get the mapping function of AVBH to the query set:

3.5. Convergence Analysis of AVBH

According to the objective function (8), we can get the following formula:where is a constant matrix and satisfies the following two conditions: (1) signs of each element, i.e., positive or negative, in are the same as that in , and (2) each element in is not greater than the corresponding position element in .

Therefore, the optimization goal for is transformed into the two suboptimization problems, i.e., and . Specifically, for the subproblem , formula (14) gives the optimal solution. Therefore, it can be guaranteed that the updated value of formula (14) is less than or equal to the value obtained before. For the subproblem , formula (21) gives the optimal solution. Therefore, it also can be guaranteed that the updated value of formula (21) is less than or equal to the value obtained before.

Combining the two parts, the combination of (14) and (21) can guarantee that the updated value is less than equal to the value obtained before the update. We can conclude the AVBH algorithm is convergence.

4. Experimental Results

The experimental computer has the Intel Core i5-2410M CPU and 8 GB DDR3 memory. We compared the performance of AVBH with that of several typical hashing methods: ACH [12], ITQ [19], KMH [20], PCAH [21], and LSH [13].

4.1. Experimental Datasets
4.1.1. CIFAR-101

It is a set of 60,000-32 × 32 colour images in 10 categories with each category containing 6,000 images. In this experiment, 320-D gist features were extracted for each image in the dataset. We randomly selected 1,000 images as the test data and the remaining 59,000 as the training data. In the training data, the closest 50 data points (based on the Euclidean distance) from a test data point were regarded as its nearest neighbours.

4.1.2. SIFT2

It is a local SIFT feature set containing 1,000,000 128-D images. 100,000 of these sample images were randomly selected as the training data and 10,000 of other sample images as the test data.

4.1.3. GIST3

It is a global GIST feature set containing 1,000,000 960-D images. 500,000 of these sample images were randomly selected as the training data and 1,000 sample images as the test data.

4.2. Performance Evaluation

The performance of AVBH was evaluated mainly by the relationship between the accuracy of the query (precision) and the recall rate (recall). We define

For the sake of fairness, the average encoding length of AVBH was set to be the encoding length of other methods under a given dataset.

Figures 24 show precision-recall curves for Euclidean neighbour retrieval for several methods on CIFAR-10, SIFT, and GIST with Euclidean neighbour ground truth. Our method, AVBH, can get a better precision performance on the four datasets. As the AVBH algorithm uses the variable bit code, the total code length is less than other algorithms. As a result, our method effectively reduces the cost of storage and improves the accuracy of query.

5. Conclusion

In this paper, an asymmetric learning to hash with variable bit encoding algorithm was proposed. By the frequency statistics of the random Fourier feature encoding for the dataset, we compress high-frequency hash codes into longer encoding representations and low-frequency hash codes into shorter encoding representations. For a query data, we quantize to a long bit hash code and compare with the same length cascade concatenated data point to retrieve the nearest neighbours. This ensures that the original data information can be preserved as much as possible while the data are compressed, which maximizes the balance between coding compression and query performance. Experiments on open datasets show that the proposed algorithm effectively reduces the cost of storage and improves the accuracy of the query. As we use a two-stage algorithm framework for the hash codes generating, the training stage costs a lot of time. In future work, we will work on simplifying the training process.

Data Availability

The datasets for the experiment of this paper are as follows. (1) CIFAR-10 (available at http://www.cs.toronto.edu/∼kriz/cifar.html): it is a set of 60,000-32 × 32 colour images in 10 categories with each category containing 6,000 images. In this experiment, 320-D gist features were extracted for each image in the dataset. We randomly selected 1,000 images as the test data and the remaining 59,000 as the training data. In the training data, the closest 50 data points (based on the Euclidean distance) from a test data point were regarded as its nearest neighbours. (2) SIFT (available at http://corpus-texmex.irisa.fr): it is a local SIFT feature set containing 1,000,000 128-D images. 100,000 of these sample images were randomly selected as the training data and 10,000 of other sample images as the test data. (3) GIST (available at http://corpus-texmex.irisa.fr): it is a global GIST feature set containing 1,000,000 960-D images. 500,000 of these sample images were randomly selected as the training data and 1,000 sample images as the test data.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by Zhejiang NSF (Grant nos. LZ20F020001 and LY20F020009) and China NSF (Grant nos. 61472194 and 61572266) as well as programs sponsored by K. C. Wong Magna Fund in Ningbo University.