Fixed- and Variable-Length Data Compression at Finite Blocklength
Presenter
April 13, 2015
Keywords:
- Compression
MSC:
- 47A20
Abstract
In data compression, a block of source symbols is compressed down into a smaller length string, while ensuring that the original block can be recovered from that encoded string, either exactly or approximately. Identifying the optimum rate-fidelity tradeoffs achievable in principle, regardless of the complexity of the code, is the subject of information-theoretic interest in data compression.
Classical information theory characterizes rate-fidelity tradeoffs achievable in the limit of infinite length of the source block to be compressed. A perennial question in information theory is how relevant the asymptotic fundamental limits are when the communication system is forced to operate at a given fixed blocklength. The finite blocklength (delay) constraint is inherent to all communication scenarios. In fact, in many systems of current interest, such as real-time multimedia communication, delays are strictly constrained, while in packetized data communication, packets are frequently on the order of 1000 bits. Despite an obvious practical interest in the nonasymptotic fundamental limits, up until the 2010 award-winning work of Polyanskiy et al., such limits were widely believed to defy analysis.
We show that both in fixed- and variable-length data compression, the nonasymptotic fundamental limit can be closely approximated by a function of just two parameters of the source, namely, the rate-distortion function (Shannon’s asymptotic limit), and the rate-dispersion function, a new parameter of the source that characterizes the nonasymptotic behavior of the minimal achievable rate.
In block-to-block data compression, a source block of k symbols is to be compressed down to n symbols. We study the tradeoff between k, n and the achievable fidelity of reproduction. For memoryless sources, we show that there is a penalty over the rate-distortion function induced by the finite blocklength requirement, and the size of that penalty is closely governed by the rate-dispersion function. Consequently, in block data compression, the rate-dispersion function measures the speed of approach to the rate-distortion function for a given source.
In fixed-to-variable-length data compression, a block of k source outcomes is represented by a variable-length binary string. The nonasymptotic fundamental limit of interest is the minimum average length of the encoded string compatible with a given fidelity. Perhaps surprisingly, the behaviors of fundamental limits in fixed- and variable-length settings are starkly different. First of all, in the variable-length setting, the strong converse does not hold, and allowing a nonvanishing, however small, excess distortion probability, one can asymptotically beat the rate-distortion function. Moreover, this asymptotic limit is approached from below , i.e. the shorter the blocklength the lower the achievable average rate! Still, like in fixed-length data compression, the rate-dispersion function is the only characteristic of the source, which, together with the rate-distortion function, gives an accurate description of that counter-intuitive behavior.