« Hufint » : différence entre les versions
Aucun résumé des modifications |
Aucun résumé des modifications |
||
Ligne 20 :
*speed for math operations on the values is not critical
== Optimality ==
It may be shown (but this is left to a brave subject) that the Hufint coding is optimal when integer densities are distributed as the inverse of the logarithm of their values...
P(
== Hufint slice and base<br> ==
|
Version du 22 janvier 2014 à 11:46
Purpose
A Hufint is a variable-length-coded integer, with an optimal compression perspective.
Hufint stands for Huffman integer, or Huffman coded integer.
The contract of this coding tool is:
- allow to encode any positive integer (including 0) a a set of bits
- allow a non-ambiguous decoding algorithm
- use less bits for smaller integer
- allow non-ambiguous encoding/decoding for arrays of integers.
Hufint are used in the Matscape projects, wherever
- integer values are likely to be small (example: number of child)
- integer values are allowed to be large (but rather seldom)
- integer statistical distribution is decreasing
- encoding size is critical
- speed for math operations on the values is not critical
Optimality
It may be shown (but this is left to a brave subject) that the Hufint coding is optimal when integer densities are distributed as the inverse of the logarithm of their values...
P(n) = k/log(n)
Hufint slice and base
The Hufint slice is the slice size for the Hufint bitset representation.
2slice-1 is then called Hufint base.
Matscape uses Hufint-6 representations (Hufint slice = 6, Hufint base = 32).
- numbers from 0 to base-1 (31) are represented on 6 bits
- numbers from base (32) to base2-1 (1023) are represented on 12 bits
- numbers from base2 (1024) to base3-1 (32767) are represented on 18 bits
- etc...
Algorithm
The algorithm is rather starightforward but not documented here.
At least two public functions (in a java lib) are defined:
int f(BitSet) /* conversion of bitSet into a positive integer - decoding */ BitSet f(int) /* conversion of a positive integer into bitSet - encoding */
Similar functions are available for arrays of integers.
Various other convenience functions aredefined for reporting and checking.