« Hufint » : différence entre les versions
(Page créée avec « == Purpose == A Hufint is a variable-length-coded integer. The contract of this coding tool is: - allow to encode any positive integer (including 0) a a se of bit - allow... ») |
Aucun résumé des modifications |
||
(14 versions intermédiaires par le même utilisateur non affichées) | |||
Ligne 1 : | Ligne 1 : | ||
== Purpose == |
== Purpose == |
||
A Hufint is a variable-length-coded integer. |
A Hufint is a variable-length-coded integer, with an optimal compression perspective. |
||
Hufint stands for Huffman integer, or Huffamn coded integer. It is related to the general [http://en.wikipedia.org/wiki/Huffman_coding Huffman coding] method. |
|||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
*allow a non-ambiguous decoding algorithm |
|||
⚫ | |||
*allow non-ambiguous encoding/decoding for arrays of integers.<br> |
|||
== Use == |
|||
⚫ | |||
Hufint are used in the |
Hufint are used in the Matscape projects, wherever |
||
*integer values are likely to be small (example: number of child in a chess tree) |
|||
⚫ | |||
*integer statistical distribution is decreasing |
|||
⚫ | |||
⚫ | |||
[[Anastomochess|Anastomochess]] is a project where hufints might help a lot. |
|||
⚫ | |||
== Optimality == |
|||
⚫ | |||
It may be shown (but the demonstration is left to a brave subject) that the Hufint coding is optimal when integer densities are distributed as the inverse of the logarithm of their values... |
|||
⚫ | |||
<blockquote> |
|||
P(n) = k/log(n) |
|||
</blockquote> |
|||
It may also be tested experimentally. |
|||
== Hufint base == |
== Hufint slice and base<br> == |
||
The Hufint |
The Hufint slice is the slice size for the Hufint bitset representation.<br> |
||
2<sup>slice-1</sup> is then called Hufint base. <br> |
|||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
*numbers from base (32) to base<sup>2</sup>-1 (1023) are represented on 12 bits <br> |
|||
*numbers from base<sup>2</sup> (1024) to base<sup>3</sup>-1 (32767) are represented on 18 bits |
|||
*etc... |
|||
== Algorithm == |
|||
The algorithm is rather starightforward but not documented here. |
|||
At least two public functions (in a java lib) are defined: |
|||
<pre>int f(BitSet) /* conversion of bitSet into a positive integer - decoding */ |
|||
BitSet f(int) /* conversion of a positive integer into bitSet - encoding */ </pre> |
|||
Similar functions are available for arrays of integers.<br>Various other convenience functions aredefined for reporting and checking.<br> |
|||
<br> |
Dernière version du 22 janvier 2014 à 12:05
Purpose
A Hufint is a variable-length-coded integer, with an optimal compression perspective.
Hufint stands for Huffman integer, or Huffamn coded integer. It is related to the general Huffman coding method.
The contract of this coding tool is:
- allow to encode any positive integer (including 0) a a set of bits
- allow a non-ambiguous decoding algorithm
- use less bits for smaller integer
- allow non-ambiguous encoding/decoding for arrays of integers.
Use
Hufint are used in the Matscape projects, wherever
- integer values are likely to be small (example: number of child in a chess tree)
- integer values are allowed to be large (but rather seldom)
- integer statistical distribution is decreasing
- encoding size is critical
- speed for math operations on the values is not critical
Anastomochess is a project where hufints might help a lot.
Optimality
It may be shown (but the demonstration is left to a brave subject) that the Hufint coding is optimal when integer densities are distributed as the inverse of the logarithm of their values...
P(n) = k/log(n)
It may also be tested experimentally.
Hufint slice and base
The Hufint slice is the slice size for the Hufint bitset representation.
2slice-1 is then called Hufint base.
Matscape uses Hufint-6 representations (Hufint slice = 6, Hufint base = 32).
- numbers from 0 to base-1 (31) are represented on 6 bits
- numbers from base (32) to base2-1 (1023) are represented on 12 bits
- numbers from base2 (1024) to base3-1 (32767) are represented on 18 bits
- etc...
Algorithm
The algorithm is rather starightforward but not documented here.
At least two public functions (in a java lib) are defined:
int f(BitSet) /* conversion of bitSet into a positive integer - decoding */ BitSet f(int) /* conversion of a positive integer into bitSet - encoding */
Similar functions are available for arrays of integers.
Various other convenience functions aredefined for reporting and checking.