« Hufint » : différence entre les versions

De Lillois Fractale Wiki
Aller à la navigation Aller à la recherche
Contenu ajouté Contenu supprimé
Aucun résumé des modifications
Aucun résumé des modifications
Ligne 16 : Ligne 16 :
*integer values are likely to be small (example: number of child)
*integer values are likely to be small (example: number of child)
*integer values are allowed to be large (but rather seldom)
*integer values are allowed to be large (but rather seldom)
*integer statistical distribution is roughly an exponential decrease
*integer statistical distribution is decreasing
*encoding size is critical
*encoding size is critical
*speed for math operations on the values is not critical
*speed for math operations on the values is not critical

== Optimality ==

It may be shown (but this is left to a brave subject) that the Hufint coding is optimal when integer densities are distributed as the inverse of the logarithm of their values...

P(x=n) = 1/log(n)


== Hufint slice and base<br> ==
== Hufint slice and base<br> ==

Version du 22 janvier 2014 à 11:46

Purpose

A Hufint is a variable-length-coded integer, with an optimal compression perspective.

Hufint stands for Huffman integer, or Huffman coded integer.

The contract of this coding tool is:

  • allow to encode any positive integer (including 0) a a set of bits
  • allow a non-ambiguous decoding algorithm
  • use less bits for smaller integer
  • allow non-ambiguous encoding/decoding for arrays of integers.

Hufint are used in the Matscape projects, wherever

  • integer values are likely to be small (example: number of child)
  • integer values are allowed to be large (but rather seldom)
  • integer statistical distribution is decreasing
  • encoding size is critical
  • speed for math operations on the values is not critical

Optimality

It may be shown (but this is left to a brave subject) that the Hufint coding is optimal when integer densities are distributed as the inverse of the logarithm of their values...

P(x=n) = 1/log(n)

Hufint slice and base

The Hufint slice is the slice size for the Hufint bitset representation.

2slice-1 is then called Hufint base.

Matscape uses Hufint-6 representations (Hufint slice = 6, Hufint base = 32).

  • numbers from 0 to base-1 (31) are represented on 6 bits
  • numbers from base (32) to base2-1 (1023) are represented on 12 bits
  • numbers from base2 (1024) to base3-1 (32767) are represented on 18 bits
  • etc...

Algorithm

The algorithm is rather starightforward but not documented here.

At least two public functions (in a java lib) are defined:

int f(BitSet) /* conversion of bitSet into a positive integer - decoding */

BitSet f(int) /* conversion of a positive integer into bitSet - encoding */ 

Similar functions are available for arrays of integers.
Various other convenience functions aredefined for reporting and checking.