En3ple

De Assothink Wiki
Aller à la navigation Aller à la recherche

Introduction

The en3/en3ple logic and format is a possible way to formalize the full knowledge present in the passive jelly of Assothink.

After several works based on data produced by dbpedia and freebase, it is considered in the Assothink development team (may 2013).

The new class would be mscp.structure.en3

Benefits

The possible benefits turn around

  • readibility
  • standards
  • extensibility
  • integrability of external resources

Limitations

The en3ple format is not optimal in term of

  • memory consumption
  • startup time
  • runtime efficiency

Definitions: en3, en3ple, en3file, en3format

A en3 object may be any of most Assothink entities:

  • concepts of any category
  • percepts
  • variants
  • keys
  • languages
  • keysets
  • link predicates
  • definitions
  • textual examples

...

A en3ple is a triple (subjet/predicate/object) definition linking 3 en3. The en3ple may be used to represent

  • all language data
  • all data imported from any LACS
  • qualified links of the passive jelly
  • fuzzy links of the passive jelly

Any given en3 is defined only by

  • the sets of en3ples where its is involved
  • its key
  • an optional content

An en3File is a file containing en3ples (1 per line). The file is readable sequentially, and the format used is the en3Format defined below.

All en3Files are in the same directory AlexBase/en3.

Usage

The resource building process may be re-organized in 2 steps.

Step 1: accumulation/production of enfiles (and nothing else)

Step 2: transformation of the en3Files generated into pk files according to previous data management rules within Assothink.

After these 2 steps, the software chain used in Assothink is unchanged.

En3Format specification summary

The lines of an en3File are ordered.

An en3 key is a string, containing only the following chars : alphabetic (including accentuated chars) , numeric, "#:_".

The '#' in a key is reserved for random-unique generated keys.

The line format is subject|predicate|object .

The field separator is '|'.

The 3 en3 specifier (subject, predicated and object) are non-empty Strings with the following interpretation

  • =! : a new en3 with a random-unique key (the key format is #nnn) and no content.
  • == : the same en3 as in the last loaded en3ple line (for nay of the 3 fields)
  • =?=p=o= , or =s=?=o= , or =s=p=?= the one and only en3 matching the named rule in previously defined en3ples. If the matching is inexistant or multiple, an exception is thrown.
  • =!content : a new en3 with a random-unique-key and with a content equal to the content part of the specifier. Lengthy content are welcome.
  • anything not starting with '=' : an en3 (possible created if not yet defined) with the specifier value used as key, and no content.

Data model

The data model used in the mscp.structure.entry class includes

  • a global HashMap<String,en3> linking keys to en3 objects
  • key (1 per en3)
  • contents (for some en3)
  • many HashMap<en3,HashSet<en3>> (probably 6 per en3, but none for en3 having content)).

Limitations

The en3 / en3ple model is good for binary relations (true or false).

Most relations provides by LACS are binary indeed.

But Assothink has to manage fuzzy relations too.

For instance

  • links between concepts and words
  • fuzzy links between loosely apprented concepts

For these reasons, en en3 / en3ple model has to be extended, up to the fuzzy en3 / en3ple model.