« Concept Word Variant » : différence entre les versions

De Lillois Fractale Wiki
Aller à la navigation Aller à la recherche
Contenu ajouté Contenu supprimé
Aucun résumé des modifications
Aucun résumé des modifications
 
(10 versions intermédiaires par le même utilisateur non affichées)
Ligne 1 : Ligne 1 :
This page is part of the [[Alex II|Alex II]] desciption.
This page is part of the [[Alex II|Alex II]] description.


It describes 4 levels of perceptions and 3 kind of nodes:
It describes 4 levels of perceptions and 3 kind of nodes:
Ligne 7 : Ligne 7 :
*Variant (node)
*Variant (node)
*Input
*Input

= Classification =


== Concept ==
== Concept ==
Ligne 64 : Ligne 66 :


A variant is described with the three first fields. It is also indexed based on the combination of the three first fields.
A variant is described with the three first fields. It is also indexed based on the combination of the three first fields.

== Input ==

The input is the 4th level of perception. An input is NOT translated into an input node (Alex does not use node for input).

Instead an input (for instance a word input) is transformed  into a set of lexically close variants.

This transformation uses standard lexical metrics, like loewenstein distance - and maybe something better when available.

Anyway the input is the nature of the signals received by the captors made available to Alex. The first and most conveninet captor is a human being introducing words thru a keyboard.


== Auxiliary notions: language, lexical category, formal variant<br> ==
== Auxiliary notions: language, lexical category, formal variant<br> ==
Ligne 73 : Ligne 85 :
*'''Formal variant''': plural, feminine, conjugated form... The formal variant is an attribute of a variant node.<br>
*'''Formal variant''': plural, feminine, conjugated form... The formal variant is an attribute of a variant node.<br>


= Analysis =
== Link classification<br> ==


== Link classification<br> ==
[[Link_classification|Links are classified]] according to the level of connected nodes and according to its own nature (qualified or not).<br>


[[Link classification|Links are classified]] according to the level of connected nodes and according to its own nature (qualified or not).<br>
<span style="color: rgb(255, 102, 0);">Un lien simple lie de manière floue 2 concepts
</span>


== Identifying concepts, words and variants<br> ==
<span style="color: rgb(255, 102, 0);">Un lien connection lie de manière spécialisée 2 concepts, sous la signification d'un troisième qui est un concept connecteur.
</span>

<span style="color: rgb(255, 102, 0);">D'un point de vue O-O, une seule classe existe: le lien. Un lien bipolaire est un lien tripolaire dont le concept connecteur est resté indéfini (null).
</span>

<span style="color: rgb(255, 102, 0);">Les liens tripolaires forment une sous-population (et non une sous-classe) des liens.
</span>

<span style="color: rgb(255, 102, 0);">Dans tous les cas, et comme dans Alex I, les liens sont églamenet caractérisés par les attributs suivants:
</span>

*<span style="color: rgb(255, 102, 0);">deux perméabilités - grandeurs scalaires -, une pour chaque sens de parcours.
</span>
*<span style="color: rgb(255, 102, 0);">deux signaux - valeurs transitoire servant à propager les états excitations,&nbsp; une pour chaque sens de parcours.</span><br>

== Identifying concepts, words and variants<br> ==


In order to identify a '''variant''' the following is used: string|language|lexical-category.
In order to identify a '''variant''' the following is used: string|language|lexical-category.
Ligne 113 : Ligne 106 :
*rose|F|N&nbsp;&nbsp;&nbsp;&nbsp; (N=Noun)
*rose|F|N&nbsp;&nbsp;&nbsp;&nbsp; (N=Noun)
*rose|F|A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (A=Adjective)
*rose|F|A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (A=Adjective)
*rose|E|N&nbsp;&nbsp;&nbsp;&nbsp; (E=English N=noun)
*rose|E|N&nbsp;&nbsp;&nbsp;&nbsp; (E=English N=noun)
*pink|E|A &nbsp; &nbsp; (E=English A=adjective)
*pink|E|A &nbsp; &nbsp; (E=English A=adjective)


Generally a '''concept''' may not be identified, except by its links&nbsp; ([[Concepts_classification|significant exception exists]]).
Generally a '''concept''' may not be identified, except by its links&nbsp; ([[Concepts classification|significant exception exists]]).

== Evocation et manifestation ==


== Evocation and manifestation ==
On ne peut nommer, mais on peut '''''évoquer''''' - de manière floue - divers concepts via les mots ou variants qui y sont directement ou indirectement liés. Un ensemble de mots et de variants évoque directement un ensemble de concepts, avec divers facteurs de pondération, et indirectement... beaucoup de choses interconnectées...


It is not possible to name a concept, but it is possible to '''''evoke''''' - in a fuzzy way - a concept via a set of linked words or via a set of nodes linked thru qualified links (QC links). More generally, a ''set'' of words and variants '''''evoke''''' a set of concepts with various weighting factors, and indirectly from there... many linked concepts...
Par ailleurs lorsqu'il communique et se présente (lors d'états d'excitation par exemple), le concepts se '''''manifeste''''' par certains mots qui lui sont liés. Le choix des mots utilisés au moment de la manifestation du concept dépend éventuellement de la configuration des excitations au moment de la manifestation.


Inversely, when a concept presents itself for communication (in some excitation state), it will be presented, '''''manifested''''' thru various words: the words linked to it through CW links, and the words linked to concepts linked to it thru QC&nbsp;links. The choice of words used for this '''''manifestation''''' may depend on the current excitation state of an active conscience or focus.
== Cinq couches d'évocation/manifestation<br> ==


== Five levels of evocation/manifestation<br> ==
Les '''5 couches EM'''&nbsp;sont:<br>


The '''5 levels of EM''' are:<br>
#input&nbsp;: n'importe quelle chaine de caratères (introduite sur un clavier ou générées par un canal sensoriel)<br>
#u-variant (unqualified variant)&nbsp;: l'ensemble des mots-variants connus, organisés par langue, et sans plus de propriétés (1 HashSet par langue) <br>
#q-variant (qualified variant)&nbsp;: l'ensemble des mots-variants connus, organisés par langue, avec spécification de variante verbale (1 HashSet par langue,avec la spécification de variante verbale dans la clé)<br>
#mot&nbsp;: l'ensemble des mots de bases, soit à peu près la notion habituelle de dictionnaire (1 HashSet par langue, avec la spécification de variante formelle dans la clé).<br>
#concept&nbsp;: le concept (pas de HashSet, pas de clé).


#input&nbsp;: any character string (entered using a keyboard or introduced by a input channel, such as a book reader). Language reference may be absent.<br>
Processus ascendant ('''évocation'''):
#u-variant (unqualified variant)&nbsp;: the set of known variants, per language, with no more property (1 HashSet per language, no variant form in the key) <br>
#q-variant (qualified variant)&nbsp;:&nbsp;the set of known variants, per language, with the property of the variant form (1 HashSet per language, variant form present in the key)<br>
#word&nbsp;: the set known word nodes, roughly the usual ''dictionary'' (1 HashSet per langauge, with the lexical category as part of the key).<br>
#concept&nbsp;: the set of known concepts (no HashSet, no indexation key&nbsp;!).


Upward process ('''evocation'''):
*1-2: distance entre mots&nbsp;: lousse évoque louse(E) lousy(E) loss (E) pousse (F) mousse(F) loupe (F).... Parfois un variant est correct (distance 0) dans une ou plusieurs langue (plane(e) et plane(F)), mais cela n'empêche pas l'évocation des variants à distance non nulle.
*2-3: chaque u-variant possède un HashSet contenant les q-variants dont il partage l'écriture.
*3-4: chaque q-variant est lié à 1! mot
*4-5: chaque mot est lié à divers concepts, et conteint un ensemble de liens flous les liant à eux


*1-2: string distance analysis&nbsp;: ''lousse'' evokes ''louse''(E) ''lousy''(E) ''loss'' (E) ''pousse'' (F) ''mousse''(F) ''loupe'' (F).... Sometimes perfect match with a vairiant (distance 0) in one or many languages (plane(e) et plane(F)), but this does not prevent the possible evocation of other variants with non-zero distance.
Processus descendant ('''manifestation''')
*2-3: any u-variant contains the set of the q-variants sharing the same string.
*3-4: any q-variant is linked to 1! word node.
*4-5: any word is linked to various concepts, and contains the set of fuzzy links leading to them.<br>


Downward process ('''manifestation''')
La manifestation a toujours lieu au sein d'un contexte de communication (choix de langue) et d'un contexte de conscience.


La manifestation part d'un contexte et va jusqu'au mot(s), parfois plus loin jusqu'aux q-variants.
The manifestation generally occurs within a commucation context (language preference, may more) and within a conscience/focus context.


La manifestation part d'un seul concept parfois, mais le plus souvent de tous les concepts significativement excités dans la conscience.
The manifestation starts from a concept a goes generally down to words, sometimes down to variants (q-variants).<br>


La manifestation cible sur plusieurs mots dont seul le (ou les) mieux liés seront exprimés.
The manifestation may start from one single concept, but more often it will start from all concept excited within a focus.


The manifestation finds various more or less likely words, but only emits the most likely.
Chaque concept contient un ensemble de liens à perméabilité variable vers divers mots.


This is based on the permeabitlities between concepts and words.
La manifestion est un processus de pondération sélectif.


The manifestation is a processus of weighted selection, with restricted emission.
== Construction du réseau ==


== Building the nodes and the jelly ==
Le RRR se construit par étapes:


There is a [[jelly construction|sequential construction of the jelly]] (nodes and links), completely independant from the conscience and focus.
#L'ensemble des variants et l'ensemble des mots sont construits, langue par langue, sur base de dictionnaires spécialisés. Des dictionnaires sont téléchargeables sur Internet, sous le nom de corpus. En français, un corpus est disponible sur le site [http://abu.cnam.fr/ abu.cnam.fr].
#Les liens des variants vers les mots sont construits. Ceci se fait par un programme Java intégré à Alex.
#Divers connecteurs (liens tripolaires) sont définis à partir de fichiers de configuration.
#Sur base des mêmes fichiers de configuration, les concepts et mots liés par ces connecteurs sont créés, ainsi que les liens concepts-mots et les liens tripolaires.
#... [ à développer]


== Limites ==
== Limits ==


Indépendemment des limites générales du projet Alex, la correspondance variants mots concepts soulève encore divers problèmes.
Besides the global limits of the Alex project, the links between variants words and concepts still show various problems and limits.


One of them is the issue of expressions. An expresssion is a combinatiopn of words having together a concept-level meaning (evocation) different from the concept-level meaning(s) raised when the involved words are individually used for the evocation process. Thus an expression implies a specific concept-level linking.
L'une d'eux est celui des '''expressions'''. Une expression est une combinaison de mots qui présente un sens distincts de celui qu'impliquent seuls les mots présents. Donc une expression a un lien vers un concept propre.


Exemples français&nbsp;: ''en avoir marre&nbsp;; virage en épingle à cheveux&nbsp;; bouc émissaire&nbsp;; tête de turc...''
Examples in french&nbsp;: ''en avoir marre&nbsp;; virage en épingle à cheveux&nbsp;; bouc émissaire&nbsp;; tête de turc...''

Dernière version du 28 février 2010 à 18:46

This page is part of the Alex II description.

It describes 4 levels of perceptions and 3 kind of nodes:

  • Concept (node)
  • Word (node)
  • Variant (node)
  • Input

Classification

Concept

By itself a concept has no name (and no indexation key).

However some concepts receive conventional names, and enter into some classification. Among the concepts, some play a specific role: the connecting concepts.

At the conscience/focus level, concepts receive excitation figures.

A concept mainly exists thru the links it has with word nodes (CW links), and thru the qualified links it has with other concepts (QC links).

Actually the concept level is the only level with significant inter-node connectivity.

Words

Word nodes make the second level of nodes.

Words are namable, and indexable.

In OO logic, a word is an object whose attributes are:

  • a string of characters
  • a language identifier
  • one or many lexical categories (like verb or noun... see below)
  • links to concepts (CW links)
  • links to variants (WV links)

A word is not directly linked to other words. For instance, the translation process in Alex II always goes (up and down) thru the concept level.

A word never receives excitation in a conscience or focus.

The indexation key contains the combination of the 3 first fields:

  • string
  • language
  • lexical

Thus orange (in french) and orange (in english) are distinct words, because they are part of different languages. Similarly, noyer (verb, french) and noyer (verb, noun) are distinct words, because they relate to different lexical categories.

Variant

Thee variant is the obejct found in the daily spoken language. It is an obvious and usual object.

A variant is linked to 1 and only 1 word, and it does not exist without this link.

Variant objects includes mainly conjugated form, plural forms, feminine forms. Horses is a variant of horse. In french Coquettes is a variant of coquet. Seen is a variant ofsee. Etc..

A variant may be built from a string that also exists as a word string. Thus, in frenchété (conugated form, variant of être) exists in the variant universe, while été (noun) exists in the word universe.

In OO logic, a variant is an object whose attributes are:

  • a string of characters
  • a language identifiers
  • a variant form descriptor
  • a link to a word

A variant is described with the three first fields. It is also indexed based on the combination of the three first fields.

Input

The input is the 4th level of perception. An input is NOT translated into an input node (Alex does not use node for input).

Instead an input (for instance a word input) is transformed  into a set of lexically close variants.

This transformation uses standard lexical metrics, like loewenstein distance - and maybe something better when available.

Anyway the input is the nature of the signals received by the captors made available to Alex. The first and most conveninet captor is a human being introducing words thru a keyboard.

Auxiliary notions: language, lexical category, formal variant

These notions are for Alex neither nodes nor concepts (although concepts might be defined for them). But they are used as attributes of words and variants.

  • Language: english, french...  The language is an attribute of a word node, and an attribute of a variant node.
  • Lexical category: verb, noun, adjective, ... The lexical category is an attribute of a word node.
  • Formal variant: plural, feminine, conjugated form... The formal variant is an attribute of a variant node.

Analysis

Link classification

Links are classified according to the level of connected nodes and according to its own nature (qualified or not).

Identifying concepts, words and variants

In order to identify a variant the following is used: string|language|lexical-category.

Examples :

  • chevaux|F|P         (F=French P=Plural)
  • voyez|F|P2           (F=French P2=Pluriel 2ème personne)

In order to identify a variant the following is used: string|language|formal-variant.

Examples:

  • rose|F|N     (N=Noun)
  • rose|F|A      (A=Adjective)
  • rose|E|N     (E=English N=noun)
  • pink|E|A     (E=English A=adjective)

Generally a concept may not be identified, except by its links  (significant exception exists).

Evocation and manifestation

It is not possible to name a concept, but it is possible to evoke - in a fuzzy way - a concept via a set of linked words or via a set of nodes linked thru qualified links (QC links). More generally, a set of words and variants evoke a set of concepts with various weighting factors, and indirectly from there... many linked concepts...

Inversely, when a concept presents itself for communication (in some excitation state), it will be presented, manifested thru various words: the words linked to it through CW links, and the words linked to concepts linked to it thru QC links. The choice of words used for this manifestation may depend on the current excitation state of an active conscience or focus.

Five levels of evocation/manifestation

The 5 levels of EM are:

  1. input : any character string (entered using a keyboard or introduced by a input channel, such as a book reader). Language reference may be absent.
  2. u-variant (unqualified variant) : the set of known variants, per language, with no more property (1 HashSet per language, no variant form in the key)
  3. q-variant (qualified variant) : the set of known variants, per language, with the property of the variant form (1 HashSet per language, variant form present in the key)
  4. word : the set known word nodes, roughly the usual dictionary (1 HashSet per langauge, with the lexical category as part of the key).
  5. concept : the set of known concepts (no HashSet, no indexation key !).

Upward process (evocation):

  • 1-2: string distance analysis : lousse evokes louse(E) lousy(E) loss (E) pousse (F) mousse(F) loupe (F).... Sometimes perfect match with a vairiant (distance 0) in one or many languages (plane(e) et plane(F)), but this does not prevent the possible evocation of other variants with non-zero distance.
  • 2-3: any u-variant contains the set of the q-variants sharing the same string.
  • 3-4: any q-variant is linked to 1! word node.
  • 4-5: any word is linked to various concepts, and contains the set of fuzzy links leading to them.

Downward process (manifestation)

The manifestation generally occurs within a commucation context (language preference, may more) and within a conscience/focus context.

The manifestation starts from a concept a goes generally down to words, sometimes down to variants (q-variants).

The manifestation may start from one single concept, but more often it will start from all concept excited within a focus.

The manifestation finds various more or less likely words, but only emits the most likely.

This is based on the permeabitlities between concepts and words.

The manifestation is a processus of weighted selection, with restricted emission.

Building the nodes and the jelly

There is a sequential construction of the jelly (nodes and links), completely independant from the conscience and focus.

Limits

Besides the global limits of the Alex project, the links between variants words and concepts still show various problems and limits.

One of them is the issue of expressions. An expresssion is a combinatiopn of words having together a concept-level meaning (evocation) different from the concept-level meaning(s) raised when the involved words are individually used for the evocation process. Thus an expression implies a specific concept-level linking.

Examples in french : en avoir marre ; virage en épingle à cheveux ; bouc émissaire ; tête de turc...