Alex liste des fichiers resources

De Assothink Wiki
Aller à la navigation Aller à la recherche

Intro

This is an annex of the application part of the Accueil project.

Location

Resources files are located in resourceDir ~/server/Alex/dico.resources

All files have extension .pk

Useful commands

>> alr(go to resource dir)

>> alex (view all available alex commands)

>> alex -pke <filename> (edit read-only file)

>> alex -pkv <filename> (view (cat) all file)

>> alex -pks <filename(s)>(view file summary, extreme lines)

>> alex -pkt <filename(s)> (create readable unpacked copies in tmp.unpack subdir)

Building pk files

Most or all pk files are built by the resourceBuilder process.

The following commands runs the resourceBuilder process (takes some time):

>> alex -r

KeySet related files - see class mscp.brain.keySet

  • wn.id (wordnet keySet, like D285687)

(note that keys in the Alex keySet are generated, NOT read from files)

LangSet related files (lang: en & fr) - see class mscp.brain.langSet

  • id.en
  • id.fr
  • id.def.en (definitions)
  • id.def.fr
  • id.vari.en (variant words)
  • id.vari.fr
  • id.conceptPercept.en (mapping of concept to percept-words)
  • id.conceptPercept.fr
  • id.perceptVariant.en.ii (mapping of percept-words to variant-words)
  • id.perceptVariant.fr.ii
  • id.manifest.en

manifest, i.e. non-ambiguous readable expression of a concept)

manifest are built in step 7 of resourceBuilder

  • id.manifest.fr

Q-Link Files

Both files list qualified links rebuilt from wordnet. The first one is one int[][] format, the second is a set of readable lines.

Line order is irrelevant.

Field 1 : concept index.

Field 2 : concept index

Field 3 : connecting concept index

  • id.link.ii
  • id.link

Wordnet related files

All files are used and created in the resourceBuilder processing.
They are derived from wordnet-downloaded files (in subdirectory WordNet-3.0)

  • wn.concept
  • wn.def.en
  • wn.def.fr
  • wn.id (keySet file)

line order : coindexal with concept index
field 1 : wordnet key

  • wn.link

line order : irrelevant
field 1 : wordnet key (concept 1)
field 2 : wordnet key (concept 2)
field 3 : concept index of connecting concept

  • wn.nvadNum (also used in concept class)
  • wn.word.en
  • wn.word.fr
  • wn.wordlist

List of acceptable wordnet words (words with spaces, words starting with digits,.
.. are refused). Line order is irrelevant. Each line contains a word prefixed by one of VNAD.

  • id.wnkey

mapping of concept index to wordnet keys
line order : irrelevant
field 1 : concept index
field 2 : wordnet key

XDXF File

XDXF processing is step 5 in resourceBuilder.

The input is xdxf subdirectory files, the output is xdxf.pk.

  • xdxf

DELA files

They are derived from DELA-downloaded files (in subdirectory DELA)

DELA processing is step 2 and 6 in resourceBuilder

  • dela.variants.en
  • dela.variants.fr
  • dela.word.en
  • dela.word.fr

Fuzzy links file

Step 8 of the resourceBuilder generates the fuzzy link file:

fuzzy.ii

Running the resourceBuilder (output example)

Starting resource building steps 012345678 (october 2011)
------------------------------------
DataBuild step 0 (resb)
Create <wn.wordlist> file (list of qWords known in wordnet), with relevant indexing
------------------------------------
DataBuild step 1 (resb)
Creation of english langSet <wn.word.en> <wn.def.en>
Creation og <wn.link> and <wn.concept>
------------------------------------
DataBuild step 2 (dela)
Building qWords and qVariants for both languages: <dela.variants.xx> and <dela.words.xx>
------------------------------------
DataBuild step 3 (resb)
Builds <wn.id> <id.wnkey>
Builds <id.link> <id.link.ii>
------------------------------------
DataBuild step 4 (resb)
Translate definitions from english to french : <wn.def.fr>
Uses google translation api; works incrementally.
------------------------------------
DataBuild step 5 (xdxf)
Input is xdxf files in cfg.resourceDir/xdxf/.
Output is xdxf.pk and various id.*.<lang>.pk files
------------------------------------
DataBuild step 6 (dela)
Building qWords and qVariants for both languages: <id.vari.en> and <id.perceptVariant.xx>
------------------------------------
DataBuild step 7 (manifestBuilder)
Creates manifest file in both languages: <id.manifest.xx>
------------------------------------
DataBuild step 8 (fuzzyBuilder)
Creates from any interesting source fuzzy links in <fuzzy.ii>
Uses now wikipedia links.





Hunspell Notes (doubts about usage)

hunspell formatting !

guide for command(1) and format(4): man hunspell

Dic location:/usr/share/myspell/dicts

Web site : http://wiki.services.openoffice.org/wiki/Dictionaries

? see /usr/share/dict/...

? see /usr/share/stardict/...

? sudo apt-get install stardict

? http://stardict.sourceforge.net/Dictionaries_fr.php