Skip to content

Plug in API Corpora Archive

Tomas Machalek edited this page Apr 25, 2016 · 2 revisions

Plug-ins / [corparch]

interface: plugins.abstract.corpora.AbstractSearchableCorporaArchive type: required client-side: yes

The corparch plug-in provides a way how to search and select corpora.

⚠️ Please note that currently there are some hidden dependencies between this plug-in and other plug-ins (taghelper, live_attributes). We are working to resolve this.

# AbstractSearchableCorporaArchive.get_corpus_info(corp_id, language=None)

Returns:

  • plugins.abstract.corpora.CorpusInfo

# AbstractSearchableCorporaArchive.get_list(user_allowed_corpora)

# AbstractSearchableCorporaArchive.search(plugin_api, user_id, query, offset=0, limit=None, filter_dict=None)

Returns a subset of corplist matching provided query.

arguments:

  • plugin_api -- a controller.PluginApi instance
  • user_id -- a database ID of the user who triggered the search
  • query -- any search query the concrete plug-in implementation can understand (KonText itself just passes it around).
  • offset -- return a list starting from this index (zero-based; default is 0)
  • limit -- a maximum number of items to return (default is None; interpretation of None is up to the plug-in, i.e. it can be "no limit" or "default limit" etc.) *filter_dict -- a dict or werkzeug.datastructures.MultiDict containing additional arguments of the search request; KonText just passes Request.args here

returns:

  • a JSON-serializable dictionary a concrete plug-in implementation understands

# AbstractSearchableCorporaArchive.initial_search_params(user_id, lang)

hierarchical_archive (a corparch variant)

⚠️ Latest KonText version (0.7.x - currently in development) does not support this variant yet (we are working on it).

This plug-in reads a hierarchical list of corpora from an XML file (it can be part of config.xml but not necessarily). A corpus is described and placed in a hierarchy in the following way:

<corplist title="">
  <corplist title="Synchronic Corpora">
     <corplist title="SYN corpora">
       <corpus id="SYN2010" web="http://www.korpus.cz/syn.php" sentence_struct="s" tagset="czech_tagset" />
       ... etc...
     </corplist>
     <corplist title="Diachronic Corpora">
        <corpus id="DIA" />
     </corplist>
  </corplist>
</corplist>

Attributes for the corplist element:

attr. name description
title name of the group

Attributes for the corpus element:

attr. name description
ident name of the corpus (as used within registry files)
sentence_struct structure delimiting sentences
tagset (optional) tagset used by this corpus
web (optional) external link containing information about the corpus

Please note that you do not have to put the corplist subtree into the config.xml file. corparch can be configured to load any XML file and search for the tree node anywhere you want.

Clone this wiki locally