-
Notifications
You must be signed in to change notification settings - Fork 22
Plug in API Corpora Archive
Plug-ins / [corparch]
interface: plugins.abstract.corpora.AbstractSearchableCorporaArchive type: required client-side: yes
The corparch plug-in provides a way how to search and select corpora.
# AbstractSearchableCorporaArchive.get_corpus_info(corp_id, language=None)
Returns:
- plugins.abstract.corpora.CorpusInfo
# AbstractSearchableCorporaArchive.get_list(user_allowed_corpora)
# AbstractSearchableCorporaArchive.search(plugin_api, user_id, query, offset=0, limit=None, filter_dict=None)
Returns a subset of corplist matching provided query.
arguments:
- plugin_api -- a controller.PluginApi instance
- user_id -- a database ID of the user who triggered the search
- query -- any search query the concrete plug-in implementation can understand (KonText itself just passes it around).
- offset -- return a list starting from this index (zero-based; default is 0)
- limit -- a maximum number of items to return (default is None; interpretation of None is up to the plug-in, i.e. it can be "no limit" or "default limit" etc.) *filter_dict -- a dict or werkzeug.datastructures.MultiDict containing additional arguments of the search request; KonText just passes Request.args here
returns:
- a JSON-serializable dictionary a concrete plug-in implementation understands
# AbstractSearchableCorporaArchive.initial_search_params(user_id, lang)
This plug-in reads a hierarchical list of corpora from an XML file (it can be part of config.xml but not necessarily). A corpus is described and placed in a hierarchy in the following way:
<corplist title="">
<corplist title="Synchronic Corpora">
<corplist title="SYN corpora">
<corpus id="SYN2010" web="http://www.korpus.cz/syn.php" sentence_struct="s" tagset="czech_tagset" />
... etc...
</corplist>
<corplist title="Diachronic Corpora">
<corpus id="DIA" />
</corplist>
</corplist>
</corplist>
Attributes for the corplist element:
attr. name | description |
---|---|
title | name of the group |
Attributes for the corpus element:
attr. name | description |
---|---|
ident | name of the corpus (as used within registry files) |
sentence_struct | structure delimiting sentences |
tagset | (optional) tagset used by this corpus |
web | (optional) external link containing information about the corpus |
Please note that you do not have to put the corplist subtree into the config.xml file. corparch can be configured to load any XML file and search for the tree node anywhere you want.