-
Notifications
You must be signed in to change notification settings - Fork 22
Development Outlook
Michal Kren edited this page May 28, 2015
·
12 revisions
This page briefly describes long-term priorities of KonText development listed according to the subject.
Important notice: the list is tentative and very likely to change in the future.
Please do get in touch with us in case you are thinking about similar functionality.
- motivation: people new to corpus linguistics may feel confused by the full functionality of KonText, especially considering its planned enhancements
- current status: expert mode only
- properties: KonText would have two basic modes of operation: beginner and expert mode; its current appearance will serve as the expert mode, while the beginner mode will have to be created as its simplified version which should be designed with mobile devices in mind
- motivation: enabling users with limited knowledge of CQL to use more sophisticated queries by means of an intuitive graphical widget (full CQL will not be supported, though)
- current status: six query types with a significant gap between CQL and all other ones
- properties: easy switching from GQC to CQL (and if possible also vice versa); a nice feature would be updating the CQL form according to what has been selected in GQC; replacement of the other query types
- motivation: to make the tag builder usable on wider range of tagsets, e.g. for the InterCorp or other foreign-language corpora
- current status: tag builder requires a positional tagset where every combination of character & position is guaranteed to have the same meaning
- properties: to be discussed, as there is a trade-off between general usability of the tag-builder and work needed both in terms of programming and complex configuration during the deployment
- motivation: obvious
- current status: no such functionality; large syntactically annotated corpora do exist, but in experimental version only
- properties: on the interface level dependence trees only; KonText would create a complex CQL query based on a subtree the user has constructed and display the result as a dependency tree
- notice: functionality already implemented in KorAP
- motivation: user requests as well as easy administration of available (sub)corpora
- current status: no such mechanism, only users can create their own subcorpora that cannot be shared; storing the within condition already implemented, but not used so far
- properties: this feature would make use of the within condition that created the particular subcorpus; the within condition would be editable and also shareable among users
- description: to enable users to create their own (possibly lemmatized and tagged) corpora
- current status: no such functionality
- dependencies: corpus sharing required for maximum usability
- description: selection of documents not only according to the given constraints, but also user-selected ratios (e.g. newspaper subcorpus that would contain 30 % title_A, 30 % title_B, 30 % title_C and 10 % other newspapers)
- current status: no such functionality
- properties: given the set of constraints and ratios, the module would select a suitable subset of documents (this is a computationally demanding task, but sufficient solution can presumably be found in real time)
- dependencies: corpus sharing required for maximum usability
- motivation: helping users with limited statistical background to make valid judgements
- current status: being implemented in the CNC
- properties: comparison of two frequencies in the same corpus/between two corpora; lexical richness; statistical confidence based on random samples
- motivation: enabling two-(or more-)dimensional frequency distributions, e.g. for a combination of txtype and publication year OR education and genre
- current status: one-dimensional frequency distributions only
- properties: structural attributes only; attractive visualisations; related with statistical module (contingency tables, correlations); possibly n dimensions
- notice: Manatee API seems to provide basic support, but this is definitely worth checking
- motivation: providing an alternative to the Word Sketches
- current status: only regular collocation lists available
- properties: based on cooccurrence profiles (Belica) and/or p-collocations (Cvrček); another option would be to make use of syntactic relations (if available)