Releases: prohippo/pyelly
Improve Vocabulary Table Lookup
PyElly was failing to recognize the plural of hyphenated terms like NAIL-BITERS. The solution was to limit vocabulary table search keys at the first hyphen, if it comes before a space in input text. The "marking" and "indexing" integration test files had to be changed to be consistent with new PyElly output.
Extend FSA Capabilities
This changes the PyElly FSA algorithm to allow for a token string. to be split up. A problem in handling the € symbol was fixed. The "marking" example application rules were extended and cleaned up. Broad update of the PyElly User's Manual, along with addition of Appendix G on Unicode.
Major Code Cleanup
Add Unicode hyphen to PyElly character set, Add definitionLine recognition of \H as Unicode hyphen; rework code to allow it to be used with vocabularyTable rule definitions. Revise example application *.v.elly files to work with new vocabularyTable. Improved macroTable commentary and debugging statements. Extend "marking" example application. Extend and revise documentation.
Major Overhaul of Vocabulary Table Operation
This cleans up ellyChar conversion of Unicode to ASCII, which is required in nameRecognition and in vocabularyTable. The generation of SQLite search keys for vocabulary entries was cleaned up. A bug in generating temporary rules for vocabulary entries was fixed. More extending of "marking" rules.
Various Bug Fixes, Extend "marking" rules
Continue changes to PyElly to address issues uncovered in processing "wild" text from the Web. This includes changes in vocabulary lookup, macro substitution patterns, handling of non-standard representations of right double quotation marks, English suffix recognition.
Bug Fix in Stop Exceptions, More Error Checking of Vocabulary Rules
This continues upgrading of PyElly code as it is tested on more "wild" Web test. Faulty stop exception logic was replaced and unit testing was extended. Vocabulary loading now has more error checking to identify problems.
Vocabulary Bug Fix and Diagnostics for Parse Tree Overflow
This provides information on a parse overflow by showing the list of generated tokens, which makes it easier to diagnose the underlying problem. It also fixes a bug when both an uninflected and a inflected form of a term are in a vocabulary table. More progress for the MARKING example application.
Add Extractor for Time Period References
Time references like "early Thursday evening" are easily recognized and can greatly help for avoiding parse tree overflows in translating long sentences from news stories. This was implemented with Python code, which is a little more readable than using the PyElly FSA.
Clean Up Punctuation Handling
Corrected problems with default punctuation rules and improved documentation. Added commentary to punctuationRecognizer.py to explain better what is going on there.
Handle M Dash, Vocabulary Table Code Cleanup
The change was mainly to make it possible for a vocabulary table entry to start with an m dash. It then becomes easier to describe certain semi-parenthetical expressions with m dashes. The compile() method for the VocabularyTable class was renamed build() to make the code more Pythonic. More rules were added to the "marking" language definition.