-
Notifications
You must be signed in to change notification settings - Fork 3
English Resource Grammar
License
delph-in/erg
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Release notes for stable version "ERG 2023" Highlights: Improved overall syntactic coverage on Redwoods profiles to 93.77% on 100K items Improved parse selection by about 1% using new redwoods.mem model. Improved overall parsing efficiency by about 20%. 2021-12-14 - Added files for Singlish dialect, authored by Siew Yeng Chow based on her Master's thesis at NTU. 2022-07 - Incorporated changes to enable chart-mapping in LKB-FOS, thanks to John Carroll. 2022-10 - Adopted Emerson-Turing construction types for appending SLASH, with thanks to Guy Emerson and John Carroll. 2022-11 - Improved Version.lsp, METADATA, and grammar-loading files for better interface with LTDB, thanks to Francis Bond. Because we now generate erg.hds file each time the grammar is loaded into LKB, discarded erg/etc/rules.hds. ------------------------------------------------------------------------------ Release notes for stable version "ERG 2020" Punctuation marks now separate tokens - Revised syntactic analysis to treat all punctuation marks as separate tokens instead of as affixes. So syntactic rules combine a punctuation token either with the immediately preceding or following token, except for the possessive apostrophe which attaches to the preceding NP. Thanks to Stephan Oepen for motivation, assistance, and guidance in making this conversion, enabling better consistency of ERG output with that of other NLP tools and conventions. Also thanks to Woodley Packard for engineering support to accommodate treebanking updates in the face of near-universal changes in token counts. Full Redwoods treebank update - All of the usual treebanked profiles, totaling 1.5M tokens, have now been updated using the full-forest treebanking tool fftb, and reflecting the changed analysis of punctuation. An additional 1000 items from WSJ section 23 have also been treebanked after the release was stable, to provide a new set of annotations for evaluation. Documentation strings throughout the grammar - Both ACE and the LKB, along with Pydelphin, now fully support the use of triple-quote-marked documentation strings on types and instances, so these have been added to most instances of leaf lexical types, constructions, and lexical rules in the ERG. Thanks to Francis Bond for pushing this cause forward, and to developers for accommodating the necessary formalism changes. Alas, PET does not yet provide full support, so this release of the grammar includes variants of several grammar files ("...for-pet.tdl") where the doc strings have been deleted. For now, compile and run PET using these variant files as follows: flop english-for-pet cheap -cm -repp -default-les=all -packing -verb=4 english ------------------------------------------------------------------------------ In trunk, as an interim update, - refreshed support for openproof generation - expanded coverage with mal-rules and types - added full-forest treebank profiles for wsj06-09 ------------------------------------------------------------------------------ Release notes for stable version "ERG 2018" Annotations - Supplied full-forest treebanks for Redwoods profiles, including the first five sections of the WSJ. - Added profiles for WeSearch user-generated content (wlb03, wnb03), and for Sherlock Holmes story (sh-spec). - Improved the well-formedness and consistency of the MRSs, aiming for more consistency with an updated version of the semantic algebra. Token mapping - Upgraded to use GML 1.0, particularly relevant for the WeScience corpus. - Improved support for both `strong' brackets (manually inserted) and `weak' delimiters (motivated by, for example, hyphens) to signal phrase boundaries that should not be crossed. Syntax - Enabled extraction from within NPs. - Changed the attachment order of pre- and post-nominal modifiers, so now the pre-modifiers attach before post-modifiers. - Added two new types of modifiers of verbal projections: indefinite NPs as in |She walked out of the casino a rich woman.|; and gapped clauses of saying, as |They will, we suspect, leave early.| - Added the `do-be' construction as in |the only thing we said she had to do was finish the assignment| Semantics - Moved information-structure constraints from RELS to ICONS in MRSs, including for focus movement (`topicalization') and passiveization. - Simplified the inventory of role labels, notably for conjunction relations, where ARG1 and ARG2 replace the old L-HNDL, R-HNDL, L-INDEX, R-INDEX roles. - Improved SEM-I consistency. Platforms and applications - Added support for ubertagging with PET and ACE, thanks to Rebecca Dridan. - Added robust parsing mode for ACE using csaw PCFG, thanks to Woodley Packard. - Added support for `agree' parser/generator - Expanded the inventory of mal-rules for grammar-checking. - Added `transfer' rules and support for generation from first order logic. - Added support for manually inserted `strong' brackets which force phrasal boundaries, as in |we ⌊(⌋ saw a man ⌊)⌋ with a telescope|. - Changed RNAME value on rules from string to type, to allow `weak' bracket rules to constrain which rules must apply, as for named entities such as |New York Stock Exchange|. - Added support for robust `bridging' rules (disabled by default). ------------------------------------------------------------------------------ Release notes for trunk version 2016-09-27 Now underway in full-forest treebanking of Redwoods profiles and eventually WSJ as well, and making minor grammar corrections along the way. ------------------------------------------------------------------------------ Release notes for trunk version 2015-06-19 [After a long hiatus, returning to commenting on trunk version changes.] Tuned paraphrase rules both for educ and for openproof. The educ set are mostly for generating variant correct answers for the new Reading composition exercises in the Redbird Language Arts course. The openproof modifications are aimed at reducing the remaining ambiguity in the generated English outputs. ------------------------------------------------------------------------------ Release notes for trunk version 2013-03-19 Added two constructions motivated by Sherlock Holmes corpus: (1) adverbial clauses with gaps and verbs of saying, as in |You have, I presume, considered this.|. (2) adverbial indefinite NPs as VP modifiers, as in |He arrived a hero and departed a villain| Also improved treatment of present participles as adjectives, employing verb predications for semantics. ------------------------------------------------------------------------------ Inflectional rules: instances made one-to-one with types ------------------------------------------------------------------------------ Release notes for version "ERG (1212)" Stable tagged release, including updates of all tsdb/gold profiles. This release is also used for the treebanked profiles of DeepBank 1.0, the Wall Street Journal corpus included in the Penn Treebank. Details and an online demo can be found at www.delph-in.net/erg ------------------------------------------------------------------------------ Release notes for version "ERG (1111)" Stable tagged release, including updates of all tsdb/gold profiles, plus the addition of two new profiles from the Tanaka corpus: rtc000 and rtc001. Details on ERG coverage of all gold profiles can be found on the Redwoods web page: http://www.delph-in.net/redwoods. ------------------------------------------------------------------------------ Update of `trunk' version as of August 2011: Added coverage for the following phenomena: - pre-determiner adjective phrases, as in |too tall a building|. |too strong an opponent to overcome| - enough' + VP/NP complement, as in |We met a tall enough player to hire.| - sentence-initial indefinite NP `depictives', as in |A happy cat, she purred.| - extraposed relative clauses, as in |A cat appeared suddenly which had no tail.| - gapping constructions, where the second head in a conjoined VP or S is missing |He persuades Kim to sing and Abrams to act.| - `do-be' construction as in |The only thing we didn't expect him to do was give himself a raise.| - conditional inversion, as in |Were we to visit Paris, we would be happy.| - more freedom in ordering of complements |The book was given to Kim by Sandy.| |The book was given by Sandy to Kim.| Also made minor improvements for generation, including corrected trigger rules. Gold profile updates are included only for csli, mrs, hike, cb, and jh1 ------------------------------------------------------------------------------ Release notes for version "ERG (1010)" Stable tagged release with full (manual) updates of all gold profiles including LOGON, WeScience, and (after a long hiatus) the Verbmobil and ecommerce treebanks, along with the newly added SemCor (semantically tagged portion of the Brown corpus - the first 3100 items so far). Details on current ERG coverage of these profiles can be found on the Redwoods web page: http://www.delph-in.net/redwoods. ------------------------------------------------------------------------------ Release notes for version "ERG (1007)" Minor improvements for better coverage of WSJ corpus and of the education and speech application corpora. ------------------------------------------------------------------------------ Release notes for version "ERG (1004)" This is intended as a `stable' release, accompanied by a full manual update of the `gold' treebanked profiles, and parse-ranking models trained on them. ------------------------------------------------------------------------------ Release notes for version "ERG (1003)" - This release is essentially a pre-release of a proposed stable release next month ("ERG (1004)"), and will serve as the basis for final tuning, debugging, and updating of the various treebanks. - At long last, the rule names have all been converted to conform to the naming scheme proposed in 2008, and described at http://wiki.delph-in.net/moin/ErgTop PLEASE NOTE that all pre-existing treebanks constructed using the ERG will have to be converted before they can be used for treebank updates with this new grammar version. See http://wiki.delph-in.net/moin/ErgRules for instructions to effect this conversion automatically. - This release also includes adaptation of the arboretum files for current use in the EPGY grammar-checking application. - The chart-mapping machinery includes a revised treatment of quote marks in their splendid variety, aiming for more normalization in preprocessing, thanks to Stephan Oepen. - The grammar currently includes some temporary patches to support generation using unknown words, mostly recently for experiments in generating from DMRSs. While largely functional, this should be considered work still in progress, since at least the mechanism for assigning semantic predicate names to unknown words is far from ideal. - The `gold' directory now contains an additional profile `petet' for the Evaluation by Textual Entailment trial data set. In addition to this new profile, the following usual three `gold' profiles have been updated using this version of the grammar: csli, mrs, hike. Expectations are that the full `gold' collection of profiles will be updated by April. ------------------------------------------------------------------------------ Release notes for version "ERG (1002)" - Re-working of arboretum files to apply to error analysis in grammar checking - In preprocessor, factoring out of treatment of quote marks. - Better interim accommodation for unknown words in generation, consistent with current naming convention for unknown predicates (see erg/tmr/pos.tdl). ------------------------------------------------------------------------------ Release notes for version "ERG (0909)" - Addition of EPGY-specific types and lexical entry constraints ------------------------------------------------------------------------------ Release notes for version "ERG (0907)" (the Barcelona release) - Note that the attribute STEM has been renamed to ORTH, for more clarity. For those few who use the lexical database in connection with the ERG, it will be necessary to reload the database, using the revised table definitions in this version of the grammar. - Improved coverage to admit some VP-modifying relative clauses, as in 'Abrams hired us, which bothers Browne.' - Further stabilizing of the chart-mapping machinery for preprocessing and for accommodation of unknown words. - Extended support for generation with unknown words - Experimental support for paraphrasing as an external LOGON MT-like task which uses the external SEM-I (semantic interface) specification. See the file in $LOGONROOT/uio/enen/README for a quick introduction. ------------------------------------------------------------------------------ - Added treebanks for WeScience profiles 3 and 4. - Added more support for generation with unknown words ------------------------------------------------------------------------------ Release notes for version "ERG (0902)" - First version making use of the new chart-mapping machinery - please note that you will need a correspondingly new version of the LKB and PET (no older than 22-Feb-09; PET compiled off its `cm' branch). - for the LOGON tree, please use the `trunk' version and select appropriate PET binaries (from the `cm' branch) as `flop -t' and `cheap -t'. - Added the first four treebanked profiles in the WeScience corpus - Updated other profiles in 'gold' subdirectory - but note that a few still await updating, including the SensEval and SemCor profiles). ------------------------------------------------------------------------------ Release notes for version "LinGO (July-08)" - Elaborated the chart-mapping rules to accommodate the existing treebanked corpora, including more systematic treatment of POS-driven unknown word handling. - Added additional treebanks for some corpus data from Senseval, SemCor, and ILIAD (Melbourne). - Added syntactic coverage for some additional sentential modifier phrases ('as'+passiveVP, and NP predicatives like "His project fully funded, Abrams celebrated."), and for marked word order with PPs appearing before some VPs, and before some complement NPs. --------------------------------------------------------------------------- Release notes for version "LinGO (Apr-08)" - Include experimental chart-mapping preprocessor rules in inpmap-rules.tdl and lexmap-rules.tdl. - Enriched the hierarchy of semantic predicates to support underspecification in translation, including abstract predicates for locative 'in, on, at' and for 'the, a, udef' quantifiers. - Tuned lexicon for Semcor data to support treebanking. - Added syntactic coverage for 'small clause' predicatives such as 'The dog barked, its heart beating wildly". --------------------------------------------------------------------------- Release notes for version "LinGO (26-Jan-08)" Final tuning for SciBorg's first treebank of six abstracts Final tuning for LOGON/HandOn treebank update --------------------------------------------------------------------------- Release notes for version "LinGO (24-Jan-08)" A few corrections to lexical entries based on most recent HandOn fan-outs --------------------------------------------------------------------------- Release notes for version "LinGO (23-Jan-08)" Added a few missing lexical entries for degree specifiers --------------------------------------------------------------------------- Release notes for version "LinGO (21-Jan-08)" And still more tuning - maybe the final round - for HandOn ------------------------------------------------------------------------------ Release notes for version "LinGO (20-Jan-08)" 1. More tuning for HandOn driven by 'sti' and 'vei' fan-out logs ------------------------------------------------------------------------------ Release notes for version "LinGO (17-Jan-08)" 1. Minor adjustments to lexicon, grammar, and trigger rules for fine-tuning of HandOn system. ------------------------------------------------------------------------------ Release notes for version "LinGO (15-Jan-08)" 1. Added vocabulary for HandOn based on missing predicates from NoEn 2. Completed tuning of lexicon and preprocessing for HandOn English data 3. One recent change that may affect transfer: Decomposition of N-V compounds like "snow-covered" and "T-marked" - used to be multi-words with single predicate, but are now constructed via compound rule, with the two component EPs and an additional linking EP with PRED |argument_rel| similar to |compound_rel| ------------------------------------------------------------------------------ Release notes for version "LinGO (Nov-07)" 1. Treebanks - Updated all treebanks in erg/gold, but have not yet rebuilt jhpstg.mem file 2. MRS quality improvements / harmonization - Added type constraints on ARG1s for several classes of modifiers - Corrected missing semantic link in P-PP construction "from behind the hill" - Removed spurious pron_rel from infinitival subordinate constructions like "Kim sang to impress Sandy." - Made minor changes to title construction: - changed pred name for post-head titles to be consistent with pre-head one - corrected rule for number-headed phrases like "page 3" ------------------------------------------------------------------------------ Release notes for version "LinGO (Oct-07)" Added lexical coverage for vocabulary in the English data for the HandOn project, in this case keeping the large number of domain-specific proper names in a separate file 'handon-propers.tdl'. Also made some repairs to remaining inconsistencies in MRSs in the message-free universe. In addition, did several bits of minor tuning of syntactic constructions in support of the DFKI Checkpoint project, and added first version of the token-mapping rules for PET's emerging support for this functionality. This release also includes an additional settings file for PET, 'mrs.set', to support development of generation capability for PET. Note that only three of the 'gold' profiles (csli, hike, and mrs) have been updated in this release; the rest will follow shortly. ------------------------------------------------------------------------------ Release notes for version "LinGO (Jul-07)" Added lexical coverage for several additional treebanked data sets, including Senseval 2-4, FraCaS, SciBorg, and Acrolinx (though the latter two data sets are not distributable). Also updated the full set of 'gold' profiles for the existing data sets. PLEASE NOTE that this version requires an up-to-date version of the LKB to get correct behavior with the treebanked data in 'gold', since the derivation trees are now augmented with a specification of which root constraint was used to admit each tree. ------------------------------------------------------------------------------ Release notes for version "LinGO (21-Mar-07)" The most significant change in this version of the ERG is the complete removal of messages, as announced at the Fefor DELPH-IN meeting to follow the completion of the LOGON demonstrator. This version is a nearly exact non-msg equivalent of the final LOGON version "LinGO (17-Mar-07)", so it should be straightforward to compare and contrast the two variants. In brief, the distinction among propositions, questions, and commands is now made via the value of the attribute SF ('sentence force' i.e., illocutionary force), a property of events. This attribute and its values are also used in the most recent release of the Grammar Matrix. In addition, this release contains the following modifications/improvements, the first of which is also included in the final LOGON version: - Adoption of Stephan Oepen's proposal for a more uniform treatment of properties of MRS events and indices - Adoption of Berthold Crysmann's proposal for the full cross-product of subtypes for encoding person-number - Addition of missing vocabulary for the Senseval 2 test data - Addition of pragmatic EPs to encode focus (formerly referred to as 'topicalization') and promoted arguments in passive constructions. ------------------------------------------------------------------------------ Release notes for version "LinGO (17-Mar-07)" (Final version with messages) Added missing lexical entries for the known-vocabulary held-out portion of the LOGON corpus (43 proper names and 5 common nouns) ------------------------------------------------------------------------------ Release notes for version "LinGO (20-Dec-06)" (Final LOGON version) - A few more corrections for mixed case and preprocessor ------------------------------------------------------------------------------ Release notes for version "LinGO (19-Dec-06)" - Added feature on 'index' called IND for 'individuated', to enable distinction in SEM-I among count nouns, mass nouns, and mass-or-count nouns. - Added further improvements to mixed case orthography, here primarily for country-related adjectives and nouns like "Englishman" and "Norwegian" ------------------------------------------------------------------------------ Release notes for version "LinGO (15-Dec-06)" - Corrected preprocessor to preserve mixed case for Norwegian 'special' characters - they were being downcased because PET doesn't like them for interactive parsing, but now that the lexical entries require mixed case for proper names, we have to keep it for batch processing, which works okay. - Modified lexical entries per latest transfer requests: - deleted bogus adjective entries for "U-shaped" - corrected "T-marked" to also work in predicative position - added entry for "noticeable that S" ------------------------------------------------------------------------------ Release notes for version "LinGO (14-Dec-06)" - Added more consistent capitalization in orthography (and CARG) for ERG lexicon, to enable higher quality generation. - Added spelling variants for two lexical entries used in training corpus: 'Hedmarker/Hedemarker' and 'El Dorado/Eldorado' - Added entries as requested by transfer: 'mackerel', 'arboretum', "vitamin C" ------------------------------------------------------------------------------ Release notes for version "LinGO (13-Dec-06)" More adjustments for final LOGON integration: - Added and corrected lexical entries as requested by Transfer - Corrected generator 'black hole' errors so generator will always terminate at least on the usual test suite data. - Incorporated new LNK feature which replaces old WLINK, for mapping from MRS relations to their corresponding surface form positions [oe] ------------------------------------------------------------------------------ Release notes for version "LinGO (01-Dec-06)" Minor additions for final LOGON integration: - Added remaining missing lexical entries for known-vocabulary held-out data - Improved efficiency for generation with coordination, by collapsing near-duplicate lexical entries for conjunctions - Made minor corrections guided by LOGON fan-out logs, to improve both coverage and quality ------------------------------------------------------------------------------ Release notes for version "LinGO (Nov-06)" NOTE: Users of this version of the ERG are strongly encouraged to also obtain a current version of the LKB and [incr tsdb()], in order to benefit fully from recent enhancements. - Since the last public release in July, the ERG's lexicon has been expanded to include about 3000 additional nouns and adjectives that occur with high frequency in the British National Corpus (100 times or more). - Some additional technical vocabulary was added to accommodate a sample of data from the Cambridge SciBorg project; these lexical entries are also tagged with "SciBorg" in the lexical database. - The remaining changes have focused on tuning the grammar and SEM-I for generation in the near-final LOGON demonstrator. - Updated treebank summary for LOGON data, in erg/gold. Note that this version of the treebank benefitted from the welcome addition to PET by Yi Zhang enabling the selective unpacking strategy used in the LKB. Profile Items Parsed Treebank ----------------------------------- JH0 261 248 226 JH1 1353 1302 1221 JH2 1307 1154 1058 JH3 1443 1367 1230 JH4 1603 1505 1416 JH5 464 420 398 PS 965 908 860 TG 2014 1875 1735 ROND 1290 1203 1133 ----- ----- ----- ----- Totals 10700 9982 9277 ------------------------------------------------------------------------------- Release notes for version "LinGO (13-Oct-06)" Added entries for digit-orthography cardinal adjectives to help generator. ------------------------------------------------------------------------------- Release notes for version "LinGO (12-Oct-06)" Maybe the final round of tuning for this integration: 1. Merged falsely ambiguous lexical predicates: NEW OLD "_fine_a_for_rel" "_fine_a_1_rel" "_good_a_at-for_rel" "_good_a_for_rel" "_good_a_at-for_rel" "_good_a_at_rel" "_good_a_at-for_rel" "_good_a_1_rel", "_understand_v_by_rel" "_understand_v_1_rel" 2. Added missing trigger rules for it-cleft construction. 3. Corrected a few minor errors in grammar rules. ------------------------------------------------------------------------------ Release notes for version "LinGO (10-Oct-06)" Still more minor tuning 1. Corrected entry for "_guess_" unknown-noun lex entry to work in compounds 2. Corrected NP fragment rules to allow fragments that are conjoined NPs 3. Enabled entry for prep "to" to also modify proper names. ------------------------------------------------------------------------------ Release notes for version "LinGO (09-Oct-06)" More minor tuning for impending LOGON release: 1. Fixed spelling of 'considerred' for dative passive form 2. Enabled generation of implicit NP coordination 3. Corrected lexical entry's PRED name for 'edge' 4. Corrected modification of imperatives 5. Added missing topmost message for sentence-initial conjunction 6. Allowed Adj-N as title 7. Added missing entries for 'follow' and 'transport': NP+PP-dir 8. Corrected multiple SEM-I entries for 'choose' verb 9. Added missing analysis of NP's + PP construction 10. Added lexical entry for 'mountain pasture' as title 11. Renamed inconsistent degree adverb preds: "_a+little_x_deg_rel" "_steeply_x_deg_rel" "_directly_x_deg_rel" "_shortly_x_deg_rel" 12. Added entry for adj 'so' ('true') with expl-it subj: "It is so that ..." Note that only the gold profiles for 'csli', 'mrs', and 'hike' have been updated for this release. ------------------------------------------------------------------------------ Release notes for version "LinGO (05-Oct-06)" Minor tuning for upcoming LOGON release: 1. Corrected PRED name for "downstairs" Old SEM-I entry: "_downstairs_a_1_rel" : ARG0 e, ARG1 u. New: _downstairs_p_rel : ARG0 e, ARG1 u. 2. Corrected nbar-fragment rule to also analyze measure-nouns like "centimeter" ------------------------------------------------------------------------------ Release notes for version "LinGO (27-Sept-06)" - Further LOGON tuning for better harmonization with NorGram - Used the newly available selective unpacking in PET to create the treebanks in the 'gold' subdirectory. - Updated the treebanks for the full set of profiles in 'gold' ------------------------------------------------------------------------------ Release notes for version "LinGO (11-Sept-06)" - Tuning for improved LOGON generation for 'vei' development corpus - Added several thousand lexical entries based on frequency of use in the BNC, guided by the unigram and bigram error mining analysis of Yi Zhang. In particular, added entries for (i) those words which were entirely missing and with BNC frequency of 100 or more; and all words with at least one entry already in the ERG, but with (ii) unigram error score of 0.00, or (iii) bigram score of 0.00. ------------------------------------------------------------------------------ Release notes for version "LinGO (18-Jul-06)" - Minor tuning to improve coverage on LOGON 'vei' items ------------------------------------------------------------------------------ Release notes for version "LinGO (Jul-06)" - Improvements in semantic composition (assisted by useful error analysis in utool) and additional lexical entries, as noted in internal LOGON release notes below, since last public release of January 2006. - Converted leaf lexical type names to conform to new naming conventions, with mapping from old to new names provided in file "new-le-types.txt". See wiki.delph-in.net/erg for documentation of new LE types. - Adopted use of new variable property mappings given in file "semi.vpm". - Updated treebank summary for LOGON data, in erg/gold (these profiles were used to retrain the parse selection model in "jh.mem"): Profile Items Parsed Treebank ----------------------------------- JH0 261 233 197 JH1 1254 1132 1043 JH2 1185 1047 908 JH3 1311 1197 1057 JH4 1454 1336 1214 JH5 464 408 371 PS 965 892 833 TG 2014 1831 1656 ROND 1290 1196 1072 ----- ----- ----- ----- Totals 10198 9272 8351 ------------------------------------------------------------------------------ Internal release notes for version "LinGO (08-Jun-06)" Small corrections to semantics of title nouns, both alone and in compounds. Note that only the 'gold' profiles for csli, mrs, and hike have been updated. ------------------------------------------------------------------------------ Internal release notes for version "LinGO (24-May-06)" Added lexical entries needed for remaining LOGON development corpus (Turglede and Preikestolen texts). Made semantics for comparatives, superlatives, and much/many more consistent. Reduced generation output of variants with commas for modification & coord. NP-coord - Corrected semantics, adding qeq (more consistent, and more scopes) Free rels - Made embedded message be prpstn_m_rel, not underspecified. Corrected semantics errors throughout, using Utool Added treebank profiles for ps (Preikestolen) and tg (Turglede) data. ------------------------------------------------------------------------------ Internal release notes for version "LinGO (13-Feb-06)" Corrected semantics for quantifiers 'most' and 'the most', dropping the predicate 'most_q_rel' in favor of decomposed semantics using the usual "many-much_a_rel". ------------------------------------------------------------------------------ Internal release notes for version "LinGO (09-Feb-06)" Minor improvements in SEM-I content, and correction of an item in gold MRS. ------------------------------------------------------------------------------ Internal release notes for version "LinGO (06-Feb-06)" More harmony for depictives, now with same semantics as other subordinate clauses. Also corrections to SEM-I for directional PP verbs. ------------------------------------------------------------------------------ Internal release notes for version "LinGO (03-Feb-06)" Improved harmony: - Comparative and superlative determiners now have decomposed semantics analogous to correponsding adjectives, consistent with NorGram - Comparative and superlative adjectives now present the ARG0 of the comp_rel/superl_rel as their INDEX, with one benefit being a better MRS for measured comparatives, as in 'Dogs are 5 cm taller than horses.' - Free relatives, like ordinary relatives, no longer introduce a TPC value for their message. Consistency: - Lexical entry type for named years ('2004') is now treated more like other named entities, undergoing a bare-NP rule to project a full NP. - Title compounds as in 'project manager Abrams' now have the compound relation take two ref-inds as arguments, like one would expect. ------------------------------------------------------------------------------ Release notes for version "LinGO (Jan-06)" PLEASE NOTE: This version of the ERG requires up-to-date versions of both the LKB and PET, since it takes advantage of improvements in the treatment of morphology in the LKB, and also depends on a consistent treatment of special characters like \?, \(, and \". This version includes minor tuning adjustments to the lexicon and grammar, to improve overall precision and coverage on the data sets included in the Redwoods 6 (Norwegian Growth) treebank, which has been expanded to include about 5000 items from the LOGON development corpus on Norwegian back-country tourism. The single-best-parse profiles for this additional data appear as usual in the subdirectory 'gold', in the six directories jh0 - jh5. In addition, the grammar now includes a semantic interface file 'erg.smi' which currently specifies the minimal properties of each lexical predicate, including its name and its arguments, their types, and their optionality. This file should soon also include the grammar predicates (those introduced by rules rather than by lexical entries), as well as the set of abstract predicates which are intended as part of the external interface to the grammar. ------------------------------------------------------------------------------ Release notes for version "LinGO (05-Dec-05)" 1. Punctuation - Eliminated the duplication in files that was formerly needed for minor differences between the LKB and PET, now resolved. 2. Lexicon - Added vocabulary needed for the LOGON development corpus on tourism in the Norwegian mountains. 3. Generation - Tuned the trigger rules for introducing semantically empty lexical entries, for improved efficiency. 4. Treebanks - There are now additional profiles jh* in the directory gold, for several segment of the LOGON development corpus for the Jotenheimen region. In this release, only jh1 is updated; the other five sections will follow soon. The other (non-LOGON) profiles are all up to date. ------------------------------------------------------------------------------ Release notes for version "LinGO (23-Nov-05)" 1. Corrected lexical entries for "write" and "unevaluated", as well as the preprocessor-related "twodigitdomersatz". Also added entry for "untrafficked". 2. Repaired error in comma punctuation which was causing overgeneration. 3. Corrected error in lexical types for day-of-month entries which was producing ill-formed MRSs. ------------------------------------------------------------------------------ Release notes for version "LinGO (15-Nov-05)" 1. Added and corrected lexical entries and SEM-I - Most interestingly, added some entries for 'kind' readings, as for the noun "bear" in "they hunted bear." The predicate names are distinct, since presumably these would be derived from some lexical rule producing a distinct sense, and take the form "_<noun>_n_kind_rel" - Changed the single entry for the adjective "born" so it is treated semantically more like the passive participle it once was, and now introduces the predicate "_bear_v_2_rel" with a distinct sense of the verb "bear" from that in "Kim can't bear to lose" - Made changes in response to requests from JTL for transfer. 2. Tuned grammar in minor respects to improve consistency in treebanking the JH corpus. ------------------------------------------------------------------------------ Release notes for version "LinGO (10-Nov-05)" 1. Corrected SEM-I and lexicon errors noted by JTL, and improved constraints on lexical types with handle arguments so the SEM-I reflects these (introducing e.g. [ ARG3 h ] instead of formerly [ ARG3 u ]). 2. Added a few more lexical entries needed for JH, and some minor syntactic additions for constructions like "Try it yourself" and "Kvame became sole owner". ------------------------------------------------------------------------------ Release notes for version "LinGO (05-Nov-05)" Quick additional release to make improvements for treebanking Jotenheim 1. Punctuation - Cleaned out a few more temporary patches in preprocessor and lexicon, especially for |"|, |(|, |)| which had had substitutions. 2. Preprocessing - Added a few more cases revealed by Jotenheim data. 3. Lexicon - Added a few missing multi-words that emerged from initial treebanking, and changed a few more formerly relational nouns to just ordinary nouns, to avoid spurious ambiguity 'top, bottom, side, front, back' Also (finally) corrected the pred names for "anybody", "someone", etc. to now use _any_q_rel rather than any_q_rel, and same for _some_q_rel. 4. Fixed TPC assignments in relative clauses and for 'wonder'. 5. Corrected nominalization, which became too constrained in an attempt to avoid spurious ambiguity. ------------------------------------------------------------------------------ Release notes for version "LinGO (01-Nov-05)" 1. Tuned generation trigger rules to reduce overgeneration, improve efficiency Also attempted to make more consistent use of TPC, PSV, allowing underspec. 2. Revised morphology to benefit from improvements in LKB and later in PET, now that irregularly inflected words can co-exist with punctuation suffixes (so eliminated files inflr-pet.tdl, inflr-pnct-pet.tdl, robust.tdl, and robust-pnct.tdl). 3. Reduced inventory of scopal adverbs, and improved consistency for adverbs. Note in particular that most so-called discourse adverbs have been converted to scopal adverbs, and the conjunctions 'and, or, but' are now treated as such even when they are sentence-initial. 4. Corrected some errors in lexical types and in syntactic rules; in particular fixed type for mass_ppcomp, which was broken, and improved nbar-coordination whose semantics was not ideal. 5. Some other lexical changes: - 'both' determiner is now logically equivalent to "the two". - 'respect (for)' wasn't entered as a mass noun, now is. - 'cross_over_v1, _v2' removed from lexicon (now done compositionally) - various entries for cardinal "one" had CARG "01", now just CARG "1". ------------------------------------------------------------------------------ Release notes for version "LinGO (09-Sep-05)" 1. Repaired punctuation overgeneration for non-WH topicalization, by removing a licensing for constructions like "Who won? asked Kim." (not frequent in our data set, though seen in Rondane). 2. Removed STATIVE from grammar, since no longer used 3. Removed spurious fragment rules only used for parsing dictionary definitions 4. Corrected lexical predicates in SEM-I _have_v_to_rel => "_have_v_to_rel" (from type to string) "_fail1_v_1_rel" => "_fail_v_1_rel" (misspelling) 5. Added missing lexical entry for unaccusative (intransitive) "weaken" 6. Added lexical entries for "move" and "drive" analogous to "put", still using the same inventory of predicates in the SEM-I. 7. Split the lexical rule for prenominal verbal modifiers into two rules, one for present participles and one for passives, to avoid spurious verb-particle entries which should be disallowed as modifiers (since the particle can't be present). 8. Modified the types for raising verbs taking an infinitival VP complement so they uniformly combine with the infinitival "to" which introduces a message. 9. Added reentrancies for TPC and PSV so the appropriate values appear on messages in embedded clauses. 10. Improved generator efficiency by adding grammar-internal feature --TPC which new generator compliance rules assign a value based on the public feature TPC. 11. Also further refined trigger rules, and exploited the newly invented compliance rules which adjust the input MRS to comply with grammar-internal constraints (so far restricted to assigning value for --TPC based on TPC. 12. Again for efficiency, added constraints on events introduced by adverbs and degree specifiers so they will not trigger lexical entries in generation. 13. Once again corrected the reported failure to generate some examples like "Abrams could." which made use of ellipsis_rel as underspecification of ellipsis_ref_rel. ------------------------------------------------------------------------------ Release notes for version "LinGO (05-Sep-05)" Improved generation with punctuation and fragments. Updated Verbmobil section of Redwoods treebank, and filled in missing gold profiles. ------------------------------------------------------------------------------ Release notes for version "LinGO (02-Sep-05)" Minor update: Modified trigger rules to use unification rather than subsumption, and added some abstractions over trigger rules, in mtr.tdl Further reduced spurious commas preceding modifiers in generation. Punctuation rules now compatible with current LKB morphology. Infinitival subjects no longer introduce nominalization (as in "To err is human.") ------------------------------------------------------------------------------ Release notes for version "LinGO (15-Aug-05)" Minor update: The usual normalizing of predicate names, this time mostly for expletive-it-taking predicates. Also some futher tuning of trigger rules, and change to verb_synsem to make sure uninflected lexical entries already identify their INDEX and KEYREL.ARG0, for better generator initialization. ------------------------------------------------------------------------------ Release notes for version "LinGO (09-Aug-05)" Minor update for yet more consistency in predicate names, especially for relational nouns and adjectives, respectively, to get their related entries to match in predicate names. Also corrected ordering error in prp_infl_rule and added a few additional lexical entries for the LOGON development corpus. ------------------------------------------------------------------------------ Release notes for version "LinGO (05-Aug-05)" Minor update to improve consistency in predicate naming conventions, and to restore the 'chunking' roots in roots.tdl which are used experimentally in trying to generate from fragmented MRSs. Note that in this release, only the 'gold' profiles for 'csli', 'mrs', and 'hike' have been updated. ------------------------------------------------------------------------------ Release notes for version "LinGO (Jul-05)" This release incorporates several significant changes to the previous release, but at long last also includes a first step at documenting an external semantic interface for the grammar. The changes will soon be described in a little more detail on the ERG Wiki, but in summary: 1. Punctuation as affixation Previous versions of the grammar implemented a treatment of punctuation adopting a standard but linguistically dubious strategy of using a preprocessor to make all punctuation marks distinct tokens, adding spaces around each one. This version implements an analysis which leaves the input string unchanged with respect to punctuation (except for apostrophes), and treats the punctuation marks as spell-changing affixes. This change creates backward incompatibilities with earlier treebanks because the tokenization for each sentence is now different. A few infelicities remain from making this change, including - minor inconsistencies in the readers of affixation rules for the LKB and PET (and even for previous and current versions of the LKB) - imperfect interaction of irregular inflected forms and punctuation - imperfect interaction of multi-words and punctuation There are work-arounds for some of these, awaiting better resolution. 2. Semantics a. Semantically empty prepositions no longer introduce an EP (they used to add an EP whose predicate name ended in "_sel_rel", for lexically 'selected'). So the generator trigger rules have been augmented to automatically introduce the necessary lexical entries for generation, currently based on predicate-naming conventions for the lexical entries that select empty prepositions. b. Messages now introduce an additional attribute, ARG0, whose value is the event of the highest-scoping verbal EP within the scope of the message. The main motivation is to make it simpler for applications to identify the relevant event properties of a clause's semantics without looking 'inside' the clause's MRS. c. All lexical predicates now have some value in the 'sense' field of the predicate name (Background: by convention in the ERG, each lexical predicate name has the following form: _ORTH_POS_SENSE_rel where ORTH is the lexeme's orthography, POS is a coarse-grained sense distinction drawing from the vocabulary [v n a p x q c], and SENSE is an arbitrary sequence of characters (excluding |_|), and where each of the fields is separated by an underscore. Earlier, the sense field could have been left empty.) The default value for the sense field is now '1'. d. Relational nouns now specify in their sense field the orthography of the preposition marking their oblique complement (usually 'of'). e. Tag questions previously discarded the semantics of the tag phrase, contrary to the monotonicity assumption in the ERG. This is now corrected, with the result that the semantics of sentences with tag questions is now rather more baroque. The main benefit of the reanalysis is that lexical rules now properly always preserve the semantics of their input lexemes. f. Sentential subjects were previously analyzed via a nominalization rule. This simplified the syntactic analysis of "That Abrams arrived annoyed Browne" since the "annoy" lexeme could always unify its ARG1 value with the semantic index of its subject. But the resulting asymmetry for the 'extraposed' and non-extraposed variants of lexemes like 'annoy' was annoying. This version of the grammar now provides the same MRS for both variants ('It annoyed Browne that Abrams arrived' and the above example), via a syntactic variant of an 'it-extraposition' lexical rule, with thanks to Ann Copestake for the suggested implementation. One consequence is that the earlier treatment of examples like "The problem was that Abrams arrived" no longer works, since the identity copula was being used, and requires its complement to supply a referential index. So there is also yet another entry for the verb 'be', which supplies an EP similar to the identity 'be'. g. Verbal modifiers of nouns were being given an inconsistent semantics, with postnominal modifiers as in 'people singing arias' supplying a message for the modifier phrase, but with prenominal modifiers as in 'the singing people' not contributing a message. In this version of the grammar, verbal projections now always supply a message, making the world a little more consistent, but leaving a sharper contrast now between "the singing children" and "the interesting children" where 'interesting' is analyzed as an adjective and hence does not supply a message. 3. Lexicon New lexical entries have been added drawn from the Norwegian tourism domain of the LOGON development corpus, bringing the current number of lexemes to 22,750 for this release, of which about 2700 are proper names. 4. SEM-I A first draft of the semantic interface for the grammar is now presented in the file erg-full.smi, including the predicate names and semantic arguments of all predicates introduced either by lexical entries or by the grammar (either via lexical/syntactic rules or via abstractions over more specific predicates). Documentation of this file is under active development. 5. Naming conventions The feature name DIVISIBLE on referential indices has been shortened to DIV for better readability of MRSs. 6. LKB warnings on grammar loading The LKB's new and improved treatment of morphology offers several advantages, and the current version of the grammar benefits from these, but still results in some warning messages when loading. Users can ignore these messages for now, while the developers resolve the underlying causes. The first is about the 'punct_bang_rule', and the others warn of lexical rules that can feed themselves. ------------------------------------------------------------------------------ Release notes for version "LinGO (30-Apr-05)" This is a minor update to the Apr-05 version, including some lexical additions, adjustments to the semantic predicate hierarchy, and tuning of syntactic analyses, all designed to improve end-to-end translation for LOGON. The only substantive difference is in the analysis of possessive constructions, where the grammar now produces nearly identical MRSs for the two noun phrases "our book" and "a book of ours", using a new lexical entry for "ours" distinct from the ordinary "ours" of "ours are not ready". One consequence of this reanalysis, which unifies the treatment of the two possessive constructions, is that the two arguments in the old 'poss_rel' EP have been reversed: what was the ARG1 is now ARG2, and vice versa. ------------------------------------------------------------------------------ Release notes for version "LinGO (Apr-05)" Overview of changes: - Lexicon size increased to 21000 entries - MRS quality improved - Unicode now used for lexicon: foreign proper names, archaic spellings - Coverage added for fragments, locative inversion, 'free' parentheticals - Changed analyses to allow PP-modif of PPs, APs; adverb-modif of APs - Support for new domains: 'shanghai', 'gcide' --Lexicon-- BNC - Based on months of hard labor by former Stanford students Hansook Lee and Mike Orme (with help from Ara Kim), the lexicon now contains all verb subcat entries for the 2000 most frequent verb stems in the British National Corpus. This should enable some interesting experimentation in automated lexical acquisition, since there are fewer lexical types that need to be hypothesized for non-verbs. GCIDE - The lexicon now also contains entries for all words observed in the first 10,000 definition 'sentences' in the GNU Contemporary International Dictionary of English (GCIDE), to enable more precise evaluation of syntactic coverage of these definitions. Shanghai - Based on some 1500 entries constructed by Yi Zhang at CoLI in Saarbruecken, the lexicon now also contains entries for most of the words found in a Web-derived corpus on tourism in Shanghai, analogous to the Rondane corpus built by Becky Neil for the LOGON project in Norway. --MRS quality-- Based on a substantial implementation effort by Stefan Thater and colleagues at CoLi, Saarbruecken, to check for well-formedness of MRSs produced by the grammar for the Redwoods and Rondane corpora, many errors were identified, enabling improvements in MRS construction in the ERG. Further improvements were enabled by the systematic use of existing capabilities in the LKB for diagnosing MRS errors in ERG analyses. While the current release still produces some flawed MRSs for these data sets, they are largely confined to a small inventory of known and somewhat problematic minor phenomena. --Unicode-- Drawing on the combined expertise of Stephan Oepen and Francis Bond, the ERG is now fully Unicode-compliant, including the PSQL database. This enables proper representation in the lexicon for orthography of non-English proper names such as "østerbø", and archaic English spellings such as "coöperation". The necessary infrastructure for Unicode is admirably and demonstrably in place in the LKB, PET, [incr tsdb()], and PostgreSQL. --Coverage-- Fragments - Further work on the treatment of fragments has been motivated largely by the effort to parse the definition sentences in GCIDE, and to give them a consistent semantic representation. New fragment types now licensed include VPs and PPs with NP gaps, as in "To devour." or "Relying on.". Locative inversion - The grammar now analyzes some locative inversion phenomena, currently restricted to sentences headed by the finite copula 'be' as in "Near the park is a large dog" but not (yet) "Near the park stood a large tree". These appear with some frequency in the Rondane data, and have also been waiting patiently for twenty years in the CSLI test suite. 'Free' parentheticals - Sentences containing some classes of parenthetical material (which would not survive in situ without the parentheses) will now be analyzed, though further work will be needed in designing the target semantics. Example now covered: "That dog (you should see its owner!) barked." --Changed analyses-- Modification - Based on more systematic analysis of phenomena found in the Rondane corpus, and corroborated in the Shanghai corpus, the ERG now permits more interesting modification structures. Prepositional phrases, formerly restricted to modifying only VPs and nominal phrases, can now also modify adjective phrase and other PPs. Similarly, adverbs can now also modify adjective phrases, as in "the wildly happy dog barked", freeing the grammar from its former requirement that duplicate degree-specifier lexical entries be added for many adverbs. --New domains-- The GCIDE corpus has been taken from the GCIDE web site, and carefully prepared by Eric Nichols at NTT in collaboration with Francis Bond, including identification of sentence breaks, normalization, and formatting, all of which are now automated via Perl scripts converting the original GCIDE data into, among other things, an 'item' file format for use with the fine system. The Shanghai corpus is being collected by Yi Zhang in Saarbruecken as part of his thesis work, and consists of text on tourism in Shanghai, written in English and mostly but not entirely by native English speakers. The corpus may still be revised, so a profile of this data is not (yet) being distributed with the ERG.
About
English Resource Grammar
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published