-
Notifications
You must be signed in to change notification settings - Fork 654
v7 Upgrade, welcome
Version 7 is a very-exciting and very-needed change to the library. It's a many-times-rewrite to the existing api, beginning November 2016, and consisting of 700 commits.
It softens many edges in the original workflow, and offers a pretty-fresh way of working with english text in a casual and liberal way.
basically:
nlp(myText).mySubset().subsetFn().out(myOutput)
the idea is to make it simple to 1) reach-in 2) make a change 3) output simply.
// give it your arbitrary text
var r = nlp(`Finally, the api is stable.`)
//grab a subset and make a transformation..
r.nouns().toUpperCase()
//call a subset-specific method
r.sentences().toExclamation()
//output the new thing as whatever
r.out('text')
//"Finally, the API is stable!"
- it's now simply called
compromise
(Thanks Joshua!) - all methods no work with terms, instead of a one-off term. - no more looping!
- includes a clever regex-like matching scheme for grammatical patterns
- easy-access to common text treatments (contractions, punctuation, etc)
- one universal input, consistently tagged + parsed
- smarter dependent/consistent/conflicting POS-tag logic
demands less working knowledge of internals + grammar 💥
no longer fusses with lumping/splitting of neighbouring terms 💥
more playful and 'bottom up' design 💥
easier matching of ad-hoc templates 💥
cuter debugging and traceable decision-making:boom:
npm install compromise
Instead of single Term
objects having the methods & tooling, the library now hoists all this functionality to the main API, so you can filter-down, act-upon, and inspect any list of terms, just as easy as acting on a single term.
( ie. one word is now just a list of words, of length 1. )
This way, you can work on arbitrary text without arbitrary compromise
choices getting in the way:
r= nlp('singing').verbs().toPastTense()
// sang
r= nlp('would have been singing').verbs().toPastTense()
// would have sang
r= nlp('john is singing. Sara was singing.').verbs().toPastTense().out('array')
//[is, was]
##no more nlp.person(), nlp.value()...
every input will now be pos-tagged, and supplied the appropriate methods for each sequence.
let r= nlp('five years old')
r.values().toNumber()
r.out('text')
// '5 years old'
if you don't trust this, you can co-erce the POS:
nlp('john is cool').tagAs('Noun').nouns().toPlural().out('text')
//john is cools
##Match/subset-lookup .match()
see match syntax
nlp('john is cool and jane is nice').match('#Person is').out('array')
//[ 'john is', 'jane is']
more functionality:
nlp('john is cool and jane is nice').not('#Person is').out('array')
//[ 'cool', 'nice']
nlp('john is cool and jane is nice').matchOne('#Person is').out('array')
//[ 'john is']
nlp('John is cool').out('normal');
nlp('John is cool').out('text');
nlp('John is cool').out('html');
//also allows a cleaner, less-crowded result
nlp('John is cool').out('json');
//and adhoc-scripting
nlp('John is cool').out(myFunction);
to see all the new features, see compromise.cool/demos
a huge thank you to our 45! contributors to the work.
for low-hanging fruit, checkout our todo list